Data

The Data class represents a dataset. Each datapoint consists of a discrete value for each variable and corresponds to a configuration of the system. This class supports the computation of key information-theoretic quantities—such as the log-likelihood, log-evidence, entropy, and both geometric and parametric model complexities. Additionally, a Data object serves as the input parameter for search algorithms implemented in the MCMSearch and BasisSearch classes to infer either the best minimally complex (MCM) or the optimal basis representation of the system, respectively.

class mcmpy.Data

Initialization

__init__(filename: str, n_var: int, n_states: int)

Constructs a new Data object by loading a dataset from a file.

Parameters:
  • filename (str) – Path to the file.

  • n_var (int) – Number of variables in the system.

  • n_states (int) – Number of values each variable can take.

Methods

log_evidence_icc(mcm: MCM)

Computes the log-evidence per ICC of the dataset for a given MCM.

Parameters:

mcm (MCM) – The MCM object for which the log-evidence will be computed per ICC.

Returns:

The log-evidence per ICC.

Return type:

numpy.ndarray

log_evidence_icc(partition: numpy.ndarray)

Computes the log-evidence per ICC of the dataset for a given partition.

Parameters:

partition (numpy.ndarray) – The partition of the MCM for which the log-evidence will be computed per ICC. The required format for this parameter is described in the MCM attributes array and array_gray_code.

Returns:

The log-evidence per ICC.

Return type:

numpy.ndarray

log_evidence(mcm: MCM)

Computes the total log-evidence of the dataset for a given MCM.

Parameters:

mcm (MCM) – The MCM object for which the log-evidence will be computed.

Returns:

The total log-evidence.

Return type:

float

log_evidence(partition: numpy.ndarray)

Computes the log-evidence of the dataset for a given partition.

Parameters:

partition (numpy.ndarray) – The partition of the MCM for which the log-evidence will be computed. The required format for this parameter is described in the MCM attributes array and array_gray_code.

Returns:

The total log-evidence.

Return type:

float

log_likelihood_icc(mcm: MCM)

Computes the log-likelihood per ICC of the dataset for a given MCM.

Parameters:

mcm (MCM) – The MCM object for which the log-likelihood will be computed per ICC.

Returns:

The log-likelihood per ICC.

Return type:

numpy.ndarray

log_likelihood_icc(partition: numpy.ndarray)

Computes the log-likelihood per ICC of the dataset for a given partition.

Parameters:

partition (numpy.ndarray) – The partition of the MCM for which the log-likelihood will be computed per ICC. The required format for this parameter is described in the MCM attributes array and array_gray_code.

Returns:

The log-likelihood per ICC.

Return type:

numpy.ndarray

log_likelihood(mcm: MCM)

Computes the total log-likelihood of the dataset for a given MCM.

Parameters:

mcm (MCM) – The MCM object for which the log-likelihood will be computed.

Returns:

The total log-likelihood.

Return type:

float

log_likelihood(partition: numpy.ndarray)

Computes the log-likelihood of the dataset for a given partition.

Parameters:

partition (numpy.ndarray) – The partition of the MCM for which the log-likelihood will be computed. The required format for this parameter is described in the MCM attributes array and array_gray_code.

Returns:

The total log-likelihood.

Return type:

float

complexity_geometric_icc(mcm: MCM)

Computes the geometric complexity per ICC of the dataset for a given MCM.

Parameters:

mcm (MCM) – The MCM object for which the geometric complexity will be computed per ICC.

Returns:

The geometric complexity per ICC.

Return type:

numpy.ndarray

complexity_geometric_icc(partition: numpy.ndarray)

Computes the geometric complexity per ICC of the dataset for a given partition.

Parameters:

partition (numpy.ndarray) – The partition of the MCM for which the geometric complexity will be computed per ICC. The required format for this parameter is described in the MCM attributes array and array_gray_code.

Returns:

The geometric complexity per ICC.

Return type:

numpy.ndarray

complexity_geometric(mcm: MCM)

Computes the total geometric complexity of the dataset for a given MCM.

Parameters:

mcm (MCM) – The MCM object for which the geometric complexity will be computed.

Returns:

The total geometric complexity.

Return type:

float

complexity_geometric(partition: numpy.ndarray)

Computes the geometric complexity of the dataset for a given partition.

Parameters:

partition (numpy.ndarray) – The partition of the MCM for which the geometric complexity will be computed. The required format for this parameter is described in the MCM attributes array and array_gray_code.

Returns:

The total geometric complexity.

Return type:

float

complexity_parametric_icc(mcm: MCM)

Computes the parametric complexity per ICC of the dataset for a given MCM.

Parameters:

mcm (MCM) – The MCM object for which the parametric complexity will be computed per ICC.

Returns:

The parametric complexity per ICC.

Return type:

numpy.ndarray

complexity_parametric_icc(partition: numpy.ndarray)

Computes the parametric complexity per ICC of the dataset for a given partition.

Parameters:

partition (numpy.ndarray) – The partition of the MCM for which the parametric complexity will be computed per ICC. The required format for this parameter is described in the MCM attributes array and array_gray_code.

Returns:

The parametric complexity per ICC.

Return type:

numpy.ndarray

complexity_parametric(mcm: MCM)

Computes the total parametric complexity of the dataset for a given MCM.

Parameters:

mcm (MCM) – The MCM object for which the parametric complexity will be computed.

Returns:

The total parametric complexity.

Return type:

float

complexity_parametric(partition: numpy.ndarray)

Computes the parametric complexity of the dataset for a given partition.

Parameters:

partition (numpy.ndarray) – The partition of the MCM for which the parametric complexity will be computed. The required format for this parameter is described in the MCM attributes array and array_gray_code.

Returns:

The total parametric complexity.

Return type:

float

minimum_description_length(mcm: MCM)

Computes the minimum description length of the dataset for a given MCM.

Parameters:

mcm (MCM) – The MCM object for which the minimum description length will be computed.

Returns:

The minimum description length.

Return type:

float

minimum_description_length(partition: numpy.ndarray)

Computes the minimum description length of the dataset for a given partition.

Parameters:

partition (numpy.ndarray) – The partition of the MCM for which the minimum description length will be computed. The required format for this parameter is described in the MCM attributes array and array_gray_code.

Returns:

The minimum description length.

Return type:

float

entropy(base: int)

Computes the entropy of the dataset using a given base.

Parameters:

base (int, optional) – Optional argument to indicate the base used for the computation of the entropy. The default option is \(q\).

Returns:

The entropy of the dataset.

Return type:

float

entropy_of_spin_operator(spin_op: numpy.ndarray)

Computes the entropy of a given spin operator applied on the dataset.

Parameters:

spin_op (numpy.ndarray) – Spin operator for which the entropy will be computed. The spin operator should be given as an array of length \(n\) where each entry represent the exponent of the corresponding variable in the operator.

Returns:

The entropy of the spin operator when applied on the dataset.

Return type:

float

Attributes

n: int

The number of variables in the system (read-only).

q: int

The number of states each variable can take (read-only).

N: int

The actual number of datapoints in the dataset (read-only).

N_unique: int

The number of unique datapoints in the dataset (read-only).