Data
The Data
class represents a dataset. Each datapoint consists of a discrete value for each variable and corresponds to a configuration of the system.
This class supports the computation of key information-theoretic quantities—such as the log-likelihood, log-evidence, entropy, and both geometric and parametric model complexities.
Additionally, a Data object serves as the input parameter for search algorithms implemented in the MCMSearch
and BasisSearch
classes
to infer either the best minimally complex (MCM) or the optimal basis representation of the system, respectively.
- class mcmpy.Data
Initialization
- __init__(filename: str, n_var: int, n_states: int)
Constructs a new Data object by loading a dataset from a file.
- Parameters:
filename (str) – Path to the file.
n_var (int) – Number of variables in the system.
n_states (int) – Number of values each variable can take.
Methods
- log_evidence_icc(mcm: MCM)
Computes the log-evidence per ICC of the dataset for a given MCM.
- Parameters:
mcm (MCM) – The MCM object for which the log-evidence will be computed per ICC.
- Returns:
The log-evidence per ICC.
- Return type:
numpy.ndarray
- log_evidence_icc(partition: numpy.ndarray)
Computes the log-evidence per ICC of the dataset for a given partition.
- Parameters:
partition (numpy.ndarray) – The partition of the MCM for which the log-evidence will be computed per ICC. The required format for this parameter is described in the MCM attributes array and array_gray_code.
- Returns:
The log-evidence per ICC.
- Return type:
numpy.ndarray
- log_evidence(mcm: MCM)
Computes the total log-evidence of the dataset for a given MCM.
- Parameters:
mcm (MCM) – The MCM object for which the log-evidence will be computed.
- Returns:
The total log-evidence.
- Return type:
float
- log_evidence(partition: numpy.ndarray)
Computes the log-evidence of the dataset for a given partition.
- Parameters:
partition (numpy.ndarray) – The partition of the MCM for which the log-evidence will be computed. The required format for this parameter is described in the MCM attributes array and array_gray_code.
- Returns:
The total log-evidence.
- Return type:
float
- log_likelihood_icc(mcm: MCM)
Computes the log-likelihood per ICC of the dataset for a given MCM.
- Parameters:
mcm (MCM) – The MCM object for which the log-likelihood will be computed per ICC.
- Returns:
The log-likelihood per ICC.
- Return type:
numpy.ndarray
- log_likelihood_icc(partition: numpy.ndarray)
Computes the log-likelihood per ICC of the dataset for a given partition.
- Parameters:
partition (numpy.ndarray) – The partition of the MCM for which the log-likelihood will be computed per ICC. The required format for this parameter is described in the MCM attributes array and array_gray_code.
- Returns:
The log-likelihood per ICC.
- Return type:
numpy.ndarray
- log_likelihood(mcm: MCM)
Computes the total log-likelihood of the dataset for a given MCM.
- Parameters:
mcm (MCM) – The MCM object for which the log-likelihood will be computed.
- Returns:
The total log-likelihood.
- Return type:
float
- log_likelihood(partition: numpy.ndarray)
Computes the log-likelihood of the dataset for a given partition.
- Parameters:
partition (numpy.ndarray) – The partition of the MCM for which the log-likelihood will be computed. The required format for this parameter is described in the MCM attributes array and array_gray_code.
- Returns:
The total log-likelihood.
- Return type:
float
- complexity_geometric_icc(mcm: MCM)
Computes the geometric complexity per ICC of the dataset for a given MCM.
- Parameters:
mcm (MCM) – The MCM object for which the geometric complexity will be computed per ICC.
- Returns:
The geometric complexity per ICC.
- Return type:
numpy.ndarray
- complexity_geometric_icc(partition: numpy.ndarray)
Computes the geometric complexity per ICC of the dataset for a given partition.
- Parameters:
partition (numpy.ndarray) – The partition of the MCM for which the geometric complexity will be computed per ICC. The required format for this parameter is described in the MCM attributes array and array_gray_code.
- Returns:
The geometric complexity per ICC.
- Return type:
numpy.ndarray
- complexity_geometric(mcm: MCM)
Computes the total geometric complexity of the dataset for a given MCM.
- Parameters:
mcm (MCM) – The MCM object for which the geometric complexity will be computed.
- Returns:
The total geometric complexity.
- Return type:
float
- complexity_geometric(partition: numpy.ndarray)
Computes the geometric complexity of the dataset for a given partition.
- Parameters:
partition (numpy.ndarray) – The partition of the MCM for which the geometric complexity will be computed. The required format for this parameter is described in the MCM attributes array and array_gray_code.
- Returns:
The total geometric complexity.
- Return type:
float
- complexity_parametric_icc(mcm: MCM)
Computes the parametric complexity per ICC of the dataset for a given MCM.
- Parameters:
mcm (MCM) – The MCM object for which the parametric complexity will be computed per ICC.
- Returns:
The parametric complexity per ICC.
- Return type:
numpy.ndarray
- complexity_parametric_icc(partition: numpy.ndarray)
Computes the parametric complexity per ICC of the dataset for a given partition.
- Parameters:
partition (numpy.ndarray) – The partition of the MCM for which the parametric complexity will be computed per ICC. The required format for this parameter is described in the MCM attributes array and array_gray_code.
- Returns:
The parametric complexity per ICC.
- Return type:
numpy.ndarray
- complexity_parametric(mcm: MCM)
Computes the total parametric complexity of the dataset for a given MCM.
- Parameters:
mcm (MCM) – The MCM object for which the parametric complexity will be computed.
- Returns:
The total parametric complexity.
- Return type:
float
- complexity_parametric(partition: numpy.ndarray)
Computes the parametric complexity of the dataset for a given partition.
- Parameters:
partition (numpy.ndarray) – The partition of the MCM for which the parametric complexity will be computed. The required format for this parameter is described in the MCM attributes array and array_gray_code.
- Returns:
The total parametric complexity.
- Return type:
float
- minimum_description_length(mcm: MCM)
Computes the minimum description length of the dataset for a given MCM.
- Parameters:
mcm (MCM) – The MCM object for which the minimum description length will be computed.
- Returns:
The minimum description length.
- Return type:
float
- minimum_description_length(partition: numpy.ndarray)
Computes the minimum description length of the dataset for a given partition.
- Parameters:
partition (numpy.ndarray) – The partition of the MCM for which the minimum description length will be computed. The required format for this parameter is described in the MCM attributes array and array_gray_code.
- Returns:
The minimum description length.
- Return type:
float
- entropy(base: int)
Computes the entropy of the dataset using a given base.
- Parameters:
base (int, optional) – Optional argument to indicate the base used for the computation of the entropy. The default option is \(q\).
- Returns:
The entropy of the dataset.
- Return type:
float
- entropy_of_spin_operator(spin_op: numpy.ndarray)
Computes the entropy of a given spin operator applied on the dataset.
- Parameters:
spin_op (numpy.ndarray) – Spin operator for which the entropy will be computed. The spin operator should be given as an array of length \(n\) where each entry represent the exponent of the corresponding variable in the operator.
- Returns:
The entropy of the spin operator when applied on the dataset.
- Return type:
float
Attributes
- n: int
The number of variables in the system (read-only).
- q: int
The number of states each variable can take (read-only).
- N: int
The actual number of datapoints in the dataset (read-only).
- N_unique: int
The number of unique datapoints in the dataset (read-only).