causationentropy.core package

Core algorithms and mathematical implementations for causal discovery.

Subpackages

causationentropy.core.information package

causationentropy.core.discovery module

The core module for causal network discovery algorithms.

Author: Kevin Slote Email: kslote@clarkson.edu version = 1.1.0

causationentropy.core.discovery.discover_network(data, method='standard', information='gaussian', max_lag=5, alpha_forward=0.05, alpha_backward=0.05, metric='euclidean', bandwidth='silverman', k_means=5, n_shuffles=200, n_jobs=-1)[source]

Infer a causal graph via Optimal Causation Entropy (oCSE).

This function implements the optimal Causation Entropy algorithm for causal network discovery from multivariate time series data. The algorithm uses conditional mutual information to identify causal relationships between variables across different time lags.

The core principle is based on the Causation Entropy framework, which quantifies causal relationships using information-theoretic measures. For variables \(X_i\) and \(X_j\) with lag \(\\tau\), the conditional mutual information is computed as:

\[I\!\left(X_j^{(t-\tau)}; X_i^{(t)} \,\middle|\, \mathbf{Z}_i^{(t)}\right) \;=\; H\!\left(X_i^{(t)} \,\middle|\, \mathbf{Z}_i^{(t)}\right) \;-\; H\!\left(X_i^{(t)} \,\middle|\, X_j^{(t-\tau)}, \mathbf{Z}_i^{(t)}\right)\]

where \(\mathbf{Z}_i^{(t)}\) represents the conditioning set for variable \(i\) at time \(t\).

The algorithm proceeds in two main phases:

Forward Selection: Iteratively selects predictors that maximize conditional mutual information with the target variable, conditioned on already selected predictors.
Backward Elimination: Removes predictors that do not maintain statistical significance when conditioned on all other selected predictors.

Statistical significance is assessed via permutation tests, where the null hypothesis assumes no causal relationship exists between variables.

Parameters:

data (array-like of shape (T, n) or DataFrame) – Multivariate time series data where T is the number of time points and n is the number of variables. Variables correspond to columns.
method (str, default='standard') –
Causal discovery algorithm variant. Options:
- ’standard’: Uses initial conditioning set of lagged target variables
- ’alternative’: No initial conditioning set
- ’information_lasso’: Information-theoretic variant with LASSO regularization
- ’lasso’: Pure LASSO-based selection
information (str, default='gaussian') –
Information measure estimator type. Options:
- ’gaussian’: Assumes Gaussian distributions
- ’knn’: k-nearest neighbor estimator
- ’kde’: Kernel density estimation
- ’geometric_knn’: Geometric mean k-NN estimator
- ’poisson’: Poisson distribution assumption
max_lag (int, default=5) – Maximum time lag to consider in causal relationships. The algorithm examines lags from 1 to max_lag (inclusive).
k_means (int, default=5) – Number of clusters for k-means based estimators (when applicable).
alpha_forward (float, default=0.05) – Significance level for forward selection permutation tests. Lower values require stronger evidence for causal relationships.
alpha_backward (float, default=0.05) – Significance level for backward elimination permutation tests.
metric (str, default='euclidean') – Distance metric for k-NN based estimators.
n_shuffles (int, default=200) – Number of permutations for statistical significance testing. Higher values provide more accurate p-value estimates but increase computational cost.
n_jobs (int, default=-1) – Number of parallel jobs for computation. -1 uses all available processors.

Returns:

G – Multi-directed graph representing the discovered causal network. Nodes correspond to variables and edges represent causal relationships. Multiple edges between the same node pair represent relationships at different time lags. Edge attributes include:

’lag’: Time delay \(\tau\) of the causal relationship
’cmi’: Conditional mutual information value for this edge
’p_value’: Empirical p-value from permutation test

Return type:

networkx.MultiDiGraph

Raises:

NotImplementedError – If an unsupported method or information type is specified.
ValueError – If the time series is too short for the chosen max_lag.

Notes

The algorithm’s computational complexity is approximately \(O(T \cdot n^2 \cdot \tau_{max} \cdot N_{shuffle})\), where \(T\) is the time series length, \(n\) is the number of variables, \(\tau_{max}\) is the maximum lag, and \(N_{shuffle}\) is the number of permutations.

For optimal performance with high-dimensional data, consider:

Reducing max_lag for shorter time series
Using ‘gaussian’ information type for continuous data
Adjusting n_shuffles based on desired statistical precision

Examples

>>> import numpy as np
>>> from causationentropy.core.discovery import discover_network
>>>
>>> # Generate sample time series data
>>> T, n = 1000, 3
>>> data = np.random.randn(T, n)
>>>
>>> # Discover causal network
>>> G = discover_network(data, max_lag=3, alpha_forward=0.01)

References

causationentropy.core.discovery.standard_optimal_causation_entropy(X, Y, Z_init, rng, alpha1=0.05, alpha2=0.05, n_shuffles=200, information='gaussian', metric='euclidean', k_means=5, bandwidth='silverman')[source]

Execute the standard optimal Causation Entropy algorithm with initial conditioning set.

This function implements the standard oCSE algorithm that begins with a non-empty initial conditioning set (typically lagged target variables). The algorithm combines forward selection and backward elimination phases to identify significant causal predictors.

The conditional mutual information for candidate predictor \(X_j\) given current conditioning set \(\mathbf{Z}\) is:

\[I(X_j; Y | \mathbf{Z}) = \sum_{x_j,y,\mathbf{z}} p(x_j,y,\mathbf{z}) \log \frac{p(x_j,y|\mathbf{z})}{p(x_j|\mathbf{z})p(y|\mathbf{z})}\]

Parameters:

X (array-like of shape (T, n)) – Predictor variables matrix.
Y (array-like of shape (T, 1)) – Target variable column.
Z_init (array-like of shape (T, p)) – Initial conditioning set (e.g., lagged target values).
rng (numpy.random.Generator) – Random number generator for reproducible results.
alpha1 (float, default=0.05) – Significance level for forward selection phase.
alpha2 (float, default=0.05) – Significance level for backward elimination phase.
n_shuffles (int, default=200) – Number of permutations for statistical testing.
information (str, default='gaussian') – Information measure estimator type.

Returns:

S – Indices of selected predictor variables that passed both forward and backward phases.

Return type:

list of int

causationentropy.core.discovery.alternative_optimal_causation_entropy(X, Y, rng, alpha1=0.05, alpha2=0.05, n_shuffles=200, information='gaussian', metric='euclidean', k_means=5, bandwidth='silverman')[source]

Execute the alternative optimal Causation Entropy algorithm without initial conditioning.

This variant of the oCSE algorithm starts with an empty conditioning set, building causal relationships purely from the forward selection process. This approach may be more suitable when no prior knowledge about lagged dependencies exists.

Parameters:

X (array-like of shape (T, n)) – Predictor variables matrix.
Y (array-like of shape (T, 1)) – Target variable column.
rng (numpy.random.Generator) – Random number generator for reproducible results.
alpha1 (float, default=0.05) – Significance level for forward selection phase.
alpha2 (float, default=0.05) – Significance level for backward elimination phase.
n_shuffles (int, default=200) – Number of permutations for statistical testing.
information (str, default='gaussian') – Information measure estimator type.

Returns:

S – Indices of selected predictor variables.

Return type:

list of int

causationentropy.core.discovery.information_lasso_optimal_causation_entropy(X, Y, rng, criterion='bic', max_lambda=100, cross_val=10, information='gaussian')[source]

Execute information-theoretic variant of oCSE with LASSO regularization.

This method combines information-theoretic causal discovery with LASSO regularization to handle high-dimensional predictor spaces. The approach balances causal relationship strength with model complexity.

Parameters:

X (array-like of shape (T, n)) – Predictor variables matrix.
Y (array-like of shape (T, 1)) – Target variable column.
rng (numpy.random.Generator) – Random number generator.
criterion (str, default='bic') – Information criterion for model selection (‘bic’ or ‘aic’).
max_lambda (int, default=100) – Maximum number of LASSO iterations.
cross_val (int, default=10) – Cross-validation folds (currently unused).
information (str, default='gaussian') – Information measure estimator type.

Returns:

S – Indices of selected predictor variables.

Return type:

list of int

Notes

This is a simplified implementation that delegates to LASSO. Future versions will incorporate information-theoretic weighting into the regularization.

causationentropy.core.discovery.lasso_optimal_causation_entropy(X, Y, rng, criterion='bic', max_lambda=100, cross_val=10)[source]

Execute LASSO-based variable selection for causal discovery.

This method uses LASSO (Least Absolute Shrinkage and Selection Operator) regression for variable selection in causal discovery. The LASSO objective function is:

\[\min_{\boldsymbol{\beta}} \frac{1}{2n} ||\mathbf{y} - \mathbf{X}\boldsymbol{\beta}||_2^2 + \lambda ||\boldsymbol{\beta}||_1\]

where \(\lambda\) is the regularization parameter that controls sparsity.

Parameters:

X (array-like of shape (T, n)) – Predictor variables matrix.
Y (array-like of shape (T, 1)) – Target variable column.
rng (numpy.random.Generator) – Random number generator (unused in current implementation).
criterion (str, default='bic') – Information criterion for regularization parameter selection.
max_lambda (int, default=100) – Maximum number of LASSO iterations.
cross_val (int, default=10) – Cross-validation folds (currently unused).

Returns:

S – Indices of variables with non-zero LASSO coefficients.

Return type:

list of int

Notes

Uses LassoLarsIC when the number of samples exceeds the number of predictors plus one, otherwise falls back to standard LASSO regression.

causationentropy.core.discovery.alternative_forward(X_full, Y, rng, alpha=0.05, n_shuffles=200, information='gaussian', metric='euclidean', k_means=5, bandwidth='silverman')[source]

Forward selection phase of oCSE without initial conditioning set.

This function implements the forward selection phase starting with an empty conditioning set. At each step, it evaluates the conditional mutual information between each remaining candidate predictor and the target, conditioned on already selected predictors.

The selection criterion at each step is:

\[j^* = \arg\max_{j \in \text{candidates}} I(X_j^{(t)}; Y^{(t+\tau)} | \mathbf{S}^{(t)})\]

where \(\mathbf{S}^{(t)}\) represents the current set of selected predictors.

Parameters:

X_full (array-like of shape (T, n)) – Complete predictor matrix containing values at time t.
Y (array-like of shape (T, 1)) – Target variable column containing values at time t+τ.
rng (numpy.random.Generator) – Random number generator for permutation tests.
alpha (float, default=0.05) – Significance level for permutation tests. Predictors must achieve conditional mutual information above the (1-α) percentile of the null distribution.
n_shuffles (int, default=200) – Number of permutations to generate for statistical testing.
information (str, default='gaussian') – Information measure estimator type used for conditional mutual information computation.

Returns:

S – Indices of selected predictor variables that passed the significance test.

Return type:

list of int

Notes

The algorithm terminates when no remaining candidate achieves statistical significance or when all candidates have been evaluated. Each selection updates the conditioning set for subsequent iterations.

causationentropy.core.discovery.standard_forward(X_full, Y, Z_init, rng, alpha=0.05, n_shuffles=200, information='gaussian', metric='euclidean', k_means=5, bandwidth='silverman')[source]

Standard forward selection phase of oCSE with initial conditioning set.

This function implements forward selection starting with a non-empty initial conditioning set Z_init, typically consisting of lagged values of the target variable. This approach incorporates prior knowledge about temporal dependencies in the causal discovery process.

At each iteration, the algorithm selects the predictor that maximizes conditional mutual information with the target, given the current conditioning set:

\[j^* = \arg\max_{j \in \text{candidates}} I(X_j^{(t)}; Y^{(t+\tau)} | \mathbf{Z}^{(t)})\]

where \(\mathbf{Z}^{(t)} = \mathbf{Z}_{\text{init}} \cup \mathbf{S}^{(t)}\) combines the initial conditioning set with currently selected predictors.

Parameters:

X_full (array-like of shape (T, n)) – Complete predictor matrix at time t.
Y (array-like of shape (T, 1)) – Target variable at time t+τ.
Z_init (array-like of shape (T, p)) – Initial conditioning set, typically containing lagged target values.
rng (numpy.random.Generator) – Random number generator for permutation tests.
alpha (float, default=0.05) – Forward selection significance threshold for permutation tests.
n_shuffles (int, default=200) – Number of shuffles for significance testing.
information (str, default='gaussian') – Information measure estimator type.

Returns:

S – Indices of selected predictor variables from X_full.

Return type:

list of int

Notes

The initial conditioning set Z_init remains constant throughout the forward selection, while newly selected predictors are added to form the complete conditioning set for subsequent iterations.

causationentropy.core.discovery.backward(X_full, Y, S_init, rng, alpha=0.05, n_shuffles=200, information='gaussian', metric='euclidean', k_means=5, bandwidth='silverman')[source]

Backward elimination phase of optimal Causation Entropy.

This function performs backward elimination to remove spurious causal relationships identified during forward selection. For each predictor selected in the forward phase, it tests whether the predictor maintains statistical significance when conditioned on all other selected predictors.

For each predictor \(X_j\) in the selected set, the test evaluates:

\[I(X_j^{(t)}; Y^{(t+\tau)} | \mathbf{S}_{-j}^{(t)}) > \text{threshold}\]

where \(\mathbf{S}_{-j}^{(t)}\) represents all selected predictors except \(X_j\).

Parameters:

X_full (array-like of shape (T, n)) – Complete predictor matrix at time t, unchanged throughout the process.
Y (array-like of shape (T, 1)) – Target variable at time t+τ.
S_init (list of int) – Indices of predictor variables selected during the forward phase.
rng (numpy.random.Generator) – Random number generator for permutation order and significance testing.
alpha (float, default=0.05) – Significance level for backward elimination tests.
n_shuffles (int, default=200) – Number of permutation shuffles for statistical testing.
information (str, default='gaussian') – Information measure estimator type.

Returns:

S_final – Subset of S_init containing predictors that maintained statistical significance during backward elimination.

Return type:

list of int

Notes

Predictors are evaluated in random order to avoid selection bias. A predictor is removed if its conditional mutual information with the target, given all other selected predictors, falls below the significance threshold.

The backward phase is essential for controlling false positive rates in causal discovery, as forward selection may include predictors that become redundant when considered alongside other selected variables.

causationentropy.core.discovery.shuffle_test(X, Y, Z, observed_cmi, alpha=0.05, n_shuffles=500, rng=None, information='gaussian', metric='euclidean', k_means=5, bandwidth='silverman')[source]

Permutation test for conditional mutual information significance.

This function performs a permutation test to assess the statistical significance of the conditional mutual information I(X;Y|Z). The test generates a null distribution by computing conditional mutual information on permuted versions of the predictor X, while keeping Y and Z unchanged.

The null hypothesis is that X and Y are conditionally independent given Z:

\[H_0: I(X; Y | Z) = 0\]

The test statistic follows the distribution:

\[\text{CMI}_{\text{null}} \sim \text{Distribution under } H_0\]

Statistical significance is assessed by comparing the observed conditional mutual information to the (1-α) percentile of the null distribution.

Parameters:

X (array-like of shape (T, k_x)) – Predictor variable(s) under test. Must be 2-D even when k_x=1.
Y (array-like of shape (T, 1)) – Target variable column.
Z (array-like of shape (T, k_z) or None) – Current conditioning set. If None, tests marginal mutual information.
observed_cmi (float) – Conditional mutual information value computed on original (unshuffled) data.
alpha (float, default=0.05) – Significance level for the test. Lower values require stronger evidence.
n_shuffles (int, default=500) – Number of random permutations to generate for the null distribution.
rng (int, numpy.random.Generator, or None) – Random number generator or seed for reproducible results.
information (str, default='gaussian') – Information measure estimator type used for conditional mutual information.

Returns:

result – Dictionary containing test results:

’Threshold’: float, the (1-α) percentile of the null distribution
’Value’: float, the observed conditional mutual information value
’Pass’: bool, True if observed_cmi >= threshold (statistically significant)
’P_value’: float, empirical p-value (proportion of null values >= observed)

Return type:

dict

Notes

The permutation test is based on the assumption that under the null hypothesis, the predictor X is exchangeable with respect to the target Y when conditioned on Z. This provides a non-parametric approach to significance testing that does not require distributional assumptions.

For computational efficiency, consider reducing n_shuffles for preliminary analyses, though this may reduce the precision of p-value estimates.

Examples

>>> import numpy as np
>>> from causationentropy.core.discovery import shuffle_test
>>>
>>> # Generate sample data
>>> X = np.random.randn(100, 1)
>>> Y = np.random.randn(100, 1)
>>> Z = np.random.randn(100, 2)
>>> observed = 0.15
>>>
>>> # Perform permutation test
>>> result = shuffle_test(X, Y, Z, observed, alpha=0.05, n_shuffles=1000)
>>> print(f"Significant: {result['Pass']}, p-value ≈ {1 - result['Value']/result['Threshold']:.3f}")

Main Discovery Function

causationentropy.core.discovery.discover_network(data, method='standard', information='gaussian', max_lag=5, alpha_forward=0.05, alpha_backward=0.05, metric='euclidean', bandwidth='silverman', k_means=5, n_shuffles=200, n_jobs=-1)[source]

Infer a causal graph via Optimal Causation Entropy (oCSE).

This function implements the optimal Causation Entropy algorithm for causal network discovery from multivariate time series data. The algorithm uses conditional mutual information to identify causal relationships between variables across different time lags.

The core principle is based on the Causation Entropy framework, which quantifies causal relationships using information-theoretic measures. For variables \(X_i\) and \(X_j\) with lag \(\\tau\), the conditional mutual information is computed as:

\[I\!\left(X_j^{(t-\tau)}; X_i^{(t)} \,\middle|\, \mathbf{Z}_i^{(t)}\right) \;=\; H\!\left(X_i^{(t)} \,\middle|\, \mathbf{Z}_i^{(t)}\right) \;-\; H\!\left(X_i^{(t)} \,\middle|\, X_j^{(t-\tau)}, \mathbf{Z}_i^{(t)}\right)\]

where \(\mathbf{Z}_i^{(t)}\) represents the conditioning set for variable \(i\) at time \(t\).

The algorithm proceeds in two main phases:

Forward Selection: Iteratively selects predictors that maximize conditional mutual information with the target variable, conditioned on already selected predictors.
Backward Elimination: Removes predictors that do not maintain statistical significance when conditioned on all other selected predictors.

Statistical significance is assessed via permutation tests, where the null hypothesis assumes no causal relationship exists between variables.

Parameters:

data (array-like of shape (T, n) or DataFrame) – Multivariate time series data where T is the number of time points and n is the number of variables. Variables correspond to columns.
method (str, default='standard') –
Causal discovery algorithm variant. Options:
- ’standard’: Uses initial conditioning set of lagged target variables
- ’alternative’: No initial conditioning set
- ’information_lasso’: Information-theoretic variant with LASSO regularization
- ’lasso’: Pure LASSO-based selection
information (str, default='gaussian') –
Information measure estimator type. Options:
- ’gaussian’: Assumes Gaussian distributions
- ’knn’: k-nearest neighbor estimator
- ’kde’: Kernel density estimation
- ’geometric_knn’: Geometric mean k-NN estimator
- ’poisson’: Poisson distribution assumption
max_lag (int, default=5) – Maximum time lag to consider in causal relationships. The algorithm examines lags from 1 to max_lag (inclusive).
k_means (int, default=5) – Number of clusters for k-means based estimators (when applicable).
alpha_forward (float, default=0.05) – Significance level for forward selection permutation tests. Lower values require stronger evidence for causal relationships.
alpha_backward (float, default=0.05) – Significance level for backward elimination permutation tests.
metric (str, default='euclidean') – Distance metric for k-NN based estimators.
n_shuffles (int, default=200) – Number of permutations for statistical significance testing. Higher values provide more accurate p-value estimates but increase computational cost.
n_jobs (int, default=-1) – Number of parallel jobs for computation. -1 uses all available processors.

Returns:

G – Multi-directed graph representing the discovered causal network. Nodes correspond to variables and edges represent causal relationships. Multiple edges between the same node pair represent relationships at different time lags. Edge attributes include:

’lag’: Time delay \(\tau\) of the causal relationship
’cmi’: Conditional mutual information value for this edge
’p_value’: Empirical p-value from permutation test

Return type:

networkx.MultiDiGraph

Raises:

NotImplementedError – If an unsupported method or information type is specified.
ValueError – If the time series is too short for the chosen max_lag.

Notes

The algorithm’s computational complexity is approximately \(O(T \cdot n^2 \cdot \tau_{max} \cdot N_{shuffle})\), where \(T\) is the time series length, \(n\) is the number of variables, \(\tau_{max}\) is the maximum lag, and \(N_{shuffle}\) is the number of permutations.

For optimal performance with high-dimensional data, consider:

Reducing max_lag for shorter time series
Using ‘gaussian’ information type for continuous data
Adjusting n_shuffles based on desired statistical precision

Examples

>>> import numpy as np
>>> from causationentropy.core.discovery import discover_network
>>>
>>> # Generate sample time series data
>>> T, n = 1000, 3
>>> data = np.random.randn(T, n)
>>>
>>> # Discover causal network
>>> G = discover_network(data, max_lag=3, alpha_forward=0.01)

References

Causation Entropy Methods

causationentropy.core.discovery.standard_optimal_causation_entropy(X, Y, Z_init, rng, alpha1=0.05, alpha2=0.05, n_shuffles=200, information='gaussian', metric='euclidean', k_means=5, bandwidth='silverman')[source]

Execute the standard optimal Causation Entropy algorithm with initial conditioning set.

This function implements the standard oCSE algorithm that begins with a non-empty initial conditioning set (typically lagged target variables). The algorithm combines forward selection and backward elimination phases to identify significant causal predictors.

The conditional mutual information for candidate predictor \(X_j\) given current conditioning set \(\mathbf{Z}\) is:

\[I(X_j; Y | \mathbf{Z}) = \sum_{x_j,y,\mathbf{z}} p(x_j,y,\mathbf{z}) \log \frac{p(x_j,y|\mathbf{z})}{p(x_j|\mathbf{z})p(y|\mathbf{z})}\]

Parameters:

X (array-like of shape (T, n)) – Predictor variables matrix.
Y (array-like of shape (T, 1)) – Target variable column.
Z_init (array-like of shape (T, p)) – Initial conditioning set (e.g., lagged target values).
rng (numpy.random.Generator) – Random number generator for reproducible results.
alpha1 (float, default=0.05) – Significance level for forward selection phase.
alpha2 (float, default=0.05) – Significance level for backward elimination phase.
n_shuffles (int, default=200) – Number of permutations for statistical testing.
information (str, default='gaussian') – Information measure estimator type.

Returns:

S – Indices of selected predictor variables that passed both forward and backward phases.

Return type:

list of int

causationentropy.core.discovery.alternative_optimal_causation_entropy(X, Y, rng, alpha1=0.05, alpha2=0.05, n_shuffles=200, information='gaussian', metric='euclidean', k_means=5, bandwidth='silverman')[source]

Execute the alternative optimal Causation Entropy algorithm without initial conditioning.

This variant of the oCSE algorithm starts with an empty conditioning set, building causal relationships purely from the forward selection process. This approach may be more suitable when no prior knowledge about lagged dependencies exists.

Parameters:

X (array-like of shape (T, n)) – Predictor variables matrix.
Y (array-like of shape (T, 1)) – Target variable column.
rng (numpy.random.Generator) – Random number generator for reproducible results.
alpha1 (float, default=0.05) – Significance level for forward selection phase.
alpha2 (float, default=0.05) – Significance level for backward elimination phase.
n_shuffles (int, default=200) – Number of permutations for statistical testing.
information (str, default='gaussian') – Information measure estimator type.

Returns:

S – Indices of selected predictor variables.

Return type:

list of int

causationentropy.core.discovery.information_lasso_optimal_causation_entropy(X, Y, rng, criterion='bic', max_lambda=100, cross_val=10, information='gaussian')[source]

Execute information-theoretic variant of oCSE with LASSO regularization.

This method combines information-theoretic causal discovery with LASSO regularization to handle high-dimensional predictor spaces. The approach balances causal relationship strength with model complexity.

Parameters:

X (array-like of shape (T, n)) – Predictor variables matrix.
Y (array-like of shape (T, 1)) – Target variable column.
rng (numpy.random.Generator) – Random number generator.
criterion (str, default='bic') – Information criterion for model selection (‘bic’ or ‘aic’).
max_lambda (int, default=100) – Maximum number of LASSO iterations.
cross_val (int, default=10) – Cross-validation folds (currently unused).
information (str, default='gaussian') – Information measure estimator type.

Returns:

S – Indices of selected predictor variables.

Return type:

list of int

Notes

This is a simplified implementation that delegates to LASSO. Future versions will incorporate information-theoretic weighting into the regularization.

causationentropy.core.discovery.lasso_optimal_causation_entropy(X, Y, rng, criterion='bic', max_lambda=100, cross_val=10)[source]

Execute LASSO-based variable selection for causal discovery.

This method uses LASSO (Least Absolute Shrinkage and Selection Operator) regression for variable selection in causal discovery. The LASSO objective function is:

\[\min_{\boldsymbol{\beta}} \frac{1}{2n} ||\mathbf{y} - \mathbf{X}\boldsymbol{\beta}||_2^2 + \lambda ||\boldsymbol{\beta}||_1\]

where \(\lambda\) is the regularization parameter that controls sparsity.

Parameters:

X (array-like of shape (T, n)) – Predictor variables matrix.
Y (array-like of shape (T, 1)) – Target variable column.
rng (numpy.random.Generator) – Random number generator (unused in current implementation).
criterion (str, default='bic') – Information criterion for regularization parameter selection.
max_lambda (int, default=100) – Maximum number of LASSO iterations.
cross_val (int, default=10) – Cross-validation folds (currently unused).

Returns:

S – Indices of variables with non-zero LASSO coefficients.

Return type:

list of int

Notes

Uses LassoLarsIC when the number of samples exceeds the number of predictors plus one, otherwise falls back to standard LASSO regression.

Selection Algorithms

causationentropy.core.discovery.standard_forward(X_full, Y, Z_init, rng, alpha=0.05, n_shuffles=200, information='gaussian', metric='euclidean', k_means=5, bandwidth='silverman')[source]

Standard forward selection phase of oCSE with initial conditioning set.

This function implements forward selection starting with a non-empty initial conditioning set Z_init, typically consisting of lagged values of the target variable. This approach incorporates prior knowledge about temporal dependencies in the causal discovery process.

At each iteration, the algorithm selects the predictor that maximizes conditional mutual information with the target, given the current conditioning set:

\[j^* = \arg\max_{j \in \text{candidates}} I(X_j^{(t)}; Y^{(t+\tau)} | \mathbf{Z}^{(t)})\]

where \(\mathbf{Z}^{(t)} = \mathbf{Z}_{\text{init}} \cup \mathbf{S}^{(t)}\) combines the initial conditioning set with currently selected predictors.

Parameters:

X_full (array-like of shape (T, n)) – Complete predictor matrix at time t.
Y (array-like of shape (T, 1)) – Target variable at time t+τ.
Z_init (array-like of shape (T, p)) – Initial conditioning set, typically containing lagged target values.
rng (numpy.random.Generator) – Random number generator for permutation tests.
alpha (float, default=0.05) – Forward selection significance threshold for permutation tests.
n_shuffles (int, default=200) – Number of shuffles for significance testing.
information (str, default='gaussian') – Information measure estimator type.

Returns:

S – Indices of selected predictor variables from X_full.

Return type:

list of int

Notes

The initial conditioning set Z_init remains constant throughout the forward selection, while newly selected predictors are added to form the complete conditioning set for subsequent iterations.

causationentropy.core.discovery.alternative_forward(X_full, Y, rng, alpha=0.05, n_shuffles=200, information='gaussian', metric='euclidean', k_means=5, bandwidth='silverman')[source]

Forward selection phase of oCSE without initial conditioning set.

This function implements the forward selection phase starting with an empty conditioning set. At each step, it evaluates the conditional mutual information between each remaining candidate predictor and the target, conditioned on already selected predictors.

The selection criterion at each step is:

\[j^* = \arg\max_{j \in \text{candidates}} I(X_j^{(t)}; Y^{(t+\tau)} | \mathbf{S}^{(t)})\]

where \(\mathbf{S}^{(t)}\) represents the current set of selected predictors.

Parameters:

X_full (array-like of shape (T, n)) – Complete predictor matrix containing values at time t.
Y (array-like of shape (T, 1)) – Target variable column containing values at time t+τ.
rng (numpy.random.Generator) – Random number generator for permutation tests.
alpha (float, default=0.05) – Significance level for permutation tests. Predictors must achieve conditional mutual information above the (1-α) percentile of the null distribution.
n_shuffles (int, default=200) – Number of permutations to generate for statistical testing.
information (str, default='gaussian') – Information measure estimator type used for conditional mutual information computation.

Returns:

S – Indices of selected predictor variables that passed the significance test.

Return type:

list of int

Notes

The algorithm terminates when no remaining candidate achieves statistical significance or when all candidates have been evaluated. Each selection updates the conditioning set for subsequent iterations.

causationentropy.core.discovery.backward(X_full, Y, S_init, rng, alpha=0.05, n_shuffles=200, information='gaussian', metric='euclidean', k_means=5, bandwidth='silverman')[source]

Backward elimination phase of optimal Causation Entropy.

This function performs backward elimination to remove spurious causal relationships identified during forward selection. For each predictor selected in the forward phase, it tests whether the predictor maintains statistical significance when conditioned on all other selected predictors.

For each predictor \(X_j\) in the selected set, the test evaluates:

\[I(X_j^{(t)}; Y^{(t+\tau)} | \mathbf{S}_{-j}^{(t)}) > \text{threshold}\]

where \(\mathbf{S}_{-j}^{(t)}\) represents all selected predictors except \(X_j\).

Parameters:

X_full (array-like of shape (T, n)) – Complete predictor matrix at time t, unchanged throughout the process.
Y (array-like of shape (T, 1)) – Target variable at time t+τ.
S_init (list of int) – Indices of predictor variables selected during the forward phase.
rng (numpy.random.Generator) – Random number generator for permutation order and significance testing.
alpha (float, default=0.05) – Significance level for backward elimination tests.
n_shuffles (int, default=200) – Number of permutation shuffles for statistical testing.
information (str, default='gaussian') – Information measure estimator type.

Returns:

S_final – Subset of S_init containing predictors that maintained statistical significance during backward elimination.

Return type:

list of int

Notes

Predictors are evaluated in random order to avoid selection bias. A predictor is removed if its conditional mutual information with the target, given all other selected predictors, falls below the significance threshold.

The backward phase is essential for controlling false positive rates in causal discovery, as forward selection may include predictors that become redundant when considered alongside other selected variables.

Statistical Testing

causationentropy.core.discovery.shuffle_test(X, Y, Z, observed_cmi, alpha=0.05, n_shuffles=500, rng=None, information='gaussian', metric='euclidean', k_means=5, bandwidth='silverman')[source]

Permutation test for conditional mutual information significance.

This function performs a permutation test to assess the statistical significance of the conditional mutual information I(X;Y|Z). The test generates a null distribution by computing conditional mutual information on permuted versions of the predictor X, while keeping Y and Z unchanged.

The null hypothesis is that X and Y are conditionally independent given Z:

\[H_0: I(X; Y | Z) = 0\]

The test statistic follows the distribution:

\[\text{CMI}_{\text{null}} \sim \text{Distribution under } H_0\]

Statistical significance is assessed by comparing the observed conditional mutual information to the (1-α) percentile of the null distribution.

Parameters:

X (array-like of shape (T, k_x)) – Predictor variable(s) under test. Must be 2-D even when k_x=1.
Y (array-like of shape (T, 1)) – Target variable column.
Z (array-like of shape (T, k_z) or None) – Current conditioning set. If None, tests marginal mutual information.
observed_cmi (float) – Conditional mutual information value computed on original (unshuffled) data.
alpha (float, default=0.05) – Significance level for the test. Lower values require stronger evidence.
n_shuffles (int, default=500) – Number of random permutations to generate for the null distribution.
rng (int, numpy.random.Generator, or None) – Random number generator or seed for reproducible results.
information (str, default='gaussian') – Information measure estimator type used for conditional mutual information.

Returns:

result – Dictionary containing test results:

’Threshold’: float, the (1-α) percentile of the null distribution
’Value’: float, the observed conditional mutual information value
’Pass’: bool, True if observed_cmi >= threshold (statistically significant)
’P_value’: float, empirical p-value (proportion of null values >= observed)

Return type:

dict

Notes

The permutation test is based on the assumption that under the null hypothesis, the predictor X is exchangeable with respect to the target Y when conditioned on Z. This provides a non-parametric approach to significance testing that does not require distributional assumptions.

For computational efficiency, consider reducing n_shuffles for preliminary analyses, though this may reduce the precision of p-value estimates.

Examples

>>> import numpy as np
>>> from causationentropy.core.discovery import shuffle_test
>>>
>>> # Generate sample data
>>> X = np.random.randn(100, 1)
>>> Y = np.random.randn(100, 1)
>>> Z = np.random.randn(100, 2)
>>> observed = 0.15
>>>
>>> # Perform permutation test
>>> result = shuffle_test(X, Y, Z, observed, alpha=0.05, n_shuffles=1000)
>>> print(f"Significant: {result['Pass']}, p-value ≈ {1 - result['Value']/result['Threshold']:.3f}")

causationentropy.core.linalg module

Linear algebra utilities for information-theoretic computations.

causationentropy.core.linalg.correlation_log_determinant(A, epsilon=1e-10)[source]

Compute the logarithm of the determinant of a correlation matrix.

This function calculates the signed log-determinant of the correlation matrix derived from the input data matrix A. The correlation matrix is defined as:

\[\mathbf{R}_{ij} = \frac{\text{Cov}(X_i, X_j)}{\sqrt{\text{Var}(X_i) \text{Var}(X_j)}}\]

The log-determinant is computed using:

\[\log |\mathbf{R}| = \text{sign}(|\mathbf{R}|) \cdot \log(||\mathbf{R}||)\]

This approach provides numerical stability for matrices that may be close to singular.

Parameters:

A (array-like of shape (n_samples, n_features)) – Input data matrix where rows are samples and columns are features.
epsilon (float, default=1e-10) – Small regularization parameter (currently unused but reserved for potential numerical stabilization).

Returns:

log_det – Logarithm of the determinant of the correlation matrix. Returns 0.0 for degenerate cases (empty matrix or scalar).

Return type:

float

Notes

Special Cases: - Empty matrix (n_features = 0): Returns 0.0 - Scalar correlation (1x1 matrix): Returns 0.0 - Singular matrix: May return -inf or raise warnings

Numerical Considerations: - Uses numpy.linalg.slogdet for stable computation of log-determinant - Handles edge cases gracefully without exceptions - More stable than computing log(det(R)) directly

Applications: - Gaussian mutual information calculation - Model selection criteria (AIC, BIC) - Multivariate normality testing - Information-theoretic measures

Interpretation: - Large positive values: High linear dependence among variables - Values near zero: Near-independence of variables - Large negative values: Multicollinearity, near-singular correlation matrix

Examples

>>> import numpy as np
>>> from causationentropy.core.linalg import correlation_log_determinant
>>>
>>> # Independent variables
>>> A_indep = np.random.randn(100, 3)
>>> log_det_indep = correlation_log_determinant(A_indep)
>>> print(f"Independent variables log-det: {log_det_indep:.3f}")
>>>
>>> # Correlated variables
>>> A_corr = np.random.randn(100, 1)
>>> A_corr = np.hstack([A_corr, A_corr + 0.1*np.random.randn(100, 1)])
>>> log_det_corr = correlation_log_determinant(A_corr)
>>> print(f"Correlated variables log-det: {log_det_corr:.3f}")
>>>
>>> # Expected: log_det_corr < log_det_indep due to correlation

See also

numpy.corrcoef: Compute correlation coefficients
numpy.linalg.slogdet: Compute sign and log-determinant

causationentropy.core.linalg.subnetwork(G, lag)[source]

Extract a subgraph containing only edges at a specific lag.

The general return value from discover_network is a NetworkX MultiDiGraph with lag, p-value, and cmi encoded as edge attributes. This method returns a DiGraph containing only edges at the specified lag value.

Since the input is a MultiDiGraph, bidirectional connections at the same lag are represented as two separate directed edges: one from i to j and one from j to i.

Parameters:

G (nx.MultiDiGraph) – The causal network graph from discover_network with edge attributes ‘lag’, ‘cmi’, and ‘p_value’.
lag (int) – The time lag to extract. Only edges with this lag value will be included.

Returns:

H – A directed graph containing only the edges at the specified lag. Edge attributes ‘cmi’ and ‘p_value’ are preserved.

Return type:

nx.DiGraph

Examples

>>> import networkx as nx
>>> from causationentropy.core.linalg import subnetwork
>>>
>>> G = nx.MultiDiGraph()
>>> G.add_edge(0, 1, lag=1, cmi=0.5, p_value=0.01)
>>> G.add_edge(1, 2, lag=2, cmi=0.3, p_value=0.05)
>>>
>>> H1 = subnetwork(G, lag=1)
>>> H1.number_of_edges()
1

causationentropy.core.linalg.companion_matrix(G)[source]

Construct the block companion matrix for a causal network.

The purpose of this method is to store the causal graph in a structure that this library prefers and is not necessarily the graph theoretical construction.

The companion matrix is a block-structured matrix used in vector autoregression (VAR) and dynamical systems analysis:

\[\begin{split}C = \begin{bmatrix} A^{(1)} & A^{(2)} & \cdots & A^{(K-1)} & A^{(K)} \\ I & 0 & \cdots & 0 & 0 \\ 0 & I & \cdots & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & I & 0 \end{bmatrix}\end{split}\]

Each \(A^{(k)}\) is the adjacency matrix of edges with lag = k, and I represents an identity matrix of size n x n.

Parameters:: G (nx.MultiDiGraph) – The causal network graph from discover_network. Must contain edge attribute ‘lag’.
Returns:: C – The block companion matrix. Returns empty (0, 0) array if max_lag = 0.
Return type:: np.ndarray of shape (n_nodes * max_lag, n_nodes * max_lag)

Notes

Nodes are ordered according to NetworkX’s default ordering (sorted)
Edges with lag=0 are ignored (contemporaneous effects not included)
The matrix enables analysis of temporal dynamics via eigenvalue analysis

Examples

>>> import networkx as nx
>>> import numpy as np
>>> from causationentropy.core.linalg import companion_matrix
>>>
>>> G = nx.MultiDiGraph()
>>> G.add_nodes_from([0, 1, 2])
>>> G.add_edge(0, 1, lag=1, cmi=0.5, p_value=0.01)
>>> G.add_edge(1, 2, lag=2, cmi=0.3, p_value=0.05)
>>>
>>> C = companion_matrix(G)
>>> print(C.shape)
(6, 6)

causationentropy.core.linalg.correlation_log_determinant(A, epsilon=1e-10)[source]

Compute the logarithm of the determinant of a correlation matrix.

This function calculates the signed log-determinant of the correlation matrix derived from the input data matrix A. The correlation matrix is defined as:

\[\mathbf{R}_{ij} = \frac{\text{Cov}(X_i, X_j)}{\sqrt{\text{Var}(X_i) \text{Var}(X_j)}}\]

The log-determinant is computed using:

\[\log |\mathbf{R}| = \text{sign}(|\mathbf{R}|) \cdot \log(||\mathbf{R}||)\]

This approach provides numerical stability for matrices that may be close to singular.

Parameters:

A (array-like of shape (n_samples, n_features)) – Input data matrix where rows are samples and columns are features.
epsilon (float, default=1e-10) – Small regularization parameter (currently unused but reserved for potential numerical stabilization).

Returns:

log_det – Logarithm of the determinant of the correlation matrix. Returns 0.0 for degenerate cases (empty matrix or scalar).

Return type:

float

Notes

Special Cases: - Empty matrix (n_features = 0): Returns 0.0 - Scalar correlation (1x1 matrix): Returns 0.0 - Singular matrix: May return -inf or raise warnings

Numerical Considerations: - Uses numpy.linalg.slogdet for stable computation of log-determinant - Handles edge cases gracefully without exceptions - More stable than computing log(det(R)) directly

Applications: - Gaussian mutual information calculation - Model selection criteria (AIC, BIC) - Multivariate normality testing - Information-theoretic measures

Interpretation: - Large positive values: High linear dependence among variables - Values near zero: Near-independence of variables - Large negative values: Multicollinearity, near-singular correlation matrix

Examples

>>> import numpy as np
>>> from causationentropy.core.linalg import correlation_log_determinant
>>>
>>> # Independent variables
>>> A_indep = np.random.randn(100, 3)
>>> log_det_indep = correlation_log_determinant(A_indep)
>>> print(f"Independent variables log-det: {log_det_indep:.3f}")
>>>
>>> # Correlated variables
>>> A_corr = np.random.randn(100, 1)
>>> A_corr = np.hstack([A_corr, A_corr + 0.1*np.random.randn(100, 1)])
>>> log_det_corr = correlation_log_determinant(A_corr)
>>> print(f"Correlated variables log-det: {log_det_corr:.3f}")
>>>
>>> # Expected: log_det_corr < log_det_indep due to correlation

See also

numpy.corrcoef: Compute correlation coefficients
numpy.linalg.slogdet: Compute sign and log-determinant

causationentropy.core.linalg.subnetwork(G, lag)[source]

Extract a subgraph containing only edges at a specific lag.

The general return value from discover_network is a NetworkX MultiDiGraph with lag, p-value, and cmi encoded as edge attributes. This method returns a DiGraph containing only edges at the specified lag value.

Since the input is a MultiDiGraph, bidirectional connections at the same lag are represented as two separate directed edges: one from i to j and one from j to i.

Parameters:

G (nx.MultiDiGraph) – The causal network graph from discover_network with edge attributes ‘lag’, ‘cmi’, and ‘p_value’.
lag (int) – The time lag to extract. Only edges with this lag value will be included.

Returns:

H – A directed graph containing only the edges at the specified lag. Edge attributes ‘cmi’ and ‘p_value’ are preserved.

Return type:

nx.DiGraph

Examples

>>> import networkx as nx
>>> from causationentropy.core.linalg import subnetwork
>>>
>>> G = nx.MultiDiGraph()
>>> G.add_edge(0, 1, lag=1, cmi=0.5, p_value=0.01)
>>> G.add_edge(1, 2, lag=2, cmi=0.3, p_value=0.05)
>>>
>>> H1 = subnetwork(G, lag=1)
>>> H1.number_of_edges()
1

causationentropy.core.linalg.companion_matrix(G)[source]

Construct the block companion matrix for a causal network.

The purpose of this method is to store the causal graph in a structure that this library prefers and is not necessarily the graph theoretical construction.

The companion matrix is a block-structured matrix used in vector autoregression (VAR) and dynamical systems analysis:

\[\begin{split}C = \begin{bmatrix} A^{(1)} & A^{(2)} & \cdots & A^{(K-1)} & A^{(K)} \\ I & 0 & \cdots & 0 & 0 \\ 0 & I & \cdots & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & I & 0 \end{bmatrix}\end{split}\]

Each \(A^{(k)}\) is the adjacency matrix of edges with lag = k, and I represents an identity matrix of size n x n.

Parameters:: G (nx.MultiDiGraph) – The causal network graph from discover_network. Must contain edge attribute ‘lag’.
Returns:: C – The block companion matrix. Returns empty (0, 0) array if max_lag = 0.
Return type:: np.ndarray of shape (n_nodes * max_lag, n_nodes * max_lag)

Notes

Nodes are ordered according to NetworkX’s default ordering (sorted)
Edges with lag=0 are ignored (contemporaneous effects not included)
The matrix enables analysis of temporal dynamics via eigenvalue analysis

Examples

>>> import networkx as nx
>>> import numpy as np
>>> from causationentropy.core.linalg import companion_matrix
>>>
>>> G = nx.MultiDiGraph()
>>> G.add_nodes_from([0, 1, 2])
>>> G.add_edge(0, 1, lag=1, cmi=0.5, p_value=0.01)
>>> G.add_edge(1, 2, lag=2, cmi=0.3, p_value=0.05)
>>>
>>> C = companion_matrix(G)
>>> print(C.shape)
(6, 6)

causationentropy.core.plotting module

Plotting utilities for visualization.

causationentropy.core.plotting.optimize_circular_order(G, seed_order=None, max_iters=3000, block_moves=True, rng=7)[source]

Parameters:

G (Graph)
seed_order (List)
max_iters (int)
block_moves (bool)
rng (int)

Return type:

List

causationentropy.core.plotting.roc_curve(TPRs, FPRs)[source]

Plot Receiver Operating Characteristic (ROC) curve.

This function creates a ROC curve visualization, which is a graphical plot that illustrates the diagnostic ability of a binary classifier system. The ROC curve plots the True Positive Rate against the False Positive Rate at various threshold settings.

The ROC curve is defined by the parametric equations:

\[ \begin{align}\begin{aligned}\text{TPR}(t) = \frac{\text{TP}(t)}{\text{TP}(t) + \text{FN}(t)} = \frac{\text{TP}(t)}{P}\\\text{FPR}(t) = \frac{\text{FP}(t)}{\text{FP}(t) + \text{TN}(t)} = \frac{\text{FP}(t)}{N}\end{aligned}\end{align} \]

where t is the classification threshold, P is the total number of positives, and N is the total number of negatives.

Parameters:

TPRs (array-like) – True Positive Rates (Sensitivity, Recall) for different thresholds. Values should be in [0, 1].
FPRs (array-like) – False Positive Rates (1 - Specificity) for different thresholds. Values should be in [0, 1].

Notes

ROC Curve Interpretation: - Perfect classifier: Curve passes through (0, 1) - high TPR, zero FPR - Random classifier: Diagonal line from (0, 0) to (1, 1) - Useless classifier: Curve below the diagonal

Key Points: - (0, 0): No false positives, but also no true positives (very conservative) - (1, 1): All positives detected, but all negatives misclassified (very liberal) - (0, 1): Perfect classification (ideal classifier)

AUC (Area Under Curve): - AUC = 1.0: Perfect classifier - AUC = 0.5: Random classifier - AUC < 0.5: Worse than random (can be inverted)

Applications: - Medical diagnosis evaluation - Network reconstruction assessment - Causal discovery method comparison - Binary classification performance analysis

The function automatically computes and displays the AUC value on the plot using the trapezoidal integration rule.

Examples

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from causationentropy.core.plotting import roc_curve
>>>
>>> # Perfect classifier example
>>> tpr_perfect = [0, 1, 1]
>>> fpr_perfect = [0, 0, 1]
>>>
>>> plt.figure(figsize=(8, 6))
>>> roc_curve(tpr_perfect, fpr_perfect)
>>> plt.legend(['Perfect Classifier'])
>>> plt.show()
>>>
>>> # Random classifier comparison
>>> tpr_random = [0, 0.5, 1]
>>> fpr_random = [0, 0.5, 1]
>>> roc_curve(tpr_random, fpr_random)
>>> plt.legend(['Random Classifier'])

See also

causationentropy.core.stats.auc: Compute area under ROC curve
causationentropy.core.stats.Compute_TPR_FPR: Compute TPR and FPR from confusion matrix

causationentropy.core.plotting.plot_causal_network(G, pos=None, seed=7, figsize=(14, 14), dpi=None, node_size=8000, node_color='white', node_linewidth=5.0, edge_width_range=(1.0, 8.0), arrowsize=30, stats_fontsize=16, label_fontsize=16, legend_fontsize=10, colorbar_fontsize=14, title_fontsize=24, colormaps=None, colorblind_safe=False, show_colorbar=True, use_pvalue_alpha=True, pvalue_threshold=0.05, show_edge_labels=False, show_statistics=True, title='Discovered Causal Network', save_path=None, file_format='png', transparent=False, show_plot=True)[source]

Plot a causal network from a MultiDiGraph object with production-quality styling.

This function visualizes a causal network, accounting for edge attributes like lag, p-value, and conditional mutual information (CMI). It is designed to produce publication-quality plots with high readability, customizable styling, and multiple output options.

Parameters:

G (nx.MultiDiGraph) – The causal network graph to plot. Expected to have ‘lag’, ‘cmi’, and optionally ‘p_value’ as edge attributes.
pos (dict, optional) – A dictionary with nodes as keys and positions as values. If not provided, an optimized circular layout will be computed.
seed (int, default=7) – Seed for the random number generator used in layout optimization.
figsize (tuple of float, default=(14, 14)) – Figure size in inches (width, height).
dpi (int, optional) – Resolution in dots per inch. If None, uses matplotlib’s default (~100). Use 300+ for publication quality.
node_size (int, default=8000) – Size of nodes in the plot.
node_color (str, default='white') – Color of nodes.
node_linewidth (float, default=5.0) – Width of node borders.
edge_width_range (tuple of float, default=(1.0, 8.0)) – Range for edge width scaling based on CMI values (min_width, max_width).
arrowsize (int, default=30) – Size of arrow heads on directed edges.
label_fontsize (int, default=16) – Font size for node labels.
title_fontsize (int, default=24) – Font size for the plot title.
colormaps (list of str, optional) – List of matplotlib colormap names to use for different lags. If None, defaults to [‘Blues’, ‘Greens’, ‘Oranges’, ‘Purples’, ‘Reds’]. If colorblind_safe=True, this is overridden with accessible palettes.
colorblind_safe (bool, default=False) – If True, use colorblind-safe color palettes (viridis, plasma, cividis, etc.).
show_colorbar (bool, default=True) – If True, display a colorbar showing the CMI scale for each lag.
use_pvalue_alpha (bool, default=True) – If True, use p-value to set edge transparency (more significant = more opaque). Requires ‘p_value’ in edge attributes.
pvalue_threshold (float, default=0.05) – Significance threshold for filtering edges. Edges with p > threshold are drawn with reduced opacity.
show_edge_labels (bool, default=False) – If True, display CMI values as edge labels.
show_statistics (bool, default=True) – If True, display network statistics (nodes, edges, max lag) in a text box.
title (str, default='Discovered Causal Network') – Title for the plot.
save_path (str, optional) – Path to save the figure. If None, figure is not saved.
file_format (str, default='png') – File format for saving (‘png’, ‘pdf’, ‘svg’, ‘eps’, etc.).
transparent (bool, default=False) – If True, save figure with transparent background.
show_plot (bool, default=True) – If True, display the plot using plt.show().
stats_fontsize (int)
legend_fontsize (int)
colorbar_fontsize (int)

Returns:

fig (matplotlib.figure.Figure) – The figure object containing the plot.
ax (matplotlib.axes.Axes) – The axes object containing the plot.

Notes

Layout Optimization:

Nodes are arranged in a circular layout. If pos is not provided, the node order is optimized to minimize edge crossings and improve readability using simulated annealing.

Edge Rendering:

Edges for lag 1 are drawn as straight lines (connectionstyle radius=0).
Edges for lags > 1 are drawn as arcs outside the circle, with arc radius increasing for higher lags to prevent overlap.
Edge color and width are scaled by the CMI value within each lag group.
Edge transparency (alpha) is modulated by p-value if use_pvalue_alpha=True.

Color Accessibility:

When colorblind_safe=True, the function uses perceptually uniform and colorblind-accessible colormaps from matplotlib (viridis, plasma, cividis, inferno, magma). This ensures the plot is accessible to readers with color vision deficiencies.

P-value Visualization:

If edge attributes include ‘p_value’ and use_pvalue_alpha=True: - Edges with p < pvalue_threshold are drawn with alpha=1.0 (fully opaque) - Edges with p >= pvalue_threshold are drawn with alpha=0.3 (translucent) This provides a visual indication of statistical significance.

Publication Quality:

DPI parameter allows control of resolution (use 300-600 for journals)
Large fonts and thick lines ensure readability when scaled
Optional statistics box provides at-a-glance network properties
Multiple export formats supported (PNG, PDF, SVG, EPS)

Examples

Basic usage with default settings:

>>> import networkx as nx
>>> from causationentropy.core.plotting import plot_causal_network
>>> G = nx.MultiDiGraph()
>>> G.add_edge('X1', 'X2', lag=1, cmi=0.5, p_value=0.01)
>>> G.add_edge('X2', 'X3', lag=1, cmi=0.3, p_value=0.03)
>>> fig, ax = plot_causal_network(G)

Colorblind-safe plot with custom styling:

>>> fig, ax = plot_causal_network(
...     G,
...     colorblind_safe=True,
...     figsize=(16, 16),
...     node_size=10000,
...     show_edge_labels=True
... )

Save high-resolution plot for publication:

>>> fig, ax = plot_causal_network(
...     G,
...     dpi=600,
...     save_path='causal_network.pdf',
...     file_format='pdf',
...     show_plot=False
... )

Customize colors and disable statistics:

>>> custom_cmaps = ['YlOrRd', 'PuBu', 'BuGn']
>>> fig, ax = plot_causal_network(
...     G,
...     colormaps=custom_cmaps,
...     show_statistics=False,
...     title='Custom Causal Network'
... )

See also

optimize_circular_order: Optimize node ordering for circular layout
causationentropy.core.discovery.discover_network: Discover causal networks

Parameters:

stats_fontsize (int)
legend_fontsize (int)
colorbar_fontsize (int)
G (MultiDiGraph)
pos (Dict)
seed (int)
figsize (Tuple[float, float])
dpi (int)
node_size (int)
node_color (str)
node_linewidth (float)
edge_width_range (Tuple[float, float])
arrowsize (int)
label_fontsize (int)
title_fontsize (int)
colormaps (List[str])
colorblind_safe (bool)
show_colorbar (bool)
use_pvalue_alpha (bool)
pvalue_threshold (float)
show_edge_labels (bool)
show_statistics (bool)
title (str)
save_path (str)
file_format (str)
transparent (bool)
show_plot (bool)

causationentropy.core.plotting.roc_curve(TPRs, FPRs)[source]

Plot Receiver Operating Characteristic (ROC) curve.

This function creates a ROC curve visualization, which is a graphical plot that illustrates the diagnostic ability of a binary classifier system. The ROC curve plots the True Positive Rate against the False Positive Rate at various threshold settings.

The ROC curve is defined by the parametric equations:

\[ \begin{align}\begin{aligned}\text{TPR}(t) = \frac{\text{TP}(t)}{\text{TP}(t) + \text{FN}(t)} = \frac{\text{TP}(t)}{P}\\\text{FPR}(t) = \frac{\text{FP}(t)}{\text{FP}(t) + \text{TN}(t)} = \frac{\text{FP}(t)}{N}\end{aligned}\end{align} \]

where t is the classification threshold, P is the total number of positives, and N is the total number of negatives.

Parameters:

TPRs (array-like) – True Positive Rates (Sensitivity, Recall) for different thresholds. Values should be in [0, 1].
FPRs (array-like) – False Positive Rates (1 - Specificity) for different thresholds. Values should be in [0, 1].

Notes

ROC Curve Interpretation: - Perfect classifier: Curve passes through (0, 1) - high TPR, zero FPR - Random classifier: Diagonal line from (0, 0) to (1, 1) - Useless classifier: Curve below the diagonal

Key Points: - (0, 0): No false positives, but also no true positives (very conservative) - (1, 1): All positives detected, but all negatives misclassified (very liberal) - (0, 1): Perfect classification (ideal classifier)

AUC (Area Under Curve): - AUC = 1.0: Perfect classifier - AUC = 0.5: Random classifier - AUC < 0.5: Worse than random (can be inverted)

Applications: - Medical diagnosis evaluation - Network reconstruction assessment - Causal discovery method comparison - Binary classification performance analysis

The function automatically computes and displays the AUC value on the plot using the trapezoidal integration rule.

Examples

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from causationentropy.core.plotting import roc_curve
>>>
>>> # Perfect classifier example
>>> tpr_perfect = [0, 1, 1]
>>> fpr_perfect = [0, 0, 1]
>>>
>>> plt.figure(figsize=(8, 6))
>>> roc_curve(tpr_perfect, fpr_perfect)
>>> plt.legend(['Perfect Classifier'])
>>> plt.show()
>>>
>>> # Random classifier comparison
>>> tpr_random = [0, 0.5, 1]
>>> fpr_random = [0, 0.5, 1]
>>> roc_curve(tpr_random, fpr_random)
>>> plt.legend(['Random Classifier'])

See also

causationentropy.core.stats.auc: Compute area under ROC curve
causationentropy.core.stats.Compute_TPR_FPR: Compute TPR and FPR from confusion matrix

causationentropy.core.stats module

Statistical utilities and performance metrics.

causationentropy.core.stats.auc(TPRs, FPRs)[source]

Compute Area Under the ROC Curve (AUC) using trapezoidal integration.

The Area Under the Curve provides a single scalar measure of classifier performance across all classification thresholds. It is computed as:

\[\text{AUC} = \int_0^1 \text{TPR}(\text{FPR}) \, d(\text{FPR})\]

where TPR (True Positive Rate) and FPR (False Positive Rate) define the ROC curve. The integral is approximated using the trapezoidal rule:

\[\text{AUC} \approx \sum_{i=1}^{n-1} \frac{1}{2}[\text{TPR}_i + \text{TPR}_{i+1}][\text{FPR}_{i+1} - \text{FPR}_i]\]

Parameters:

TPRs (array-like) – True Positive Rates (sensitivities) corresponding to different thresholds. Should be sorted in ascending order of FPR.
FPRs (array-like) – False Positive Rates (1 - specificities) corresponding to different thresholds. Should be sorted in ascending order.

Returns:

AUC – Area under the ROC curve. Values range from 0 to 1, where: - 0.5: Random classifier performance - 1.0: Perfect classifier performance - 0.0: Perfectly wrong classifier (can be inverted)

Return type:

float

Notes

The AUC metric provides several interpretations:

Probabilistic: Probability that a randomly chosen positive instance ranks higher than a randomly chosen negative instance
Geometric: Area under the ROC curve in TPR-FPR space
Performance: Single-number summary of classifier quality across thresholds

Advantages: - Scale-invariant: Measures prediction quality regardless of classification threshold - Aggregated: Provides performance summary across all thresholds

Limitations: - Can be overly optimistic for imbalanced datasets - Doesn’t reflect class distribution in deployment - May not align with specific cost considerations

Examples

>>> import numpy as np
>>> from causationentropy.core.stats import auc
>>>
>>> # Perfect classifier
>>> tpr_perfect = np.array([0, 1, 1])
>>> fpr_perfect = np.array([0, 0, 1])
>>> print(f"Perfect AUC: {auc(tpr_perfect, fpr_perfect)}")
>>>
>>> # Random classifier
>>> tpr_random = np.array([0, 0.5, 1])
>>> fpr_random = np.array([0, 0.5, 1])
>>> print(f"Random AUC: {auc(tpr_random, fpr_random)}")

causationentropy.core.stats.Compute_TPR_FPR(A, B)[source]

Compute True Positive Rate and False Positive Rate for binary adjacency matrices.

This function evaluates the performance of a predicted network (B) against a ground truth network (A) by computing standard classification metrics:

\[\text{TPR} = \frac{\text{TP}}{\text{TP} + \text{FN}} = \frac{\text{TP}}{P}\]

\[\text{FPR} = \frac{\text{FP}}{\text{FP} + \text{TN}} = \frac{\text{FP}}{N}\]

where: - TP: True positives (correctly predicted edges) - FN: False negatives (missed edges) - FP: False positives (incorrectly predicted edges) - TN: True negatives (correctly predicted non-edges) - P: Total positive edges in ground truth - N: Total negative edges in ground truth

Parameters:

A (array-like of shape (n, n)) – Ground truth binary adjacency matrix. Should contain only 0s and 1s.
B (array-like of shape (n, n)) – Predicted binary adjacency matrix. Should contain only 0s and 1s and have the same shape as A.

Returns:

TPR (float) – True Positive Rate (Sensitivity, Recall). Fraction of actual edges that were correctly identified.
FPR (float) – False Positive Rate (1 - Specificity). Fraction of actual non-edges that were incorrectly predicted as edges.

Notes

This implementation assumes: - Matrices are square and binary - Self-loops are excluded (diagonal elements ignored) - Matrices represent undirected graphs (symmetric)

Interpretation: - TPR (Sensitivity): How well the method detects true connections - FPR (1-Specificity): How often the method falsely detects connections

Performance Assessment: - High TPR, Low FPR: Excellent performance - High TPR, High FPR: Sensitive but not specific - Low TPR, Low FPR: Conservative approach - Low TPR, High FPR: Poor performance

Applications: - Network reconstruction evaluation - Causal discovery validation - ROC curve generation - Method comparison and benchmarking

Examples

>>> import numpy as np
>>> from causationentropy.core.stats import Compute_TPR_FPR
>>>
>>> # Ground truth: simple 3-node chain
>>> A = np.array([[0, 1, 0],
...               [1, 0, 1],
...               [0, 1, 0]])
>>>
>>> # Perfect prediction
>>> B_perfect = A.copy()
>>> tpr, fpr = Compute_TPR_FPR(A, B_perfect)
>>> print(f"Perfect: TPR={tpr:.2f}, FPR={fpr:.2f}")
>>>
>>> # Overprediction (extra edge)
>>> B_over = np.array([[0, 1, 1],
...                    [1, 0, 1],
...                    [1, 1, 0]])
>>> tpr, fpr = Compute_TPR_FPR(A, B_over)
>>> print(f"Overpredicted: TPR={tpr:.2f}, FPR={fpr:.2f}")

causationentropy.core.stats.auc(TPRs, FPRs)[source]

Compute Area Under the ROC Curve (AUC) using trapezoidal integration.

The Area Under the Curve provides a single scalar measure of classifier performance across all classification thresholds. It is computed as:

\[\text{AUC} = \int_0^1 \text{TPR}(\text{FPR}) \, d(\text{FPR})\]

where TPR (True Positive Rate) and FPR (False Positive Rate) define the ROC curve. The integral is approximated using the trapezoidal rule:

\[\text{AUC} \approx \sum_{i=1}^{n-1} \frac{1}{2}[\text{TPR}_i + \text{TPR}_{i+1}][\text{FPR}_{i+1} - \text{FPR}_i]\]

Parameters:

TPRs (array-like) – True Positive Rates (sensitivities) corresponding to different thresholds. Should be sorted in ascending order of FPR.
FPRs (array-like) – False Positive Rates (1 - specificities) corresponding to different thresholds. Should be sorted in ascending order.

Returns:

AUC – Area under the ROC curve. Values range from 0 to 1, where: - 0.5: Random classifier performance - 1.0: Perfect classifier performance - 0.0: Perfectly wrong classifier (can be inverted)

Return type:

float

Notes

The AUC metric provides several interpretations:

Probabilistic: Probability that a randomly chosen positive instance ranks higher than a randomly chosen negative instance
Geometric: Area under the ROC curve in TPR-FPR space
Performance: Single-number summary of classifier quality across thresholds

Advantages: - Scale-invariant: Measures prediction quality regardless of classification threshold - Aggregated: Provides performance summary across all thresholds

Limitations: - Can be overly optimistic for imbalanced datasets - Doesn’t reflect class distribution in deployment - May not align with specific cost considerations

Examples

>>> import numpy as np
>>> from causationentropy.core.stats import auc
>>>
>>> # Perfect classifier
>>> tpr_perfect = np.array([0, 1, 1])
>>> fpr_perfect = np.array([0, 0, 1])
>>> print(f"Perfect AUC: {auc(tpr_perfect, fpr_perfect)}")
>>>
>>> # Random classifier
>>> tpr_random = np.array([0, 0.5, 1])
>>> fpr_random = np.array([0, 0.5, 1])
>>> print(f"Random AUC: {auc(tpr_random, fpr_random)}")

causationentropy.core.stats.Compute_TPR_FPR(A, B)[source]

Compute True Positive Rate and False Positive Rate for binary adjacency matrices.

This function evaluates the performance of a predicted network (B) against a ground truth network (A) by computing standard classification metrics:

\[\text{TPR} = \frac{\text{TP}}{\text{TP} + \text{FN}} = \frac{\text{TP}}{P}\]

\[\text{FPR} = \frac{\text{FP}}{\text{FP} + \text{TN}} = \frac{\text{FP}}{N}\]

where: - TP: True positives (correctly predicted edges) - FN: False negatives (missed edges) - FP: False positives (incorrectly predicted edges) - TN: True negatives (correctly predicted non-edges) - P: Total positive edges in ground truth - N: Total negative edges in ground truth

Parameters:

A (array-like of shape (n, n)) – Ground truth binary adjacency matrix. Should contain only 0s and 1s.
B (array-like of shape (n, n)) – Predicted binary adjacency matrix. Should contain only 0s and 1s and have the same shape as A.

Returns:

TPR (float) – True Positive Rate (Sensitivity, Recall). Fraction of actual edges that were correctly identified.
FPR (float) – False Positive Rate (1 - Specificity). Fraction of actual non-edges that were incorrectly predicted as edges.

Notes

This implementation assumes: - Matrices are square and binary - Self-loops are excluded (diagonal elements ignored) - Matrices represent undirected graphs (symmetric)

Interpretation: - TPR (Sensitivity): How well the method detects true connections - FPR (1-Specificity): How often the method falsely detects connections

Performance Assessment: - High TPR, Low FPR: Excellent performance - High TPR, High FPR: Sensitive but not specific - Low TPR, Low FPR: Conservative approach - Low TPR, High FPR: Poor performance

Applications: - Network reconstruction evaluation - Causal discovery validation - ROC curve generation - Method comparison and benchmarking

Examples

>>> import numpy as np
>>> from causationentropy.core.stats import Compute_TPR_FPR
>>>
>>> # Ground truth: simple 3-node chain
>>> A = np.array([[0, 1, 0],
...               [1, 0, 1],
...               [0, 1, 0]])
>>>
>>> # Perfect prediction
>>> B_perfect = A.copy()
>>> tpr, fpr = Compute_TPR_FPR(A, B_perfect)
>>> print(f"Perfect: TPR={tpr:.2f}, FPR={fpr:.2f}")
>>>
>>> # Overprediction (extra edge)
>>> B_over = np.array([[0, 1, 1],
...                    [1, 0, 1],
...                    [1, 1, 0]])
>>> tpr, fpr = Compute_TPR_FPR(A, B_over)
>>> print(f"Overpredicted: TPR={tpr:.2f}, FPR={fpr:.2f}")

causationentropy.core package

Subpackages

causationentropy.core.discovery module

Main Discovery Function

Causation Entropy Methods

Selection Algorithms

Statistical Testing

causationentropy.core.linalg module

causationentropy.core.plotting module

causationentropy.core.stats module

Module contents