Infer a causal graph via Optimal Causation Entropy (oCSE).
This function implements the optimal Causation Entropy algorithm for causal network discovery
from multivariate time series data. The algorithm uses conditional mutual information to
identify causal relationships between variables across different time lags.
The core principle is based on the Causation Entropy framework, which quantifies causal
relationships using information-theoretic measures. For variables \(X_i\) and \(X_j\)
with lag \(\\tau\), the conditional mutual information is computed as:
where \(\mathbf{Z}_i^{(t)}\) represents the conditioning set for variable \(i\) at time \(t\).
The algorithm proceeds in two main phases:
Forward Selection: Iteratively selects predictors that maximize conditional mutual
information with the target variable, conditioned on already selected predictors.
Backward Elimination: Removes predictors that do not maintain statistical significance
when conditioned on all other selected predictors.
Statistical significance is assessed via permutation tests, where the null hypothesis assumes
no causal relationship exists between variables.
Parameters:
data (array-like of shape (T, n) or DataFrame) – Multivariate time series data where T is the number of time points and n is the number
of variables. Variables correspond to columns.
metric (str, default='euclidean') – Distance metric for k-NN based estimators.
n_shuffles (int, default=200) – Number of permutations for statistical significance testing. Higher values
provide more accurate p-value estimates but increase computational cost.
n_jobs (int, default=-1) – Number of parallel jobs for computation. -1 uses all available processors.
Returns:
G – Multi-directed graph representing the discovered causal network. Nodes correspond to
variables and edges represent causal relationships. Multiple edges between the same
node pair represent relationships at different time lags. Edge attributes include:
’lag’: Time delay \(\tau\) of the causal relationship
’cmi’: Conditional mutual information value for this edge
’p_value’: Empirical p-value from permutation test
Return type:
networkx.MultiDiGraph
Raises:
NotImplementedError – If an unsupported method or information type is specified.
ValueError – If the time series is too short for the chosen max_lag.
Notes
The algorithm’s computational complexity is approximately \(O(T \cdot n^2 \cdot \tau_{max} \cdot N_{shuffle})\),
where \(T\) is the time series length, \(n\) is the number of variables,
\(\tau_{max}\) is the maximum lag, and \(N_{shuffle}\) is the number of permutations.
For optimal performance with high-dimensional data, consider:
Reducing max_lag for shorter time series
Using ‘gaussian’ information type for continuous data
Adjusting n_shuffles based on desired statistical precision
Examples
>>> importnumpyasnp>>> fromcausationentropy.core.discoveryimportdiscover_network>>>>>> # Generate sample time series data>>> T,n=1000,3>>> data=np.random.randn(T,n)>>>>>> # Discover causal network>>> G=discover_network(data,max_lag=3,alpha_forward=0.01)
Execute the standard optimal Causation Entropy algorithm with initial conditioning set.
This function implements the standard oCSE algorithm that begins with a non-empty
initial conditioning set (typically lagged target variables). The algorithm combines
forward selection and backward elimination phases to identify significant causal predictors.
The conditional mutual information for candidate predictor \(X_j\) given current
conditioning set \(\mathbf{Z}\) is:
\[I(X_j; Y | \mathbf{Z}) = \sum_{x_j,y,\mathbf{z}} p(x_j,y,\mathbf{z}) \log \frac{p(x_j,y|\mathbf{z})}{p(x_j|\mathbf{z})p(y|\mathbf{z})}\]
Parameters:
X (array-like of shape (T, n)) – Predictor variables matrix.
Y (array-like of shape (T, 1)) – Target variable column.
Z_init (array-like of shape (T, p)) – Initial conditioning set (e.g., lagged target values).
causationentropy.core.discovery.alternative_optimal_causation_entropy(X, Y, rng, alpha1=0.05, alpha2=0.05, n_shuffles=200, information='gaussian', metric='euclidean', k_means=5, bandwidth='silverman')[source]
Execute the alternative optimal Causation Entropy algorithm without initial conditioning.
This variant of the oCSE algorithm starts with an empty conditioning set, building
causal relationships purely from the forward selection process. This approach may
be more suitable when no prior knowledge about lagged dependencies exists.
Parameters:
X (array-like of shape (T, n)) – Predictor variables matrix.
Y (array-like of shape (T, 1)) – Target variable column.
causationentropy.core.discovery.information_lasso_optimal_causation_entropy(X, Y, rng, criterion='bic', max_lambda=100, cross_val=10, information='gaussian')[source]
Execute information-theoretic variant of oCSE with LASSO regularization.
This method combines information-theoretic causal discovery with LASSO regularization
to handle high-dimensional predictor spaces. The approach balances causal relationship
strength with model complexity.
Parameters:
X (array-like of shape (T, n)) – Predictor variables matrix.
Y (array-like of shape (T, 1)) – Target variable column.
This is a simplified implementation that delegates to LASSO. Future versions
will incorporate information-theoretic weighting into the regularization.
causationentropy.core.discovery.lasso_optimal_causation_entropy(X, Y, rng, criterion='bic', max_lambda=100, cross_val=10)[source]
Execute LASSO-based variable selection for causal discovery.
This method uses LASSO (Least Absolute Shrinkage and Selection Operator) regression
for variable selection in causal discovery. The LASSO objective function is:
Uses LassoLarsIC when the number of samples exceeds the number of predictors plus one,
otherwise falls back to standard LASSO regression.
causationentropy.core.discovery.alternative_forward(X_full, Y, rng, alpha=0.05, n_shuffles=200, information='gaussian', metric='euclidean', k_means=5, bandwidth='silverman')[source]
Forward selection phase of oCSE without initial conditioning set.
This function implements the forward selection phase starting with an empty conditioning
set. At each step, it evaluates the conditional mutual information between each remaining
candidate predictor and the target, conditioned on already selected predictors.
alpha (float, default=0.05) – Significance level for permutation tests. Predictors must achieve
conditional mutual information above the (1-α) percentile of the null distribution.
n_shuffles (int, default=200) – Number of permutations to generate for statistical testing.
information (str, default='gaussian') – Information measure estimator type used for conditional mutual information computation.
Returns:
S – Indices of selected predictor variables that passed the significance test.
The algorithm terminates when no remaining candidate achieves statistical significance
or when all candidates have been evaluated. Each selection updates the conditioning
set for subsequent iterations.
causationentropy.core.discovery.standard_forward(X_full, Y, Z_init, rng, alpha=0.05, n_shuffles=200, information='gaussian', metric='euclidean', k_means=5, bandwidth='silverman')[source]
Standard forward selection phase of oCSE with initial conditioning set.
This function implements forward selection starting with a non-empty initial conditioning
set Z_init, typically consisting of lagged values of the target variable. This approach
incorporates prior knowledge about temporal dependencies in the causal discovery process.
At each iteration, the algorithm selects the predictor that maximizes conditional mutual
information with the target, given the current conditioning set:
The initial conditioning set Z_init remains constant throughout the forward selection,
while newly selected predictors are added to form the complete conditioning set for
subsequent iterations.
causationentropy.core.discovery.backward(X_full, Y, S_init, rng, alpha=0.05, n_shuffles=200, information='gaussian', metric='euclidean', k_means=5, bandwidth='silverman')[source]
Backward elimination phase of optimal Causation Entropy.
This function performs backward elimination to remove spurious causal relationships
identified during forward selection. For each predictor selected in the forward phase,
it tests whether the predictor maintains statistical significance when conditioned on
all other selected predictors.
For each predictor \(X_j\) in the selected set, the test evaluates:
Predictors are evaluated in random order to avoid selection bias. A predictor is
removed if its conditional mutual information with the target, given all other
selected predictors, falls below the significance threshold.
The backward phase is essential for controlling false positive rates in causal
discovery, as forward selection may include predictors that become redundant
when considered alongside other selected variables.
Permutation test for conditional mutual information significance.
This function performs a permutation test to assess the statistical significance of
the conditional mutual information I(X;Y|Z). The test generates a null distribution
by computing conditional mutual information on permuted versions of the predictor X,
while keeping Y and Z unchanged.
The null hypothesis is that X and Y are conditionally independent given Z:
\[H_0: I(X; Y | Z) = 0\]
The test statistic follows the distribution:
\[\text{CMI}_{\text{null}} \sim \text{Distribution under } H_0\]
Statistical significance is assessed by comparing the observed conditional mutual
information to the (1-α) percentile of the null distribution.
Parameters:
X (array-like of shape (T, k_x)) – Predictor variable(s) under test. Must be 2-D even when k_x=1.
Y (array-like of shape (T, 1)) – Target variable column.
Z (array-like of shape (T, k_z) or None) – Current conditioning set. If None, tests marginal mutual information.
observed_cmi (float) – Conditional mutual information value computed on original (unshuffled) data.
alpha (float, default=0.05) – Significance level for the test. Lower values require stronger evidence.
n_shuffles (int, default=500) – Number of random permutations to generate for the null distribution.
rng (int, numpy.random.Generator, or None) – Random number generator or seed for reproducible results.
information (str, default='gaussian') – Information measure estimator type used for conditional mutual information.
Returns:
result – Dictionary containing test results:
’Threshold’: float, the (1-α) percentile of the null distribution
’Value’: float, the observed conditional mutual information value
’Pass’: bool, True if observed_cmi >= threshold (statistically significant)
’P_value’: float, empirical p-value (proportion of null values >= observed)
The permutation test is based on the assumption that under the null hypothesis,
the predictor X is exchangeable with respect to the target Y when conditioned on Z.
This provides a non-parametric approach to significance testing that does not
require distributional assumptions.
For computational efficiency, consider reducing n_shuffles for preliminary analyses,
though this may reduce the precision of p-value estimates.
Infer a causal graph via Optimal Causation Entropy (oCSE).
This function implements the optimal Causation Entropy algorithm for causal network discovery
from multivariate time series data. The algorithm uses conditional mutual information to
identify causal relationships between variables across different time lags.
The core principle is based on the Causation Entropy framework, which quantifies causal
relationships using information-theoretic measures. For variables \(X_i\) and \(X_j\)
with lag \(\\tau\), the conditional mutual information is computed as:
where \(\mathbf{Z}_i^{(t)}\) represents the conditioning set for variable \(i\) at time \(t\).
The algorithm proceeds in two main phases:
Forward Selection: Iteratively selects predictors that maximize conditional mutual
information with the target variable, conditioned on already selected predictors.
Backward Elimination: Removes predictors that do not maintain statistical significance
when conditioned on all other selected predictors.
Statistical significance is assessed via permutation tests, where the null hypothesis assumes
no causal relationship exists between variables.
Parameters:
data (array-like of shape (T, n) or DataFrame) – Multivariate time series data where T is the number of time points and n is the number
of variables. Variables correspond to columns.
metric (str, default='euclidean') – Distance metric for k-NN based estimators.
n_shuffles (int, default=200) – Number of permutations for statistical significance testing. Higher values
provide more accurate p-value estimates but increase computational cost.
n_jobs (int, default=-1) – Number of parallel jobs for computation. -1 uses all available processors.
Returns:
G – Multi-directed graph representing the discovered causal network. Nodes correspond to
variables and edges represent causal relationships. Multiple edges between the same
node pair represent relationships at different time lags. Edge attributes include:
’lag’: Time delay \(\tau\) of the causal relationship
’cmi’: Conditional mutual information value for this edge
’p_value’: Empirical p-value from permutation test
Return type:
networkx.MultiDiGraph
Raises:
NotImplementedError – If an unsupported method or information type is specified.
ValueError – If the time series is too short for the chosen max_lag.
Notes
The algorithm’s computational complexity is approximately \(O(T \cdot n^2 \cdot \tau_{max} \cdot N_{shuffle})\),
where \(T\) is the time series length, \(n\) is the number of variables,
\(\tau_{max}\) is the maximum lag, and \(N_{shuffle}\) is the number of permutations.
For optimal performance with high-dimensional data, consider:
Reducing max_lag for shorter time series
Using ‘gaussian’ information type for continuous data
Adjusting n_shuffles based on desired statistical precision
Examples
>>> importnumpyasnp>>> fromcausationentropy.core.discoveryimportdiscover_network>>>>>> # Generate sample time series data>>> T,n=1000,3>>> data=np.random.randn(T,n)>>>>>> # Discover causal network>>> G=discover_network(data,max_lag=3,alpha_forward=0.01)
Execute the standard optimal Causation Entropy algorithm with initial conditioning set.
This function implements the standard oCSE algorithm that begins with a non-empty
initial conditioning set (typically lagged target variables). The algorithm combines
forward selection and backward elimination phases to identify significant causal predictors.
The conditional mutual information for candidate predictor \(X_j\) given current
conditioning set \(\mathbf{Z}\) is:
\[I(X_j; Y | \mathbf{Z}) = \sum_{x_j,y,\mathbf{z}} p(x_j,y,\mathbf{z}) \log \frac{p(x_j,y|\mathbf{z})}{p(x_j|\mathbf{z})p(y|\mathbf{z})}\]
Parameters:
X (array-like of shape (T, n)) – Predictor variables matrix.
Y (array-like of shape (T, 1)) – Target variable column.
Z_init (array-like of shape (T, p)) – Initial conditioning set (e.g., lagged target values).
causationentropy.core.discovery.alternative_optimal_causation_entropy(X, Y, rng, alpha1=0.05, alpha2=0.05, n_shuffles=200, information='gaussian', metric='euclidean', k_means=5, bandwidth='silverman')[source]
Execute the alternative optimal Causation Entropy algorithm without initial conditioning.
This variant of the oCSE algorithm starts with an empty conditioning set, building
causal relationships purely from the forward selection process. This approach may
be more suitable when no prior knowledge about lagged dependencies exists.
Parameters:
X (array-like of shape (T, n)) – Predictor variables matrix.
Y (array-like of shape (T, 1)) – Target variable column.
causationentropy.core.discovery.information_lasso_optimal_causation_entropy(X, Y, rng, criterion='bic', max_lambda=100, cross_val=10, information='gaussian')[source]
Execute information-theoretic variant of oCSE with LASSO regularization.
This method combines information-theoretic causal discovery with LASSO regularization
to handle high-dimensional predictor spaces. The approach balances causal relationship
strength with model complexity.
Parameters:
X (array-like of shape (T, n)) – Predictor variables matrix.
Y (array-like of shape (T, 1)) – Target variable column.
This is a simplified implementation that delegates to LASSO. Future versions
will incorporate information-theoretic weighting into the regularization.
causationentropy.core.discovery.lasso_optimal_causation_entropy(X, Y, rng, criterion='bic', max_lambda=100, cross_val=10)[source]
Execute LASSO-based variable selection for causal discovery.
This method uses LASSO (Least Absolute Shrinkage and Selection Operator) regression
for variable selection in causal discovery. The LASSO objective function is:
causationentropy.core.discovery.standard_forward(X_full, Y, Z_init, rng, alpha=0.05, n_shuffles=200, information='gaussian', metric='euclidean', k_means=5, bandwidth='silverman')[source]
Standard forward selection phase of oCSE with initial conditioning set.
This function implements forward selection starting with a non-empty initial conditioning
set Z_init, typically consisting of lagged values of the target variable. This approach
incorporates prior knowledge about temporal dependencies in the causal discovery process.
At each iteration, the algorithm selects the predictor that maximizes conditional mutual
information with the target, given the current conditioning set:
The initial conditioning set Z_init remains constant throughout the forward selection,
while newly selected predictors are added to form the complete conditioning set for
subsequent iterations.
causationentropy.core.discovery.alternative_forward(X_full, Y, rng, alpha=0.05, n_shuffles=200, information='gaussian', metric='euclidean', k_means=5, bandwidth='silverman')[source]
Forward selection phase of oCSE without initial conditioning set.
This function implements the forward selection phase starting with an empty conditioning
set. At each step, it evaluates the conditional mutual information between each remaining
candidate predictor and the target, conditioned on already selected predictors.
alpha (float, default=0.05) – Significance level for permutation tests. Predictors must achieve
conditional mutual information above the (1-α) percentile of the null distribution.
n_shuffles (int, default=200) – Number of permutations to generate for statistical testing.
information (str, default='gaussian') – Information measure estimator type used for conditional mutual information computation.
Returns:
S – Indices of selected predictor variables that passed the significance test.
The algorithm terminates when no remaining candidate achieves statistical significance
or when all candidates have been evaluated. Each selection updates the conditioning
set for subsequent iterations.
causationentropy.core.discovery.backward(X_full, Y, S_init, rng, alpha=0.05, n_shuffles=200, information='gaussian', metric='euclidean', k_means=5, bandwidth='silverman')[source]
Backward elimination phase of optimal Causation Entropy.
This function performs backward elimination to remove spurious causal relationships
identified during forward selection. For each predictor selected in the forward phase,
it tests whether the predictor maintains statistical significance when conditioned on
all other selected predictors.
For each predictor \(X_j\) in the selected set, the test evaluates:
Predictors are evaluated in random order to avoid selection bias. A predictor is
removed if its conditional mutual information with the target, given all other
selected predictors, falls below the significance threshold.
The backward phase is essential for controlling false positive rates in causal
discovery, as forward selection may include predictors that become redundant
when considered alongside other selected variables.
Permutation test for conditional mutual information significance.
This function performs a permutation test to assess the statistical significance of
the conditional mutual information I(X;Y|Z). The test generates a null distribution
by computing conditional mutual information on permuted versions of the predictor X,
while keeping Y and Z unchanged.
The null hypothesis is that X and Y are conditionally independent given Z:
\[H_0: I(X; Y | Z) = 0\]
The test statistic follows the distribution:
\[\text{CMI}_{\text{null}} \sim \text{Distribution under } H_0\]
Statistical significance is assessed by comparing the observed conditional mutual
information to the (1-α) percentile of the null distribution.
Parameters:
X (array-like of shape (T, k_x)) – Predictor variable(s) under test. Must be 2-D even when k_x=1.
Y (array-like of shape (T, 1)) – Target variable column.
Z (array-like of shape (T, k_z) or None) – Current conditioning set. If None, tests marginal mutual information.
observed_cmi (float) – Conditional mutual information value computed on original (unshuffled) data.
alpha (float, default=0.05) – Significance level for the test. Lower values require stronger evidence.
n_shuffles (int, default=500) – Number of random permutations to generate for the null distribution.
rng (int, numpy.random.Generator, or None) – Random number generator or seed for reproducible results.
information (str, default='gaussian') – Information measure estimator type used for conditional mutual information.
Returns:
result – Dictionary containing test results:
’Threshold’: float, the (1-α) percentile of the null distribution
’Value’: float, the observed conditional mutual information value
’Pass’: bool, True if observed_cmi >= threshold (statistically significant)
’P_value’: float, empirical p-value (proportion of null values >= observed)
The permutation test is based on the assumption that under the null hypothesis,
the predictor X is exchangeable with respect to the target Y when conditioned on Z.
This provides a non-parametric approach to significance testing that does not
require distributional assumptions.
For computational efficiency, consider reducing n_shuffles for preliminary analyses,
though this may reduce the precision of p-value estimates.
Compute the logarithm of the determinant of a correlation matrix.
This function calculates the signed log-determinant of the correlation matrix
derived from the input data matrix A. The correlation matrix is defined as:
Special Cases:
- Empty matrix (n_features = 0): Returns 0.0
- Scalar correlation (1x1 matrix): Returns 0.0
- Singular matrix: May return -inf or raise warnings
Numerical Considerations:
- Uses numpy.linalg.slogdet for stable computation of log-determinant
- Handles edge cases gracefully without exceptions
- More stable than computing log(det(R)) directly
Applications:
- Gaussian mutual information calculation
- Model selection criteria (AIC, BIC)
- Multivariate normality testing
- Information-theoretic measures
Interpretation:
- Large positive values: High linear dependence among variables
- Values near zero: Near-independence of variables
- Large negative values: Multicollinearity, near-singular correlation matrix
Extract a subgraph containing only edges at a specific lag.
The general return value from discover_network is a NetworkX MultiDiGraph
with lag, p-value, and cmi encoded as edge attributes. This method returns a
DiGraph containing only edges at the specified lag value.
Since the input is a MultiDiGraph, bidirectional connections at the same lag
are represented as two separate directed edges: one from i to j and one from
j to i.
Parameters:
G (nx.MultiDiGraph) – The causal network graph from discover_network with edge attributes
‘lag’, ‘cmi’, and ‘p_value’.
lag (int) – The time lag to extract. Only edges with this lag value will be included.
Returns:
H – A directed graph containing only the edges at the specified lag.
Edge attributes ‘cmi’ and ‘p_value’ are preserved.
Construct the block companion matrix for a causal network.
The purpose of this method is to store the causal graph in a structure that this
library prefers and is not necessarily the graph theoretical construction.
The companion matrix is a block-structured matrix used in vector autoregression (VAR)
and dynamical systems analysis:
Compute the logarithm of the determinant of a correlation matrix.
This function calculates the signed log-determinant of the correlation matrix
derived from the input data matrix A. The correlation matrix is defined as:
Special Cases:
- Empty matrix (n_features = 0): Returns 0.0
- Scalar correlation (1x1 matrix): Returns 0.0
- Singular matrix: May return -inf or raise warnings
Numerical Considerations:
- Uses numpy.linalg.slogdet for stable computation of log-determinant
- Handles edge cases gracefully without exceptions
- More stable than computing log(det(R)) directly
Applications:
- Gaussian mutual information calculation
- Model selection criteria (AIC, BIC)
- Multivariate normality testing
- Information-theoretic measures
Interpretation:
- Large positive values: High linear dependence among variables
- Values near zero: Near-independence of variables
- Large negative values: Multicollinearity, near-singular correlation matrix
Extract a subgraph containing only edges at a specific lag.
The general return value from discover_network is a NetworkX MultiDiGraph
with lag, p-value, and cmi encoded as edge attributes. This method returns a
DiGraph containing only edges at the specified lag value.
Since the input is a MultiDiGraph, bidirectional connections at the same lag
are represented as two separate directed edges: one from i to j and one from
j to i.
Parameters:
G (nx.MultiDiGraph) – The causal network graph from discover_network with edge attributes
‘lag’, ‘cmi’, and ‘p_value’.
lag (int) – The time lag to extract. Only edges with this lag value will be included.
Returns:
H – A directed graph containing only the edges at the specified lag.
Edge attributes ‘cmi’ and ‘p_value’ are preserved.
Construct the block companion matrix for a causal network.
The purpose of this method is to store the causal graph in a structure that this
library prefers and is not necessarily the graph theoretical construction.
The companion matrix is a block-structured matrix used in vector autoregression (VAR)
and dynamical systems analysis:
This function creates a ROC curve visualization, which is a graphical plot
that illustrates the diagnostic ability of a binary classifier system.
The ROC curve plots the True Positive Rate against the False Positive Rate
at various threshold settings.
The ROC curve is defined by the parametric equations:
where t is the classification threshold, P is the total number of positives,
and N is the total number of negatives.
Parameters:
TPRs (array-like) – True Positive Rates (Sensitivity, Recall) for different thresholds.
Values should be in [0, 1].
FPRs (array-like) – False Positive Rates (1 - Specificity) for different thresholds.
Values should be in [0, 1].
Notes
ROC Curve Interpretation:
- Perfect classifier: Curve passes through (0, 1) - high TPR, zero FPR
- Random classifier: Diagonal line from (0, 0) to (1, 1)
- Useless classifier: Curve below the diagonal
Key Points:
- (0, 0): No false positives, but also no true positives (very conservative)
- (1, 1): All positives detected, but all negatives misclassified (very liberal)
- (0, 1): Perfect classification (ideal classifier)
AUC (Area Under Curve):
- AUC = 1.0: Perfect classifier
- AUC = 0.5: Random classifier
- AUC < 0.5: Worse than random (can be inverted)
Plot a causal network from a MultiDiGraph object with production-quality styling.
This function visualizes a causal network, accounting for edge attributes
like lag, p-value, and conditional mutual information (CMI). It is designed
to produce publication-quality plots with high readability, customizable
styling, and multiple output options.
Parameters:
G (nx.MultiDiGraph) – The causal network graph to plot. Expected to have ‘lag’, ‘cmi’, and
optionally ‘p_value’ as edge attributes.
pos (dict, optional) – A dictionary with nodes as keys and positions as values. If not
provided, an optimized circular layout will be computed.
seed (int, default=7) – Seed for the random number generator used in layout optimization.
figsize (tuple of float, default=(14, 14)) – Figure size in inches (width, height).
dpi (int, optional) – Resolution in dots per inch. If None, uses matplotlib’s default (~100).
Use 300+ for publication quality.
node_size (int, default=8000) – Size of nodes in the plot.
node_color (str, default='white') – Color of nodes.
node_linewidth (float, default=5.0) – Width of node borders.
edge_width_range (tuple of float, default=(1.0, 8.0)) – Range for edge width scaling based on CMI values (min_width, max_width).
arrowsize (int, default=30) – Size of arrow heads on directed edges.
label_fontsize (int, default=16) – Font size for node labels.
title_fontsize (int, default=24) – Font size for the plot title.
colormaps (list of str, optional) – List of matplotlib colormap names to use for different lags.
If None, defaults to [‘Blues’, ‘Greens’, ‘Oranges’, ‘Purples’, ‘Reds’].
If colorblind_safe=True, this is overridden with accessible palettes.
colorblind_safe (bool, default=False) – If True, use colorblind-safe color palettes (viridis, plasma, cividis, etc.).
show_colorbar (bool, default=True) – If True, display a colorbar showing the CMI scale for each lag.
use_pvalue_alpha (bool, default=True) – If True, use p-value to set edge transparency (more significant = more opaque).
Requires ‘p_value’ in edge attributes.
pvalue_threshold (float, default=0.05) – Significance threshold for filtering edges. Edges with p > threshold
are drawn with reduced opacity.
show_edge_labels (bool, default=False) – If True, display CMI values as edge labels.
show_statistics (bool, default=True) – If True, display network statistics (nodes, edges, max lag) in a text box.
title (str, default='Discovered Causal Network') – Title for the plot.
save_path (str, optional) – Path to save the figure. If None, figure is not saved.
file_format (str, default='png') – File format for saving (‘png’, ‘pdf’, ‘svg’, ‘eps’, etc.).
transparent (bool, default=False) – If True, save figure with transparent background.
show_plot (bool, default=True) – If True, display the plot using plt.show().
fig (matplotlib.figure.Figure) – The figure object containing the plot.
ax (matplotlib.axes.Axes) – The axes object containing the plot.
Notes
Layout Optimization:
Nodes are arranged in a circular layout. If pos is not provided, the
node order is optimized to minimize edge crossings and improve readability
using simulated annealing.
Edge Rendering:
Edges for lag 1 are drawn as straight lines (connectionstyle radius=0).
Edges for lags > 1 are drawn as arcs outside the circle, with arc
radius increasing for higher lags to prevent overlap.
Edge color and width are scaled by the CMI value within each lag group.
Edge transparency (alpha) is modulated by p-value if use_pvalue_alpha=True.
Color Accessibility:
When colorblind_safe=True, the function uses perceptually uniform and
colorblind-accessible colormaps from matplotlib (viridis, plasma, cividis,
inferno, magma). This ensures the plot is accessible to readers with
color vision deficiencies.
P-value Visualization:
If edge attributes include ‘p_value’ and use_pvalue_alpha=True:
- Edges with p < pvalue_threshold are drawn with alpha=1.0 (fully opaque)
- Edges with p >= pvalue_threshold are drawn with alpha=0.3 (translucent)
This provides a visual indication of statistical significance.
Publication Quality:
DPI parameter allows control of resolution (use 300-600 for journals)
Large fonts and thick lines ensure readability when scaled
This function creates a ROC curve visualization, which is a graphical plot
that illustrates the diagnostic ability of a binary classifier system.
The ROC curve plots the True Positive Rate against the False Positive Rate
at various threshold settings.
The ROC curve is defined by the parametric equations:
where t is the classification threshold, P is the total number of positives,
and N is the total number of negatives.
Parameters:
TPRs (array-like) – True Positive Rates (Sensitivity, Recall) for different thresholds.
Values should be in [0, 1].
FPRs (array-like) – False Positive Rates (1 - Specificity) for different thresholds.
Values should be in [0, 1].
Notes
ROC Curve Interpretation:
- Perfect classifier: Curve passes through (0, 1) - high TPR, zero FPR
- Random classifier: Diagonal line from (0, 0) to (1, 1)
- Useless classifier: Curve below the diagonal
Key Points:
- (0, 0): No false positives, but also no true positives (very conservative)
- (1, 1): All positives detected, but all negatives misclassified (very liberal)
- (0, 1): Perfect classification (ideal classifier)
AUC (Area Under Curve):
- AUC = 1.0: Perfect classifier
- AUC = 0.5: Random classifier
- AUC < 0.5: Worse than random (can be inverted)
TPRs (array-like) – True Positive Rates (sensitivities) corresponding to different thresholds.
Should be sorted in ascending order of FPR.
FPRs (array-like) – False Positive Rates (1 - specificities) corresponding to different thresholds.
Should be sorted in ascending order.
Returns:
AUC – Area under the ROC curve. Values range from 0 to 1, where:
- 0.5: Random classifier performance
- 1.0: Perfect classifier performance
- 0.0: Perfectly wrong classifier (can be inverted)
Probabilistic: Probability that a randomly chosen positive instance
ranks higher than a randomly chosen negative instance
Geometric: Area under the ROC curve in TPR-FPR space
Performance: Single-number summary of classifier quality across thresholds
Advantages:
- Scale-invariant: Measures prediction quality regardless of classification threshold
- Aggregated: Provides performance summary across all thresholds
Limitations:
- Can be overly optimistic for imbalanced datasets
- Doesn’t reflect class distribution in deployment
- May not align with specific cost considerations
where:
- TP: True positives (correctly predicted edges)
- FN: False negatives (missed edges)
- FP: False positives (incorrectly predicted edges)
- TN: True negatives (correctly predicted non-edges)
- P: Total positive edges in ground truth
- N: Total negative edges in ground truth
Parameters:
A (array-like of shape (n, n)) – Ground truth binary adjacency matrix. Should contain only 0s and 1s.
B (array-like of shape (n, n)) – Predicted binary adjacency matrix. Should contain only 0s and 1s
and have the same shape as A.
Returns:
TPR (float) – True Positive Rate (Sensitivity, Recall). Fraction of actual edges
that were correctly identified.
FPR (float) – False Positive Rate (1 - Specificity). Fraction of actual non-edges
that were incorrectly predicted as edges.
Notes
This implementation assumes:
- Matrices are square and binary
- Self-loops are excluded (diagonal elements ignored)
- Matrices represent undirected graphs (symmetric)
Interpretation:
- TPR (Sensitivity): How well the method detects true connections
- FPR (1-Specificity): How often the method falsely detects connections
Performance Assessment:
- High TPR, Low FPR: Excellent performance
- High TPR, High FPR: Sensitive but not specific
- Low TPR, Low FPR: Conservative approach
- Low TPR, High FPR: Poor performance
TPRs (array-like) – True Positive Rates (sensitivities) corresponding to different thresholds.
Should be sorted in ascending order of FPR.
FPRs (array-like) – False Positive Rates (1 - specificities) corresponding to different thresholds.
Should be sorted in ascending order.
Returns:
AUC – Area under the ROC curve. Values range from 0 to 1, where:
- 0.5: Random classifier performance
- 1.0: Perfect classifier performance
- 0.0: Perfectly wrong classifier (can be inverted)
Probabilistic: Probability that a randomly chosen positive instance
ranks higher than a randomly chosen negative instance
Geometric: Area under the ROC curve in TPR-FPR space
Performance: Single-number summary of classifier quality across thresholds
Advantages:
- Scale-invariant: Measures prediction quality regardless of classification threshold
- Aggregated: Provides performance summary across all thresholds
Limitations:
- Can be overly optimistic for imbalanced datasets
- Doesn’t reflect class distribution in deployment
- May not align with specific cost considerations
where:
- TP: True positives (correctly predicted edges)
- FN: False negatives (missed edges)
- FP: False positives (incorrectly predicted edges)
- TN: True negatives (correctly predicted non-edges)
- P: Total positive edges in ground truth
- N: Total negative edges in ground truth
Parameters:
A (array-like of shape (n, n)) – Ground truth binary adjacency matrix. Should contain only 0s and 1s.
B (array-like of shape (n, n)) – Predicted binary adjacency matrix. Should contain only 0s and 1s
and have the same shape as A.
Returns:
TPR (float) – True Positive Rate (Sensitivity, Recall). Fraction of actual edges
that were correctly identified.
FPR (float) – False Positive Rate (1 - Specificity). Fraction of actual non-edges
that were incorrectly predicted as edges.
Notes
This implementation assumes:
- Matrices are square and binary
- Self-loops are excluded (diagonal elements ignored)
- Matrices represent undirected graphs (symmetric)
Interpretation:
- TPR (Sensitivity): How well the method detects true connections
- FPR (1-Specificity): How often the method falsely detects connections
Performance Assessment:
- High TPR, Low FPR: Excellent performance
- High TPR, High FPR: Sensitive but not specific
- Low TPR, Low FPR: Conservative approach
- Low TPR, High FPR: Poor performance