causationentropy.core.information.conditional_mutual_information.conditional_mutual_information(X, Y, Z=None, method='gaussian', metric='euclidean', k=6, bandwidth='silverman', kernel='gaussian')[source]
Compute conditional mutual information using specified estimation method.
This function provides a unified interface for computing conditional mutual information
I(X;Y|Z) using various estimation approaches. The choice of method depends on the
data type, dimensionality, and distributional assumptions.
Conditional mutual information quantifies the information shared between X and Y
when conditioning on Z:
\[I(X; Y | Z) = H(X | Z) - H(X | Y, Z)\]
Equivalently:
\[I(X; Y | Z) = H(X, Z) + H(Y, Z) - H(Z) - H(X, Y, Z)\]
Parameters:
X (array-like of shape (n_samples, n_features_x)) – First variable.
Y (array-like of shape (n_samples, n_features_y)) – Second variable.
Z (array-like of shape (n_samples, n_features_z) or None) – Conditioning variable. If None, computes marginal mutual information I(X;Y).
ValueError – If an unsupported method is specified.
Notes
Method Selection Guidelines:
Gaussian: Best for linear relationships, exact under Gaussianity
KDE: Good for smooth nonlinear dependencies, curse of dimensionality
k-NN: Robust for moderate dimensions, adapts to local density
Geometric k-NN: Effective for manifold data with intrinsic structure
Poisson: Specifically for discrete count data
Computational Complexity:
- Gaussian: O(n³) for matrix operations
- KDE: O(n²) for density evaluation
- k-NN: O(n² log n) for neighbor finding
Sample Size Requirements:
- Increase with dimensionality and complexity of dependencies
- k-NN methods generally require fewer samples than KDE
- Parametric methods (Gaussian) most sample-efficient when assumptions hold
Examples
>>> importnumpyasnp>>> fromcausationentropy.core.information.conditional_mutual_informationimportconditional_mutual_information>>>>>> # Generate sample data>>> n=1000>>> X=np.random.randn(n,2)>>> Y=np.random.randn(n,1)>>> Z=np.random.randn(n,1)>>>>>> # Compute conditional MI using different methods>>> cmi_gauss=conditional_mutual_information(X,Y,Z,method='gaussian')>>> cmi_knn=conditional_mutual_information(X,Y,Z,method='knn',k=3)>>>>>> print(f"Gaussian CMI: {cmi_gauss:.3f}")>>> print(f"k-NN CMI: {cmi_knn:.3f}")
causationentropy.core.information.conditional_mutual_information.gaussian_conditional_mutual_information(X, Y, Z=None)[source]
Compute conditional mutual information for multivariate Gaussian variables.
For multivariate Gaussian variables, the conditional mutual information has
a closed-form expression using covariance matrix determinants:
\[I(X; Y | Z) = \frac{1}{2} \log \frac{|\Sigma_{XZ}| |\Sigma_{YZ}|}{|\Sigma_Z| |\Sigma_{XYZ}|}\]
where \(\Sigma_{\cdot}\) denotes the covariance matrix of the subscripted variables.
Parameters:
X (array-like of shape (N, k_x)) – First variable with N samples and k_x features.
Y (array-like of shape (N, k_y)) – Second variable with N samples and k_y features.
Z (array-like of shape (N, k_z) or None) – Conditioning variable with N samples and k_z features.
If None, computes marginal mutual information I(X;Y).
This implementation uses log-determinants of correlation matrices for
numerical stability, employing the signed log-determinant function
to handle potential numerical issues.
The Gaussian assumption implies that:
- All conditional dependencies are captured by linear relationships
- Higher-order moments beyond covariance carry no information
- The estimator is exact under Gaussianity
For non-Gaussian data, this estimator provides a lower bound on the
true conditional mutual information.
causationentropy.core.information.conditional_mutual_information.kde_conditional_mutual_information(X, Y, Z, bandwidth='silverman', kernel='gaussian')[source]
Estimate conditional mutual information using Kernel Density Estimation.
This function computes conditional mutual information using the entropy decomposition:
\[I(X; Y | Z) = H(X, Z) + H(Y, Z) - H(Z) - H(X, Y, Z)\]
where each entropy term is estimated using kernel density estimation.
Parameters:
X (array-like of shape (n_samples, n_features_x)) – First variable.
Y (array-like of shape (n_samples, n_features_y)) – Second variable.
Z (array-like of shape (n_samples, n_features_z) or None) – Conditioning variable. If None, reduces to marginal mutual information.
bandwidth (str or float, default='silverman') – Bandwidth parameter for KDE.
kernel (str, default='gaussian') – Kernel function for density estimation.
Returns:
I – Estimated conditional mutual information in nats.
The KDE approach can capture nonlinear conditional dependencies but suffers from:
- Curse of dimensionality for high-dimensional conditioning sets
- Bandwidth selection sensitivity
- Computational complexity scaling with sample size
Consider k-NN methods for high-dimensional problems or large datasets.
causationentropy.core.information.conditional_mutual_information.knn_conditional_mutual_information(X, Y, Z, metric='minkowski', k=1)[source]
Estimate conditional mutual information using k-nearest neighbor method.
This function implements conditional mutual information estimation using
the relationship:
\[I(X; Y | Z) = I(X, Y) - I(X, Y; Z)\]
where both mutual information terms are estimated using the KSG k-NN estimator.
The approach leverages the fact that:
\[I(X; Y | Z) = I(X; Y) - I(X; Y | Z)\]
Parameters:
X (array-like of shape (n_samples, n_features_x)) – First variable.
Y (array-like of shape (n_samples, n_features_y)) – Second variable.
Z (array-like of shape (n_samples, n_features_z) or None) – Conditioning variable. If None, computes marginal mutual information.
metric (str, default='minkowski') – Distance metric for k-NN calculations.
This implementation uses the decomposition approach rather than direct
conditional MI estimation. The accuracy depends on:
Quality of marginal MI estimates
Dimensionality of the joint space
Sample size relative to effective dimensionality
References
causationentropy.core.information.conditional_mutual_information.geometric_knn_conditional_mutual_information(X, Y, Z, metric='euclidean', k=1)[source]
Estimate conditional mutual information using geometric k-nearest neighbor method.
This function applies the geometric k-NN entropy estimator to compute
conditional mutual information via the entropy decomposition:
\[I(X; Y | Z) = H_{ ext{geom}}(X, Z) + H_{ ext{geom}}(Y, Z) - H_{ ext{geom}}(Z) - H_{ ext{geom}}(X, Y, Z)\]
The geometric correction accounts for local manifold structure, providing
improved estimates for data with non-uniform density or intrinsic dimensionality
lower than the ambient space.
Parameters:
X (array-like of shape (n_samples, n_features_x)) – First variable.
Y (array-like of shape (n_samples, n_features_y)) – Second variable.
Z (array-like of shape (n_samples, n_features_z) or None) – Conditioning variable. If None, computes marginal mutual information.
metric (str, default='euclidean') – Distance metric for neighbor calculations.
The geometric approach is particularly effective for:
- Data on lower-dimensional manifolds
- Non-uniform density distributions
- Cases where local geometric structure is important
The method accounts for the effective local dimensionality through
geometric corrections to the standard k-NN entropy estimates.
causationentropy.core.information.conditional_mutual_information.poisson_conditional_mutual_information(X, Y, Z)[source]
Estimate conditional mutual information for multivariate Poisson distributions.
This function computes conditional mutual information for discrete count data
assuming Poisson distributions. The estimation uses the covariance structure
of the multivariate Poisson distribution:
\[I(X; Y | Z) = H(X, Z) + H(Y, Z) - H(Z) - H(X, Y, Z)\]
where entropies are computed using Poisson-specific formulations that account
for the discrete nature and parameter structure of Poisson variables.
Parameters:
X (array-like of shape (n_samples, n_features_x)) – Count data from first Poisson variables.
Y (array-like of shape (n_samples, n_features_y)) – Count data from second Poisson variables.
Z (array-like of shape (n_samples, n_features_z) or None) – Count data from conditioning Poisson variables.
If None, computes marginal mutual information.
Returns:
I – Estimated conditional mutual information for Poisson data.
This implementation is specifically designed for discrete count data where:
- Variables follow Poisson distributions
- Dependencies are captured through covariance structure
- Joint distributions maintain Poisson-like properties
Applications include:
- Gene expression count data
- Event occurrence data
- Discrete interaction networks
- Epidemiological count models
X (array-like of shape (n_samples, n_features)) – Input data for entropy estimation.
bandwidth (str or float, default='silverman') – Bandwidth selection method or explicit bandwidth value.
If ‘silverman’, uses Silverman’s rule of thumb.
kernel (str, default='gaussian') – Kernel function type. Options include ‘gaussian’, ‘tophat’, ‘epanechnikov’,
‘exponential’, ‘linear’, ‘cosine’.
Returns:
H – Estimated differential entropy in nats (natural units).
The KDE entropy estimator can suffer from boundary effects and may be biased
for small sample sizes. The choice of bandwidth critically affects the estimate:
Too small: Undersmoothed, entropy overestimated
Too large: Oversmoothed, entropy underestimated
Silverman’s rule provides a reasonable default bandwidth for Gaussian-like data.
Estimate entropy using geometric k-nearest neighbor method.
This function implements the geometric k-NN entropy estimator from Lord, Sun, and Bollt.
The method estimates differential entropy by analyzing the geometric properties of
k-nearest neighbor configurations in the data space.
where \(N\) is the sample size, \(d\) is the dimension, \(\rho_i\) is the
distance to the k-th nearest neighbor of point \(i\), and the geometric correction
accounts for the local geometry of the nearest neighbor configuration.
Parameters:
X (array-like of shape (N, d)) – Input data matrix where N is the number of samples and d is the dimensionality.
Xdist (array-like of shape (N, N)) – Pairwise distance matrix between all points in X.
k (int, default=1) – Number of nearest neighbors to consider for entropy estimation.
Returns:
H_X – Estimated differential entropy using the geometric k-NN method.
High-dimensional data where traditional methods may fail
Data with non-uniform density distributions
Cases where the underlying geometry is important
The geometric correction term accounts for the local dimensionality and shape
of the data manifold, making this estimator more robust than standard k-NN methods.
Estimate entropy for Poisson-distributed random variables.
This function computes the entropy of Poisson random variables with given rate
parameters. For a Poisson random variable X with parameter λ, the entropy is:
Estimate joint entropy for multivariate Poisson distributions.
This function computes the joint entropy of a multivariate Poisson distribution
using the covariance matrix structure. The joint entropy decomposes into:
where the first term represents marginal entropies and the second captures
the interaction effects through covariances.
Parameters:
Cov (array-like of shape (n, n)) – Covariance matrix of the multivariate Poisson distribution.
Diagonal elements represent marginal variances (= means for Poisson).
Off-diagonal elements represent covariances between variables.
Returns:
joint_entropy – Estimated joint entropy of the multivariate Poisson distribution.
This decomposition assumes a specific form for multivariate Poisson distributions
where the interaction structure is captured through the covariance terms.
The method:
Computes marginal entropies using diagonal elements (Poisson parameters)
Adds covariance contributions from off-diagonal elements
This approach is computationally efficient for high-dimensional Poisson models.
where \(\Sigma_X\), \(\Sigma_Y\) are the covariance matrices of X and Y,
and \(\Sigma_{XY}\) is the joint covariance matrix of the concatenated vector [X, Y].
This implementation uses correlation matrices and their log-determinants for
numerical stability.
Parameters:
X (array-like of shape (n_samples, n_features_x)) – First multivariate Gaussian variable.
Y (array-like of shape (n_samples, n_features_y)) – Second multivariate Gaussian variable. Must have the same number of samples as X.
This estimator is exact for multivariate Gaussian data and provides the
theoretical benchmark for other mutual information estimators.
The Gaussian assumption implies:
- All marginal and joint distributions are multivariate normal
- Linear relationships capture all dependencies
- Higher-order moments beyond covariance are uninformative
For non-Gaussian data, this estimator captures only linear dependencies
and may underestimate the true mutual information.
causationentropy.core.information.mutual_information.kde_mutual_information(X, Y, bandwidth='silverman', kernel='gaussian')[source]
Estimate mutual information using Kernel Density Estimation.
This function computes mutual information using the relationship:
\[I(X; Y) = H(X) + H(Y) - H(X, Y)\]
where each entropy term is estimated using KDE. The joint entropy H(X,Y)
is computed on the concatenated space [X, Y].
Parameters:
X (array-like of shape (n_samples, n_features_x)) – First variable.
Y (array-like of shape (n_samples, n_features_y)) – Second variable.
bandwidth (str or float, default='silverman') – Bandwidth selection method for kernel density estimation.
kernel (str, default='gaussian') – Kernel function type.
The KDE approach can capture nonlinear dependencies but is sensitive to:
- Bandwidth selection (affects bias-variance tradeoff)
- Curse of dimensionality for high-dimensional data
- Sample size requirements for reliable density estimates
Consider using k-NN methods for high-dimensional data or small samples.
causationentropy.core.information.mutual_information.knn_mutual_information(X, Y, metric='euclidean', k=1)[source]
Estimate mutual information using k-nearest neighbor (KSG) method.
This function implements the Kraskov-Stögbauer-Grassberger estimator,
which uses k-nearest neighbor statistics to estimate mutual information:
where \(\psi\) is the digamma function, \(N\) is the total number of samples,
\(n_x\) and \(n_y\) are the numbers of neighbors in the marginal spaces
within the distance to the k-th neighbor in the joint space.
Parameters:
X (array-like of shape (n_samples, n_features_x)) – First variable.
Y (array-like of shape (n_samples, n_features_y)) – Second variable.
metric (str, default='euclidean') – Distance metric for neighborhood calculations.
k (int, default=1) – Number of nearest neighbors to consider.
This estimator is particularly effective for:
- Data lying on lower-dimensional manifolds
- Non-uniform density distributions
- Cases where local geometry matters
The geometric correction helps account for the intrinsic dimensionality
of the data, potentially providing more accurate estimates than standard k-NN methods.