Information Theory

Information-theoretic measures for causal discovery.

Main Functions

causationentropy.core.information.conditional_mutual_information.conditional_mutual_information(X, Y, Z=None, method='gaussian', metric='euclidean', k=6, bandwidth='silverman', kernel='gaussian')[source]

Compute conditional mutual information using specified estimation method.

This function provides a unified interface for computing conditional mutual information I(X;Y|Z) using various estimation approaches. The choice of method depends on the data type, dimensionality, and distributional assumptions.

Conditional mutual information quantifies the information shared between X and Y when conditioning on Z:

\[I(X; Y | Z) = H(X | Z) - H(X | Y, Z)\]

Equivalently:

\[I(X; Y | Z) = H(X, Z) + H(Y, Z) - H(Z) - H(X, Y, Z)\]

Parameters:

X (array-like of shape (n_samples, n_features_x)) – First variable.
Y (array-like of shape (n_samples, n_features_y)) – Second variable.
Z (array-like of shape (n_samples, n_features_z) or None) – Conditioning variable. If None, computes marginal mutual information I(X;Y).
method (str, default='gaussian') –
Estimation method. Available options:
- ’gaussian’: Assumes multivariate Gaussian distributions
- ’kde’ or ‘kernel_density’: Kernel density estimation
- ’knn’: k-nearest neighbor (KSG) estimator
- ’geometric_knn’: Geometric k-NN with manifold corrections
- ’poisson’: For discrete count data with Poisson assumptions
metric (str, default='euclidean') – Distance metric for k-NN based methods.
k (int, default=1) – Number of nearest neighbors for k-NN methods.
bandwidth (str or float, default='silverman') – Bandwidth parameter for KDE methods.
kernel (str, default='gaussian') – Kernel function for KDE methods.

Returns:

I – Estimated conditional mutual information in nats.

Return type:

float

Raises:

ValueError – If an unsupported method is specified.

Notes

Method Selection Guidelines:

Gaussian: Best for linear relationships, exact under Gaussianity
KDE: Good for smooth nonlinear dependencies, curse of dimensionality
k-NN: Robust for moderate dimensions, adapts to local density
Geometric k-NN: Effective for manifold data with intrinsic structure
Poisson: Specifically for discrete count data

Computational Complexity: - Gaussian: O(n³) for matrix operations - KDE: O(n²) for density evaluation - k-NN: O(n² log n) for neighbor finding

Sample Size Requirements: - Increase with dimensionality and complexity of dependencies - k-NN methods generally require fewer samples than KDE - Parametric methods (Gaussian) most sample-efficient when assumptions hold

Examples

>>> import numpy as np
>>> from causationentropy.core.information.conditional_mutual_information import conditional_mutual_information
>>>
>>> # Generate sample data
>>> n = 1000
>>> X = np.random.randn(n, 2)
>>> Y = np.random.randn(n, 1)
>>> Z = np.random.randn(n, 1)
>>>
>>> # Compute conditional MI using different methods
>>> cmi_gauss = conditional_mutual_information(X, Y, Z, method='gaussian')
>>> cmi_knn = conditional_mutual_information(X, Y, Z, method='knn', k=3)
>>>
>>> print(f"Gaussian CMI: {cmi_gauss:.3f}")
>>> print(f"k-NN CMI: {cmi_knn:.3f}")

causationentropy.core.information.conditional_mutual_information.gaussian_conditional_mutual_information(X, Y, Z=None)[source]

Compute conditional mutual information for multivariate Gaussian variables.

For multivariate Gaussian variables, the conditional mutual information has a closed-form expression using covariance matrix determinants:

\[I(X; Y | Z) = \frac{1}{2} \log \frac{|\Sigma_{XZ}| |\Sigma_{YZ}|}{|\Sigma_Z| |\Sigma_{XYZ}|}\]

This can also be expressed as:

\[I(X; Y | Z) = \frac{1}{2} [\log |\Sigma_{XZ}| + \log |\Sigma_{YZ}| - \log |\Sigma_Z| - \log |\Sigma_{XYZ}|]\]

where \(\Sigma_{\cdot}\) denotes the covariance matrix of the subscripted variables.

Parameters:

X (array-like of shape (N, k_x)) – First variable with N samples and k_x features.
Y (array-like of shape (N, k_y)) – Second variable with N samples and k_y features.
Z (array-like of shape (N, k_z) or None) – Conditioning variable with N samples and k_z features. If None, computes marginal mutual information I(X;Y).

Returns:

I – Conditional mutual information in nats.

Return type:

float

Notes

This implementation uses log-determinants of correlation matrices for numerical stability, employing the signed log-determinant function to handle potential numerical issues.

The Gaussian assumption implies that: - All conditional dependencies are captured by linear relationships - Higher-order moments beyond covariance carry no information - The estimator is exact under Gaussianity

For non-Gaussian data, this estimator provides a lower bound on the true conditional mutual information.

Nonparametric Estimators

causationentropy.core.information.conditional_mutual_information.kde_conditional_mutual_information(X, Y, Z, bandwidth='silverman', kernel='gaussian')[source]

Estimate conditional mutual information using Kernel Density Estimation.

This function computes conditional mutual information using the entropy decomposition:

\[I(X; Y | Z) = H(X, Z) + H(Y, Z) - H(Z) - H(X, Y, Z)\]

where each entropy term is estimated using kernel density estimation.

Parameters:

X (array-like of shape (n_samples, n_features_x)) – First variable.
Y (array-like of shape (n_samples, n_features_y)) – Second variable.
Z (array-like of shape (n_samples, n_features_z) or None) – Conditioning variable. If None, reduces to marginal mutual information.
bandwidth (str or float, default='silverman') – Bandwidth parameter for KDE.
kernel (str, default='gaussian') – Kernel function for density estimation.

Returns:

I – Estimated conditional mutual information in nats.

Return type:

float

Notes

The KDE approach can capture nonlinear conditional dependencies but suffers from: - Curse of dimensionality for high-dimensional conditioning sets - Bandwidth selection sensitivity - Computational complexity scaling with sample size

Consider k-NN methods for high-dimensional problems or large datasets.

causationentropy.core.information.conditional_mutual_information.knn_conditional_mutual_information(X, Y, Z, metric='minkowski', k=1)[source]

Estimate conditional mutual information using k-nearest neighbor method.

This function implements conditional mutual information estimation using the relationship:

\[I(X; Y | Z) = I(X, Y) - I(X, Y; Z)\]

where both mutual information terms are estimated using the KSG k-NN estimator.

The approach leverages the fact that:

\[I(X; Y | Z) = I(X; Y) - I(X; Y | Z)\]

Parameters:

X (array-like of shape (n_samples, n_features_x)) – First variable.
Y (array-like of shape (n_samples, n_features_y)) – Second variable.
Z (array-like of shape (n_samples, n_features_z) or None) – Conditioning variable. If None, computes marginal mutual information.
metric (str, default='minkowski') – Distance metric for k-NN calculations.
k (int, default=1) – Number of nearest neighbors.

Returns:

I – Estimated conditional mutual information in nats.

Return type:

float

Notes

This implementation uses the decomposition approach rather than direct conditional MI estimation. The accuracy depends on:

Quality of marginal MI estimates
Dimensionality of the joint space
Sample size relative to effective dimensionality

References

causationentropy.core.information.conditional_mutual_information.geometric_knn_conditional_mutual_information(X, Y, Z, metric='euclidean', k=1)[source]

Estimate conditional mutual information using geometric k-nearest neighbor method.

This function applies the geometric k-NN entropy estimator to compute conditional mutual information via the entropy decomposition:

\[I(X; Y | Z) = H_{ ext{geom}}(X, Z) + H_{ ext{geom}}(Y, Z) - H_{ ext{geom}}(Z) - H_{ ext{geom}}(X, Y, Z)\]

The geometric correction accounts for local manifold structure, providing improved estimates for data with non-uniform density or intrinsic dimensionality lower than the ambient space.

Parameters:

X (array-like of shape (n_samples, n_features_x)) – First variable.
Y (array-like of shape (n_samples, n_features_y)) – Second variable.
Z (array-like of shape (n_samples, n_features_z) or None) – Conditioning variable. If None, computes marginal mutual information.
metric (str, default='euclidean') – Distance metric for neighbor calculations.
k (int, default=1) – Number of nearest neighbors.

Returns:

I – Estimated conditional mutual information using geometric k-NN method.

Return type:

float

Notes

The geometric approach is particularly effective for: - Data on lower-dimensional manifolds - Non-uniform density distributions - Cases where local geometric structure is important

The method accounts for the effective local dimensionality through geometric corrections to the standard k-NN entropy estimates.

References

Distribution-Specific Estimators

causationentropy.core.information.conditional_mutual_information.poisson_conditional_mutual_information(X, Y, Z)[source]

Estimate conditional mutual information for multivariate Poisson distributions.

This function computes conditional mutual information for discrete count data assuming Poisson distributions. The estimation uses the covariance structure of the multivariate Poisson distribution:

\[I(X; Y | Z) = H(X, Z) + H(Y, Z) - H(Z) - H(X, Y, Z)\]

where entropies are computed using Poisson-specific formulations that account for the discrete nature and parameter structure of Poisson variables.

Parameters:

X (array-like of shape (n_samples, n_features_x)) – Count data from first Poisson variables.
Y (array-like of shape (n_samples, n_features_y)) – Count data from second Poisson variables.
Z (array-like of shape (n_samples, n_features_z) or None) – Count data from conditioning Poisson variables. If None, computes marginal mutual information.

Returns:

I – Estimated conditional mutual information for Poisson data.

Return type:

float

Notes

This implementation is specifically designed for discrete count data where: - Variables follow Poisson distributions - Dependencies are captured through covariance structure - Joint distributions maintain Poisson-like properties

Applications include: - Gene expression count data - Event occurrence data - Discrete interaction networks - Epidemiological count models

References

Entropy Functions

causationentropy.core.information.entropy.l2dist(a, b)[source]

Compute the Euclidean (L2) distance between two points.

\[d(a, b) = ||a - b||_2 = \sqrt{\sum_{i=1}^{n} (a_i - b_i)^2}\]

Parameters:

a (array-like) – Input points or vectors.
b (array-like) – Input points or vectors.

Returns:

distance – Euclidean distance between a and b.

Return type:

float

causationentropy.core.information.entropy.hyperellipsoid_check(svd_Yi, Z_i)[source]

Check if points lie within a hyperellipsoid defined by SVD components.

This function determines whether points in Z_i fall within the unit hyperellipsoid defined by the singular value decomposition of Yi.

Parameters:

svd_Yi (tuple) – SVD decomposition (U, S, Vt) of the reference matrix.
Z_i (array-like) – Points to test for inclusion in the hyperellipsoid.

Returns:

inside – True if all points lie within the hyperellipsoid, False otherwise.

Return type:

bool

Notes

This is used in the geometric k-NN entropy estimation to assess the local geometric configuration of nearest neighbors.

causationentropy.core.information.entropy.kde_entropy(X, bandwidth='silverman', kernel='gaussian')[source]

Estimate entropy using Kernel Density Estimation (KDE).

This function computes the differential entropy of a continuous random variable using kernel density estimation. The entropy is defined as:

\[H(X) = -\int f(x) \log f(x) \, dx\]

where \(f(x)\) is the probability density function estimated via KDE:

\[\hat{f}(x) = \frac{1}{nh} \sum_{i=1}^{n} K\left(\frac{x - x_i}{h}\right)\]

with kernel function \(K\) and bandwidth \(h\).

Parameters:

X (array-like of shape (n_samples, n_features)) – Input data for entropy estimation.
bandwidth (str or float, default='silverman') – Bandwidth selection method or explicit bandwidth value. If ‘silverman’, uses Silverman’s rule of thumb.
kernel (str, default='gaussian') – Kernel function type. Options include ‘gaussian’, ‘tophat’, ‘epanechnikov’, ‘exponential’, ‘linear’, ‘cosine’.

Returns:

H – Estimated differential entropy in nats (natural units).

Return type:

float

Notes

The KDE entropy estimator can suffer from boundary effects and may be biased for small sample sizes. The choice of bandwidth critically affects the estimate:

Too small: Undersmoothed, entropy overestimated
Too large: Oversmoothed, entropy underestimated

Silverman’s rule provides a reasonable default bandwidth for Gaussian-like data.

causationentropy.core.information.entropy.geometric_knn_entropy(X, Xdist, k=1)[source]

Estimate entropy using geometric k-nearest neighbor method.

This function implements the geometric k-NN entropy estimator from Lord, Sun, and Bollt. The method estimates differential entropy by analyzing the geometric properties of k-nearest neighbor configurations in the data space.

The entropy estimate is given by:

\[H(X) = \log N + \log \frac{\pi^{d/2}}{\Gamma(1 + d/2)} + \frac{d}{N} \sum_{i=1}^{N} \log \rho_i + \text{geometric correction}\]

where \(N\) is the sample size, \(d\) is the dimension, \(\rho_i\) is the distance to the k-th nearest neighbor of point \(i\), and the geometric correction accounts for the local geometry of the nearest neighbor configuration.

Parameters:

X (array-like of shape (N, d)) – Input data matrix where N is the number of samples and d is the dimensionality.
Xdist (array-like of shape (N, N)) – Pairwise distance matrix between all points in X.
k (int, default=1) – Number of nearest neighbors to consider for entropy estimation.

Returns:

H_X – Estimated differential entropy using the geometric k-NN method.

Return type:

float

Notes

This estimator is particularly effective for:

High-dimensional data where traditional methods may fail
Data with non-uniform density distributions
Cases where the underlying geometry is important

The geometric correction term accounts for the local dimensionality and shape of the data manifold, making this estimator more robust than standard k-NN methods.

References

causationentropy.core.information.entropy.poisson_entropy(lambdas)[source]

Estimate entropy for Poisson-distributed random variables.

This function computes the entropy of Poisson random variables with given rate parameters. For a Poisson random variable X with parameter λ, the entropy is:

\[H(X) = -\sum_{k=0}^{\infty} P(X = k) \log P(X = k)\]

where \(P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}\).

The summation is truncated when the cumulative probability reaches a specified tolerance to ensure numerical stability.

Parameters:: lambdas (array-like) – Rate parameters for the Poisson distributions. Can be scalar or array. Values are automatically converted to absolute values.
Returns:: est – Estimated entropy values in nats. Shape matches the input lambdas.
Return type:: float or array-like

Notes

This implementation:

Uses adaptive truncation based on cumulative probability mass
Handles numerical stability by setting log(0) terms to zero
Returns real values even if complex arithmetic is used internally

The estimator is particularly useful for count data and discrete event processes where Poisson assumptions are appropriate.

References

causationentropy.core.information.entropy.poisson_joint_entropy(Cov)[source]

Estimate joint entropy for multivariate Poisson distributions.

This function computes the joint entropy of a multivariate Poisson distribution using the covariance matrix structure. The joint entropy decomposes into:

\[H(\mathbf{X}) = \sum_{i} H(X_i) + \sum_{i<j} \text{Cov}(X_i, X_j)\]

where the first term represents marginal entropies and the second captures the interaction effects through covariances.

Parameters:: Cov (array-like of shape (n, n)) – Covariance matrix of the multivariate Poisson distribution. Diagonal elements represent marginal variances (= means for Poisson). Off-diagonal elements represent covariances between variables.
Returns:: joint_entropy – Estimated joint entropy of the multivariate Poisson distribution.
Return type:: float

Notes

This decomposition assumes a specific form for multivariate Poisson distributions where the interaction structure is captured through the covariance terms.

The method:

Computes marginal entropies using diagonal elements (Poisson parameters)
Adds covariance contributions from off-diagonal elements

This approach is computationally efficient for high-dimensional Poisson models.

Mutual Information Functions

causationentropy.core.information.mutual_information.gaussian_mutual_information(X, Y)[source]

Compute mutual information for multivariate Gaussian variables using log-determinants.

For multivariate Gaussian random variables, the mutual information has a closed-form expression in terms of the covariance matrices:

\[I(X; Y) = \frac{1}{2} \log \frac{|\Sigma_X| |\Sigma_Y|}{|\Sigma_{XY}|}\]

where \(\Sigma_X\), \(\Sigma_Y\) are the covariance matrices of X and Y, and \(\Sigma_{XY}\) is the joint covariance matrix of the concatenated vector [X, Y].

This implementation uses correlation matrices and their log-determinants for numerical stability.

Parameters:

X (array-like of shape (n_samples, n_features_x)) – First multivariate Gaussian variable.
Y (array-like of shape (n_samples, n_features_y)) – Second multivariate Gaussian variable. Must have the same number of samples as X.

Returns:

I – Mutual information in nats (natural units).

Return type:

float

Notes

This estimator is exact for multivariate Gaussian data and provides the theoretical benchmark for other mutual information estimators.

The Gaussian assumption implies: - All marginal and joint distributions are multivariate normal - Linear relationships capture all dependencies - Higher-order moments beyond covariance are uninformative

For non-Gaussian data, this estimator captures only linear dependencies and may underestimate the true mutual information.

causationentropy.core.information.mutual_information.kde_mutual_information(X, Y, bandwidth='silverman', kernel='gaussian')[source]

Estimate mutual information using Kernel Density Estimation.

This function computes mutual information using the relationship:

\[I(X; Y) = H(X) + H(Y) - H(X, Y)\]

where each entropy term is estimated using KDE. The joint entropy H(X,Y) is computed on the concatenated space [X, Y].

Parameters:

X (array-like of shape (n_samples, n_features_x)) – First variable.
Y (array-like of shape (n_samples, n_features_y)) – Second variable.
bandwidth (str or float, default='silverman') – Bandwidth selection method for kernel density estimation.
kernel (str, default='gaussian') – Kernel function type.

Returns:

I – Estimated mutual information in nats.

Return type:

float

Notes

The KDE approach can capture nonlinear dependencies but is sensitive to: - Bandwidth selection (affects bias-variance tradeoff) - Curse of dimensionality for high-dimensional data - Sample size requirements for reliable density estimates

Consider using k-NN methods for high-dimensional data or small samples.

causationentropy.core.information.mutual_information.knn_mutual_information(X, Y, metric='euclidean', k=1)[source]

Estimate mutual information using k-nearest neighbor (KSG) method.

This function implements the Kraskov-Stögbauer-Grassberger estimator, which uses k-nearest neighbor statistics to estimate mutual information:

\[I(X; Y) = \psi(k) + \psi(N) - \langle \psi(n_x + 1) + \psi(n_y + 1) \rangle\]

where \(\psi\) is the digamma function, \(N\) is the total number of samples, \(n_x\) and \(n_y\) are the numbers of neighbors in the marginal spaces within the distance to the k-th neighbor in the joint space.

Parameters:

X (array-like of shape (n_samples, n_features_x)) – First variable.
Y (array-like of shape (n_samples, n_features_y)) – Second variable.
metric (str, default='euclidean') – Distance metric for neighborhood calculations.
k (int, default=1) – Number of nearest neighbors to consider.

Returns:

I – Estimated mutual information in nats.

Return type:

float

Notes

The KSG estimator:

Is asymptotically consistent
Adapts to local density variations
Works well for continuous data
Can handle moderate dimensionality

Choice of k involves bias-variance tradeoff: - Small k: Lower bias, higher variance - Large k: Higher bias, lower variance

References

causationentropy.core.information.mutual_information.geometric_knn_mutual_information(X, Y, metric='euclidean', k=1)[source]

Estimate mutual information using geometric k-nearest neighbor method.

This function applies the geometric k-NN entropy estimator to compute mutual information via the entropy decomposition:

\[I(X; Y) = H_{ ext{geom}}(X) + H_{ ext{geom}}(Y) - H_{ ext{geom}}(X, Y)\]

The geometric correction accounts for local manifold structure and provides improved estimates for data with non-uniform density distributions.

Parameters:

X (array-like of shape (n_samples, n_features_x)) – First variable.
Y (array-like of shape (n_samples, n_features_y)) – Second variable.
metric (str, default='euclidean') – Distance metric for neighbor calculations.
k (int, default=1) – Number of nearest neighbors.

Returns:

I – Estimated mutual information using geometric k-NN method.

Return type:

float

Notes

This estimator is particularly effective for: - Data lying on lower-dimensional manifolds - Non-uniform density distributions - Cases where local geometry matters

The geometric correction helps account for the intrinsic dimensionality of the data, potentially providing more accurate estimates than standard k-NN methods.

References