Glossary
This glossary provides definitions of key terms and concepts used throughout the Causation Entropy library and documentation.
- Causal Discovery
The process of inferring causal relationships between variables from observational data, without direct experimental intervention. Distinguished from correlation analysis by attempting to identify directional, mechanistic relationships.
- Optimal Causation Entropy (oCSE)
An information-theoretic measure of causal influence based on conditional mutual information. Quantifies how much information a potential cause provides about an effect, beyond what is already known from other variables.
- Conditional Mutual Information (CMI)
A measure of mutual dependence between two variables given knowledge of a third variable (or set of variables). Mathematically:
\[I(X; Y | Z) = H(X | Z) - H(X | Y, Z)\]- Conditioning Set
The set of variables \(\mathbf{Z}\) that are held constant when computing conditional mutual information. In causal discovery, this typically includes confounding variables and previously selected predictors.
- False Discovery Rate (FDR)
The expected proportion of false positives among all discoveries (rejected null hypotheses). In causal discovery, this controls the expected fraction of incorrectly identified causal relationships.
- False Positive Rate (FPR)
The probability of incorrectly identifying a causal relationship when none exists. Also known as Type I error rate or \(1 - \text{specificity}\).
\[\text{FPR} = \frac{\text{FP}}{\text{FP} + \text{TN}}\]- Forward Selection
A greedy algorithm phase that iteratively selects the predictor variable with the highest conditional mutual information with the target, subject to statistical significance constraints.
- Backward Elimination
A pruning phase that removes previously selected predictors that no longer maintain statistical significance when conditioned on all other selected variables.
- Granger Causality
A statistical concept of causality based on predictability: X is said to Granger-cause Y if past values of X contain information that helps predict Y beyond what is contained in past values of Y alone.
- Information Criterion
A measure used for model selection that balances goodness of fit against model complexity. Common examples include AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion).
- k-Nearest Neighbor (k-NN) Estimator
A non-parametric method for estimating probability densities and information measures based on distances to the k-th nearest neighbor in the data space.
- Kernel Density Estimation (KDE)
A non-parametric method for estimating probability density functions by placing kernel functions (typically Gaussian) at each data point and summing their contributions.
- Lag
The time delay \(\tau\) between a potential cause and its effect in time series analysis. A lag of \(\tau\) means the cause variable at time \(t-\tau\) potentially influences the effect variable at time \(t\).
- LASSO (Least Absolute Shrinkage and Selection Operator)
A regularization method that performs variable selection by adding an L1 penalty term to the loss function:
\[\min_\beta \frac{1}{2n}||y - X\beta||_2^2 + \lambda ||\beta||_1\]- Maximum Lag
The maximum time delay \(\tau_{\max}\) considered in causal discovery. Variables are tested as potential causes at lags \(1, 2, \ldots, \tau_{\max}\).
- Mutual Information
A measure of mutual dependence between two variables, quantifying the amount of information obtained about one variable by observing another:
\[I(X; Y) = H(X) - H(X | Y) = H(Y) - H(Y | X)\]- Network Inference
The process of reconstructing the structure of a network (graph) from observational data on the nodes. In causal discovery, this involves identifying directed edges representing causal relationships.
- Permutation Test
A non-parametric statistical test that assesses significance by comparing the observed test statistic to a distribution generated by randomly permuting the data under the null hypothesis.
- Statistical Significance
The probability that an observed relationship occurred by chance, typically assessed using p-values and compared to a significance level \(\alpha\) (commonly 0.05).
- Time Series
A sequence of data points indexed by time, typically collected at successive, equally-spaced points in time.
- Transfer Entropy
An information-theoretic measure of directed information transfer between time series, closely related to Granger causality but based on information theory rather than linear prediction.
- True Positive Rate (TPR)
The probability of correctly identifying a causal relationship when it exists. Also known as sensitivity, recall, or statistical power.
\[\text{TPR} = \frac{\text{TP}}{\text{TP} + \text{FN}}\]- Vector Autoregression (VAR)
A multivariate extension of autoregressive models where each variable is regressed on lagged values of itself and all other variables in the system:
\[\mathbf{x}_t = \mathbf{A}_1 \mathbf{x}_{t-1} + \cdots + \mathbf{A}_p \mathbf{x}_{t-p} + \boldsymbol{\epsilon}_t\]
Mathematical Notation
Common mathematical symbols used throughout the documentation:
Symbol |
Meaning |
|---|---|
\(H(X)\) |
Entropy of random variable X |
\(I(X; Y)\) |
Mutual information between X and Y |
\(I(X; Y | Z)\) |
Conditional mutual information between X and Y given Z |
\(X^{(t)}\) |
Variable X at time t |
\(X_i^{(t-\tau)}\) |
Variable i at time t-τ (lag τ) |
\(\mathbf{Z}_i^{(t)}\) |
Conditioning set for variable i at time t |
\(\tau\) |
Time lag |
\(\tau_{\max}\) |
Maximum lag considered |
\(\alpha\) |
Significance level (e.g., 0.05) |
\(\lambda\) |
Regularization parameter |
\(\mathbf{A}\) |
Adjacency matrix |
\(\rho\) |
Spectral radius or correlation coefficient |
\(\epsilon\) |
Error term or small constant |
\(\psi(\cdot)\) |
Digamma function |
\(\Gamma(\cdot)\) |
Gamma function |
\(|\mathbf{M}|\) |
Determinant of matrix M |
\(\mathbf{I}_n\) |
n×n identity matrix |
\(\mathbb{E}[\cdot]\) |
Expected value |
\(\text{Var}(\cdot)\) |
Variance |
\(\text{Cov}(\cdot, \cdot)\) |
Covariance |
Abbreviations
Abbreviation |
Full Term |
|---|---|
oCSE |
optimal Causal Entropy |
CMI |
Conditional Mutual Information |
MI |
Mutual Information |
KDE |
Kernel Density Estimation |
k-NN |
k-Nearest Neighbor |
KSG |
Kraskov-Stögbauer-Grassberger (estimator) |
LASSO |
Least Absolute Shrinkage and Selection Operator |
VAR |
Vector Autoregression |
AIC |
Akaike Information Criterion |
BIC |
Bayesian Information Criterion |
ROC |
Receiver Operating Characteristic |
AUC |
Area Under Curve |
TPR |
True Positive Rate |
FPR |
False Positive Rate |
FDR |
False Discovery Rate |
TE |
Transfer Entropy |
GC |
Granger Causality |