======== Glossary ======== This glossary provides definitions of key terms and concepts used throughout the Causation Entropy library and documentation. .. glossary:: Causal Discovery The process of inferring causal relationships between variables from observational data, without direct experimental intervention. Distinguished from correlation analysis by attempting to identify directional, mechanistic relationships. Optimal Causation Entropy (oCSE) An information-theoretic measure of causal influence based on conditional mutual information. Quantifies how much information a potential cause provides about an effect, beyond what is already known from other variables. Conditional Mutual Information (CMI) A measure of mutual dependence between two variables given knowledge of a third variable (or set of variables). Mathematically: .. math:: I(X; Y | Z) = H(X | Z) - H(X | Y, Z) Conditioning Set The set of variables :math:`\mathbf{Z}` that are held constant when computing conditional mutual information. In causal discovery, this typically includes confounding variables and previously selected predictors. False Discovery Rate (FDR) The expected proportion of false positives among all discoveries (rejected null hypotheses). In causal discovery, this controls the expected fraction of incorrectly identified causal relationships. False Positive Rate (FPR) The probability of incorrectly identifying a causal relationship when none exists. Also known as Type I error rate or :math:`1 - \text{specificity}`. .. math:: \text{FPR} = \frac{\text{FP}}{\text{FP} + \text{TN}} Forward Selection A greedy algorithm phase that iteratively selects the predictor variable with the highest conditional mutual information with the target, subject to statistical significance constraints. Backward Elimination A pruning phase that removes previously selected predictors that no longer maintain statistical significance when conditioned on all other selected variables. Granger Causality A statistical concept of causality based on predictability: X is said to Granger-cause Y if past values of X contain information that helps predict Y beyond what is contained in past values of Y alone. Information Criterion A measure used for model selection that balances goodness of fit against model complexity. Common examples include AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion). k-Nearest Neighbor (k-NN) Estimator A non-parametric method for estimating probability densities and information measures based on distances to the k-th nearest neighbor in the data space. Kernel Density Estimation (KDE) A non-parametric method for estimating probability density functions by placing kernel functions (typically Gaussian) at each data point and summing their contributions. Lag The time delay :math:`\tau` between a potential cause and its effect in time series analysis. A lag of :math:`\tau` means the cause variable at time :math:`t-\tau` potentially influences the effect variable at time :math:`t`. LASSO (Least Absolute Shrinkage and Selection Operator) A regularization method that performs variable selection by adding an L1 penalty term to the loss function: .. math:: \min_\beta \frac{1}{2n}||y - X\beta||_2^2 + \lambda ||\beta||_1 Maximum Lag The maximum time delay :math:`\tau_{\max}` considered in causal discovery. Variables are tested as potential causes at lags :math:`1, 2, \ldots, \tau_{\max}`. Mutual Information A measure of mutual dependence between two variables, quantifying the amount of information obtained about one variable by observing another: .. math:: I(X; Y) = H(X) - H(X | Y) = H(Y) - H(Y | X) Network Inference The process of reconstructing the structure of a network (graph) from observational data on the nodes. In causal discovery, this involves identifying directed edges representing causal relationships. Permutation Test A non-parametric statistical test that assesses significance by comparing the observed test statistic to a distribution generated by randomly permuting the data under the null hypothesis. Statistical Significance The probability that an observed relationship occurred by chance, typically assessed using p-values and compared to a significance level :math:`\alpha` (commonly 0.05). Time Series A sequence of data points indexed by time, typically collected at successive, equally-spaced points in time. Transfer Entropy An information-theoretic measure of directed information transfer between time series, closely related to Granger causality but based on information theory rather than linear prediction. True Positive Rate (TPR) The probability of correctly identifying a causal relationship when it exists. Also known as sensitivity, recall, or statistical power. .. math:: \text{TPR} = \frac{\text{TP}}{\text{TP} + \text{FN}} Vector Autoregression (VAR) A multivariate extension of autoregressive models where each variable is regressed on lagged values of itself and all other variables in the system: .. math:: \mathbf{x}_t = \mathbf{A}_1 \mathbf{x}_{t-1} + \cdots + \mathbf{A}_p \mathbf{x}_{t-p} + \boldsymbol{\epsilon}_t Mathematical Notation ==================== Common mathematical symbols used throughout the documentation: .. list-table:: Mathematical Symbols :widths: 15 85 :header-rows: 1 * - Symbol - Meaning * - :math:`H(X)` - Entropy of random variable X * - :math:`I(X; Y)` - Mutual information between X and Y * - :math:`I(X; Y | Z)` - Conditional mutual information between X and Y given Z * - :math:`X^{(t)}` - Variable X at time t * - :math:`X_i^{(t-\tau)}` - Variable i at time t-τ (lag τ) * - :math:`\mathbf{Z}_i^{(t)}` - Conditioning set for variable i at time t * - :math:`\tau` - Time lag * - :math:`\tau_{\max}` - Maximum lag considered * - :math:`\alpha` - Significance level (e.g., 0.05) * - :math:`\lambda` - Regularization parameter * - :math:`\mathbf{A}` - Adjacency matrix * - :math:`\rho` - Spectral radius or correlation coefficient * - :math:`\epsilon` - Error term or small constant * - :math:`\psi(\cdot)` - Digamma function * - :math:`\Gamma(\cdot)` - Gamma function * - :math:`|\mathbf{M}|` - Determinant of matrix M * - :math:`\mathbf{I}_n` - n×n identity matrix * - :math:`\mathbb{E}[\cdot]` - Expected value * - :math:`\text{Var}(\cdot)` - Variance * - :math:`\text{Cov}(\cdot, \cdot)` - Covariance Abbreviations ============= .. list-table:: Common Abbreviations :widths: 20 80 :header-rows: 1 * - Abbreviation - Full Term * - oCSE - optimal Causal Entropy * - CMI - Conditional Mutual Information * - MI - Mutual Information * - KDE - Kernel Density Estimation * - k-NN - k-Nearest Neighbor * - KSG - Kraskov-Stögbauer-Grassberger (estimator) * - LASSO - Least Absolute Shrinkage and Selection Operator * - VAR - Vector Autoregression * - AIC - Akaike Information Criterion * - BIC - Bayesian Information Criterion * - ROC - Receiver Operating Characteristic * - AUC - Area Under Curve * - TPR - True Positive Rate * - FPR - False Positive Rate * - FDR - False Discovery Rate * - TE - Transfer Entropy * - GC - Granger Causality