Statistical Utilities

causationentropy.core.stats.auc(TPRs, FPRs)[source]

Compute Area Under the ROC Curve (AUC) using trapezoidal integration.

The Area Under the Curve provides a single scalar measure of classifier performance across all classification thresholds. It is computed as:

\[\text{AUC} = \int_0^1 \text{TPR}(\text{FPR}) \, d(\text{FPR})\]

where TPR (True Positive Rate) and FPR (False Positive Rate) define the ROC curve. The integral is approximated using the trapezoidal rule:

\[\text{AUC} \approx \sum_{i=1}^{n-1} \frac{1}{2}[\text{TPR}_i + \text{TPR}_{i+1}][\text{FPR}_{i+1} - \text{FPR}_i]\]

Parameters:

TPRs (array-like) – True Positive Rates (sensitivities) corresponding to different thresholds. Should be sorted in ascending order of FPR.
FPRs (array-like) – False Positive Rates (1 - specificities) corresponding to different thresholds. Should be sorted in ascending order.

Returns:

AUC – Area under the ROC curve. Values range from 0 to 1, where: - 0.5: Random classifier performance - 1.0: Perfect classifier performance - 0.0: Perfectly wrong classifier (can be inverted)

Return type:

float

Notes

The AUC metric provides several interpretations:

Probabilistic: Probability that a randomly chosen positive instance ranks higher than a randomly chosen negative instance
Geometric: Area under the ROC curve in TPR-FPR space
Performance: Single-number summary of classifier quality across thresholds

Advantages: - Scale-invariant: Measures prediction quality regardless of classification threshold - Aggregated: Provides performance summary across all thresholds

Limitations: - Can be overly optimistic for imbalanced datasets - Doesn’t reflect class distribution in deployment - May not align with specific cost considerations

Examples

>>> import numpy as np
>>> from causationentropy.core.stats import auc
>>>
>>> # Perfect classifier
>>> tpr_perfect = np.array([0, 1, 1])
>>> fpr_perfect = np.array([0, 0, 1])
>>> print(f"Perfect AUC: {auc(tpr_perfect, fpr_perfect)}")
>>>
>>> # Random classifier
>>> tpr_random = np.array([0, 0.5, 1])
>>> fpr_random = np.array([0, 0.5, 1])
>>> print(f"Random AUC: {auc(tpr_random, fpr_random)}")

causationentropy.core.stats.Compute_TPR_FPR(A, B)[source]

Compute True Positive Rate and False Positive Rate for binary adjacency matrices.

This function evaluates the performance of a predicted network (B) against a ground truth network (A) by computing standard classification metrics:

\[\text{TPR} = \frac{\text{TP}}{\text{TP} + \text{FN}} = \frac{\text{TP}}{P}\]

\[\text{FPR} = \frac{\text{FP}}{\text{FP} + \text{TN}} = \frac{\text{FP}}{N}\]

where: - TP: True positives (correctly predicted edges) - FN: False negatives (missed edges) - FP: False positives (incorrectly predicted edges) - TN: True negatives (correctly predicted non-edges) - P: Total positive edges in ground truth - N: Total negative edges in ground truth

Parameters:

A (array-like of shape (n, n)) – Ground truth binary adjacency matrix. Should contain only 0s and 1s.
B (array-like of shape (n, n)) – Predicted binary adjacency matrix. Should contain only 0s and 1s and have the same shape as A.

Returns:

TPR (float) – True Positive Rate (Sensitivity, Recall). Fraction of actual edges that were correctly identified.
FPR (float) – False Positive Rate (1 - Specificity). Fraction of actual non-edges that were incorrectly predicted as edges.

Notes

This implementation assumes: - Matrices are square and binary - Self-loops are excluded (diagonal elements ignored) - Matrices represent undirected graphs (symmetric)

Interpretation: - TPR (Sensitivity): How well the method detects true connections - FPR (1-Specificity): How often the method falsely detects connections

Performance Assessment: - High TPR, Low FPR: Excellent performance - High TPR, High FPR: Sensitive but not specific - Low TPR, Low FPR: Conservative approach - Low TPR, High FPR: Poor performance

Applications: - Network reconstruction evaluation - Causal discovery validation - ROC curve generation - Method comparison and benchmarking

Examples

>>> import numpy as np
>>> from causationentropy.core.stats import Compute_TPR_FPR
>>>
>>> # Ground truth: simple 3-node chain
>>> A = np.array([[0, 1, 0],
...               [1, 0, 1],
...               [0, 1, 0]])
>>>
>>> # Perfect prediction
>>> B_perfect = A.copy()
>>> tpr, fpr = Compute_TPR_FPR(A, B_perfect)
>>> print(f"Perfect: TPR={tpr:.2f}, FPR={fpr:.2f}")
>>>
>>> # Overprediction (extra edge)
>>> B_over = np.array([[0, 1, 1],
...                    [1, 0, 1],
...                    [1, 1, 0]])
>>> tpr, fpr = Compute_TPR_FPR(A, B_over)
>>> print(f"Overpredicted: TPR={tpr:.2f}, FPR={fpr:.2f}")