Model Selection and Information Criteria

HIcosmo provides a complete set of model selection tools, including AIC, BIC, and Bayesian evidence computation. These tools help compare the goodness of fit of different cosmological models.

Information Criteria Overview

Criterion	Formula	Use Case	Interpretation
AIC	\(\chi^2_{\min} + 2k\)	Small samples, exploratory analysis	Smaller values indicate a better model
BIC	\(\chi^2_{\min} + k \ln N\)	Large samples, model selection	Stronger penalty on the number of parameters
Bayesian Evidence	\(\int L(\theta) \pi(\theta) d\theta\)	Bayesian model comparison	Accounts for prior volume

Where:

\(k\) = number of free parameters
\(N\) = number of data points
\(\chi^2_{\min}\) = minimum chi-squared value

Quick Start

3 lines to compute information criteria:

from hicosmo.visualization import information_criteria

result = information_criteria('my_chain')
print(f"AIC = {result['aic']:.2f}, BIC = {result['bic']:.2f}")

Formula Details

Chi-Squared Statistic

For a Gaussian likelihood function:

\[\log L = -\frac{\chi^2}{2} + \text{normalization constant}\]

Where the normalization constant includes:

\(-\frac{1}{2} \log |C|\): determinant of the covariance matrix
Other constants independent of the parameters

Extracting \(\chi^2\) from the log-likelihood:

\[\chi^2 = -2 \times (\log L - \text{normalization constant})\]

Note

HIcosmo automatically handles the normalization constant, ensuring correct AIC/BIC values. No manual intervention is needed.

AIC (Akaike Information Criterion)

Akaike Information Criterion:

\[\text{AIC} = \chi^2_{\min} + 2k\]

Characteristics:

Penalty term is independent of sample size
Suitable for small sample cases
Tends to favor more complex models

Interpreting ΔAIC:

ΔAIC	Interpretation
< 2	No significant difference
2 - 4	Weak evidence in favor of the lower-AIC model
4 - 7	Moderate evidence
> 10	Strong evidence

BIC (Bayesian Information Criterion)

Bayesian Information Criterion:

\[\text{BIC} = \chi^2_{\min} + k \ln N\]

Characteristics:

Penalty term increases with sample size
Stronger penalty against overfitting
Consistent estimator (selects the true model when the sample is large enough)

Jeffreys Scale Interpretation of ΔBIC:

ΔBIC	Evidence Strength
< 2	Not worth more than a bare mention
2 - 6	Positive evidence
6 - 10	Strong evidence
> 10	Decisive evidence

Bayesian Evidence

Bayesian evidence (marginal likelihood):

\[Z = \int L(\theta) \pi(\theta) d\theta\]

Where \(\pi(\theta)\) is the prior distribution.

Bayes Factor:

\[B_{12} = \frac{Z_1}{Z_2} = \exp(\log Z_1 - \log Z_2)\]

Jeffreys Scale:

Δlog Z	Bayes Factor B	Evidence Strength
< 1.0	< 2.7	Not worth more than a bare mention
1.0 - 2.5	2.7 - 12	Weak evidence
2.5 - 5.0	12 - 150	Moderate evidence
> 5.0	> 150	Strong evidence

Warning

HIcosmo uses the Harmonic Mean Estimator to compute Bayesian evidence. This method can be unstable; for accurate Bayesian evidence computation, consider using nested sampling (e.g., dynesty).

Handling the Normalization Constant

Why Is the Normalization Constant Needed?

Consider the full form of a Gaussian likelihood function:

\[\log L = -\frac{1}{2} (d - t)^T C^{-1} (d - t) - \frac{1}{2} \log |C| - \frac{N}{2} \log(2\pi)\]

Where:

The first term \(-\frac{\chi^2}{2}\) depends on the model parameters
The latter two terms are the normalization constant, independent of the model parameters

Problem: If \(\log L\) is used directly to compute AIC/BIC:

# Wrong!
aic = 2*k - 2*log_likelihood  # Will yield negative values

Solution: Extract \(\chi^2\) first, then compute:

# Correct
chi2 = -2 * (log_likelihood - normalization_constant)
aic = chi2 + 2*k  # Positive value

HIcosmo’s Implementation

HIcosmo automatically handles the normalization constant; no manual intervention is needed:

Likelihood functions provide the normalization constant:

class PantheonPlusLikelihood(Likelihood):
    def get_normalization_constant(self) -> float:
        """Return the normalization constant"""
        return -0.5 * self.log_det_cov

Automatically stored when saving MCMC results:

# Normalization constant is automatically included when saving the chain
mcmc.save_results('my_chain')
# metadata contains normalization_constant

Automatically read by information criteria:

result = information_criteria('my_chain')
# Automatically reads normalization_constant from metadata

Impact on Model Comparison

Key conclusion: The normalization constant has no impact on model comparison results!

\[\Delta\text{AIC} = \text{AIC}_1 - \text{AIC}_2 = (\chi^2_1 + 2k_1) - (\chi^2_2 + 2k_2)\]

The normalization constant is the same (using the same data) and cancels out in the difference.

# Verification
delta_aic_without_norm = aic1 - aic2
delta_aic_with_norm = aic1_corrected - aic2_corrected
# Both are equal!

However, the absolute values of AIC/BIC require correct normalization to be meaningful:

Reduced \(\chi^2 = \chi^2_{\min}/N \approx 1\) indicates a good fit
AIC, BIC must be positive numbers

API Reference

information_criteria Function

from hicosmo.visualization import information_criteria

result = information_criteria(
    samples='my_chain',           # Chain name, path, or sample dictionary
    num_params=None,              # Number of free parameters (auto-inferred)
    num_data=None,                # Number of data points (auto-inferred)
    log_likelihood_fn=None,       # Likelihood function (optional)
    evidence_method='harmonic_mean',  # Evidence estimation method
)

Return Value:

{
    'num_params': 4,                    # Number of free parameters k
    'num_data': 1580,                   # Number of data points N
    'n_samples': 10000,                 # MCMC sample count
    'log_likelihood_max': 2300.68,      # Maximum log-likelihood
    'normalization_constant': 2994.34,  # Normalization constant
    'chi2_min': 1387.32,                # Minimum chi-squared
    'aic': 1395.32,                     # AIC
    'bic': 1417.88,                     # BIC
    'log_evidence': 2298.5,             # log(Bayesian evidence)
    'log_evidence_method': 'harmonic_mean',
}

aic and bic Functions

Compute AIC or BIC independently:

from hicosmo.visualization.information_criteria import aic, bic

# Compute AIC
aic_value = aic(
    log_likelihood_max=2300.68,
    num_params=4,
    normalization_constant=2994.34
)

# Compute BIC
bic_value = bic(
    log_likelihood_max=2300.68,
    num_params=4,
    num_data=1580,
    normalization_constant=2994.34
)

Complete Examples

Single-Model Information Criteria

from hicosmo import MCMC
from hicosmo.models import LCDM
from hicosmo.likelihoods import SN_likelihood
from hicosmo.visualization import information_criteria

# Run MCMC
likelihood = SN_likelihood(LCDM, 'pantheon+')
params = {
    'H0': (67.0, 60.0, 80.0),
    'Omega_m': (0.3, 0.1, 0.5),
}

mcmc = MCMC(params, likelihood, chain_name='lcdm_pantheon')
mcmc.run(num_samples=5000, num_chains=4)

# Compute information criteria
result = information_criteria('lcdm_pantheon')

print(f"Number of free parameters k = {result['num_params']}")
print(f"Number of data points N = {result['num_data']}")
print(f"chi2_min = {result['chi2_min']:.2f}")
print(f"Reduced chi2 = {result['chi2_min']/result['num_data']:.4f}")
print(f"AIC = {result['aic']:.2f}")
print(f"BIC = {result['bic']:.2f}")

Multi-Model Comparison

from hicosmo import MCMC
from hicosmo.models import LCDM, wCDM, CPL
from hicosmo.likelihoods import SN_likelihood
from hicosmo.visualization import information_criteria
import numpy as np

# Define three models
models = {
    'LCDM': (LCDM, ['H0', 'Omega_m']),
    'wCDM': (wCDM, ['H0', 'Omega_m', 'w']),
    'CPL': (CPL, ['H0', 'Omega_m', 'w0', 'wa']),
}

results = {}

for name, (model_class, free_params) in models.items():
    # Build parameter configuration
    params = {p: (67.0 if p == 'H0' else 0.3, 0.1, 0.9) for p in free_params}
    if 'w' in free_params:
        params['w'] = (-1.0, -2.0, 0.0)
    if 'w0' in free_params:
        params['w0'] = (-1.0, -2.0, 0.0)
        params['wa'] = (0.0, -1.0, 1.0)

    # Run MCMC
    likelihood = SN_likelihood(model_class, 'pantheon+')
    mcmc = MCMC(params, likelihood, chain_name=f'{name.lower()}_chain')
    mcmc.run(num_samples=5000, num_chains=4)

    # Compute information criteria
    results[name] = information_criteria(f'{name.lower()}_chain')

# Print comparison table
print("\nModel Selection Results")
print("=" * 70)
print(f"{'Model':<8} {'k':<4} {'chi2_min':<12} {'AIC':<12} {'BIC':<12} {'DBIC':<10}")
print("-" * 70)

bic_ref = results['LCDM']['bic']
for name, ic in results.items():
    delta_bic = ic['bic'] - bic_ref
    print(f"{name:<8} {ic['num_params']:<4} {ic['chi2_min']:<12.2f} "
          f"{ic['aic']:<12.2f} {ic['bic']:<12.2f} {delta_bic:<+10.2f}")

print("-" * 70)

# Interpret results
print("\nModel Selection Interpretation (Jeffreys Scale):")
for name, ic in results.items():
    if name == 'LCDM':
        continue
    delta = abs(ic['bic'] - bic_ref)
    if delta < 2:
        strength = "Not worth more than a bare mention"
    elif delta < 6:
        strength = "Positive evidence"
    elif delta < 10:
        strength = "Strong evidence"
    else:
        strength = "Decisive evidence"
    better = "LCDM" if ic['bic'] > bic_ref else name
    print(f"  {name} vs LCDM: |DBIC| = {delta:.1f} -> {strength} in favor of {better}")

Bayesian Evidence Comparison

from hicosmo.visualization import information_criteria

# Assume MCMC has been run for multiple models
chains = ['lcdm_chain', 'wcdm_chain', 'cpl_chain']
names = ['LCDM', 'wCDM', 'CPL']

print("\nBayesian Evidence Comparison")
print("=" * 60)
print(f"{'Model':<8} {'log(Z)':<15} {'Dlog(Z)':<15} {'B':<10}")
print("-" * 60)

results = [information_criteria(c) for c in chains]
log_z_ref = results[0]['log_evidence']

for name, ic in zip(names, results):
    delta = ic['log_evidence'] - log_z_ref
    bayes_factor = np.exp(delta) if abs(delta) < 100 else float('inf')
    print(f"{name:<8} {ic['log_evidence']:<15.4f} {delta:<+15.4f} {bayes_factor:<10.2f}")

print("-" * 60)
print("\nJeffreys Scale: |Dlog(Z)| > 5 indicates strong evidence")

FAQ

Q: Are negative AIC/BIC values correct?

No. AIC = chi2 + 2k and BIC = chi2 + k*ln(N) must be positive, because chi2 >= 0.

If you see negative values, it may be due to using an uncorrected log-likelihood. HIcosmo handles this automatically.

Q: Should I use AIC or BIC?

AIC: Suitable for small samples and exploratory analysis; lighter penalty on complex models
BIC: Suitable for large samples and model selection; stronger penalty against overfitting

In cosmology, BIC is generally recommended, as datasets are typically large (N > 1000).

Q: What if the Bayesian evidence estimate is unstable?

The Harmonic Mean Estimator can indeed be unstable. Recommendations:

Increase the number of MCMC samples
Average over multiple chains
For accurate computation, use nested sampling (e.g., dynesty, polychord)

Q: Does the normalization constant affect model comparison?

No. When comparing models using the same data, the normalization constant is identical and cancels in DAIC and DBIC.

However, the normalization constant does affect absolute values:

Ensures chi2 > 0
Ensures AIC, BIC are positive
Reduced chi2 ~ 1 indicates a good fit

Next Steps

Visualization: Plot constraint figures
Samplers: Learn about MCMC configuration
Fisher Forecasts: Perform survey predictions