Model Selection and Information Criteria

HIcosmo provides a complete set of model selection tools, including AIC, BIC, and Bayesian evidence computation. These tools help compare the goodness of fit of different cosmological models.

Information Criteria Overview

Criterion

Formula

Use Case

Interpretation

AIC

\(\chi^2_{\min} + 2k\)

Small samples, exploratory analysis

Smaller values indicate a better model

BIC

\(\chi^2_{\min} + k \ln N\)

Large samples, model selection

Stronger penalty on the number of parameters

Bayesian Evidence

\(\int L(\theta) \pi(\theta) d\theta\)

Bayesian model comparison

Accounts for prior volume

Where:

  • \(k\) = number of free parameters

  • \(N\) = number of data points

  • \(\chi^2_{\min}\) = minimum chi-squared value

Quick Start

3 lines to compute information criteria:

from hicosmo.visualization import information_criteria

result = information_criteria('my_chain')
print(f"AIC = {result['aic']:.2f}, BIC = {result['bic']:.2f}")

Formula Details

Chi-Squared Statistic

For a Gaussian likelihood function:

\[\log L = -\frac{\chi^2}{2} + \text{normalization constant}\]

Where the normalization constant includes:

  • \(-\frac{1}{2} \log |C|\): determinant of the covariance matrix

  • Other constants independent of the parameters

Extracting \(\chi^2\) from the log-likelihood:

\[\chi^2 = -2 \times (\log L - \text{normalization constant})\]

Note

HIcosmo automatically handles the normalization constant, ensuring correct AIC/BIC values. No manual intervention is needed.

AIC (Akaike Information Criterion)

Akaike Information Criterion:

\[\text{AIC} = \chi^2_{\min} + 2k\]

Characteristics:

  • Penalty term is independent of sample size

  • Suitable for small sample cases

  • Tends to favor more complex models

Interpreting ΔAIC:

ΔAIC

Interpretation

< 2

No significant difference

2 - 4

Weak evidence in favor of the lower-AIC model

4 - 7

Moderate evidence

> 10

Strong evidence

BIC (Bayesian Information Criterion)

Bayesian Information Criterion:

\[\text{BIC} = \chi^2_{\min} + k \ln N\]

Characteristics:

  • Penalty term increases with sample size

  • Stronger penalty against overfitting

  • Consistent estimator (selects the true model when the sample is large enough)

Jeffreys Scale Interpretation of ΔBIC:

ΔBIC

Evidence Strength

< 2

Not worth more than a bare mention

2 - 6

Positive evidence

6 - 10

Strong evidence

> 10

Decisive evidence

Bayesian Evidence

Bayesian evidence (marginal likelihood):

\[Z = \int L(\theta) \pi(\theta) d\theta\]

Where \(\pi(\theta)\) is the prior distribution.

Bayes Factor:

\[B_{12} = \frac{Z_1}{Z_2} = \exp(\log Z_1 - \log Z_2)\]

Jeffreys Scale:

Δlog Z

Bayes Factor B

Evidence Strength

< 1.0

< 2.7

Not worth more than a bare mention

1.0 - 2.5

2.7 - 12

Weak evidence

2.5 - 5.0

12 - 150

Moderate evidence

> 5.0

> 150

Strong evidence

Warning

HIcosmo uses the Harmonic Mean Estimator to compute Bayesian evidence. This method can be unstable; for accurate Bayesian evidence computation, consider using nested sampling (e.g., dynesty).

Handling the Normalization Constant

Why Is the Normalization Constant Needed?

Consider the full form of a Gaussian likelihood function:

\[\log L = -\frac{1}{2} (d - t)^T C^{-1} (d - t) - \frac{1}{2} \log |C| - \frac{N}{2} \log(2\pi)\]

Where:

  • The first term \(-\frac{\chi^2}{2}\) depends on the model parameters

  • The latter two terms are the normalization constant, independent of the model parameters

Problem: If \(\log L\) is used directly to compute AIC/BIC:

# Wrong!
aic = 2*k - 2*log_likelihood  # Will yield negative values

Solution: Extract \(\chi^2\) first, then compute:

# Correct
chi2 = -2 * (log_likelihood - normalization_constant)
aic = chi2 + 2*k  # Positive value

HIcosmo’s Implementation

HIcosmo automatically handles the normalization constant; no manual intervention is needed:

  1. Likelihood functions provide the normalization constant:

class PantheonPlusLikelihood(Likelihood):
    def get_normalization_constant(self) -> float:
        """Return the normalization constant"""
        return -0.5 * self.log_det_cov
  1. Automatically stored when saving MCMC results:

# Normalization constant is automatically included when saving the chain
mcmc.save_results('my_chain')
# metadata contains normalization_constant
  1. Automatically read by information criteria:

result = information_criteria('my_chain')
# Automatically reads normalization_constant from metadata

Impact on Model Comparison

Key conclusion: The normalization constant has no impact on model comparison results!

\[\Delta\text{AIC} = \text{AIC}_1 - \text{AIC}_2 = (\chi^2_1 + 2k_1) - (\chi^2_2 + 2k_2)\]

The normalization constant is the same (using the same data) and cancels out in the difference.

# Verification
delta_aic_without_norm = aic1 - aic2
delta_aic_with_norm = aic1_corrected - aic2_corrected
# Both are equal!

However, the absolute values of AIC/BIC require correct normalization to be meaningful:

  • Reduced \(\chi^2 = \chi^2_{\min}/N \approx 1\) indicates a good fit

  • AIC, BIC must be positive numbers

API Reference

information_criteria Function

from hicosmo.visualization import information_criteria

result = information_criteria(
    samples='my_chain',           # Chain name, path, or sample dictionary
    num_params=None,              # Number of free parameters (auto-inferred)
    num_data=None,                # Number of data points (auto-inferred)
    log_likelihood_fn=None,       # Likelihood function (optional)
    evidence_method='harmonic_mean',  # Evidence estimation method
)

Return Value:

{
    'num_params': 4,                    # Number of free parameters k
    'num_data': 1580,                   # Number of data points N
    'n_samples': 10000,                 # MCMC sample count
    'log_likelihood_max': 2300.68,      # Maximum log-likelihood
    'normalization_constant': 2994.34,  # Normalization constant
    'chi2_min': 1387.32,                # Minimum chi-squared
    'aic': 1395.32,                     # AIC
    'bic': 1417.88,                     # BIC
    'log_evidence': 2298.5,             # log(Bayesian evidence)
    'log_evidence_method': 'harmonic_mean',
}

aic and bic Functions

Compute AIC or BIC independently:

from hicosmo.visualization.information_criteria import aic, bic

# Compute AIC
aic_value = aic(
    log_likelihood_max=2300.68,
    num_params=4,
    normalization_constant=2994.34
)

# Compute BIC
bic_value = bic(
    log_likelihood_max=2300.68,
    num_params=4,
    num_data=1580,
    normalization_constant=2994.34
)

Complete Examples

Single-Model Information Criteria

from hicosmo import MCMC
from hicosmo.models import LCDM
from hicosmo.likelihoods import SN_likelihood
from hicosmo.visualization import information_criteria

# Run MCMC
likelihood = SN_likelihood(LCDM, 'pantheon+')
params = {
    'H0': (67.0, 60.0, 80.0),
    'Omega_m': (0.3, 0.1, 0.5),
}

mcmc = MCMC(params, likelihood, chain_name='lcdm_pantheon')
mcmc.run(num_samples=5000, num_chains=4)

# Compute information criteria
result = information_criteria('lcdm_pantheon')

print(f"Number of free parameters k = {result['num_params']}")
print(f"Number of data points N = {result['num_data']}")
print(f"chi2_min = {result['chi2_min']:.2f}")
print(f"Reduced chi2 = {result['chi2_min']/result['num_data']:.4f}")
print(f"AIC = {result['aic']:.2f}")
print(f"BIC = {result['bic']:.2f}")

Multi-Model Comparison

from hicosmo import MCMC
from hicosmo.models import LCDM, wCDM, CPL
from hicosmo.likelihoods import SN_likelihood
from hicosmo.visualization import information_criteria
import numpy as np

# Define three models
models = {
    'LCDM': (LCDM, ['H0', 'Omega_m']),
    'wCDM': (wCDM, ['H0', 'Omega_m', 'w']),
    'CPL': (CPL, ['H0', 'Omega_m', 'w0', 'wa']),
}

results = {}

for name, (model_class, free_params) in models.items():
    # Build parameter configuration
    params = {p: (67.0 if p == 'H0' else 0.3, 0.1, 0.9) for p in free_params}
    if 'w' in free_params:
        params['w'] = (-1.0, -2.0, 0.0)
    if 'w0' in free_params:
        params['w0'] = (-1.0, -2.0, 0.0)
        params['wa'] = (0.0, -1.0, 1.0)

    # Run MCMC
    likelihood = SN_likelihood(model_class, 'pantheon+')
    mcmc = MCMC(params, likelihood, chain_name=f'{name.lower()}_chain')
    mcmc.run(num_samples=5000, num_chains=4)

    # Compute information criteria
    results[name] = information_criteria(f'{name.lower()}_chain')

# Print comparison table
print("\nModel Selection Results")
print("=" * 70)
print(f"{'Model':<8} {'k':<4} {'chi2_min':<12} {'AIC':<12} {'BIC':<12} {'DBIC':<10}")
print("-" * 70)

bic_ref = results['LCDM']['bic']
for name, ic in results.items():
    delta_bic = ic['bic'] - bic_ref
    print(f"{name:<8} {ic['num_params']:<4} {ic['chi2_min']:<12.2f} "
          f"{ic['aic']:<12.2f} {ic['bic']:<12.2f} {delta_bic:<+10.2f}")

print("-" * 70)

# Interpret results
print("\nModel Selection Interpretation (Jeffreys Scale):")
for name, ic in results.items():
    if name == 'LCDM':
        continue
    delta = abs(ic['bic'] - bic_ref)
    if delta < 2:
        strength = "Not worth more than a bare mention"
    elif delta < 6:
        strength = "Positive evidence"
    elif delta < 10:
        strength = "Strong evidence"
    else:
        strength = "Decisive evidence"
    better = "LCDM" if ic['bic'] > bic_ref else name
    print(f"  {name} vs LCDM: |DBIC| = {delta:.1f} -> {strength} in favor of {better}")

Bayesian Evidence Comparison

from hicosmo.visualization import information_criteria

# Assume MCMC has been run for multiple models
chains = ['lcdm_chain', 'wcdm_chain', 'cpl_chain']
names = ['LCDM', 'wCDM', 'CPL']

print("\nBayesian Evidence Comparison")
print("=" * 60)
print(f"{'Model':<8} {'log(Z)':<15} {'Dlog(Z)':<15} {'B':<10}")
print("-" * 60)

results = [information_criteria(c) for c in chains]
log_z_ref = results[0]['log_evidence']

for name, ic in zip(names, results):
    delta = ic['log_evidence'] - log_z_ref
    bayes_factor = np.exp(delta) if abs(delta) < 100 else float('inf')
    print(f"{name:<8} {ic['log_evidence']:<15.4f} {delta:<+15.4f} {bayes_factor:<10.2f}")

print("-" * 60)
print("\nJeffreys Scale: |Dlog(Z)| > 5 indicates strong evidence")

FAQ

Q: Are negative AIC/BIC values correct?

No. AIC = chi2 + 2k and BIC = chi2 + k*ln(N) must be positive, because chi2 >= 0.

If you see negative values, it may be due to using an uncorrected log-likelihood. HIcosmo handles this automatically.

Q: Should I use AIC or BIC?

  • AIC: Suitable for small samples and exploratory analysis; lighter penalty on complex models

  • BIC: Suitable for large samples and model selection; stronger penalty against overfitting

In cosmology, BIC is generally recommended, as datasets are typically large (N > 1000).

Q: What if the Bayesian evidence estimate is unstable?

The Harmonic Mean Estimator can indeed be unstable. Recommendations:

  1. Increase the number of MCMC samples

  2. Average over multiple chains

  3. For accurate computation, use nested sampling (e.g., dynesty, polychord)

Q: Does the normalization constant affect model comparison?

No. When comparing models using the same data, the normalization constant is identical and cancels in DAIC and DBIC.

However, the normalization constant does affect absolute values:

  • Ensures chi2 > 0

  • Ensures AIC, BIC are positive

  • Reduced chi2 ~ 1 indicates a good fit

Next Steps