Model Selection and Information Criteria ======================================== HIcosmo provides a complete set of model selection tools, including AIC, BIC, and Bayesian evidence computation. These tools help compare the goodness of fit of different cosmological models. Information Criteria Overview ----------------------------- .. list-table:: :header-rows: 1 :widths: 15 25 30 30 * - Criterion - Formula - Use Case - Interpretation * - **AIC** - :math:`\chi^2_{\min} + 2k` - Small samples, exploratory analysis - Smaller values indicate a better model * - **BIC** - :math:`\chi^2_{\min} + k \ln N` - Large samples, model selection - Stronger penalty on the number of parameters * - **Bayesian Evidence** - :math:`\int L(\theta) \pi(\theta) d\theta` - Bayesian model comparison - Accounts for prior volume Where: - :math:`k` = number of free parameters - :math:`N` = number of data points - :math:`\chi^2_{\min}` = minimum chi-squared value Quick Start ----------- **3 lines to compute information criteria**: .. code-block:: python from hicosmo.visualization import information_criteria result = information_criteria('my_chain') print(f"AIC = {result['aic']:.2f}, BIC = {result['bic']:.2f}") Formula Details --------------- Chi-Squared Statistic ~~~~~~~~~~~~~~~~~~~~~ For a Gaussian likelihood function: .. math:: \log L = -\frac{\chi^2}{2} + \text{normalization constant} Where the normalization constant includes: - :math:`-\frac{1}{2} \log |C|`: determinant of the covariance matrix - Other constants independent of the parameters Extracting :math:`\chi^2` from the log-likelihood: .. math:: \chi^2 = -2 \times (\log L - \text{normalization constant}) .. note:: HIcosmo automatically handles the normalization constant, ensuring correct AIC/BIC values. No manual intervention is needed. AIC (Akaike Information Criterion) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Akaike Information Criterion: .. math:: \text{AIC} = \chi^2_{\min} + 2k **Characteristics**: - Penalty term is independent of sample size - Suitable for small sample cases - Tends to favor more complex models **Interpreting ΔAIC**: .. list-table:: :header-rows: 1 :widths: 30 70 * - ΔAIC - Interpretation * - < 2 - No significant difference * - 2 - 4 - Weak evidence in favor of the lower-AIC model * - 4 - 7 - Moderate evidence * - > 10 - Strong evidence BIC (Bayesian Information Criterion) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Bayesian Information Criterion: .. math:: \text{BIC} = \chi^2_{\min} + k \ln N **Characteristics**: - Penalty term increases with sample size - Stronger penalty against overfitting - Consistent estimator (selects the true model when the sample is large enough) **Jeffreys Scale Interpretation of ΔBIC**: .. list-table:: :header-rows: 1 :widths: 30 70 * - ΔBIC - Evidence Strength * - < 2 - Not worth more than a bare mention * - 2 - 6 - Positive evidence * - 6 - 10 - Strong evidence * - > 10 - Decisive evidence Bayesian Evidence ~~~~~~~~~~~~~~~~~ Bayesian evidence (marginal likelihood): .. math:: Z = \int L(\theta) \pi(\theta) d\theta Where :math:`\pi(\theta)` is the prior distribution. **Bayes Factor**: .. math:: B_{12} = \frac{Z_1}{Z_2} = \exp(\log Z_1 - \log Z_2) **Jeffreys Scale**: .. list-table:: :header-rows: 1 :widths: 30 30 40 * - Δlog Z - Bayes Factor B - Evidence Strength * - < 1.0 - < 2.7 - Not worth more than a bare mention * - 1.0 - 2.5 - 2.7 - 12 - Weak evidence * - 2.5 - 5.0 - 12 - 150 - Moderate evidence * - > 5.0 - > 150 - Strong evidence .. warning:: HIcosmo uses the **Harmonic Mean Estimator** to compute Bayesian evidence. This method can be unstable; for accurate Bayesian evidence computation, consider using nested sampling (e.g., dynesty). Handling the Normalization Constant ----------------------------------- Why Is the Normalization Constant Needed? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Consider the full form of a Gaussian likelihood function: .. math:: \log L = -\frac{1}{2} (d - t)^T C^{-1} (d - t) - \frac{1}{2} \log |C| - \frac{N}{2} \log(2\pi) Where: - The first term :math:`-\frac{\chi^2}{2}` depends on the model parameters - The latter two terms are the normalization constant, independent of the model parameters **Problem**: If :math:`\log L` is used directly to compute AIC/BIC: .. code-block:: python # Wrong! aic = 2*k - 2*log_likelihood # Will yield negative values **Solution**: Extract :math:`\chi^2` first, then compute: .. code-block:: python # Correct chi2 = -2 * (log_likelihood - normalization_constant) aic = chi2 + 2*k # Positive value HIcosmo's Implementation ~~~~~~~~~~~~~~~~~~~~~~~~~ HIcosmo automatically handles the normalization constant; no manual intervention is needed: 1. **Likelihood functions provide the normalization constant**: .. code-block:: python class PantheonPlusLikelihood(Likelihood): def get_normalization_constant(self) -> float: """Return the normalization constant""" return -0.5 * self.log_det_cov 2. **Automatically stored when saving MCMC results**: .. code-block:: python # Normalization constant is automatically included when saving the chain mcmc.save_results('my_chain') # metadata contains normalization_constant 3. **Automatically read by information criteria**: .. code-block:: python result = information_criteria('my_chain') # Automatically reads normalization_constant from metadata Impact on Model Comparison ~~~~~~~~~~~~~~~~~~~~~~~~~~ **Key conclusion**: The normalization constant has **no impact on model comparison results**! .. math:: \Delta\text{AIC} = \text{AIC}_1 - \text{AIC}_2 = (\chi^2_1 + 2k_1) - (\chi^2_2 + 2k_2) The normalization constant is the same (using the same data) and cancels out in the difference. .. code-block:: python # Verification delta_aic_without_norm = aic1 - aic2 delta_aic_with_norm = aic1_corrected - aic2_corrected # Both are equal! However, the **absolute values** of AIC/BIC require correct normalization to be meaningful: - Reduced :math:`\chi^2 = \chi^2_{\min}/N \approx 1` indicates a good fit - AIC, BIC must be positive numbers API Reference ------------- information_criteria Function ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from hicosmo.visualization import information_criteria result = information_criteria( samples='my_chain', # Chain name, path, or sample dictionary num_params=None, # Number of free parameters (auto-inferred) num_data=None, # Number of data points (auto-inferred) log_likelihood_fn=None, # Likelihood function (optional) evidence_method='harmonic_mean', # Evidence estimation method ) **Return Value**: .. code-block:: python { 'num_params': 4, # Number of free parameters k 'num_data': 1580, # Number of data points N 'n_samples': 10000, # MCMC sample count 'log_likelihood_max': 2300.68, # Maximum log-likelihood 'normalization_constant': 2994.34, # Normalization constant 'chi2_min': 1387.32, # Minimum chi-squared 'aic': 1395.32, # AIC 'bic': 1417.88, # BIC 'log_evidence': 2298.5, # log(Bayesian evidence) 'log_evidence_method': 'harmonic_mean', } aic and bic Functions ~~~~~~~~~~~~~~~~~~~~~ Compute AIC or BIC independently: .. code-block:: python from hicosmo.visualization.information_criteria import aic, bic # Compute AIC aic_value = aic( log_likelihood_max=2300.68, num_params=4, normalization_constant=2994.34 ) # Compute BIC bic_value = bic( log_likelihood_max=2300.68, num_params=4, num_data=1580, normalization_constant=2994.34 ) Complete Examples ----------------- Single-Model Information Criteria ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from hicosmo import MCMC from hicosmo.models import LCDM from hicosmo.likelihoods import SN_likelihood from hicosmo.visualization import information_criteria # Run MCMC likelihood = SN_likelihood(LCDM, 'pantheon+') params = { 'H0': (67.0, 60.0, 80.0), 'Omega_m': (0.3, 0.1, 0.5), } mcmc = MCMC(params, likelihood, chain_name='lcdm_pantheon') mcmc.run(num_samples=5000, num_chains=4) # Compute information criteria result = information_criteria('lcdm_pantheon') print(f"Number of free parameters k = {result['num_params']}") print(f"Number of data points N = {result['num_data']}") print(f"chi2_min = {result['chi2_min']:.2f}") print(f"Reduced chi2 = {result['chi2_min']/result['num_data']:.4f}") print(f"AIC = {result['aic']:.2f}") print(f"BIC = {result['bic']:.2f}") Multi-Model Comparison ~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from hicosmo import MCMC from hicosmo.models import LCDM, wCDM, CPL from hicosmo.likelihoods import SN_likelihood from hicosmo.visualization import information_criteria import numpy as np # Define three models models = { 'LCDM': (LCDM, ['H0', 'Omega_m']), 'wCDM': (wCDM, ['H0', 'Omega_m', 'w']), 'CPL': (CPL, ['H0', 'Omega_m', 'w0', 'wa']), } results = {} for name, (model_class, free_params) in models.items(): # Build parameter configuration params = {p: (67.0 if p == 'H0' else 0.3, 0.1, 0.9) for p in free_params} if 'w' in free_params: params['w'] = (-1.0, -2.0, 0.0) if 'w0' in free_params: params['w0'] = (-1.0, -2.0, 0.0) params['wa'] = (0.0, -1.0, 1.0) # Run MCMC likelihood = SN_likelihood(model_class, 'pantheon+') mcmc = MCMC(params, likelihood, chain_name=f'{name.lower()}_chain') mcmc.run(num_samples=5000, num_chains=4) # Compute information criteria results[name] = information_criteria(f'{name.lower()}_chain') # Print comparison table print("\nModel Selection Results") print("=" * 70) print(f"{'Model':<8} {'k':<4} {'chi2_min':<12} {'AIC':<12} {'BIC':<12} {'DBIC':<10}") print("-" * 70) bic_ref = results['LCDM']['bic'] for name, ic in results.items(): delta_bic = ic['bic'] - bic_ref print(f"{name:<8} {ic['num_params']:<4} {ic['chi2_min']:<12.2f} " f"{ic['aic']:<12.2f} {ic['bic']:<12.2f} {delta_bic:<+10.2f}") print("-" * 70) # Interpret results print("\nModel Selection Interpretation (Jeffreys Scale):") for name, ic in results.items(): if name == 'LCDM': continue delta = abs(ic['bic'] - bic_ref) if delta < 2: strength = "Not worth more than a bare mention" elif delta < 6: strength = "Positive evidence" elif delta < 10: strength = "Strong evidence" else: strength = "Decisive evidence" better = "LCDM" if ic['bic'] > bic_ref else name print(f" {name} vs LCDM: |DBIC| = {delta:.1f} -> {strength} in favor of {better}") Bayesian Evidence Comparison ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from hicosmo.visualization import information_criteria # Assume MCMC has been run for multiple models chains = ['lcdm_chain', 'wcdm_chain', 'cpl_chain'] names = ['LCDM', 'wCDM', 'CPL'] print("\nBayesian Evidence Comparison") print("=" * 60) print(f"{'Model':<8} {'log(Z)':<15} {'Dlog(Z)':<15} {'B':<10}") print("-" * 60) results = [information_criteria(c) for c in chains] log_z_ref = results[0]['log_evidence'] for name, ic in zip(names, results): delta = ic['log_evidence'] - log_z_ref bayes_factor = np.exp(delta) if abs(delta) < 100 else float('inf') print(f"{name:<8} {ic['log_evidence']:<15.4f} {delta:<+15.4f} {bayes_factor:<10.2f}") print("-" * 60) print("\nJeffreys Scale: |Dlog(Z)| > 5 indicates strong evidence") FAQ --- Q: Are negative AIC/BIC values correct? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **No.** AIC = chi2 + 2k and BIC = chi2 + k*ln(N) must be positive, because chi2 >= 0. If you see negative values, it may be due to using an uncorrected log-likelihood. HIcosmo handles this automatically. Q: Should I use AIC or BIC? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - **AIC**: Suitable for small samples and exploratory analysis; lighter penalty on complex models - **BIC**: Suitable for large samples and model selection; stronger penalty against overfitting In cosmology, **BIC** is generally recommended, as datasets are typically large (N > 1000). Q: What if the Bayesian evidence estimate is unstable? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The Harmonic Mean Estimator can indeed be unstable. Recommendations: 1. Increase the number of MCMC samples 2. Average over multiple chains 3. For accurate computation, use nested sampling (e.g., dynesty, polychord) Q: Does the normalization constant affect model comparison? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **No.** When comparing models using the same data, the normalization constant is identical and cancels in DAIC and DBIC. However, the normalization constant does affect **absolute values**: - Ensures chi2 > 0 - Ensures AIC, BIC are positive - Reduced chi2 ~ 1 indicates a good fit Next Steps ---------- - `Visualization `_: Plot constraint figures - `Samplers `_: Learn about MCMC configuration - `Fisher Forecasts `_: Perform survey predictions