Dataset Management

HIcosmo provides automatic dataset discovery, allowing users to easily view all available observational data. No hardcoding is needed – when you add new datasets to the data directory, the system will automatically detect them.

Viewing Available Datasets

Quick View

The simplest way is to call show_available_datasets():

from hicosmo.likelihoods import show_available_datasets

# Show all available datasets
show_available_datasets()

Output example:

======================================================================
                    HIcosmo Available Datasets
======================================================================

  BAO (Baryon Acoustic Oscillations)
  --------------------------------------------------
    • desi_2024        : DESI 2024 DR1 BAO measurements (z=0.3-2.3)
    • boss_dr12        : BOSS DR12 Luminous Red Galaxy BAO
    • sdss_dr12        : SDSS DR12 consensus BAO
    • sdss_dr16        : SDSS DR16 LRG and QSO BAO
    • sixdf            : 6dF Galaxy Survey BAO (z=0.1)

  SN (Type Ia Supernovae)
  --------------------------------------------------
    • pantheon+shoes   : Pantheon+ with SH0ES Cepheid calibration

  CMB (Cosmic Microwave Background)
  --------------------------------------------------
    • planck2018_distance : Planck 2018 distance priors (built-in)

  Lensing (Strong Gravitational Lensing)
  --------------------------------------------------
    • h0licow          : H0LiCOW strong lensing time delays (6 lenses)
    • tdcosmo          : TDCOSMO hierarchical strong lensing (7 lenses)
    • tdcosmo2025      : TDCOSMO 2025 updated analysis

======================================================================

Programmatic Access

If you need to use the dataset list in code:

from hicosmo.likelihoods import list_all_datasets

# Get all datasets (as a dictionary)
datasets = list_all_datasets()

# View BAO datasets
print(datasets['bao'])
# ['desi_2024', 'boss_dr12', 'sdss_dr12', 'sdss_dr16', 'sixdf']

# View SN datasets
print(datasets['sn'])
# ['pantheon+shoes']

# Dynamically select datasets
for bao_name in datasets['bao']:
    print(f"Available BAO dataset: {bao_name}")

View by Category

View datasets of a specific category only:

# Show BAO datasets only
show_available_datasets('bao')

# Show lensing datasets only
show_available_datasets('lensing')

Class Method Queries

Each Likelihood class also provides an available_datasets() method:

from hicosmo.likelihoods.bao import DESI2024BAO
from hicosmo.likelihoods.sn import PantheonPlusLikelihood

# Available datasets for the BAO class
print(DESI2024BAO.available_datasets())
# ['desi_2024', 'boss_dr12', 'sdss_dr12', 'sdss_dr16', 'sixdf']

# Available datasets for the SN class
print(PantheonPlusLikelihood.available_datasets())
# ['pantheon+shoes']

Dataset Categories

HIcosmo supports the following categories of observational data:

Dataset Categories

Category

Full Name

Included Datasets

bao

Baryon Acoustic Oscillations

DESI 2024, BOSS DR12, SDSS DR12/16, 6dF

sn

Type Ia Supernovae

Pantheon+, Pantheon+SH0ES

cmb

Cosmic Microwave Background

Planck 2018 distance priors

h0

Direct H0 Measurements

SH0ES

lensing

Strong Gravitational Lensing

H0LiCOW, TDCOSMO, TDCOSMO 2025

Using Discovered Datasets

Use discovered datasets for MCMC analysis:

from hicosmo.likelihoods import (
    BAO_likelihood, SN_likelihood,
    list_all_datasets
)
from hicosmo.models import LCDM

# 1. View available datasets
datasets = list_all_datasets()
print("BAO datasets:", datasets['bao'])

# 2. Select a dataset to create a likelihood
bao = BAO_likelihood(LCDM, 'desi_2024')
sn = SN_likelihood(LCDM, 'pantheon+shoes')

# 3. Joint analysis
joint = bao + sn

# 4. Run MCMC
from hicosmo.samplers import MCMC
params = {
    'H0': (70.0, 60.0, 80.0),
    'Omega_m': (0.3, 0.1, 0.5),
}
mcmc = MCMC(params, joint, chain_name='joint_analysis')
mcmc.run(num_samples=2000)

Default Data Directory

Data file storage location:

  • Default path: hicosmo/data/

  • Environment variable: You can specify a custom path via HICOSMO_DATA

Directory Structure

hicosmo/data/
├── bao_data/
│   ├── desi_2024/
│   │   ├── desi_2024_gaussian_bao_ALL_GCcomb_mean.txt
│   │   └── desi_2024_gaussian_bao_ALL_GCcomb_cov.txt
│   ├── boss_dr12/
│   ├── sdss_dr12/
│   ├── sdss_dr16/
│   └── sixdf/
├── sne/
│   ├── Pantheon+SH0ES.dat
│   └── Pantheon+SH0ES_STAT+SYS.cov
├── h0licow/
└── tdcosmo/

Adding New Datasets

Adding new datasets is very simple – just place the data files in the correct directory:

  1. BAO data: Create a new directory under bao_data/

  2. SN data: Place in the sne/ directory

  3. Others: Place in the corresponding category directory

Example: Adding a new BAO dataset

# Create a new dataset directory
mkdir hicosmo/data/bao_data/my_new_survey

# Copy data files
cp my_data.txt hicosmo/data/bao_data/my_new_survey/

Then refresh the data registry:

from hicosmo.likelihoods import list_all_datasets

# Force refresh to discover new datasets
datasets = list_all_datasets(refresh=True)
print(datasets['bao'])
# [..., 'my_new_survey']  # New dataset automatically discovered!

DataRegistry Advanced Usage

For more complex needs, you can use the DataRegistry class directly:

from hicosmo.data_registry import DataRegistry

# Create a registry instance
registry = DataRegistry()

# Get detailed information
info = registry.get_info('bao', 'desi_2024')
print(f"Name: {info.name}")
print(f"Description: {info.description}")
print(f"Number of files: {len(info.files)}")
print(f"Path: {info.path}")

# Export as dictionary (serializable to JSON)
data = registry.to_dict()

Complete Example

"""Complete example of dataset discovery and usage"""
from hicosmo.likelihoods import (
    show_available_datasets,
    list_all_datasets,
    BAO_likelihood,
    SN_likelihood,
)
from hicosmo.models import LCDM

# 1. View all available datasets
print("=" * 50)
print("Step 1: View available datasets")
print("=" * 50)
show_available_datasets()

# 2. Programmatic access
print("\nStep 2: Get dataset list")
datasets = list_all_datasets()
for category, names in datasets.items():
    if names:
        print(f"  {category}: {names}")

# 3. Create likelihood
print("\nStep 3: Create Likelihood")
bao = BAO_likelihood(LCDM, datasets['bao'][0])  # Use the first BAO dataset
print(f"  Created: {bao}")

# 4. Test likelihood
print("\nStep 4: Test Likelihood")
log_L = bao(H0=70.0, Omega_m=0.3)
print(f"  log(L) = {log_L:.2f}")