Dataset Management
HIcosmo provides automatic dataset discovery, allowing users to easily view all available observational data. No hardcoding is needed – when you add new datasets to the data directory, the system will automatically detect them.
Viewing Available Datasets
Quick View
The simplest way is to call show_available_datasets():
from hicosmo.likelihoods import show_available_datasets
# Show all available datasets
show_available_datasets()
Output example:
======================================================================
HIcosmo Available Datasets
======================================================================
BAO (Baryon Acoustic Oscillations)
--------------------------------------------------
• desi_2024 : DESI 2024 DR1 BAO measurements (z=0.3-2.3)
• boss_dr12 : BOSS DR12 Luminous Red Galaxy BAO
• sdss_dr12 : SDSS DR12 consensus BAO
• sdss_dr16 : SDSS DR16 LRG and QSO BAO
• sixdf : 6dF Galaxy Survey BAO (z=0.1)
SN (Type Ia Supernovae)
--------------------------------------------------
• pantheon+shoes : Pantheon+ with SH0ES Cepheid calibration
CMB (Cosmic Microwave Background)
--------------------------------------------------
• planck2018_distance : Planck 2018 distance priors (built-in)
Lensing (Strong Gravitational Lensing)
--------------------------------------------------
• h0licow : H0LiCOW strong lensing time delays (6 lenses)
• tdcosmo : TDCOSMO hierarchical strong lensing (7 lenses)
• tdcosmo2025 : TDCOSMO 2025 updated analysis
======================================================================
Programmatic Access
If you need to use the dataset list in code:
from hicosmo.likelihoods import list_all_datasets
# Get all datasets (as a dictionary)
datasets = list_all_datasets()
# View BAO datasets
print(datasets['bao'])
# ['desi_2024', 'boss_dr12', 'sdss_dr12', 'sdss_dr16', 'sixdf']
# View SN datasets
print(datasets['sn'])
# ['pantheon+shoes']
# Dynamically select datasets
for bao_name in datasets['bao']:
print(f"Available BAO dataset: {bao_name}")
View by Category
View datasets of a specific category only:
# Show BAO datasets only
show_available_datasets('bao')
# Show lensing datasets only
show_available_datasets('lensing')
Class Method Queries
Each Likelihood class also provides an available_datasets() method:
from hicosmo.likelihoods.bao import DESI2024BAO
from hicosmo.likelihoods.sn import PantheonPlusLikelihood
# Available datasets for the BAO class
print(DESI2024BAO.available_datasets())
# ['desi_2024', 'boss_dr12', 'sdss_dr12', 'sdss_dr16', 'sixdf']
# Available datasets for the SN class
print(PantheonPlusLikelihood.available_datasets())
# ['pantheon+shoes']
Dataset Categories
HIcosmo supports the following categories of observational data:
Category |
Full Name |
Included Datasets |
|---|---|---|
|
Baryon Acoustic Oscillations |
DESI 2024, BOSS DR12, SDSS DR12/16, 6dF |
|
Type Ia Supernovae |
Pantheon+, Pantheon+SH0ES |
|
Cosmic Microwave Background |
Planck 2018 distance priors |
|
Direct H0 Measurements |
SH0ES |
|
Strong Gravitational Lensing |
H0LiCOW, TDCOSMO, TDCOSMO 2025 |
Using Discovered Datasets
Use discovered datasets for MCMC analysis:
from hicosmo.likelihoods import (
BAO_likelihood, SN_likelihood,
list_all_datasets
)
from hicosmo.models import LCDM
# 1. View available datasets
datasets = list_all_datasets()
print("BAO datasets:", datasets['bao'])
# 2. Select a dataset to create a likelihood
bao = BAO_likelihood(LCDM, 'desi_2024')
sn = SN_likelihood(LCDM, 'pantheon+shoes')
# 3. Joint analysis
joint = bao + sn
# 4. Run MCMC
from hicosmo.samplers import MCMC
params = {
'H0': (70.0, 60.0, 80.0),
'Omega_m': (0.3, 0.1, 0.5),
}
mcmc = MCMC(params, joint, chain_name='joint_analysis')
mcmc.run(num_samples=2000)
Default Data Directory
Data file storage location:
Default path:
hicosmo/data/Environment variable: You can specify a custom path via
HICOSMO_DATA
Directory Structure
hicosmo/data/
├── bao_data/
│ ├── desi_2024/
│ │ ├── desi_2024_gaussian_bao_ALL_GCcomb_mean.txt
│ │ └── desi_2024_gaussian_bao_ALL_GCcomb_cov.txt
│ ├── boss_dr12/
│ ├── sdss_dr12/
│ ├── sdss_dr16/
│ └── sixdf/
├── sne/
│ ├── Pantheon+SH0ES.dat
│ └── Pantheon+SH0ES_STAT+SYS.cov
├── h0licow/
└── tdcosmo/
Adding New Datasets
Adding new datasets is very simple – just place the data files in the correct directory:
BAO data: Create a new directory under
bao_data/SN data: Place in the
sne/directoryOthers: Place in the corresponding category directory
Example: Adding a new BAO dataset
# Create a new dataset directory
mkdir hicosmo/data/bao_data/my_new_survey
# Copy data files
cp my_data.txt hicosmo/data/bao_data/my_new_survey/
Then refresh the data registry:
from hicosmo.likelihoods import list_all_datasets
# Force refresh to discover new datasets
datasets = list_all_datasets(refresh=True)
print(datasets['bao'])
# [..., 'my_new_survey'] # New dataset automatically discovered!
DataRegistry Advanced Usage
For more complex needs, you can use the DataRegistry class directly:
from hicosmo.data_registry import DataRegistry
# Create a registry instance
registry = DataRegistry()
# Get detailed information
info = registry.get_info('bao', 'desi_2024')
print(f"Name: {info.name}")
print(f"Description: {info.description}")
print(f"Number of files: {len(info.files)}")
print(f"Path: {info.path}")
# Export as dictionary (serializable to JSON)
data = registry.to_dict()
Complete Example
"""Complete example of dataset discovery and usage"""
from hicosmo.likelihoods import (
show_available_datasets,
list_all_datasets,
BAO_likelihood,
SN_likelihood,
)
from hicosmo.models import LCDM
# 1. View all available datasets
print("=" * 50)
print("Step 1: View available datasets")
print("=" * 50)
show_available_datasets()
# 2. Programmatic access
print("\nStep 2: Get dataset list")
datasets = list_all_datasets()
for category, names in datasets.items():
if names:
print(f" {category}: {names}")
# 3. Create likelihood
print("\nStep 3: Create Likelihood")
bao = BAO_likelihood(LCDM, datasets['bao'][0]) # Use the first BAO dataset
print(f" Created: {bao}")
# 4. Test likelihood
print("\nStep 4: Test Likelihood")
log_L = bao(H0=70.0, Omega_m=0.3)
print(f" log(L) = {log_L:.2f}")