Dataset Management ================== HIcosmo provides automatic dataset discovery, allowing users to easily view all available observational data. No hardcoding is needed -- when you add new datasets to the data directory, the system will automatically detect them. Viewing Available Datasets -------------------------- Quick View ~~~~~~~~~~ The simplest way is to call ``show_available_datasets()``: .. code-block:: python from hicosmo.likelihoods import show_available_datasets # Show all available datasets show_available_datasets() Output example:: ====================================================================== HIcosmo Available Datasets ====================================================================== BAO (Baryon Acoustic Oscillations) -------------------------------------------------- • desi_2024 : DESI 2024 DR1 BAO measurements (z=0.3-2.3) • boss_dr12 : BOSS DR12 Luminous Red Galaxy BAO • sdss_dr12 : SDSS DR12 consensus BAO • sdss_dr16 : SDSS DR16 LRG and QSO BAO • sixdf : 6dF Galaxy Survey BAO (z=0.1) SN (Type Ia Supernovae) -------------------------------------------------- • pantheon+shoes : Pantheon+ with SH0ES Cepheid calibration CMB (Cosmic Microwave Background) -------------------------------------------------- • planck2018_distance : Planck 2018 distance priors (built-in) Lensing (Strong Gravitational Lensing) -------------------------------------------------- • h0licow : H0LiCOW strong lensing time delays (6 lenses) • tdcosmo : TDCOSMO hierarchical strong lensing (7 lenses) • tdcosmo2025 : TDCOSMO 2025 updated analysis ====================================================================== Programmatic Access ~~~~~~~~~~~~~~~~~~~ If you need to use the dataset list in code: .. code-block:: python from hicosmo.likelihoods import list_all_datasets # Get all datasets (as a dictionary) datasets = list_all_datasets() # View BAO datasets print(datasets['bao']) # ['desi_2024', 'boss_dr12', 'sdss_dr12', 'sdss_dr16', 'sixdf'] # View SN datasets print(datasets['sn']) # ['pantheon+shoes'] # Dynamically select datasets for bao_name in datasets['bao']: print(f"Available BAO dataset: {bao_name}") View by Category ~~~~~~~~~~~~~~~~~ View datasets of a specific category only: .. code-block:: python # Show BAO datasets only show_available_datasets('bao') # Show lensing datasets only show_available_datasets('lensing') Class Method Queries ~~~~~~~~~~~~~~~~~~~~ Each Likelihood class also provides an ``available_datasets()`` method: .. code-block:: python from hicosmo.likelihoods.bao import DESI2024BAO from hicosmo.likelihoods.sn import PantheonPlusLikelihood # Available datasets for the BAO class print(DESI2024BAO.available_datasets()) # ['desi_2024', 'boss_dr12', 'sdss_dr12', 'sdss_dr16', 'sixdf'] # Available datasets for the SN class print(PantheonPlusLikelihood.available_datasets()) # ['pantheon+shoes'] Dataset Categories ------------------ HIcosmo supports the following categories of observational data: .. list-table:: Dataset Categories :header-rows: 1 :widths: 15 35 50 * - Category - Full Name - Included Datasets * - ``bao`` - Baryon Acoustic Oscillations - DESI 2024, BOSS DR12, SDSS DR12/16, 6dF * - ``sn`` - Type Ia Supernovae - Pantheon+, Pantheon+SH0ES * - ``cmb`` - Cosmic Microwave Background - Planck 2018 distance priors * - ``h0`` - Direct H0 Measurements - SH0ES * - ``lensing`` - Strong Gravitational Lensing - H0LiCOW, TDCOSMO, TDCOSMO 2025 Using Discovered Datasets ------------------------- Use discovered datasets for MCMC analysis: .. code-block:: python from hicosmo.likelihoods import ( BAO_likelihood, SN_likelihood, list_all_datasets ) from hicosmo.models import LCDM # 1. View available datasets datasets = list_all_datasets() print("BAO datasets:", datasets['bao']) # 2. Select a dataset to create a likelihood bao = BAO_likelihood(LCDM, 'desi_2024') sn = SN_likelihood(LCDM, 'pantheon+shoes') # 3. Joint analysis joint = bao + sn # 4. Run MCMC from hicosmo.samplers import MCMC params = { 'H0': (70.0, 60.0, 80.0), 'Omega_m': (0.3, 0.1, 0.5), } mcmc = MCMC(params, joint, chain_name='joint_analysis') mcmc.run(num_samples=2000) Default Data Directory ---------------------- Data file storage location: - **Default path**: ``hicosmo/data/`` - **Environment variable**: You can specify a custom path via ``HICOSMO_DATA`` Directory Structure ~~~~~~~~~~~~~~~~~~~ .. code-block:: text hicosmo/data/ ├── bao_data/ │ ├── desi_2024/ │ │ ├── desi_2024_gaussian_bao_ALL_GCcomb_mean.txt │ │ └── desi_2024_gaussian_bao_ALL_GCcomb_cov.txt │ ├── boss_dr12/ │ ├── sdss_dr12/ │ ├── sdss_dr16/ │ └── sixdf/ ├── sne/ │ ├── Pantheon+SH0ES.dat │ └── Pantheon+SH0ES_STAT+SYS.cov ├── h0licow/ └── tdcosmo/ Adding New Datasets ------------------- Adding new datasets is very simple -- just place the data files in the correct directory: 1. **BAO data**: Create a new directory under ``bao_data/`` 2. **SN data**: Place in the ``sne/`` directory 3. **Others**: Place in the corresponding category directory Example: Adding a new BAO dataset .. code-block:: bash # Create a new dataset directory mkdir hicosmo/data/bao_data/my_new_survey # Copy data files cp my_data.txt hicosmo/data/bao_data/my_new_survey/ Then refresh the data registry: .. code-block:: python from hicosmo.likelihoods import list_all_datasets # Force refresh to discover new datasets datasets = list_all_datasets(refresh=True) print(datasets['bao']) # [..., 'my_new_survey'] # New dataset automatically discovered! DataRegistry Advanced Usage --------------------------- For more complex needs, you can use the ``DataRegistry`` class directly: .. code-block:: python from hicosmo.data_registry import DataRegistry # Create a registry instance registry = DataRegistry() # Get detailed information info = registry.get_info('bao', 'desi_2024') print(f"Name: {info.name}") print(f"Description: {info.description}") print(f"Number of files: {len(info.files)}") print(f"Path: {info.path}") # Export as dictionary (serializable to JSON) data = registry.to_dict() Complete Example ---------------- .. code-block:: python """Complete example of dataset discovery and usage""" from hicosmo.likelihoods import ( show_available_datasets, list_all_datasets, BAO_likelihood, SN_likelihood, ) from hicosmo.models import LCDM # 1. View all available datasets print("=" * 50) print("Step 1: View available datasets") print("=" * 50) show_available_datasets() # 2. Programmatic access print("\nStep 2: Get dataset list") datasets = list_all_datasets() for category, names in datasets.items(): if names: print(f" {category}: {names}") # 3. Create likelihood print("\nStep 3: Create Likelihood") bao = BAO_likelihood(LCDM, datasets['bao'][0]) # Use the first BAO dataset print(f" Created: {bao}") # 4. Test likelihood print("\nStep 4: Test Likelihood") log_L = bao(H0=70.0, Omega_m=0.3) print(f" log(L) = {log_L:.2f}")