Dosage¶
- bgen_reader.compute_dosage(expec, alt=None)[source]
Compute dosage from allele expectation.
- Parameters
expec (array_like) – Allele expectations encoded as a samples-by-alleles matrix.
alt (array_like, optional) – Alternative allele index. If
None
, the allele having the minor allele frequency for the providedexpec
is used as the alternative. Defaults toNone
.
- Returns
Dosage encoded as an array of size equal to the number of samples.
- Return type
Examples
First a quick-start example.¶>>> from bgen_reader import allele_expectation, compute_dosage >>> from bgen_reader import example_filepath, read_bgen >>> >>> filepath = example_filepath("example.32bits.bgen") >>> >>> # Read the example. >>> bgen = read_bgen(filepath, verbose=False) >>> >>> # Extract the allele expectations of the fourth variant. >>> variant_idx = 3 >>> e = allele_expectation(bgen, variant_idx) >>> >>> # Compute the dosage when considering the first allele >>> # as the reference/alternative one. >>> alt_allele_idx = 1 >>> d = compute_dosage(e, alt=alt_allele_idx) >>> >>> # Print the dosage of the first five samples only. >>> print(d[:5]) [1.96185308 0.00982666 0.01745552 1.00347899 1.01153563]
Genotype probabilities, allele expectations and frequencies.¶>>> from bgen_reader import ( ... allele_expectation, ... allele_frequency, ... compute_dosage, ... example_filepath, ... read_bgen, ... ) >>> from pandas import DataFrame >>> from xarray import DataArray >>> >>> # Download an example >>> filepath = example_filepath("example.32bits.bgen") >>> >>> # Open the bgen file. >>> bgen = read_bgen(filepath, verbose=False) >>> variants = bgen["variants"] >>> genotype = bgen["genotype"] >>> samples = bgen["samples"] >>> >>> variant_idx = 3 >>> variant = variants.loc[variant_idx].compute() >>> # Print the metadata of the fourth variant. >>> print(variant) id rsid chrom pos nalleles allele_ids vaddr 3 SNPID_5 RSID_5 01 5000 2 A,G 16034 >>> geno = bgen["genotype"][variant_idx].compute() >>> metageno = DataFrame({k: geno[k] for k in ["ploidy", "missing"]}, ... index=samples) >>> metageno.index.name = "sample" >>> print(metageno) ploidy missing sample sample_001 2 False sample_002 2 False sample_003 2 False sample_004 2 False ... ... ... sample_497 2 False sample_498 2 False sample_499 2 False sample_500 2 False [500 rows x 2 columns] >>> p = DataArray( ... geno["probs"], ... name="probability", ... coords={"sample": samples}, ... dims=["sample", "genotype"], ... ) >>> # Print the genotype probabilities. >>> print(p.to_series().unstack(level=-1)) genotype 0 1 2 sample sample_001 0.00488 0.02838 0.96674 sample_002 0.99045 0.00928 0.00027 sample_003 0.98932 0.00391 0.00677 sample_004 0.00662 0.98328 0.01010 ... ... ... ... sample_497 0.00137 0.01312 0.98550 sample_498 0.00552 0.99423 0.00024 sample_499 0.01266 0.01154 0.97580 sample_500 0.00021 0.98431 0.01547 [500 rows x 3 columns] >>> alleles = variant["allele_ids"].values[0].split(",") >>> e = DataArray( ... allele_expectation(bgen, variant_idx), ... name="expectation", ... coords={"sample": samples, "allele": alleles}, ... dims=["sample", "allele"], ... ) >>> # Print the allele expectations. >>> print(e.to_series().unstack(level=-1)) allele A G sample sample_001 0.03815 1.96185 sample_002 1.99017 0.00983 sample_003 1.98254 0.01746 sample_004 0.99652 1.00348 ... ... ... sample_497 0.01587 1.98413 sample_498 1.00528 0.99472 sample_499 0.03687 1.96313 sample_500 0.98474 1.01526 [500 rows x 2 columns] >>> rsid = variant["rsid"].values[0] >>> chrom = variant["chrom"].values[0] >>> variant_name = f"{chrom}:{rsid}" >>> f = DataFrame(allele_frequency(e), columns=[variant_name], index=alleles) >>> f.index.name = "allele" >>> # Allele frequencies. >>> print(f) 01:RSID_5 allele A 305.97218 G 194.02782 >>> alt = f.idxmin().values[0] >>> alt_idx = alleles.index(alt) >>> d = compute_dosage(e, alt=alt_idx).to_series() >>> d = DataFrame(d.values, columns=[f"alt={alt}"], index=d.index) >>> # Dosages when considering G as the alternative allele. >>> print(d) alt=G sample sample_001 1.96185 sample_002 0.00983 sample_003 0.01746 sample_004 1.00348 ... ... sample_497 1.98413 sample_498 0.99472 sample_499 1.96313 sample_500 1.01526 [500 rows x 1 columns]