Expectation

bgen_reader.allele_expectation(bgen, variant_idx)[source]

Allele expectation.

Compute the expectation of each allele from the genotype probabilities.

Parameters
  • bgen (bgen_file) – Bgen file handler.

  • variant_idx (int) – Variant index.

Returns

Samples-by-alleles matrix of allele expectations.

Return type

numpy.ndarray

Note

This function supports unphased genotypes only.

Examples

>>> from bgen_reader import allele_expectation, example_filepath, read_bgen
>>>
>>> from texttable import Texttable
>>>
>>> filepath = example_filepath("example.32bits.bgen")
>>>
>>> # Read the example.
>>> bgen = read_bgen(filepath, verbose=False)
>>>
>>> variants = bgen["variants"]
>>> samples = bgen["samples"]
>>> genotype = bgen["genotype"]
>>>
>>> genotype = bgen["genotype"]
>>> # This `compute` call will return a pandas data frame,
>>> variant = variants[variants["rsid"] == "RSID_6"].compute()
>>> # from which we retrieve the variant index.
>>> variant_idx = variant.index.values[0]
>>> print(variant)
        id    rsid chrom   pos  nalleles allele_ids  vaddr
4  SNPID_6  RSID_6    01  6000         2        A,G  19377
>>> genotype = bgen["genotype"]
>>> # Samples is a pandas series, and we retrieve the
>>> # sample index from the sample name.
>>> sample_idx = samples[samples == "sample_005"].index.values[0]
>>>
>>> genotype = bgen["genotype"]
>>> # This `compute` call will return a dictionary from which
>>> # we can get the probability matrix the corresponding
>>> # variant.
>>> p = genotype[variant_idx].compute()["probs"][sample_idx]
>>>
>>> genotype = bgen["genotype"]
>>> # Allele expectation makes sense for unphased genotypes only,
>>> # which is the case here.
>>> e = allele_expectation(bgen, variant_idx)[sample_idx]
>>>
>>> genotype = bgen["genotype"]
>>> alleles = variant["allele_ids"].values[0].split(",")
>>>
>>> genotype = bgen["genotype"]
>>>
>>> # Print what we have got in a nice format.
>>> table = Texttable()
>>> table = table.add_rows(
...     [
...         ["", "AA", "AG", "GG", "E[.]"],
...         ["p"] + list(p) + ["na"],
...         ["#" + alleles[0], 2, 1, 0, e[0]],
...         ["#" + alleles[1], 0, 1, 2, e[1]],
...     ]
... )
>>> print(table.draw())
+----+-------+-------+-------+-------+
|    |  AA   |  AG   |  GG   | E[.]  |
+====+=======+=======+=======+=======+
| p  | 0.012 | 0.987 | 0.001 | na    |
+----+-------+-------+-------+-------+
| #A | 2     | 1     | 0     | 1.011 |
+----+-------+-------+-------+-------+
| #G | 0     | 1     | 2     | 0.989 |
+----+-------+-------+-------+-------+