Bgen-reader’s documentation
>>> # Download a sample file
>>> from bgen_reader import example_filepath
>>> bgen_file = example_filepath("example.bgen")
>>> # Read from the file
>>> from bgen_reader import open_bgen
>>> bgen = open_bgen(bgen_file, verbose=False)
>>> probs0 = bgen.read(0) # Read 1st variant
>>> print(probs0.shape) # Shape of the NumPy array
(500, 1, 3)
>>> probs_all = bgen.read() # Read all variants
>>> print(probs_all.shape) # Shape of the NumPy array
(500, 199, 3)
Bgen⧉ is a file format for storing large genetic datasets. It supports both unphased genotypes and phased haplotype data with variable ploidy and number of alleles. It was designed to provide a compact data representation without sacrificing variant access performance. This Python package is a wrapper around the bgen library⧉, a low-memory footprint reader that efficiently reads bgen files. It fully supports the bgen format specifications: 1.2 and 1.3; as well as their optional compressed formats.
We offer two APIs (interfaces to the library):
The Dask-Inspired API (original) API offers compatibility with previous version of this library, a dataframe-based interface, and good sustained reading speeds (about 250,000 distributions per second).
The NumPy-Inspired API (new) API offers an array-based interface and faster sustained reading speeds (about 4 million distributions per second). Both versions are memory efficient.
Comments and bugs
You can get the source and open issues on Github⧉.