>>> # Download a sample file >>> from bgen_reader import example_filepath >>> bgen_file = example_filepath("example.bgen") >>> # Read from the file >>> from bgen_reader import open_bgen >>> bgen = open_bgen(bgen_file, verbose=False) >>> probs0 = bgen.read(0) # Read 1st variant >>> print(probs0.shape) # Shape of the NumPy array (500, 1, 3) >>> probs_all = bgen.read() # Read all variants >>> print(probs_all.shape) # Shape of the NumPy array (500, 199, 3)
Bgen⧉ is a file format for storing large genetic datasets. It supports both unphased genotypes and phased haplotype data with variable ploidy and number of alleles. It was designed to provide a compact data representation without sacrificing variant access performance. This Python package is a wrapper around the bgen library⧉, a low-memory footprint reader that efficiently reads bgen files. It fully supports the bgen format specifications: 1.2 and 1.3; as well as their optional compressed formats.
We offer two APIs (interfaces to the library):
The Dask-Inspired API (original) API offers compatibility with previous version of this library, a dataframe-based interface, and good sustained reading speeds (about 250,000 distributions per second).
The NumPy-Inspired API (new) API offers an array-based interface and faster sustained reading speeds (about 4 million distributions per second). Both versions are memory efficient.