xsra

A performant and storage-efficient CLI tool to extract sequences from an SRA archive with support for FASTA, FASTQ, and BINSEQ outputs. The NCBI Sequence Read Archive (SRA) is a repository of raw sequencing data. The file format used by the SRA is a complicated binary database format that isn't directly readable by most bioinformatics tools. This tool makes use of the ncbi_vdb C-library via ncbi-vdb-sys to interact with SRA files through safe abstractions. This tool is designed to be a fast, storage-efficient, and more convenient replacement for the `fastq-dump` and `fasterq-dump` tools provided by the NCBI sra-tools. However, it is not a complete feature-for-feature replacement, and some functionality may be missing.

Tool Features

Multi-threaded extraction of SRA to FASTA, FASTQ, and BINSEQ outputs

Optional built-in compression of output files (FASTA, FASTQ) - [gzip, bgzip, zstd]

Choice of BINSEQ output format (*.bq and *.vbq)

Segment level extraction and filtering

Stream directly from SRA, prefetch to disk, or run on cloud providers

OSZAR »