imgag / ngs-bits
Short-read and long-read sequencing tools for diagnostics
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing imgag/ngs-bits in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler view*ngs-bits* - Short-read and long-read sequencing tools for diagnostics Installation Binaries of *ngs-bits* are available via Bioconda: • **Binaries** for Linux/macOS Alternatively, *ngs-bits* can be built from sources. Use git to clone the most recent release (the source code package of GitHub does not contains required sub-modules): > git clone --recursive https://github.com/imgag/ngs-bits.git > cd ngs-bits > git checkout 2025_12 > git submodule update --recursive --init Depending on your operating system, building instructions vary slightly: • Building from **sources** for Linux • Building from **sources** for MacOS • Building from **sources** for Windows app requires a running server, instructions on how to deplpy it on a Linux machine can be found here Support Please report any issues or questions to the ngs-bits issue tracker. Documentation The documentation of individual tools is linked in the tools list below. For some tools the documentation pages contain only the command-line help, for other tools they contain more information. If you want to contribute, check the development documentation. License *ngs-bits* is provided under the MIT license, but is is based on other software components with different lincenses: • Qt is our base framwork for the graphical user interface, platform abstraction, data structures and much more. • htslib for HTS data format support (BAM, VCF, ...) • SimpleCrypt for weak encryption • QR-Code-generator for QR code generation ChangeLog Change log is available on the releases page. Citing You can cite ngs-bits in using Zenodo DOIs: • 2025_12: • 2025_09: A list of all releases/DOIs can be found here. Tools list _ngs-bits_ contains a lot of tools that are used for NGS-based diagnostics in our institute. Some of the tools need the NGSD, a database that contains for example gene, transcript and exon data. Installation instructions for the NGSD can be found here. Main tools • SeqPurge - A highly-sensitive adapter trimmer for paired-end short-read data. • SampleSimilarity - Calculates pairwise sample similarity metrics from VCF/BAM files. • SampleIdentity - Tries to identify datasets that are from the same patient based on BAM/CRAM files of WGS/WES/lrGS/RNA sequencing. • SampleGender - Determines sample gender based on a BAM file. • SampleAncestry - Estimates the ancestry of a sample based on variants. • CnvHunter - CNV detection from targeted resequencing data using non-matched control samples. • RohHunter - ROH detection based on a variant list annotated with AF values. • UpdHunter - UPD detection from trio variant data. QC tools The default output format of the quality control tools is qcML, an XML-based format for -omics quality control, that consists of an XML schema, which defined the overall structure of the format, and an ontology which defines the QC metrics that can be used. • ReadQC - Quality control tool for FASTQ files. • MappingQC - Quality control tool for a BAM file. • VariantQC - Quality control tool for a VCF file. • SomaticQC - Quality control tool for tumor-normal pairs (paper). • TrioMaternalContamination - Detects maternal contamination of a child using SNPs from parents. • TrioMendelianErrors - Determines mendelian error rate form a trio VCF file. • RnaQC - Calculates QC metrics for RNA samples. • QcToTsv - Converts qcML files to a TSV file. BAM tools • BamClipOverlap - (Soft-)Clips paired-end reads that overlap. • BamDownsample - Downsamples a BAM file to the given percentage of reads. • BamExtract - Extract reads from BAM/CRAM by read name. • BamFilter - Filters a BAM file by multiple criteria. • BamHighCoverage - Determines high-coverage regions in a BAM file. • BamInfo - Basic BAM information. • BamToFastq - Converts a coordinate-sorted BAM file to FASTQ files. • FastaFromBam - Download the reference genome FASTA file for a BAM/CRAM file. BED tools • BedAdd - Merges regions from several BED files. • BedAnnotateFromBed - Annotates BED file regions with information from a second BED file. • BedAnnotateGC - Annnotates the regions in a BED file with GC content. • BedAnnotateGenes - Annotates BED file regions with gene names (needs NGSD). • BedChunk - Splits regions in a BED file to chunks of a desired size. • BedCoverage - Annotates the regions in a BED file with the average coverage in one or several BAM files. • BedExtend - Extends the regions in a BED file by _n_ bases. • BedGeneOverlap - Calculates how much of each overlapping gene is covered (needs NGSD). • BedHighCoverage - Detects high-coverage regions from a BAM file. • BedInfo - Prints summary information about a BED file. • BedIntersect - Intersects two BED files. • BedLiftOver - Lift-over of regions in a BED file to a different genome build. • BedLowCoverage - Calcualtes regions of low coverage based on a input BED and BAM file. • BedMerge - Merges overlapping regions in a BED file. • BedReadCount - Annoates the regions in a BED file with the read count from a BAM file. • BedShrink - Shrinks the regions in a BED file by _n_ bases. • BedSort - Sorts the regions in a BED file • BedSubtract - Subracts one BED file from another BED file. • BedToFasta - Converts BED file to a FASTA file (based on the reference genome). • CnvReferenceCohort - Create a reference cohort for CNV calling from a list of coverage profiles. FASTQ tools • FastqAddBarcode - Adds sequences from separate FASTQ as barcodes to read IDs. • FastqConvert - Converts the quality scores from Illumina 1.5 offset to Sanger/Illumina 1.8 offset. • FastqConcat - Concatinates several FASTQ files into one output FASTQ file. • FastqDownsample - Downsamples paired-end FASTQ files. • FastqExtract - Extracts reads from a FASTQ file according to an ID list. • FastqExtractBarcode - Moves molecular barcodes of reads to a separate file. • FastqExtractUMI - Moves unique moleculare identifier from read sequence to read ID. • FastqFormat - Determines the quality score offset of a FASTQ file. • FastqList…