Next Generation Sequencing (NGS) software packages


In the era of Next Generation Sequencing (NGS) technology, it is easy to sequence whole genome, exome and transcriptome of an organism. But there are several challenges also associated with analysis of data produce by these technologies as high throughput data came in form of short reads, and also containing several artifacts. We have developed several modules for the analysis of Next Generation Sequencing (NGS) data, generated after sequencing of whole genomes, transcriptomes and human exomes.

Automated pipeline for whole genome assembly and annotation

We have developed an automated pipeline for genome assembly and annotation of microbial genomes. User can provide path of input sequencing reads files and parameters in the configuration file for the pipeline. This pipeline work in three steps; (i) Filtering of genome sequencing data, (ii) Genome assembly of filtered reads, (iii) Genome annotation of assembled genome.

USAGE: assemb_anno.pl -i (Configuration file) -o (Output directory name)
Example Command: ./assemb_anno.pl -i Configuration_file -o my_out
-i Configuration_file
-o Output Directory


Benchmarking of Genome assemblers (GenomeABC server)

Recently, several algorithms have been developed for assembling of whole genome from short reads. A number of algorithms are available free for public use in form of software packages such as Velvet, SOAPdenovo, AbySS, Euler-sr, Edena and SSAKE. Presently, it is difficult for a user to choose appropriate assembler for their genomes due to lack of benchmarking of existing genome assemblers. We have developed GenomeABC software for the bencmarking of assembled genomes. Here, we have included three modules for the purposes; (i) Benchmarking of genome assembles, (ii) Generation of artificial genome and simulated reads, (iii) Generation of mutated genome and simulated reads corresponding to this.

(i) Benchmarking of genome assembles
This is a major module of GenomeABC which allows users to evaluate their assemblers. In order to use this module user should provide reference genome and contigs generated by their assemblers. This module will compare contigs and reference genome in order to evaluate performance of assemblers. In this study, BLAT is used to map contigs on reference genome.

USAGE: benchmarking_new_assembled_genome.pl -c (fasta format contig file) -r (fasta format reference genome file) -o (output file name)
Example Command: ./benchmarking_new_assembled_genome.pl -c contigs.fasta -r ref.fasta -o out.txt
-c Sequence in FASTA format
-r Reference genome file
-o Output Directory

(ii) Generation of artificial genome and simulated reads
This module of server allows users to mutate a genome. User should upload reference genome and specify percent of nucleotide tobe mutated in reference genome. This module will randomly mutate the desired number of position (% of mutation) in reference genome. This module also allows users to generate simulated short reads (single-end or paired-end reads). This module will be useful for evaluating assemblers which assemble genomes based on similar reference genomes.

USAGE: make_genome.pl -s (Genome Size (Put 5000000 for 5-Mb)) -a (A % (i.e. 25%)) -t (T % (i.e. 25%)) -g (G % (i.e. 25%)) -c (C % (i.e. 25%)) -l (Read length) -i (Insert length) -v (Coverage) -y (Type of reads) -o (Out directory) -s Size of genome shich have to be created.
-a Percentage of A in the genome.
-t Percentage of T in the genome.
-g Percentage of G in the genome.
-c Percentage of C in the genome.
-l Read length.
-i Insert length.
-v Coverage.
-y Type of reads(single end (1) or paired end (2)).
-o Output directory name.

(iii) Generation of mutated genome and simulated reads
This module of server allows users to mutate a genome. User should upload reference genome and specify percent of nucleotide to be mutated in reference genome. This module will randomly mutate the desired number of position (% of mutation) in reference genome. This module also allows users to generate simulated short reads (single-end or paired-end reads). This module will be useful for evaluating assemblers which assemble genomes based on similar reference genomes.

USAGE: make_mut_genome.pl -i (Input genome fasta file) -m (Percentage of mutation) -l (Read length) -f (Insert length) -c (Coverage) -y (Type of reads) -o (Out put file)
-i Input genome file.
-m Percentage of mutation.
-l Read length.
-f Insert length.
-c Coverage.
-y Type of reads(single end (1) or paired end (2)).
-o Output directory name.


Variation detection in normal-tumor paired data

We have developed a pipeline for the identification of SNPs and somatic variations among normal-tumor paired sequencing data. User should provide sequencing data of tumor sample and normal tissue sample of same individual for the comparison of both data simultaneously and identification of SNPs and somatic variation. This pipeline works in several steps by usingdifferent kind of freely available tools; (i) Filtering of sequencing data, (ii) Alignment of filtered reads to human genome, (iii) Variation detection in the normal-tumor samples (IV) Mapping of somatic varaiations at gene level.

USAGE: variation_detect.pl -i (Configuration file) -o (Output directory name)
Example Command: ./variation_detect.pl Configuration_file -o my_out
-i Configuration_file
-o Output Directory


Software packages (.deb) for genome assembly and annotation

We have also developed some debian (.deb) packages for whole genome asembly and annotation from Next Generation Sequencing (NGS) data. After installing OSDDlinux, user can download and install these .deb packages in the system.
ProgramPurposeUsage
ABySSGenome assemblerCommand line
AmosGenome assemblerCommand line
ArtemisGenome ViewerGraphical user interface
AugustusGene predictionCommand line
AmphoraPhylogenomic Inference Pipeline for Bacterial and Archaeal SequencesCommand line
AnnovarVariation predictionCommand line
BlatAlignment tool, faster than BLASTCommand line
BlastAlignment toolCommand line
BrigGenome ViewerGraphical user interface
CeleraGenome assemblerCommand line
ChimerascanChimeric transcripts detectorCommand line
CufflinksTranscript assembly, differential expression, and differential regulation for RNA-SeqCommand line
EdenaGenome assemblerCommand line
EVMGene predictionCommand line
FastQCFilter NGS data i.e. Short readsGraphical user interface
FastXQCFilter NGS data i.e. Short readsCommand line
GenemarkGene predictionCommand line
GenosetsComparative Genomics visualizationGraphical user interface
GlimmerGene predictionCommand line
IGVGenome ViewerGraphical user interface
JSpeciesGenome comparisonGraphical user interface
ALLPATHS-LGGenome assemblerCommand line
MakerGneome annotation pipeline, EukaryotesCommand line
MaqShort reads alignerCommand line
MauveGenome ViewerCommand line
MummerGenome comparisonCommand line
MiraGenome assemblerCommand line
NGS-QC toolkitFilter NGS data i.e. Short readsCommand line
PashaParallelized Short Read AssemblyCommand line
RayGenome assemblerCommand line
RNAmmerRNA predictionCommand line
SOAPdenovoGenome assemblerCommand line
SOAP-alignerShort reads alignerCommand line
SpadesGenome assemblerCommand line
TabletGenome alignment viewerGraphical user interface
TophatA spliced read mapper for RNA-SeqCommand line
VaastVariation predictionCommand line
VCFtoolsVariation predictionCommand line
NewblerGenome assemblerCommand line


Installation instructions


Installation of whole genome assembly and annotation pipeline

This pipeline has been developed for whole genome assembly and annotation of microbes (Bacteria and Fungal genomes). It uses a wide variety of software for the purpose and runs in mainly three steps.

(1) Filter the raw sequencing data

First step is to filter the raw sequencing reads for high quality bases from vector and adaptor contaminated reads. For this purpose, NGS-QC toolkit is integrated in the pipeline. Bioperl is required for this software to work.

(2) Genome assembly of filtered data

Filtered reads are further used to assemble the genome with user defined parameters (i.e. Hash lengths, K). Genome assembly results are then provided to the user for selecting the best result. Velvet and SOAPdenovo software are used at this step, for genome assembly.

(3) Whole genome annotation

The best genome assembly set is used further for genome annotation. Prokka and MAKER softwares have been integrated for the annotation of bacterial and fungal genomes respectively. Genome assembly set and annotated genome files are produced as output of this pipeline.

Dependencies :- Several libraries of bioperl need to be installed for full functioning of Prokka and Maker softwares. The user should be aware of the dependencies of the integrated softwares.

Benchmarking of genome assemblers (GenomeABC)

Standalone version of GenomeABC server has been developed for the analysis of assembled genome and benchmark the assemblers. This is a set of simple perl scripts and user can easily use this software. BLAT and Bioperl are the necessary software required to run the GenomeABC software.

Variation detection in normal-cancer paired data

This pipeline uses several softwares.
(1) First step is to filter the raw sequencing reads for high quality bases from the vector and adaptor contaminated reads. For this purpose, NGS-QC toolkit has been integrated in the pipeline.

(2) BWA software has been integrated for the alignment of filtered reads to the human reference genome.
(3) In the step further, SAMtool software processes the alignment files.

(4) Finally, VarScan.v2.3.5 software detects the somatic variations and SNPs in the given sequencing data.

User should have all these software installed to run this pipeline.

Debian packages of all these softwares can be downloaded at the OSDDlinux website (http://osddlinux.osdd.net/ngs.php).
Dependencies:- User should have all the mentioned softwares in the default path i.e. /gpsr/local/bin to run this pipeline.

Installation of debian (.deb) packages

User can download .deb packages from the OSDDlinux page (http://osddlinux.osdd.net/ngs.php). For the installation of these packages, user should have OSDDlinux operating system with /gpsr/software directory. To install, user simply needs to execute the command:-

sudo dpkg -i package.deb

Software would automatically get installed in the /gpsr/software/ directory and executable files can be called from /gpsr/local/bin directory.

Example:- sudo dpkg -i maq.deb

Installation location : /gpsr/software/

Executable present: /gpsr/local/bin