PRICING & INQUIRIES

For pricing and inquiries, send an email to sales@omicsoft.com.

5001 Weston Parkway, Suite 201
Cary, NC 27513
US

888-259-6642

Overview

Omicsoft is the leading provider of Next Generation Sequencing, Cancer Genomics, Immunology, and Bioinformatics solutions for Next Generation Sequencing Data and Gene Expression Analysis.

[Feature Review] Comprehensive Quality Control of Next Generation Sequencing Data

Exciting Updates and Latest News

Keeping you up-to-date with the latest in NGS, Bioinformatics Analysis, and cancer genomics with blogs on Array Suite, OncoLand (TCGA and more), ImmunoLand, and more.

[Feature Review] Comprehensive Quality Control of Next Generation Sequencing Data

Vivian Zhang

Next-generation sequencing (NGS) technology is revolutionizing genomic research. NGS has become one of the most commonly used methods in genomic and even clinical research. With  increased data output capacity and dramatically dropped costs associated with it, researchers are producing trillions (TB) of base pairs of data everyday. With the large amount of data, data quality control is always critical to ensure the quality and reliability of the data. Omicsoft's NGS analytics provides comprehensive functions for NGS raw data and aligned data QC, both for DNA-Seq (Exome-Seq, WGS, and targeted sequencing) and RNA-Seq. 

[NGS RAW DATA QC]

In Array Studio, the NGS Raw Data QC Wizard is an easy-to-use choice to run multiple QC commands simultaneously. The Raw Data QC Wizard provides options including Basic statistics, Base Distribution, Quality BoxPlot, K-Mer Analysis and Sequence Duplication.  

The Basic Statistics module generates some simple composition statistics for the files analyzed, such as sequence length, GC content etc. The NGS Base Distribution module can be used to check for uniformity between the different bases, as one would expect to see about equal distribution of the four bases across the length of the read. The Quality BoxPlot module is used to look at the quality score for each base pair in a file (aggregated over all reads from that file). It gives the user an idea of where the quality score starts to drop off for each file. The "K-Mer Analysis (K=5)" module counts the enrichment of every 5-mer within the sequence library. It calculates an observed/expected ratio for each k-mer based on the base content of the library as a whole and then uses the actual count that the k-mer appears. This can help find over-represented sequences which are not aligned in the data.

[SEQUENCING ALIGNMENT AND ALIGNED DATA QC]

After raw data QC, the user can move forward to the next step in his or her NGS analysis with more confidence in the result. The user can use Omicsoft Sequencing Aligner (OSA) to align the data to the genome of choice. OSA (Omicsoft Sequence Aligner) is a fast and accurate alignment tool for NGS data. OSA is the base aligner for RNA-Seq, DNA-Seq, miRNA-Seq data in FusionMap, Oshell, and Array Suite (ArrayStudioand ArrayServer).

Figure: Percentage of alignment reads that match to 10 million 100bp paired ends simulation data with 0%, 0.5% (default), 1% and 2% error rates. Gene model provided (left) and not provided (right).

Figure: Percentage of alignment reads that match to 10 million 100bp paired ends simulation data with 0%, 0.5% (default), 1% and 2% error rates. Gene model provided (left) and not provided (right).

Figure: Alignment job run time of 10 millions 100bp paired ends simulation data with 0%, 0.5% (default), 1% and 2% error rates. Gene model provided  

Figure: Alignment job run time of 10 millions 100bp paired ends simulation data with 0%, 0.5% (default), 1% and 2% error rates. Gene model provided

 

However, even with an accurate aligner like OSA, it is important to examine the aligned data quality. Omicsoft provides comprehensive DNA-Seq QC Metrics and RNA-Seq QC Metrics. These metrics include alignment metrics, coverage metrics, duplication metrics, insert size metrics, flag metrics, profile metrics and more. A total number of more than 100 metrics ensures that the aligned data is fully examined and ready for downstream analysis. An example list of metrics of RNA-Seq data can be found: Aligned data QC.