Next-generation sequencing (NGS) technology is revolutionizing genomic research. NGS has become one of the most commonly used methods in genomic and even clinical research. With increased data output capacity and dramatically dropped costs associated with it, researchers are producing trillions (TB) of base pairs of data everyday. With the large amount of data, data quality control is always critical to ensure the quality and reliability of the data. Omicsoft's NGS analytics provides comprehensive functions for NGS raw data and aligned data QC, both for DNA-Seq (Exome-Seq, WGS, and targeted sequencing) and RNA-Seq.
[NGS RAW DATA QC]
In Array Studio, the NGS Raw Data QC Wizard is an easy-to-use choice to run multiple QC commands simultaneously. The Raw Data QC Wizard provides options including Basic statistics, Base Distribution, Quality BoxPlot, K-Mer Analysis and Sequence Duplication.
The Basic Statistics module generates some simple composition statistics for the files analyzed, such as sequence length, GC content etc. The NGS Base Distribution module can be used to check for uniformity between the different bases, as one would expect to see about equal distribution of the four bases across the length of the read. The Quality BoxPlot module is used to look at the quality score for each base pair in a file (aggregated over all reads from that file). It gives the user an idea of where the quality score starts to drop off for each file. The "K-Mer Analysis (K=5)" module counts the enrichment of every 5-mer within the sequence library. It calculates an observed/expected ratio for each k-mer based on the base content of the library as a whole and then uses the actual count that the k-mer appears. This can help find over-represented sequences which are not aligned in the data.
[SEQUENCING ALIGNMENT AND ALIGNED DATA QC]
After raw data QC, the user can move forward to the next step in his or her NGS analysis with more confidence in the result. The user can use Omicsoft Sequencing Aligner (OSA) to align the data to the genome of choice. OSA (Omicsoft Sequence Aligner) is a fast and accurate alignment tool for NGS data. OSA is the base aligner for RNA-Seq, DNA-Seq, miRNA-Seq data in FusionMap, Oshell, and Array Suite (ArrayStudioand ArrayServer).
However, even with an accurate aligner like OSA, it is important to examine the aligned data quality. Omicsoft provides comprehensive DNA-Seq QC Metrics and RNA-Seq QC Metrics. These metrics include alignment metrics, coverage metrics, duplication metrics, insert size metrics, flag metrics, profile metrics and more. A total number of more than 100 metrics ensures that the aligned data is fully examined and ready for downstream analysis. An example list of metrics of RNA-Seq data can be found: Aligned data QC.