PRICING & INQUIRIES

For pricing and inquiries, send an email to sales@omicsoft.com.

5001 Weston Parkway, Suite 201
Cary, NC 27513
US

888-259-6642

Overview

Omicsoft is the leading provider of Next Generation Sequencing, Cancer Genomics, Immunology, and Bioinformatics solutions for Next Generation Sequencing Data and Gene Expression Analysis.

Exciting Updates and Latest News

Keeping you up-to-date with the latest in NGS, Bioinformatics Analysis, and cancer genomics with blogs on Array Suite, OncoLand (TCGA and more), ImmunoLand, and more.

Filtering by Tag: Array Studio

[Array Studio Tutorial] Getting Started With Copy Number Variation Analysis

Vivian Zhang

Copy number variation is large-scale change in many locations in the genome, including insertions, deletions, inversions and duplications. A CNV can be defined as a DNA segment that is 1 kb or larger and present at variable copy number in comparison with a reference genome (Redon, R., et al. 2006) . CNV has been linked to many human diseases and has been found in all human populations. It also plays an important role in evolution. Array Studio provides comprehensive functions to manage, visualize, analyze and integrate CNV data. In this article, we will introduce the basic functions of CNV analysis.

 

 

 

Array Studio can import SNP and Copy Number Variation (CNV) intensity data, to analyze your samples for chromosome-wide and local amplifications, deletions, and Loss-of-Heterozygosity (LOH) events. The users can easily view, sort and filter data.

Filtering Log2 ratio data imported from Affymetrix CEL files. Each column is a sample and each row is a probe set (SNP in this example). 

Filtering Log2 ratio data imported from Affymetrix CEL files. Each column is a sample and each row is a probe set (SNP in this example). 

 

Array Studio can merge probe-level SNP intensity data to genomic regions, or segments, with predicted copy numbers for each segment. The CNV Segmentation command will generate segmentation results for Log2Ratio CNV Data, using a variety of criteria, to identify copy number segments, and any Loss of Heterozygosity segments. 

The CNV Segmentation Command Window allows users to change segmentation parameters. 

The CNV Segmentation Command Window allows users to change segmentation parameters. 

 

 

After segmenting CNV data, Array Studio has multiple interactive Views to help quickly identify meaningful amplifications and deletions. As with all Array Studio Views, users can sort, filter, and customize the Views to maximize the ability to identify these changes. These data can also be viewed in the Omicsoft Genome Browser.

Genome View displays Probe-level signal and B-allele frequency

Genome View displays Probe-level signal and B-allele frequency

Segment View displays segmented signal intensity along each chromosome. For example, the TCGA tumor sample has a chromosome-wide loss of signal on chromosome 10 comparing to normal sample. 

Segment View displays segmented signal intensity along each chromosome. For example, the TCGA tumor sample has a chromosome-wide loss of signal on chromosome 10 comparing to normal sample. 

Segment Chromosome View displays copy number predictions along chromosome schematics. As we can see, again, the TCGA tumor sample has a loss of signal on chromosome 10. 

Segment Chromosome View displays copy number predictions along chromosome schematics. As we can see, again, the TCGA tumor sample has a loss of signal on chromosome 10. 

 

Omicsoft Genome Browse also allows user to integrate CNV data with DNA-seq data. 

Omicsoft Genome Browser can display multiple data types, including CNV chip and DNA-Seq data.

Omicsoft Genome Browser can display multiple data types, including CNV chip and DNA-Seq data.

For more details on how to use Genome Browser, please refer to blog: 

[Array Studio Analysis] Getting Started With Genome Browser: Basic Navigation, Visualization And Annotation 

[Array Studio Analysis] Genome Browser Advanced Analysis Of Variants, Fusion And Isoform Expression

 

Reference: 

Redon, R., et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006) doi:10.1038/nature05329

[Array Studio Video Tutorial] DNA-Seq Analysis: Sequence Variation, Copy Number Variation And More

Vivian Zhang

With mapped DNA-Seq data, Array Studio allows users to identify, visualize and annotate sequence, mutations and copy number variations. In this article, we walk you through a few important DNA-Seq analysis modules.

 

 

1. Identify DNA Sequence Variation and Generate and Annotate VCF Variant Data

Users can run Summarize Variant Data module to identify SNPs, insertions and deletions. It also automatically runs in DNA-Seq pipeline. The output Variant Report can be annotated with Mutation Annotator databases.

This module returns a report table showing the gene name for each annotated mutation, chromosome, position, reference allele, mutation allele, Annotation type (intron, non-synonymous, 5’ UTR, synonymous, 3’UTR, etc.), AAPosition (amino acid position of change), AAChange (amino acid change—if there is one), transcript ID, transcript name, transcript strand, distance to 3’ end, and distance to 5’ end.

This module returns a report table showing the gene name for each annotated mutation, chromosome, position, reference allele, mutation allele, Annotation type (intron, non-synonymous, 5’ UTR, synonymous, 3’UTR, etc.), AAPosition (amino acid position of change), AAChange (amino acid change—if there is one), transcript ID, transcript name, transcript strand, distance to 3’ end, and distance to 5’ end.

Variant Call Format (VCF) data is the most common format for reporting sequence variation. Array Studio variant detection can output merged or individual VCF files, and can organize and annotate these data for efficient filtering in Array Studio. and public database-based annotators, please check out our wiki page on Annotate Variant Files. Array Studio provides a large number of classifiers and annotators to improve the identification of interesting variants. Examples include:  1000GenomesClinVarGERP++dbNSFP (database for nonsynonymous SNPs' functional predictions), GRASP : Genome-Wide Repository of Associations between SNPs and Phenotypes, GWAVAHaploregRegulomeDB.

 

2. Identify Somatic and Germline Mutations in Matched-Pair Data

If you have matched pair DNA-Seq data between tumor and normal from the same subject, you can run the Var Scan 2 matched pair analysis. this annotates somatic mutations that compare two samples from the same subject such as tumor or normal for differences in genotype. It will generate calls for genotype of the samples and flag germline versus somatic mutations. 

For each variant, the result reports "Normal coverage/frequency", "Tumor coverage/frequency", "Somatic/variant p-value", "Call", "NormalGenotype", "TumorGenotype" and "FilteringStatus", allowing the user to filter the result and identify somatic mutations of interest.

For each variant, the result reports "Normal coverage/frequency", "Tumor coverage/frequency", "Somatic/variant p-value", "Call", "NormalGenotype", "TumorGenotype" and "FilteringStatus", allowing the user to filter the result and identify somatic mutations of interest.

 

 

3. Summarize NGS Coverage Data to Detect Copy Number Variants and Visualize NGS Copy Number Variations

DNAseq Whole-Genome and Whole-Exome data can be processed in Array Studio to detect amplification and deletion events. By comparing the relative signal between samples from the same subject, regions with unusually high or low signal in the disease sample will be flagged as a potential Copy Number Variation (CNV) event. The result can be visualized in a few ways, including scatter plot, segment chromosome view, or in genome browser. 

Summary Copy Number report provides "Observation", "Log2Ratio", "Copy Number", "Normal Coverage", "Tumor Coverage", and segment information.

Summary Copy Number report provides "Observation", "Log2Ratio", "Copy Number", "Normal Coverage", "Tumor Coverage", and segment information.

Genome browser view of coverage data. The highlighted example clearly has an increased coverage for this genomic region, while the coverage is comparable in adjacent region.  

Genome browser view of coverage data. The highlighted example clearly has an increased coverage for this genomic region, while the coverage is comparable in adjacent region.  

 

For more details and additional DNA-Seq analysis function, please check out our video tutorial Getting Started with DNAseq Analysis or search on our wiki page about your specific topic of interest.

 

[Event] Learn|Network|Impact 2017 OmicSoft User Group Meeting

Vivian Zhang

OmicSoft, now a QIAGEN company, would like to invite you to our annual Omicsoft User Group Meeting being held in Cambridge, MA on September 19-20, 2017. 

FREE registration and attendance, limited time only. For registration and more details, please directly go to our UGM page.

In the past ten years, OmicSoft has helped numerous users from major pharma and biotech companies (as well as research institutions) accelerate their bioinformatics and genomics research (who are our customers?). Last year, OmicSoft successfully held our kick-off OmicSoft User Group Meeting. More than 30 leading pharmaceutical and biotech companies, more than 100 experts and scientists in the field of bioinformatics/genomics/genetics attended the meeting.

Last year, our action-packed one-day meeting provided an open platform for our users and industry peers to learn, to network, and to impact the development of OmicSoft products. Click here for 2016 OmicSoft UGM meeting agenda. This year OmicSoft has had several milestones and technology breakthroughs including: our acquisition by QIAGEN, Array Suite 10,0. release, Cloud-Based Lands, Single Cell RNA-Seq support, upcoming integration with QIAGEN's bioinformatics products, Web-based solutions and more. We are expanding the 2017 OmicSoft User Group Meeting into a two-day event with:

  • More product training - Get the most out of Omicsoft products, and QIAGEN's bioinformatics products
  • More user talks and networking opportunities - Learn from others' experiences, industry best practices, and expand professional network
  • More One-On-One meetings - Get problems solved, questions answered and get personalized training from our experienced staff

 

Learn, network, impact. Come join us and leading pharma, biotech companies and research institutions.

  • Learn to Use OmicSoft Products More Efficiently 
  • Impact Future Product Development
  • Network with Peers and Industry Experts
  • Get One-On-One Help from Experts
  • Explore more QIAGEN Bioinformatics products

 

Please contact us for potential presentation and collaboration opportunities. 

[Array Studio Video Tutorial] DNA-Seq Analysis Basics: Getting Started With DNA-Seq Pipeline Analysis And Data QC

Vivian Zhang

Omicsoft Next Generation Sequencing (NGS) analysis includes NGS (next generation sequencing) bioinformatics tools for the entire process, from QC to alignment to post-alignment summarizations and analysis. Array Studio provides a suite of tools to quickly, easily, and reliably process DNA-seq data. In this article, we introduce our tutorial on DNA-Seq analysis pipeline and data QC. We will discuss more on downstream analysis functions in the coming blog(s). 

 

Getting Started with DNA-seq pipeline functions

 

1 Running the DNA-seq pipeline

In Array Studio, users have the choice of either executing each step of the analysis one-by-one, or can use the DNA-seq pipeline function. Our video tutorial will walk you through the functions automatically executed by the standard DNA-seq pipeline, starting with raw reads in .fastq format.

Pipeline.png
DNA-Seq Pipeline. Users have the options to choose to perform analysis steps such as raw data QC, post-alignment data QC, summarize mutation and SNP etc. 

DNA-Seq Pipeline. Users have the options to choose to perform analysis steps such as raw data QC, post-alignment data QC, summarize mutation and SNP etc. 

2 Raw Data QC

If you choose to perform analysis step by step, before aligning your DNA-seq data, you must first perform quality control (QC) on the raw data, to spot common problems like adapter or barcode sequence contamination, degraded quality at ends of reads, or problematic samples. The Array Studio Raw Data QC Wizard reports a number of useful measures of raw NGS quality.

Additional information about how to interpret these functions can be found in the RNA-seq Raw Data QC Analysis video.

3 Map DNA-seq Reads to Genome

Map (DNA-Seq) Reads To Genome is a part of the DNA-Seq pipeline. Users can also align reads independently. In the Advanced tab, the user can set a number of options, including read trimming, adapter stripping and more.

 

4 Aligned Data QC

Array Studio automatically generates an Alignment Report after aligning reads to the genome or transcriptome. Additional alignment statistics can be generated by running the Aligned Data QC module.

 

Additional aligned data QC metrics include: 

1 Alignment Metrics
2 Flag Metrics
3 Profile Metrics
4 Insert Size Metrics
5 Duplication Metrics
6 Strand Metrics
 

 

Please check out our video tutorial Getting Started with DNA-seq pipeline functions for more detailed illustration. 

[Omic Data Analysis Tutorial] Microarray Data Visualization, Statistical Inference and Pattern Discovery

Vivian Zhang

 

No matter if you're dealing with  microarray or RNA-seq data with calculated FPKM or read counts, it is important to perform downstream analysis to make sense of the data and identify interesting data patterns, samples, genes or proteins. In this article, we will introduce some commonly-used visualization and statistical analysis functions that are covered in the second half of our Microarray Analysis video tutorials:

 

 

Visualize Data with Array Studio Views

-Omic Data are read-only data constructs. The most common way to explore -Omic data is to add "Views" onto your data, including a "table" view to directly visualize the numerical data values or a "chart" view, such as the Variable view and Scatter plot. 

1 The Table View

The most common way to look at your -Omic data is with the Table View. Although it looks like a standard spreadsheet, the Table View is actually a visualization of your underlying data. It is dynamically connected to the attached annotation and design metadata, and can be sorted and filtered without worry of altering the underlying data. Array Studio is able to easily handle millions of rows and columns in the Table View .

Example functions introduced in the video tutorial will allow you to:

  • Sort and Filter Table Views 
  • Display context-specific details from metadata 
  • Convert read-only -Omic data to editable Table data 
  • Log2-transform your expression data 
  • Link to publish databases through Web Details On-Demand
  • Visualize distribution of expression values with Kernel Density 
Example table view of microarray data. The details window display the data details for selected probe sets.

Example table view of microarray data. The details window display the data details for selected probe sets.

 

2 Adding Additional Views: The Variable View and Scatter Plot

Depending on the contents of your -Omic data or table, Array Studio has about 40 views to interactively display your data. This video clip briefly walks through some of the more popular Views for Gene-level data; the Variable View and Pairwise Scatter Plot.

Array Studio not only provides dozens of views depending on the content of data, but also allows user to tailor the visualizations to the user's preferred method. Some commonly used views for microarray data include BoxPlot, ScatterView, VariableView and VennDiagramView. The example chart is fine-tuned from variable view into violin plot grouped by time and treatment.

Array Studio not only provides dozens of views depending on the content of data, but also allows user to tailor the visualizations to the user's preferred method. Some commonly used views for microarray data include BoxPlot, ScatterView, VariableView and VennDiagramView. The example chart is fine-tuned from variable view into violin plot grouped by time and treatment.

 

 

Statistical Inference and Pattern Discovery

 Hierarchical Clustering and Pattern Matching to identify similar Gene Expression Dynamics

Gene expression data can be grouped using Hierarchical Clustering by Variables (e.g. genes) and Observations (e.g. samples) to reveal associations in your data.

In additional to visualizing the overall clustering pattern, you can also search datasets for variables/observations with similar patterns to your variable/observation of interest through Find Neighbors. You can display these comparisons in multiple ways, including pairwise correlation/MA plots, heatmaps, and 3D scatter plots.

Probes with similar pattern to probe 1371785_at are detected through Find Neighbors module. After a list of "neighbor" probes created, users can visualize the data pattern among those probes through pairwise correlation plots or 3D scatter plots. 

Probes with similar pattern to probe 1371785_at are detected through Find Neighbors module. After a list of "neighbor" probes created, users can visualize the data pattern among those probes through pairwise correlation plots or 3D scatter plots. 

 

Discover Differentially-Expressed Genes by ANOVA

The One-Way ANOVA is used to research the effects of a single factor, while Two-Way ANOVA can be used to research the effects of two factors on expression data. This model generates an inference report, including automatically generated Report View and VolcanoPlotView. Additionally, the Venn Diagram and Inference Report Summary can help to quickly visualize the deferentially expressed genes.

Inference Report Summary and Venn Diagram help to quickly research significant genes and compare across groups. 

Inference Report Summary and Venn Diagram help to quickly research significant genes and compare across groups. 

 

Identify Enriched Gene Ontology Terms

If you are interested in discovering pathways or functionally related genes that are enriched in your data, you can run the Gene Ontology (GO) module. This module will perform built-in gene ontology classification on one or more significant lists. Once you generate a list of significant variables, Array Studio can go through all possible GO terms (across different class levels) to see how many variables in the list are covered by the GO terms. You can infer different biological attributes (such as functions, corresponding biological process) of the variables in the list. 

Example table results.  Each Category lists a GO Term (with a link to the Gene Ontology website), as well as the number of hits for that category in a particular list (The column name is the list name). A corresponding p-values can also be generated.

Example table results. Each Category lists a GO Term (with a link to the Gene Ontology website), as well as the number of hits for that category in a particular list (The column name is the list name). A corresponding p-values can also be generated.

[Omic Data Analysis Tutorial] Getting Started with Microarray Data Analysis

Vivian Zhang

In bioinformatics research, there are many different data sources, including microarray, sequence data, CNV data, ChIP-chip data, genotype data, etc. In Array Studio, we divide genomic data into two groups, -Omic data and Table data. First, -Omic data, which is basically a data matrix with annotation for both columns and rows. Microarray data is a standard example of -Omic data. The microarray tutorial is a great starting point for new users of Array Studio, whether or not you will be working directly with microarray data. In this article, we will cover Getting Started with Array Studio Microarray Analysis on microarray analyses basics.

 

1 Getting Started with Array Studio Microarray Analysis

When Array Studio is first installed, it will look similar to below. Array Studio organizes projects in the Solution Explorer. Any generated data or figure can be displayed in the middle of the window, while a Legend and Filter window appears on the right side of the window.

 

 

After you create a new project, Array Studio will guide you through importing your expression microarray datasets. Three data types, OMIC measurement data table, design table and annotation table are the basic -Omic data types.

After importing data and downstream analysis, Array Studio organizes data in four main data types: List Data, Table Data, -Omic Data and NGS Data in a project.  -Omic data is read only table data with annotation and design tables attached (these can be modified). -Omic data and table data can be converted from one type to the other.

Array Studio provides several methods to reproduce analysis steps. Omicsoft scripts (Oscript) for analysis functions can be viewed in every function window, by right-clicking on an object name, or by viewing the full Audit Trail. Array Studio tracks all analysis steps done in a project, using its Audit Trail feature. It is important for data integrity needs, and for individual users to track the changes and reproduce the procedures.

 

2 Preparing Your Data for Downstream Analysis

Before downstream analysis, Array Studio contains modules to identify samples that deviate significantly from the rest of the data set, possibly indicating a failed sample that should be excluded from downstream analysis. 

QC by PCA and Removing Failed Samples

Principal Component Analysis (PCA) can identify variance in data sets, which can come from real differences between sample groups, or it can come from a failed microarray chip. Failed experiments can quickly be removed from your -Omic data objects for downstream analysis.

 

 

QC by Correlation of Expression

Array Studio can identify samples that deviate significantly from others in your data set, by calculating the correlation coefficient of each gene/probeset. Samples that correlate unusually poorly will be flagged as possible failed samples, and can be excluded from downstream analysis.

For step-by-step instructions, please check out our video tutorial: Getting Started with Array Studio Microarray Analysis

[Array Studio Video Tutorial] RNA-Seq Advanced Analysis

Vivian Zhang

Finding genes or transcripts that are differentially expressed among different conditions is an important analysis step in understanding the functions of genetic variants. Array Studio contains a number of different modules for performing univariate analysis/differential expression, including One-Way ANOVA, Two-Way ANOVA, and the more advanced General Linear Model, as well as a few others. Statistical inference can be performed on your feature-level data, whether it was quantified in Array Studio or imported from external programs. In this article, we will introduce popular methods of Advanced Analysis of RNA-seq data

 

 

1 ANOVA on RNA-seq Data

A One-Way ANOVA is used to research the effects of a single factor, while Two-Way ANOVAs can be used to research the effects of two factors on expression data.  For example, if a user has an experiment with factors for time and treatment, this model can be quickly used to generate results (including fold changes, estimates, raw and adjusted p-values, LSMeans, and Estimate data). By selecting factor 1 and factor 2, and then the level to compare to, Array Studio will automatically create the comparisons and model for the user. This model generates an inference report, including automatically generated Report View and VolcanoPlotView:

 

2 DESeq on RNA-seq Data

For RNA-Seq, read count is a good estimate of the abundance of the target transcript. Thus, it is of great interest to compare read counts between different conditions. The DESeq GLM test is a powerful tool for inferring differential expression of genes/transcripts from raw count data. It allows the user to model the data using a linear model and test for differential expression using negative binomial distribution. The function should perform similarly to the DESeq R packageDESeq only works on raw counts of sequencing reads (with no additional background reads added to the dataset). After running the test, a report table is generated along with a scatter plot. A volcano plot will be generated as well, similarly as in the ANOVA analysis. For more details on how the DESeq method works and more functions, check out the DESeq R manual

 

3 Identifying Differential Usage of Isoforms

ArrayStudio uses a straightforward approach to identifying genes with differential transcript usage between groups. This function allows user to identify diferentially expressed isoforms between comparisons. Based on transcript level data, either RPKM, FPKM or Count data, the function convert the expression values to ratios, dividing the value of each transcript by sum of all transcripts in the same gene. The highest ranking p-value reflects the largest difference in relative transcript usage.

Differentially expressed isoforms report sorted by p-value. The user can directly visualized the difference in transcript usage in genome browser.

Differentially expressed isoforms report sorted by p-value. The user can directly visualized the difference in transcript usage in genome browser.

Genome browser view can display exon junction reads. As it is shown, in lung, only 37 reads span certain junction but 11000 reads span the same junction in skin.

Genome browser view can display exon junction reads. As it is shown, in lung, only 37 reads span certain junction but 11000 reads span the same junction in skin.

 

For how to achieve the above results, please check out our video tutorial: Advanced Analysis of RNA-seq data

[Array Studio Video Tutorial] RNA-Seq Downstream Analysis: Normalization, Visualization and Data Integration

Vivian Zhang

After aligning data, there are a number of downstream analyses that can be done. For instance, the generated RPKM (or FPKM) dataset can be used, as Microarray Data, for clustering (log2 transformation may be necessary). Count data can be used to look for changes between groups of samples through DESeq analysis. A large number of visualization and QC functions are available to analyze feature-level RNA-seq data in Array Studio. In this article, we will introduce our video tutorials on RNA-Seq Downstream Analysis 

 

 

 

1 Normalizing and Transforming RNA-seq Data for MicroArray-type analysis

Array Studio has a large number of modules originally designed for Gene Expression MicroArray analysis, but these modules are also useful for analyzing feature-level (e.g. gene-level, exon-level) RNA-seq data. However, many of these modules expect normalized and log-transformed input data. Array Studio provides a number of methods for normalizing RNA-Seq data, including Log Geometric Mean, Mean, Median, Quantile, TMM (edgeR), TotalCount, RPKM to TPM, UpperQuartile, and LandNormalization. Array Studio also provides methods for normalizing and transforming -Omic data. 

 

2 Attach new Views to Data

In Array Studio, data can be directly viewed in tables, but can also be displayed in up to 40 Views, depending on the contents of the underlying data. Array Studio features the very powerful Variable View, among it's most popular views:

The Variable View allows the user to visualize one chart for each variable in the dataset. The example variable view shows the Log 2 FPKM values for gene CLDM18, categorized by tissue and gender.

The Variable View allows the user to visualize one chart for each variable in the dataset. The example variable view shows the Log 2 FPKM values for gene CLDM18, categorized by tissue and gender.

 

3 Principal Component Analysis on normalized expression data

Principal Component Analysis (PCA) is an effective tool to group data by components that contribute to the greatest variance in the dataset. In other words, PCA can group your data based on variance, which should reflect differences between samples. Outliers (such as failed samples) will often appear as outliers. 

Both 2D and 3D PCA plots are commonly used to group data or identify outliers. 

Both 2D and 3D PCA plots are commonly used to group data or identify outliers. 

 

4 Hierarchical Clustering of normalized expression data

Gene expression data can be grouped by Hierarchical Clustering by Variables (e.g. genes) and Observations (e.g. samples) to reveal associations in your data. Array Studio can easily handle Hierarchical Clustering of up to 20000 variables, far more than the capacity of many popular gene clustering programs.

Classic dendrogram is an older version of dendrogram. The new version is more interactive and provides more gene annotation information for downstream analysis. 

Classic dendrogram is an older version of dendrogram. The new version is more interactive and provides more gene annotation information for downstream analysis. 

 

5 RNAseq-MicroArray Integration

Feature-level (genes, transcripts, etc.) results from RNA-seq experiments can directly be compared to microarray data from the same samples, using the Microarray-Microarray Integration module. This module allows the user to create a duplex matrix (two values for each variable in the dataset) for two “microarray” data types. The resulting dataset can also contain correlation information for each variable, making it easy to figure out which variables correlate well between datasets.

Microarray-microarray integration module provides variable views on gene and sample level showing how well microarray and RNA-seq data correlate. 

Microarray-microarray integration module provides variable views on gene and sample level showing how well microarray and RNA-seq data correlate. 

 

To learn how to perform these downstream analysis on RNA-seq data, please check out our video tutorials on RNA-Seq Downstream Analysis 

[Array Studio Video Tutorial] RNA-Seq Analysis Basics: Getting Started with RNA-Seq Pipeline Analysis and Data QC

Vivian Zhang

Omicsoft Next Generation Sequencing (NGS) analysis includes NGS (next generation sequencing) bioinformatics tools for the entire process, from QC to alignment to post-alignment summarizations and analysis. RNA-Seq data analysis is a critical part of Omicsoft's NGS bioinformatics tools. In this article, we introduce our tutorial on how to get started with RNA-seq pipeline analysis and data QC.

Getting Started with RNA-seq pipeline functions

 

1 Running the RNA-seq pipeline for a new project

A typical RNA-seq analysis project consists steps from data quality control, alignment, aligned data quality control to data quantification, visualization, and statistical inference. In Array Studio, users have the choice of either executing each step of the analysis one-by-one, or can use the RNA-seq pipeline function. It only takes a few clicks to create a new RNA-seq project and run RNA-seq pipeline. 

屏幕截图 2016-07-29 07.02.38.png
RNA-Seq Pipeline. Users have the options to choose to perform analysis steps such as raw data QC, post-alignment data QC, exon junctions, sequence quantification, mutation and fusion detection.

RNA-Seq Pipeline. Users have the options to choose to perform analysis steps such as raw data QC, post-alignment data QC, exon junctions, sequence quantification, mutation and fusion detection.

 

 

2 Raw Data QC

If you choose to perform analysis step by step, before aligning your RNA-seq data, you must first perform quality control (QC) on the raw data, to spot common problems like adapter or barcode sequence contamination, degraded quality at ends of reads, or problematic samples. The Array Studio Raw Data QC Wizard reports a number of useful measures of raw NGS quality, and can be generated as part of the RNA-seq pipeline function. 

Example QC report includes:

  • Base Distribution 
  • Basic Stats 
  • Duplication Level 
  • Kmer Analysis 
  • Overall/Per-sequence Quality Reports 
  • Quality Box plot 
  • Over-represented Sequences 
  • Per-sequence GC report 
  • Sequence Length Report 

 

3 Filtering and Trimming Raw Reads

Array Studio's NGS Filter function can trim low-quality bases from raw NGS data, filter out uniformly low-quality reads, and strip away adapter sequences. The RNA-seq pipeline assumes that input reads are pre-filtered and stripped, so only quality-based trimming and filtering will be performed in the pipeline (no adapter stripping). It is a good idea to run the Filter function on your reads, based on the raw data QC results, before running the RNA-seq pipeline.

 

4 Aligned Data QC

Array Studio automatically generates an Alignment Report after aligning reads to the genome or transcriptome. Additional alignment statistics can be generated by running the Aligned Data QC and RNA-seq 5'->3' Trend modules.

Alignment report is automatically generated after alignment. 

Alignment report is automatically generated after alignment. 

Additional aligned data QC metrics include:   1 Alignment Metrics 2 Flag Metrics 3 Profile Metrics 4 Source Metrics 5 Insert Size Metrics 6 Duplication Metrics 7 Coverage Metrics 8 Strand Metrics 9 Feature Metrics

Additional aligned data QC metrics include: 

1 Alignment Metrics
2 Flag Metrics
3 Profile Metrics
4 Source Metrics
5 Insert Size Metrics
6 Duplication Metrics
7 Coverage Metrics
8 Strand Metrics
9 Feature Metrics

 

The best way to quickly learn how to perform these analysis steps is to watch our short video tutorials Getting Started with RNA-seq pipeline functions. Please stay tuned for more blog articles on RNA-seq analysis.

[Array Studio Analysis] Getting Started with RT-PCR Analysis

Vivian Zhang

Although RNA-seq has become the invaluable tool to study gene expression, RT-PCR (reverse transcription-polymerase chain reaction) is still the most sensitive method and widely-used for small-scale mRNA expression studies or RNA-seq analysis validation. In this article, we would like to introduce to you how to perform RT-PCR analysis using Array Studio. For more details, please check out our tutorial series: Getting Started with RT-PCR Analysis, which has step-by-step video tutorial clips to help you quickly become RT-PCR analysis expert.

In this tutorial, we provide tutorial video clips on:

 

1. Importing RT-PCR data

Array Studio allows user to import Ct or abundance data from text files or Excel spreadsheets. The Import RT-PCR Wizard function simplifies the data importing and normalizing processes.

Array Studio can process different data formats, no matter its "Tall-skinny" or Matrix data format. 

Array Studio can process different data formats, no matter its "Tall-skinny" or Matrix data format. 

With your data ready to import, Import RT-PCR Wizard offers step-by-step instructions on:

  • Choosing the correct input format
  • Selecting the annotation and data columns 
  • Previewing raw data for missing values 
  • Attaching Annotation and Design metadata 
  • Combining or remove technical replicates
  • Specifying default values for missing data 
  • Transforming Ct data to delta-Ct
  • Normalizing data 
  • Previewing data 

After importing through the RT-PCR Wizard, three data tables are generated: a data table, annotation table and design table. They are standard data formats in Array Studio.

 

2. Downstream Analysis

2.1 Visualizing RT-PCR Data- Adding Views to RT-PCR data

Once the data is in an Array Studio project, a variety of functions are available for downstream analysis. To start with, data visualization provides a good overview of the data. Array Studio has up to 40 different views available for your RT-PCR data. Here are a few commonly used views:

 

2.2 Data Processing-QC and Excluding/Subsetting Data

Sometimes single assay or sample experiment fails and should be removed from downstream analysis to allow for more accurate detection of real differences among groups. These failed experiments can be detected and easily removed from an Array Studio data object. For example, we can use Principal Component Analysis (PCA) to detect and remove outlier samples. To further subset data, we can use hierarchical clustering.

3D PCA plot and hierarchical clustering heatmap. In the upper PCA plot, each dot represents a sample. In the bottom heatmap,   data is clearly separated by source tissue but not so much by group.

3D PCA plot and hierarchical clustering heatmap. In the upper PCA plot, each dot represents a sample. In the bottom heatmap, data is clearly separated by source tissue but not so much by group.

 

 

2.3 Statistical Inference-Two-Way ANOVA of RT-PCR Data

Array Studio has a few different statistical inference modules to identify statistical significant differences between groups, for example, ANOVA and general linear model. Here is an example of Two-Way ANOVA analysis.

Two-Way ANOVA analysis results using source tissue and group as factors. This analysis generates one volcano plot for each test, in addition to the report table. The volcano plot is interactive -- selecting a subset of samples in one plot automatically selects the corresponding samples in another plot. In this example, the three selected genes CDH1, PFN2 and NOTCH2 that are affected in Breast are not similarly affected in Lymphoid, with NOTCH2 affected in the opposite direction.

Two-Way ANOVA analysis results using source tissue and group as factors. This analysis generates one volcano plot for each test, in addition to the report table. The volcano plot is interactive -- selecting a subset of samples in one plot automatically selects the corresponding samples in another plot. In this example, the three selected genes CDH1, PFN2 and NOTCH2 that are affected in Breast are not similarly affected in Lymphoid, with NOTCH2 affected in the opposite direction.

 

2.4 Omic Data Analysis-Integration of RT-PCR and RNA-seq/Microarray Data

RT-PCR data can be compared to other gene or transcript level data, such as from RNA-Seq or microarray, using Microarray-Microarray Integration. Careful data matching is important to ensure proper matching of data. 

Variable view of RT-PCR and RNA-Seq data integration for gene TFPI as an example.

Variable view of RT-PCR and RNA-Seq data integration for gene TFPI as an example.

 

Please check out our tutorial series: Getting Started with RT-PCR Analysis to learn how to perform the above RT-PCA analyses. 

[Feature Update] New Array Studio Launcher Accelerates Software Download

Vivian Zhang

Quick Announcement: Recently, Omicsoft released a  new version of Array Studio Launcher (http://www.omicsoft.com/software/ArrayStudioLauncher/publish.htm). By launching Array Studio, the new software loading page displays:

The new Array Studio Launcher is 10 times faster in downloading software upgrades, saving up to several minutes downloading time.