[Array Studio Video Tutorial] RNA-Seq Downstream Analysis: Normalization, Visualization and Data Integration
After aligning data, there are a number of downstream analyses that can be done. For instance, the generated RPKM (or FPKM) dataset can be used, as Microarray Data, for clustering (log2 transformation may be necessary). Count data can be used to look for changes between groups of samples through DESeq analysis. A large number of visualization and QC functions are available to analyze feature-level RNA-seq data in Array Studio. In this article, we will introduce our video tutorials on RNA-Seq Downstream Analysis
- 1 Normalizing and Transforming RNA-seq Data for MicroArray-type analysis
- 2 Attach new Views to Data
- 3 Principal Component Analysis on normalized expression data
- 4 Hierarchical Clustering of normalized expression data
- 5 RNAseq-MicroArray Integration
Array Studio has a large number of modules originally designed for Gene Expression MicroArray analysis, but these modules are also useful for analyzing feature-level (e.g. gene-level, exon-level) RNA-seq data. However, many of these modules expect normalized and log-transformed input data. Array Studio provides a number of methods for normalizing RNA-Seq data, including Log Geometric Mean, Mean, Median, Quantile, TMM (edgeR), TotalCount, RPKM to TPM, UpperQuartile, and LandNormalization. Array Studio also provides methods for normalizing and transforming -Omic data.
In Array Studio, data can be directly viewed in tables, but can also be displayed in up to 40 Views, depending on the contents of the underlying data. Array Studio features the very powerful Variable View, among it's most popular views:
Principal Component Analysis (PCA) is an effective tool to group data by components that contribute to the greatest variance in the dataset. In other words, PCA can group your data based on variance, which should reflect differences between samples. Outliers (such as failed samples) will often appear as outliers.
Gene expression data can be grouped by Hierarchical Clustering by Variables (e.g. genes) and Observations (e.g. samples) to reveal associations in your data. Array Studio can easily handle Hierarchical Clustering of up to 20000 variables, far more than the capacity of many popular gene clustering programs.
Feature-level (genes, transcripts, etc.) results from RNA-seq experiments can directly be compared to microarray data from the same samples, using the Microarray-Microarray Integration module. This module allows the user to create a duplex matrix (two values for each variable in the dataset) for two “microarray” data types. The resulting dataset can also contain correlation information for each variable, making it easy to figure out which variables correlate well between datasets.
To learn how to perform these downstream analysis on RNA-seq data, please check out our video tutorials on RNA-Seq Downstream Analysis