Finding genes or transcripts that are differentially expressed among different conditions is an important analysis step in understanding the functions of genetic variants. Array Studio contains a number of different modules for performing univariate analysis/differential expression, including One-Way ANOVA, Two-Way ANOVA, and the more advanced General Linear Model, as well as a few others. Statistical inference can be performed on your feature-level data, whether it was quantified in Array Studio or imported from external programs. In this article, we will introduce popular methods of Advanced Analysis of RNA-seq data
A One-Way ANOVA is used to research the effects of a single factor, while Two-Way ANOVAs can be used to research the effects of two factors on expression data. For example, if a user has an experiment with factors for time and treatment, this model can be quickly used to generate results (including fold changes, estimates, raw and adjusted p-values, LSMeans, and Estimate data). By selecting factor 1 and factor 2, and then the level to compare to, Array Studio will automatically create the comparisons and model for the user. This model generates an inference report, including automatically generated Report View and VolcanoPlotView:
For RNA-Seq, read count is a good estimate of the abundance of the target transcript. Thus, it is of great interest to compare read counts between different conditions. The DESeq GLM test is a powerful tool for inferring differential expression of genes/transcripts from raw count data. It allows the user to model the data using a linear model and test for differential expression using negative binomial distribution. The function should perform similarly to the DESeq R package. DESeq only works on raw counts of sequencing reads (with no additional background reads added to the dataset). After running the test, a report table is generated along with a scatter plot. A volcano plot will be generated as well, similarly as in the ANOVA analysis. For more details on how the DESeq method works and more functions, check out the DESeq R manual.
ArrayStudio uses a straightforward approach to identifying genes with differential transcript usage between groups. This function allows user to identify diferentially expressed isoforms between comparisons. Based on transcript level data, either RPKM, FPKM or Count data, the function convert the expression values to ratios, dividing the value of each transcript by sum of all transcripts in the same gene. The highest ranking p-value reflects the largest difference in relative transcript usage.
For how to achieve the above results, please check out our video tutorial: Advanced Analysis of RNA-seq data.