Choice of Gene Annotation on RNA-Seq Results

Matt Newman

Many users (and potential users) have asked us about our choice of gene annotation (or gene model, in Omicsoft lingo), in the OncoLand and ImmunoLand products. For the past three years, we've used an implementation of the UCSC gene model, that we refer to as the "Omicsoft Gene Model".  It consisted of UCSC gene annotation + mirBase (for miRNAs) + the mitochondrial genes from Ensembl.

It was recently announced that UCSC will be moving to the GENCODE basic gene annotation for future incarnations of the gene annotation for their GRCh38 reference library. This is really good news for everyone, as it will hopefully simplify and standardize the reporting of transcript and gene IDs across publications, tools, etc.  It's something we are actively looking at for next year's releases of OncoLand and ImmunoLand as well (in addition to likely maintaining our current B37.3 and Omicsoft gene model results).

An interesting read on the effect of the gene annotation source on RNA-Seq can be found here:  In it, the author found that the source of gene annotation does have a profound effect on RNA-Seq alignment, gene expression calculations, and differential expression results.