PRICING & INQUIRIES

For pricing and inquiries, send an email to sales@omicsoft.com.

5001 Weston Parkway, Suite 201
Cary, NC 27513
US

888-259-6642

Overview

Omicsoft is the leading provider of Next Generation Sequencing, Cancer Genomics, Immunology, and Bioinformatics solutions for Next Generation Sequencing Data and Gene Expression Analysis.

Exciting Updates and Latest News

Keeping you up-to-date with the latest in NGS, Bioinformatics Analysis, and cancer genomics with blogs on Array Suite, OncoLand (TCGA and more), ImmunoLand, and more.

Filtering by Category: Land Feature

[Land Update] Next Generation Of OmicSoft Lands on AWS Cloud

Vivian Zhang

In April 2017, Gary Ge from OmicSoft presented the webinar, Next Generation Of OmicSoft Lands on AWS Cloud, which described OmicSoft's transition into a cloud-based Land system. For those who missed the webinar, please watch the recording here, or read through this article on how our new Land technology may improve our service.

OmicSoft’s Land technology has enabled collection and management of large public data sets in curated knowledge bases in the fields of cancer genomics (OncoLand), cardiovascular, metabolic and immunology (DiseaseLand), as well as genetic research (GeneticsLand). With more data being curated daily, and more users requesting content faster, we have been focused on creating a better solution for public Land delivery.

With the rapid growth of our Land database, we now provide 32 Lands including OncoLand and DiseaseLand to customers. Previously, we delivered and updated all content of approximately 1.5TB data each quarter. The delivery often takes 1 to 5 days, and requires server-based parallel publishing, which takes a lot effort for both OmicSoft and company IT/OmicSoft product administrators.

In 2014, OmicSoft released Studio on the Cloud, and continues to improve its cloud implementation since inception. Studio on the Cloud allows users to seamlessly run all Array Studio analytics from Amazon, combining the storage of S3 (Amazon Simple Storage Service) with the analytical power of EC2 (Amazon Elastic Compute Cloud). Omicsoft has seen an increasing number of clients that implement mixed mode solutions (cloud solution in addition to their SGE/PBS/LSF cluster). 

With new technology breakthroughs, OmicSoft now offer the cloud-based Land. Our cloud Land designe enables Land streaming from Amazon AWS. It makes Land data delivery much easier. Here is a comparison on Land delivery performance:

 

The design has the following features: 

• 10x performance improvement for dynamic query
• Stream to client’s ArrayServer with server cache
• Quick land delivery with minimal local storage footprint
• Faster future content updates
• Potential On-demand content updates (particularly for DiseaseLand) 
• Allowing virtual land of public cloud lands and local server internal lands

To date, most of our clients have chosen to switch to cloud-based Land delivery. Please speak to the OmicSoft support team, or your company administrator to understand how cloud-based Land delivery can benefit your research.

[New Feature] Manage Land Sample Clinical Data

Vivian Zhang

Omicsoft has been working diligently over the past few months to both strengthen our ability to incorporate clinical data, as well as  growing our list of curated clinical measurements from public datasets. Currently, there are more than 1000 different clinical measurement variables in total, including sample demographics, survival data, symptoms, treatments and more in OncoLand and DiseaseLand. Moreover, users often have their sets of internal clinical data they wish to add to the system. If you have not started leveraging the power of our clinical data subsystem, please take a look at OncoLand Case Study - Clinical Variables for a 10 mins quick video tutorial on how to utilize clinical data to identify novel associations.

To help users better manage Land clinical data, we recently implemented Manage Sample Clinical Data function in Land. This function can be accessed through:

 

 

This function allows users to add clinical data, manage clinical variable meta data, remove samples and remove clinical vatiables: 

Add Clinical Data

Add Clinical Data

Adding clinical data is straightforward. In addition, "Metadata" for clinical data columns can be controlled by adding a second table. For example, clinical data column grouping can be controlled by a table where the first column contains Clinical Data column names, and the second column contains category:

Add Clinical Variable Metadata

Add Clinical Variable Metadata

 

The function is easy-to-use and straightforward, allowing users to manage their clinical data efficiently and effectively. For more details on the function, please refer to our wiki page

Stay tuned for additional functionality coming at the end of this year, including support for CDISC formatted files, to include time-series measurement data.

[Important Land Update] Land Filter Now Carries Over Across Multiple Searches

Vivian Zhang

Omicsoft's Lands are known for being comprehensive, powerful and integrated, allowing users to navigate across samples, genes, data types, datasets and platforms. As comprehensive and flexible as it can be, the system may appear to be complex for some users, with growing numbers of samples, datasets and data types. To help user apply filters more easily and efficiently, Omicsoft recently improved its filtering logic in the Lands. 

Previously, filters applied to one search do not carry over from the main Land tab, requiring users to apply filters all over again for any new search.

For example, if the user wants to compare gene expression FPKM for EGFR in KIRC (Kidney Renal Clear Cell Carcinoma) and KIRP (Kidney Renal Papillary Cell Carcinoma), the first step might be to filter the tumor types in the TCGA_B37 main tab (to see sample numbers and understand the overall distribution of samples). Next, the user can search for EGFR and go to Gene FPKM view (Step 2). If the user wants to see the gene expression of TP53, previously the Land doesn't carry over the filter and the user needs to redo the filter again (Step 3).

Step 1  filter for Tumor Type KIRC and KIRP, and check sample statistics (data availability in this example search):

Step 2 search for Gene EGFR and view Gene FPKM view:

Step 3 search for Gene TP53, the sample filter does not carry over and the user has to redo the filter all over again:

 

Now with the new version, the filter carries  over without any additional steps:

 

Imagine when one has already filtered many steps and navigated to a group of samples/genes that appear intriguing, how easy and time-saving it becomes to directly have all the filter steps applied to the new search. 

This filter logic applies to all left-hand side filter tabs including Sample, Comparison and all data type filter tabs. 

If the user does not want to apply the filters, simply click Clear All Filters button to reset everything:

[New Feature] Geneset Analysis Updates: Enrichment Score, Volcano Plot and Summary Bar Plot

Vivian Zhang

Early this fall, Omicsoft released the new Geneset Analysis functonality (See webinar  Announcing GeneSet Analysis Functionality, integrated with Omicsoft’s Land databases and blog post Geneset Analysis Functionality: Integrated With Omicsoft Land Databases.). It helps users to identify comparisons containing similar gene set enrichment from both tens of thousands of gene sets in the Lands as well as customer gene sets, with directional results. Geneset Analysis is under active development, and we would like to update you with a few new features since its release.

GeneSet Analysis result

GeneSet Analysis result

 

Gene Set Enrichment Analysis Report
 

The Geneset Enrichment Analysis Report reports p-value, enrichment score, direction of enrichment and other annotation information:

Geneset Enrichment Analysis Report

Geneset Enrichment Analysis Report

 

Enrichment Volcano Plot

 

Enrichment Volcano Plot is a plot of Enrichment Score vs P-Value. The Enrichment score for the gene set is the degree to which this gene set is overrepresented at the top or bottom of the ranked list of genes in the comparisons. The plot helps to visualize potential gene sets of interest to further research on, with indication of enrichment directions. 

Enrichment Volcano Plot

Enrichment Volcano Plot

Summary Bar Plot

The Summary Bar Plot helps to visualize the number of overlapped genes and dynamically links to those genes with details in details window.

Summary Bar Plot

Summary Bar Plot

 

If you an Omicsoft Land customer, give it a try with the latest Array Suite version. Let us know any comments or suggestions you have!

[Land Tutorial] Getting Started with OncoLand

Vivian Zhang

OncoLand is an Oncology database and visualization software that helps users explore public and private cancer genomics datasets. It contains tens of thousands of carefully processed and curated oncology -Omic data samples. OmicSoft uses the Land framework to deliver an increasing number of large datasets, including data types such as RNA-Seq, DNA-Seq, miRNA-Seq, Copy Number Variation, Gene Expression Chip, Protein Expression, Methylation and hundreds of clinical measurements. 

Omicsoft contains data from more than 10 large public dataset, including TCGA, CCLE, CGCI, ICGC, TARGET, Multiple Myeloma, GTEx, Blueprint and more. In this blog, we will introduce our data content based on our video tutorials: Getting Started With OncoLand

For more details about Land content, please refer to our NEW wiki pages: Introduction to TCGA Land Content and Introduction to CCLE Land Content

A first look at OncoLand

Most our OncoLand users are likely to be familiar with our Land interface. After selecting Land, you are likely to see the graphical interface similar to the following:

Example TCGA_B37 default view, displaying Sample Distribution view. 

Example TCGA_B37 default view, displaying Sample Distribution view. 

 

TCGALand Introduction and Overview

TCGA, The Cancer Genome Atlas, is a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of genome analysis technologies. TCGALand is OncoLand's signature Land, it contains RNA-Seq, Expression Array, DNA-Seq, CNV, Methylation, and Protein data from more than 30 tumor types. 

TCGALand Sample Distribution across Tumor Type.

TCGALand Sample Distribution across Tumor Type.

TCGALand provides table and figure views on the sample, gene and clinical data level. We will introduce genomic data views in the following article, or you can refer to our video tutorials: Getting Started With OncoLand. Here, we would like to highlight clinical data views, which is introduced in the TCGALand Introduction and Overview video clip.

Clinical Significance - Group Association is a dynamic view showing the association of all clinical variables with the selected grouping variable. It quickly provides insights on which clinical variables are potentially associated with the selected grouping variable. 

Clinical Association for TCGALand Tumor Type.

Clinical Association for TCGALand Tumor Type.

Another useful view is Survival View. It plots survival rate over time for selected grouping variables

TCGALand Survival Plot by Tumor Type.

TCGALand Survival Plot by Tumor Type.

 

CCLELand Introduction and Overview

The Cancer Cell Line Encyclopedia (CCLE) project is an effort to conduct a detailed genetic characterization of a large panel of human cancer cell lines. CCLE provides public access analysis and visualization of DNA copy number, mRNA expression, mutation data and more, for 1000 cancer cell lines. CCLELand groups data according to Primary Site (Tissue), with histology as the secondary grouping.

CCLELand Primary Grouping by Primary Site, instead of Tumor Type.

CCLELand Primary Grouping by Primary Site, instead of Tumor Type.

 

Stay tuned for more on OncoLand! 

[New Feature] Geneset Analysis Functionality: integrated with Omicsoft Land databases

Vivian Zhang

Gene Set Analysis is a powerful tool to help users who have their own gene signatures and would like to identify comparisons or other signatures containing similar gene set enrichment from both tens of thousands of comparisons in the Lands as well as customer gene sets for on-premises customers. Recently, Omicsoft officially released our new GeneSet Analysis function. For more details, check out our webinar recording Announcing GeneSet Analysis Functionality, integrated with Omicsoft’s Land databases presented by Matt Newman, VP of Business Development at Omicsoft on September 28th, 2016. 

Previously, Omicsoft's Land system offered a simplified GeneSet Enrichment Analysis. It allowed users to compare their own gene sets with those contained in the Lands: 

Although this was powerful enough to identify comparisons with similar gene sets:

1. it was restricted within a specific Land of choice and not shared across Lands

2. it did not take directionality into account

3. it was not able to include other genesets beyond Land data as target gene sets 

4. it required the user to be familiar with the Land system, and not just the analysis sub-system of Array Suite.

Even though Omicsoft's Array Studio also provides a Molecular Signature module that allows users to compare to Broad's molecular signature database, the Molecular Signature module also does not take directionality into account and requires user to add straight lists to Array Studio Projects, with no ability to incorporate inference reports, nor any of the important data stored within the Lands or easily incorporate customer Gene Sets.

 

In order to more fully leverage Omicsoft's data assets, we have officially released our new GeneSet Analysis module. The new GeneSet Analysis allows the users to query across OncoLand, DiseaseLand, Molecular Signatures, and more. 

GeneSet Analysis Wizard

GeneSet Analysis Wizard

In addition to the geneset databases included, the new GeneSet Analysis also provides directional results -- up and down p-values and directions.

GeneSet Analysis result

GeneSet Analysis result

We are still in active development of the GeneSet Analysis module, constantly improving our content, functions and visualizations. Here are a couple examples we are working on:

1. Multi-species data support in addition to human and mouse data

2. Additional visualizations based on table results

If you have any comments or suggests, please let us know. 

 

Want to give it a try? Please check out our latest webinar Announcing GeneSet Analysis Functionality, integrated with Omicsoft’s Land databases and our GeneSet Analysis wiki for detailed illustration. 

 

 

[OncoLand Case Study] Find genes that are frequently co-mutated with your gene-of-interest: Co-mutation of TP53 and ATRX when IDH1-R132 is mutated

Vivian Zhang

The IDH1 gene encodes isocitrate dehydrogenase, which is  involved in NADPH production, especially in the brain. Mutations in IDH1 are frequently found in low grade and high grade gliomas (Low grade (grade II), anaplastic (grade III), and glioblastoma (GBM, grade IV).). (Research Article: IDH1 and IDH2 Mutations in Gliomas) These mutations play an important role in gliomagenesis and thus have clinical interest. We can query OncoLand to learn about IDH1 mutations, and other genes frequently co-mutated. For details, please refer to our OncoLand case study wiki:

Identify mutation hotspots in a gene of interest

In several cancers, IDH1 is frequently mutated at arginine 132, which alters the enzyme's active site. We can visualize the frequencies of mutations at different sites in each tumor. As we can see, our data confirms that IDH1 arginine 132 is frequently mutated in low grade gliomas (LGG) and glioblastoma (GBM):

TCGALand DNA-Seq Somatic Mutation Site Distribution View. 

TCGALand DNA-Seq Somatic Mutation Site Distribution View. 

The user can create a SampleSet, for example the one shown below, IDH1_mutaion, from the Analytics | Generate Sample Set | Generate Site Mutation Status SampleSet. 

SampleSet: IDH1_mutation

SampleSet: IDH1_mutation

Identify other genes that are co-mutated with your gene of interest

With the SampleSet, we can identify the gene mutations that are correlated through Analytics | Integration Analysis | Sample Grouping to Mutation. The test may take a few minutes if all genes are queried, and the results will be available from the Analytics | Open Result Set menu. From the results table, we can rank genes with the PValue from the Fisher Exact Test to identify the correlated genes, for instance ARRX and TP53 in LGG and GBM:

Analytics | Integration Analysis | Sample Grouping to Mutation Test results. Rank by PValue, filter by only co-occurring gene in LGG and GBM.

Analytics | Integration Analysis | Sample Grouping to Mutation Test results. Rank by PValue, filter by only co-occurring gene in LGG and GBM.

Visualize Co-mutation patterns with the Alteration Omicprint

There are several ways to visualize co-mutation frequencies of multiple genes. While the "Alteration Distribution" displays the number of samples mutated in any gene of the GeneSet, "Somatic Co-mutation Frequencies" will display the distribution of samples with different mutation loads. The "Alteration Omicprint" efficiently displays per-sample mutation status of one, ten, or even hundreds of genes. You can also generate custom Omicprinst based on custom queries if you want to query mutation status. Please check out our case study tutorial videos to learn how to perform the analysis. 

Alteration Omicprint displays gene alteration status for multiple genes for corresponding samples. Custom quires for IDH1 and TP53 somatic mutation status, and BMP2 RNA-Seq FPKM are created. Next, check out Custom Query Omicprint view. For each custom query, sample status is displayed. As we can see, samples with mutated IDH1 and TP53 frequently over-express BMP2 in GBM. 

Alteration Omicprint displays gene alteration status for multiple genes for corresponding samples. Custom quires for IDH1 and TP53 somatic mutation status, and BMP2 RNA-Seq FPKM are created. Next, check out Custom Query Omicprint view. For each custom query, sample status is displayed. As we can see, samples with mutated IDH1 and TP53 frequently over-express BMP2 in GBM. 

[Land Update] Brief Introduction of TumorMutation and OncoGEO in Oncoland

Vivian Zhang

In this blog, we would like to introduce two recently updated Lands in Oncoland: TumorMutation2015 and OncoGEO2015.

TumorMutationLand is a collection of mutation and copy number tumor data from more than 2400 samples. The data are from important publications that are not included in other Lands.

TumorMutation2015 Land Data Availability (Partial list).

TumorMutation2015 Land Data Availability (Partial list).

TumorMutation2015 Land Sample Distribution.

TumorMutation2015 Land Sample Distribution.

OncoGEO currently has over 1200 RNASeq samples from GEO and Sequence Read Archive (SRA). It serves as future home to “comparison” data from GEO (similar to data provided in ImmunoLand comparing Disease vs Normal, Treated vs Control, etc.).

OncoGEO2015 Land Data Availability.

OncoGEO2015 Land Data Availability.

OcoGEO2015 Land Sample Distribution.

OcoGEO2015 Land Sample Distribution.

[Feature Update] Powerful variant search in GeneticsLand

Vivian Zhang

Since last month's blog post on GeneticsLand:  GeneticsLand: A Turnkey Solution For Genetic Data Storage, Analysis And Annotation, we have continued to rapidly improve the views and functionality of GeneticsLand. The recently improved Search Variants function provides informative details on Variants Annotation, Frequency across populations, GWAS and eQTL Details, Region Association Plot and Reference Links.

Example views:

Variant Annotation View.

Variant Annotation View.

Variant Frequency Across Different Population View.

Variant Frequency Across Different Population View.

GWAS Catalog View. By clicking on, for example, Gene ID, GeneticsLand will link to Variants table of all variants in the displayed gene: 

GWAS Catalog View. By clicking on, for example, Gene ID, GeneticsLand will link to Variants table of all variants in the displayed gene: 

                                  

 

 

 

 

 

 

 

 

 

 

 

 

eQTL Information

eQTL Information

Region Association View. The color represents the correlation with the queried variant.

Region Association View. The color represents the correlation with the queried variant.

Reference Links to public resources including dbSNP. SNPedia, GTEx, Google scholar, Haploreg, RegulomeDB.

Reference Links to public resources including dbSNP. SNPedia, GTEx, Google scholar, Haploreg, RegulomeDB.


[Feature Update] Improved Mutation Annotation

Vivian Zhang

Mutation identification is one of the most important types of genomic research analyses. The genomic position of the identified mutations is a critical factor to assess the importance and functionality of the mutations. Recently, we improved our mutation annotation categorization to help users better research mutation. 

Now, we added a third category, Consequence, beyond the original Type and Location of gene information: 

The new category helps to clarify the effects of the mutation, including the following categories:

  • SYNONYMOUS: change of a single nucleotide in CDS but not causing amino acid change
  • NON_SYNONYMOUS: change of a single nucleotide in CDS and causing amino acid change
  • FRAME_SHIFT: Frameshift (total of NT changes are not 3N) in CDS caused by insertion, deletion or indel
  • STOP_GAIN: mutation creating a stop codon
  • STOP_LOSS: mutation destroying a stop codon
  • NO_CONSEQUENCE: any consequence not described above, such as SUBSTITUTION in the intergeneic regions. It is only a technical (not a biological) definition.

[NEW FEATURE] GENESET ANALYSIS - VISUALIZE EXPRESSION COMPARISON FOR ANY SET OF GENES OF INTEREST

Vivian Zhang

ImmunoLand is Omicsoft's most recently developed Land database. It is an immune-related genomics database and visualization software that helps users explore public and private immune-focused genomics datasets. In ImmunoLand, researchers can search a gene, multiple genes, a pathway, a project or multiple projects. With the recently implemented Gene Set Analysis, users can visualize comparison data for any set of genes of interest.

After the user create a geneset, for example, GSE26927:

The user can go to the view directory and select Gene Set Analysis.

By selecting the geneset that was just created, the user will get a GeneSet Enrichment Analysis plot, displaying comparison P-Value of the comparisons that have overlapped genes with the selected geneset. As an alternative, the user can search for the geneset from the search gene toolbox.

[New Feature] Sample filtering made easy with new String Filter function

Vivian Zhang

At Omicsoft, we have a continually growing Land user base. The increasing number of public genomic research projects and datasets has made it possible to research on public samples with certain disease, gene mutation or clinical phenotype without spending millions of dollars to conduct the experiments. As we continue to improve Land sample search and filter capacities, we are glad to introduce a new String Filter function that will make it easier to search multiple samples, genes or any conditions. 

For any string variables, no matter it is sample ID, gene name, clinical measurement or others, the user can filter multiple strings using Add String Filter function:

For example, if the user is interested in gene expression of EGFR gene and wants to further research on a few samples with high EGFR expression in breast cancer, he or she will likely check the gene FPKM view of EGFR gene in TCGA Land and identify a few samples:

Samples with high EGFR expression in breast cancer patients are highlighted in pink.

Samples with high EGFR expression in breast cancer patients are highlighted in pink.

Next, the user can right click on SampleID filter, as it is shown in Figure 1, and choose Add String Filter (Select) to select sample ID names:

Or, the user can choose Add String Filter (Input) and just copy in the sample ID:

The string filter function applies to all string variable filters. Now, let's get started with fast string filtering on your sample of interest!

Note: Array Studio version requirement: v8.1.0.95  or higher.

[New Feature] ImmunoLand Update: Viewing Expression Level of Multiple Genes (Gene Pathway) through Multigene Variable View

Vivian Zhang

Genetic diseases are often results of malfunctions in multiple genes or gene pathways. Being able to understand the correlation between genes or to compare multiple genes is crucial in genomic research. At Omicsoft, we try to provide multiple gene views and pathway views to make researcher's life easy. In the recent released ImmunoLand, a new Multigene Variable allows user to view gene expression level of multiple genes of interest in the same chart.

Previously, ImmunoLand provides view for gene level and transcription level expression of single gene:

Transcript FPKM of EGFR categorized by disease category. 

Transcript FPKM of EGFR categorized by disease category. 

Now, a new multigene variable view is available:

Gene FPKM of SLC35E2B, BCAS3, BTRC and EYA1.

Gene FPKM of SLC35E2B, BCAS3, BTRC and EYA1.

The user can further specify multiple grouping categories:

Gene FPKM of SLC35E2B, BCAS3, BTRC and EYA1 grouped by disease category.

Gene FPKM of SLC35E2B, BCAS3, BTRC and EYA1 grouped by disease category.

[Feature Update] Checking Sample Details Made Easy: Improved Land Sample TableView Visualization

Vivian Zhang

At Omicsoft, we constantly expand our sample datasets, improve graphical visualization and introduce new features, all for the goal to help researchers better conduct genomic research. For 2015 Q2 Land updates, we formally released our ImmunoLand and significantly improved clinical integration. (For more details, please watch our recorded webinar:  Omicsoft 2015 Q2 Land Updates.) With more than 100,000 samples of different types of genomic data and hundreds of clinical measurements available, we also improved our user interface for better data display and query. One of the improvements is the improved Land sample TableView visualization.

In previous version, the sample TableView, where all the sample information including clinical information is displayed, appears to be: 

The size of each cell is predefined regardless of the length of content in the cell. To check the longer content, user needs to mouse over or manually expand the cell. 

With the new user interface, the cell size is customized based on the cell content, making all the information clear and easy to check at a glance :

At Omicsoft, we live to improve for the better good of customer need. While you are enjoying the convenience of the new TableView, talk to us if you have suggestion or request to improve our product and service.

[Feature Review] Save Customized Views for Future Usage and Sharing through in the "Lands"

Vivian Zhang

In Omicsoft's Lands (ImmunoLand and OncoLand), we have pre-configured over 40 views for different data types, including RNA-Seq, DNA-Seq, miRNA-Seq, Copy Number Variation, Gene Expression Chip, Protein Expression, Methylation and hundreds of clinical measurements. While we design our Lands to be extremely powerful in providing visualizations with customizable gene, sample and project filters along with customizable graphical designs, we acknowledge that it sometimes takes time to explore the data. For some of our customers, admin/super user want to configure their company or group specific views. Or, users in a specific research group may want to set customized views that are most commonly used for a specific disease or project. All these customization can be done through the Land custom view format.

How many steps does it take to display the expression of gene POLR3A at difference stages of systemic sclerosis comparing to normal control in study GSE58095? To draw the plot like the one below, the use needs to search the gene POLR3A, click on Expression | Expression Intensity view, filter project GSE58095, change grouping to disease category and make sure the color and scale are of the preferred settings. 

Expression Intensity of POLR3A in different stage of systemic sclerosis in study GSE58095.

Expression Intensity of POLR3A in different stage of systemic sclerosis in study GSE58095.

After the user has made it to this view and feels it can be potentially very informative for his or her project, the user can save the view: 

Furthermore, if the user wants to share to view with the whole team and would like to replicate this query for other genes or projects, the user can ask the admin to create custom views. To see how to create customer views in land, please check out our wiki page: Custom Views in Land or contact us. As a standard user, the query can be set up as Custom Views, or even be grouped into a selection of custom views into specific project folder, like this one for Scleroderma Projects:

[GENOMIC RESEARCH] Mutation Analysis with improved mutation annotation system

Vivian Zhang

Identifying disease associated gene mutations is an important part of genetic disease research in designing of targeted drugs. To accelerate gene mutation analysis, Omicsoft's Land database provides rich mutation visualization views including mutation and somatic mutation site distribution, mutation landscape, and mutation genome browser.  Omicsoft's recent improvement to it's mutation annotation system allows user to annotate and filter mutations based on hundreds of criteria, including mutation confidence, position, gene information, functional mutation, eQTL information, regulation information protein information, and clinical information, using publicly available databases as the source for annotation. 

Land mutation annotation directory

A simple example is to compare the mutation distributions with and without synonymous mutation included:

Figure: TP53 mutation distribution with synonymous mutation included

Figure: TP53 mutation distribution with synonymous mutation included

Figure TP53 mutation distribution with synonymous mutation excluded

Figure TP53 mutation distribution with synonymous mutation excluded

With an improved annotation system, the user can also filter mutation and identify clinical phenotype associated mutations, using database resources like ClinVar.

Databases additionally include SIFT, Polyphen, 1000 Genomes, ExAC, ESP6500, GTEx eQTL, RegulomeDb, and Interpro Domain, and the system is designed to be able to support countless other databases or customer's internal annotation systems as well.

Figure: TP53 mutation landscape in Li Fraumeni Syndrome samples

Figure: TP53 mutation landscape in Li Fraumeni Syndrome samples

[IMMUNOLOGICAL RESEARCH] Research an Immunological Genome Study in ImmunoLand

Vivian Zhang

Traditionally immunology studies are focused on a particular protein or pathway. However, immunological activity is a system-level response, which is well suited for large-scale integrative approaches and requires an overall perspective on the immune system(s). With advanced technologies enabling large-scale, genome-level approaches, immunology studies are embracing the era of immunogenomics (Related Readings: Beyond the transcriptome: completion of act one of the Immunological Genome Project. ).

ImmunoLand is Omicsoft's most recently developed Land database. It is an immune-related genomics database and visualization software that helps users explore public and private immune-focused genomics datasets. In ImmunoLand, researchers can search a gene, multiple genes, a pathway, a project or multiple projects across more than 22,000 samples from public projects, including GEO (Gene Expression Omnibus), SRA (Sequence Read Archive), ArrayExpress, dbGAP (The Database of Genotypes and Phenotypes), and other large data repositories like BluePrint, GTEx, and ImmGen (The Immunological Genome Project). 

Here is how:

Immunological genomics studies are currently conducted based on many different diseases, immune cells, activation responses, treatments, tissues, states of cell differentiation and so forth. In ImmunoLand, each study in the database is carefully reviewed by Omicsoft’s curators, with meta data clean-up occurring, outliers removal, and then statistically-driven comparisons generated for each study. ImmunoLand allows the users to be able to search across projects, or search directly for a project of interest. For example, let's search for the project GSE37448 from the Immunological Genome Project:

Figure: Gene Expression Intensity Heatmap categorized by disease category

Figure: Gene Expression Intensity Heatmap categorized by disease category

By default, the view is displaying a heatmap of the expression intensity of samples, categorized by disease category. It is interesting to look at the heatmap of the genes with highest differential expression across cell types: 

Figure: Gene Expression Intensity Heatmap of genes with Gene Rank Expression Intensity <100

Figure: Gene Expression Intensity Heatmap of genes with Gene Rank Expression Intensity <100

Figure: Expression Per-Gene View showing gene CD3G

Figure: Expression Per-Gene View showing gene CD3G

Next, the user can search for their gene(s) of interest across projects to compare different comparisons (diseases, immune cells, activation responses, treatments, tissues, states of cell differentiation). The GSE37448 study was done in mouse. It might, for instance, be interesting to check out the gene expression in human organs in ImmunoLand2015 instead of ImmunoMouse2015.

 

[Feature Review] Dynamic Correlation among Gene, Structural Variation and Protein

Vivian Zhang

Cancer is a complex disease, and like other complex diseases, changes in gene expression and structural variation correlate with each other and together play an integrated role in the development of cancer. Understanding the correlation among gene expression, structural variation and protein expression is indispensable in oncology research. 

Figure 1. Relationships between NRC signature genes and their driver-mutating genes in the protein interaction network. Jie Li et  al.  2010

Figure 1. Relationships between NRC signature genes and their driver-mutating genes in the protein interaction network. Jie Li et al. 2010

Oncoland provides dynamic correlation visualization for RNA-Seq, miRNA-Seq, somatic mutation, copy number variation and protein RPPA data.

Take ESR1, estrogen receptor 1, for example, the RNA-Seq Expression=> RNA-Seq Expression provides the correlation and scatterplot view of ESR1 expression with all other genes:

In OncoLand, you can filter any criteria of your interest in tumor type, sample metadata, clinical subpopulation and more.

For instance, if you were interested in Estrogen Receptor positive samples in primary breast cancer, just filter it:

The correlation and scatterplot view will dynamically change with the filter criteria:

REFERENCE

Li, Jie, et al. "Identification of high-quality cancer prognostic markers and metastasis network modules." Nature communications 1 (2010): 34.



Searching for the EGFR vIII variant in OncoLand

Matt Newman

OncoLand data is based on a known gene model (UCSC), which can sometime miss the detection of certain "novel" transcripts in the default visualizations.  By storing the exon junction level coverage, we can search out "novel" junctions.  In this example, we look at the EGFR vIII variant, which is not part of the UCSC gene model, but can be discovered here by using the variety of tools in Oncoland.



Detection of high frequency mutations in tumor suppressors

Matt Newman

In this video, we use OncoLand to find a list of the highest frequency mutations per tumor type across TCGA, limited to a list of known tumor suppressors.  This technique can easily be extended for any gene, and the categorization can be across all tumor types, histologies, or any other classification available in the system or that you create.  It also applies to all datasets, not just TCGA, so could easily be applied to ICGC, CCLE, etc.