PRICING & INQUIRIES

For pricing and inquiries, send an email to sales@omicsoft.com.

5001 Weston Parkway, Suite 201
Cary, NC 27513
US

888-259-6642

Overview

Omicsoft is the leading provider of Next Generation Sequencing, Cancer Genomics, Immunology, and Bioinformatics solutions for Next Generation Sequencing Data and Gene Expression Analysis.

Exciting Updates and Latest News

Keeping you up-to-date with the latest in NGS, Bioinformatics Analysis, and cancer genomics with blogs on Array Suite, OncoLand (TCGA and more), ImmunoLand, and more.

[Array Studio Video Tutorial] DNA-Seq Analysis: Sequence Variation, Copy Number Variation And More

Vivian Zhang

With mapped DNA-Seq data, Array Studio allows users to identify, visualize and annotate sequence, mutations and copy number variations. In this article, we walk you through a few important DNA-Seq analysis modules.

 

 

1. Identify DNA Sequence Variation and Generate and Annotate VCF Variant Data

Users can run Summarize Variant Data module to identify SNPs, insertions and deletions. It also automatically runs in DNA-Seq pipeline. The output Variant Report can be annotated with Mutation Annotator databases.

This module returns a report table showing the gene name for each annotated mutation, chromosome, position, reference allele, mutation allele, Annotation type (intron, non-synonymous, 5’ UTR, synonymous, 3’UTR, etc.), AAPosition (amino acid position of change), AAChange (amino acid change—if there is one), transcript ID, transcript name, transcript strand, distance to 3’ end, and distance to 5’ end.

This module returns a report table showing the gene name for each annotated mutation, chromosome, position, reference allele, mutation allele, Annotation type (intron, non-synonymous, 5’ UTR, synonymous, 3’UTR, etc.), AAPosition (amino acid position of change), AAChange (amino acid change—if there is one), transcript ID, transcript name, transcript strand, distance to 3’ end, and distance to 5’ end.

Variant Call Format (VCF) data is the most common format for reporting sequence variation. Array Studio variant detection can output merged or individual VCF files, and can organize and annotate these data for efficient filtering in Array Studio. and public database-based annotators, please check out our wiki page on Annotate Variant Files. Array Studio provides a large number of classifiers and annotators to improve the identification of interesting variants. Examples include:  1000GenomesClinVarGERP++dbNSFP (database for nonsynonymous SNPs' functional predictions), GRASP : Genome-Wide Repository of Associations between SNPs and Phenotypes, GWAVAHaploregRegulomeDB.

 

2. Identify Somatic and Germline Mutations in Matched-Pair Data

If you have matched pair DNA-Seq data between tumor and normal from the same subject, you can run the Var Scan 2 matched pair analysis. this annotates somatic mutations that compare two samples from the same subject such as tumor or normal for differences in genotype. It will generate calls for genotype of the samples and flag germline versus somatic mutations. 

For each variant, the result reports "Normal coverage/frequency", "Tumor coverage/frequency", "Somatic/variant p-value", "Call", "NormalGenotype", "TumorGenotype" and "FilteringStatus", allowing the user to filter the result and identify somatic mutations of interest.

For each variant, the result reports "Normal coverage/frequency", "Tumor coverage/frequency", "Somatic/variant p-value", "Call", "NormalGenotype", "TumorGenotype" and "FilteringStatus", allowing the user to filter the result and identify somatic mutations of interest.

 

 

3. Summarize NGS Coverage Data to Detect Copy Number Variants and Visualize NGS Copy Number Variations

DNAseq Whole-Genome and Whole-Exome data can be processed in Array Studio to detect amplification and deletion events. By comparing the relative signal between samples from the same subject, regions with unusually high or low signal in the disease sample will be flagged as a potential Copy Number Variation (CNV) event. The result can be visualized in a few ways, including scatter plot, segment chromosome view, or in genome browser. 

Summary Copy Number report provides "Observation", "Log2Ratio", "Copy Number", "Normal Coverage", "Tumor Coverage", and segment information.

Summary Copy Number report provides "Observation", "Log2Ratio", "Copy Number", "Normal Coverage", "Tumor Coverage", and segment information.

Genome browser view of coverage data. The highlighted example clearly has an increased coverage for this genomic region, while the coverage is comparable in adjacent region.  

Genome browser view of coverage data. The highlighted example clearly has an increased coverage for this genomic region, while the coverage is comparable in adjacent region.  

 

For more details and additional DNA-Seq analysis function, please check out our video tutorial Getting Started with DNAseq Analysis or search on our wiki page about your specific topic of interest.

 

[Event] Learn|Network|Impact 2017 OmicSoft User Group Meeting

Vivian Zhang

OmicSoft, now a QIAGEN company, would like to invite you to our annual Omicsoft User Group Meeting being held in Cambridge, MA on September 19-20, 2017. 

FREE registration and attendance, limited time only. For registration and more details, please directly go to our UGM page.

In the past ten years, OmicSoft has helped numerous users from major pharma and biotech companies (as well as research institutions) accelerate their bioinformatics and genomics research (who are our customers?). Last year, OmicSoft successfully held our kick-off OmicSoft User Group Meeting. More than 30 leading pharmaceutical and biotech companies, more than 100 experts and scientists in the field of bioinformatics/genomics/genetics attended the meeting.

Last year, our action-packed one-day meeting provided an open platform for our users and industry peers to learn, to network, and to impact the development of OmicSoft products. Click here for 2016 OmicSoft UGM meeting agenda. This year OmicSoft has had several milestones and technology breakthroughs including: our acquisition by QIAGEN, Array Suite 10,0. release, Cloud-Based Lands, Single Cell RNA-Seq support, upcoming integration with QIAGEN's bioinformatics products, Web-based solutions and more. We are expanding the 2017 OmicSoft User Group Meeting into a two-day event with:

  • More product training - Get the most out of Omicsoft products, and QIAGEN's bioinformatics products
  • More user talks and networking opportunities - Learn from others' experiences, industry best practices, and expand professional network
  • More One-On-One meetings - Get problems solved, questions answered and get personalized training from our experienced staff

 

Learn, network, impact. Come join us and leading pharma, biotech companies and research institutions.

  • Learn to Use OmicSoft Products More Efficiently 
  • Impact Future Product Development
  • Network with Peers and Industry Experts
  • Get One-On-One Help from Experts
  • Explore more QIAGEN Bioinformatics products

 

Please contact us for potential presentation and collaboration opportunities. 

[Ten Year Anniversary Release] Array Suite 10.0: Accelerating Bioinformatics Research For Ten Years

Vivian Zhang

OmicSoft, now a QIAGEN company, is excited to announce Array Suite 10.0, the ten year anniversary release to its flagship software product. Array Suite provides the backbone of OmicSoft's software and data service offerings, including OncoLand, DiseaseLand and GeneticsLand. In the past ten years, Array Suite has helped numerous users from major pharma and biotech companies (as well as research instutitions) accelerate their bioinformatics and genomics research. 

Founded in 2007, OmicSoft had a vision to focus on biomarker data management, visualization, and analysis. Array Suite (Array Studio and Array Server) differs from standard desktop solutions or open source solutions, with Array Studio providing the graphical user interface for NGS and OMIC analysis and visualization and Array Server providing the enterprise back-end solution for pipelines, project management, sample/file management, data storage and OMIC data warehouse (Land database).  In January 2017, QIAGEN enhanced its portfolio with the acquisition of OmicSoft, allowing us to imagine new possibilities for integration with the larger QIAGEN bioinformatics portfolio.  We will update everyone on these enhancements, and how they will benefit our users, in the near future.

"Although much has changed in the past ten years, in both software and the company itself, I'm proud that OmicSoft Corporation has remained unchanged it it's fundamental desire to implement useful tools, driven by our customer's needs, in the -OMICS space.  I am confident that this will continue into the future with our acquisition by QIAGEN, and I look forward to many more years of Array Studio helping to drive exciting breakthroughs and research by our customers" - Matt Newman, VP Business Development

OmicSoft is extremely proud of it's customer-centric product development and customer support, and we look to continue this into the future, as we have for the past 10 years.  With our latest update, this trend continues. Array Suite 10.0 includes revolutionary updates, with multiple technology breakthroughs including: Cloud-Based Lands, Single Cell RNA-Seq support, ENCODE integration and many other updates to both analytics and framework.

Here is a list of some of our exciting updates:

1. Cloud-Based Lands  
2. Single Cell RNA-Seq support  
3. ENCODE integration in Omicsoft genome browser  
4. New gene set analysis
5. Streaming large tables  
6. Smart labeling in multi-charts
7. Smart caching for cloud/HTTP bam sources
8. New analytic modules including variable selection and prediction  
9. Significant improvements on plasmid-host integration  
10. Various genome browser improvements

For more details, please join our webinar, Omicsoft Array Suite 10.0 Release on May 3rd, 2017. Jack Liu from Omicsoft will present to you the top ten new features and more. Register here.

[Array Studio Video Tutorial] DNA-Seq Analysis Basics: Getting Started With DNA-Seq Pipeline Analysis And Data QC

Vivian Zhang

Omicsoft Next Generation Sequencing (NGS) analysis includes NGS (next generation sequencing) bioinformatics tools for the entire process, from QC to alignment to post-alignment summarizations and analysis. Array Studio provides a suite of tools to quickly, easily, and reliably process DNA-seq data. In this article, we introduce our tutorial on DNA-Seq analysis pipeline and data QC. We will discuss more on downstream analysis functions in the coming blog(s). 

 

Getting Started with DNA-seq pipeline functions

 

1 Running the DNA-seq pipeline

In Array Studio, users have the choice of either executing each step of the analysis one-by-one, or can use the DNA-seq pipeline function. Our video tutorial will walk you through the functions automatically executed by the standard DNA-seq pipeline, starting with raw reads in .fastq format.

Pipeline.png
DNA-Seq Pipeline. Users have the options to choose to perform analysis steps such as raw data QC, post-alignment data QC, summarize mutation and SNP etc. 

DNA-Seq Pipeline. Users have the options to choose to perform analysis steps such as raw data QC, post-alignment data QC, summarize mutation and SNP etc. 

2 Raw Data QC

If you choose to perform analysis step by step, before aligning your DNA-seq data, you must first perform quality control (QC) on the raw data, to spot common problems like adapter or barcode sequence contamination, degraded quality at ends of reads, or problematic samples. The Array Studio Raw Data QC Wizard reports a number of useful measures of raw NGS quality.

Additional information about how to interpret these functions can be found in the RNA-seq Raw Data QC Analysis video.

3 Map DNA-seq Reads to Genome

Map (DNA-Seq) Reads To Genome is a part of the DNA-Seq pipeline. Users can also align reads independently. In the Advanced tab, the user can set a number of options, including read trimming, adapter stripping and more.

 

4 Aligned Data QC

Array Studio automatically generates an Alignment Report after aligning reads to the genome or transcriptome. Additional alignment statistics can be generated by running the Aligned Data QC module.

 

Additional aligned data QC metrics include: 

1 Alignment Metrics
2 Flag Metrics
3 Profile Metrics
4 Insert Size Metrics
5 Duplication Metrics
6 Strand Metrics
 

 

Please check out our video tutorial Getting Started with DNA-seq pipeline functions for more detailed illustration. 

[Land Tutorial] Getting Started With DiseaseLand (ImmunoLand And CVMLand) II

Vivian Zhang

Last time, we introduced  some of the basic views in DiseaseLand. In this article, we would like to introduce you to some advanced functionality: SampleSets and GeneSets. 

 

The SampleSet is powerful concept/tool that allow users to create custom sample groupings, based on data in the Land or imported tables. This video tutorial demonstrates several ways to build a SampleSet from data using selection and filters, then uses SampleSets in Land Analytics to scan the entire DiseaseLand to discover differential splicing.

Grouping lesional vs non-lesional Psoriasis samples in ImmunoLand (DiseaseLand). 

Grouping lesional vs non-lesional Psoriasis samples in ImmunoLand (DiseaseLand). 

Serpinb7 Transcript Expression grouped by lesional vs. non-lesional samples. 

Serpinb7 Transcript Expression grouped by lesional vs. non-lesional samples. 

With the newly created sampleset, users can visualize isoform differential expression at the transcript level in the Genome Browser:

Differential expression of Serpinb 7 in lesional vs. non-lesional samples.

Differential expression of Serpinb 7 in lesional vs. non-lesional samples.

3. GeneSets

Besides the SampleSet tool, GeneSets are a powerful tool for grouping and comparing genes, including members of a gene family, a pathway, or co-regulated genes. These GeneSets can be used to discover DiseaseLand studies that share "genetic signatures" of common up- or down-regulated genes with your GeneSet.

After adding GeneSet Some_IBD_Genes, the geneset becomes available for search.

After adding GeneSet Some_IBD_Genes, the geneset becomes available for search.

With the created geneset, users can visualize and perform analyses on the set of genes. 

Heatmap of Comparison for GeneSet Some_IBD_Genes.

Heatmap of Comparison for GeneSet Some_IBD_Genes.

 

Stay tuned for a DiseaseLand comparison views tutorial!

[Land Tutorial] DiseaseLand (ImmunoLand And CVMLand) Comparison Views

Vivian Zhang

 DiseaseLand features Comparison Views, allowing users to easily search and visualize statistical contrasts between groups of samples using common queries: Treated vs Control, Disease vs Normal, Responder vs Non-Responder etc. By searching a gene, the user can visualize the association with comparisons across thousands of projects, and narrow down to find interesting projects interactively. (Additional reading: ComparisonLand ). In this article, we will introduce you on how to use comparison views.

Video Tutorial: Comparison

Comparison Distribution by comparison types:

Comparison distribution by comparison type. Statistics from previous Land version. Actually distribution and number of comparison update quarterly. 

Comparison distribution by comparison type. Statistics from previous Land version. Actually distribution and number of comparison update quarterly. 

 

Search gene and view comparisons:

In DiseaseLand, you can search for a gene and view its expression in all samples or a single project, or you can visualize which comparisons detected up- or down-regulation of the gene. This way, you can identify projects of interest, and discover trends in your favorite gene's regulation.

 

Comparison details for Serpinb 7 by treatment vs. control. By selecting the comparisons (dots) of interest, detailed information will pop up.

Comparison details for Serpinb 7 by treatment vs. control. By selecting the comparisons (dots) of interest, detailed information will pop up.

Comparison Details Views:

Omicsoft uses manually curated metadata to generate statistical tests (called comparisons) for each project/study included in DiseaseLand, generally following the comparisons in the original paper. The Comparison collection is useful for finding the common differential expression patterns/signatures between studies, such as between an microarray and NGS study, or to find links between a gene knockout experiment and a compound treatment study.

When searching for project(s), a few views are available:

Here are a couple example views:

Example Volcano Plot of project GSE38713.

Example Volcano Plot of project GSE38713.

Example Venn Diagram of project GSE14905.

Example Venn Diagram of project GSE14905.

Example Significant Genes of project GSE58121, GSE63980 and GSE63980.

Example Significant Genes of project GSE58121, GSE63980 and GSE63980.

[Land Tutorial] Getting Started With DiseaseLand (ImmunoLand And CVMLand)

Vivian Zhang

 

Starting with the Q2 2016 release, Omicsoft replaced ImmunoLand and CVMLand with DiseaseLand in its Land user interface.  DiseaseLand focuses on datasets of common genetic disease including but not limited to immunological diseases, neurological disorders, metabolic diseases and cardiovascular diseases. 

Relatively new to customers, DiseaseLand has been gaining popularity among our prestigious client companies. For a quick review of DiseaseLand content, please check out our wiki article Introduction to DiseaseLand Content. Today, we would like to introduce to you our ImmunoLand video tutorials to help you quickly get started with DiseaseLand.

 

 

DiseaseLand is similar to OncoLand at large. If you are already an OncoLand user, DiseaseLand has most of the advanced analytics functions as covered in OncoLand trainings. 

Besides sample, gene and variable views, DiseaseLand features comparison views. It allows users to easily search and visualize data using common queries: Treated vs Control, Disease vs Normal, Responder vs Non-Responder etc. Comparison statistics are also available when taking a first look at DiseaseLand:

Comparison Distribution View. Comparison Details and Project Details are available once the users select a group of samples. For examples, Comparison Details include information on comparison test method and comparison category as highlighted in the figure.   

Comparison Distribution View. Comparison Details and Project Details are available once the users select a group of samples. For examples, Comparison Details include information on comparison test method and comparison category as highlighted in the figure. 

 

 

At present, DiseaseLand data is primarily focused on gene expression from microarrays and NGS studies. The user can search for a gene of interest and narrow down to find interesting projects interactively. The default view for DiseaseLand is Disease vs Normal Comparison. By selecting the comparison of interest, sample details of the comparison will be displayed in the Details Window. More details on gene, probe and project level are also available. 

 

 

Experimental designs in projects within DiseaseLand are quite different, and batch effects in microarray projects are difficult to remove. Omicsoft created project-specific views to display expression values based on experimental design within each project. Expression Intensity Project View provides log 2 expression intensity values. The users can easily filter their project of interest, or any other filters like disease, clinical details, or even project contributors.

Expression Intensity Project View of Project GSE14580.

Expression Intensity Project View of Project GSE14580.

 

 

To maximize inter-study comparisons of RNA-seq data in Diseaseland, Omicsoft processes data from each study, starting from fastq files, through a commmon pipeline. Expression values from RNA-Seq studies are expressed as FPKM values, with upper quantile normalization. DiseaseLand offer project-specific views and also a merged view from all samples. Views display log transformed FPKM values. Samples and projects can be filtered interactively to allow exploration of data.

Gene FPKM View of all psoriasis samples in DiseaseLand HumanDisease. Samples can be categorized by project name (Change Symbol Properties) and clicking each project name selects the samples from the project to display the details in Detailed Window.

Gene FPKM View of all psoriasis samples in DiseaseLand HumanDisease. Samples can be categorized by project name (Change Symbol Properties) and clicking each project name selects the samples from the project to display the details in Detailed Window.

 

Stay tuned for more DiseaseLand tutorials!

[News Release] Announcing the Acquisition of Omicsoft Corporation by QIAGEN

Vivian Zhang

Dear Customer: 
 
We are excited to announce that as of January 9, 2017, QIAGEN has acquired OmicSoft, bringing OmicSoft into the larger QIAGEN Bioinformatics organization. 
 
QIAGEN is committed to providing you with the best Sample to Insight solutions in the world. The addition of the OmicSoft portfolio to QIAGEN will enable us to offer you the most comprehensive, integrated commercial bioinformatics solution. Through the efforts of our expanded team, we look forward to helping you gain even more valuable insights. 

Read the full press release here
 
Your channels for ordering and support of OmicSoft products will remain the same for now. This includes OmicSoft’s phone numbers, e-mail addresses and website. You can continue to place your OmicSoft orders and make related product inquiries as you have done before. Please also note that any other QIAGEN-related product questions should continue to be directed to your current QIAGEN contact.  
 
You are a valued customer, and we will notify you in advance of any changes to the ordering or support processes as we work to enable distribution of OmicSoft products through QIAGEN channels. 
 
If you have any questions, please do not hesitate to contact your QIAGEN and OmicSoft sales representatives. 
 
With best regards, 
 
 

01132017 QIAGEN Acquisition.jpg

[New Feature] Manage Land Sample Clinical Data

Vivian Zhang

Omicsoft has been working diligently over the past few months to both strengthen our ability to incorporate clinical data, as well as  growing our list of curated clinical measurements from public datasets. Currently, there are more than 1000 different clinical measurement variables in total, including sample demographics, survival data, symptoms, treatments and more in OncoLand and DiseaseLand. Moreover, users often have their sets of internal clinical data they wish to add to the system. If you have not started leveraging the power of our clinical data subsystem, please take a look at OncoLand Case Study - Clinical Variables for a 10 mins quick video tutorial on how to utilize clinical data to identify novel associations.

To help users better manage Land clinical data, we recently implemented Manage Sample Clinical Data function in Land. This function can be accessed through:

 

 

This function allows users to add clinical data, manage clinical variable meta data, remove samples and remove clinical vatiables: 

Add Clinical Data

Add Clinical Data

Adding clinical data is straightforward. In addition, "Metadata" for clinical data columns can be controlled by adding a second table. For example, clinical data column grouping can be controlled by a table where the first column contains Clinical Data column names, and the second column contains category:

Add Clinical Variable Metadata

Add Clinical Variable Metadata

 

The function is easy-to-use and straightforward, allowing users to manage their clinical data efficiently and effectively. For more details on the function, please refer to our wiki page

Stay tuned for additional functionality coming at the end of this year, including support for CDISC formatted files, to include time-series measurement data.

[Land Update] Omicsoft Quarterly Land Update Summary

Vivian Zhang

Omicsoft is excited to announce it’s latest Land updates, including OncoLand and DiseaseLand.

Highlights include:

OncoLand:

  • Official release of the B38 Human Lands, including TCGA, CCLE, GTEx, Blueprint, and Sanger
  • Additional samples in the TARGET and Blueprint Lands
  • 5700 new Somatic Mutation samples in the TumorMutation Land
  • 350 new samples and 80 new comparisons in the OncoGeo Land
  • 1750 new expression samples in the ClinicalOutcome Land
  • Updated clinical data and CNVCall data in TCGA
  • New Comparison data (Tumor vs Normal) for 24 tumor types in TCGA




DiseaseLand:

  • 3916 single cell samples from 5 projects to the Single Cell Human Land, including seven new cell types
  • 5466 single cell samples from 10 projects to the Single Cell Human Land, including seven new mouse cell types
  • 840 new RNA-Seq samples in Human DiseaseLand, along with additional comparisons
  • 2402 new RNA-Seq samples in Mouse DiseaseLand, along with additional comparisons
  • DiseaseLand now includes over 67,000 human samples, with 3239 comparison from 1000+ projects and almost 21,000 mouse samples, with 2,248 comparisons from over 650 projects. 


Incorporation with the recently introduced Gene Set Analysis module provides extra value to the release, as we now allow users to query against all of the new Land data as well. 
 

 

Matt Newman, VP of Business Development, will spend 45 minutes on December 12th, at 11:00 am EST, to give an overview of all the new datasets and visualizations that are included with this latest release. Please register here. We will contact users about this release update shortly after our webinar. Please stay tuned. 

[Important Land Update] Land Filter Now Carries Over Across Multiple Searches

Vivian Zhang

Omicsoft's Lands are known for being comprehensive, powerful and integrated, allowing users to navigate across samples, genes, data types, datasets and platforms. As comprehensive and flexible as it can be, the system may appear to be complex for some users, with growing numbers of samples, datasets and data types. To help user apply filters more easily and efficiently, Omicsoft recently improved its filtering logic in the Lands. 

Previously, filters applied to one search do not carry over from the main Land tab, requiring users to apply filters all over again for any new search.

For example, if the user wants to compare gene expression FPKM for EGFR in KIRC (Kidney Renal Clear Cell Carcinoma) and KIRP (Kidney Renal Papillary Cell Carcinoma), the first step might be to filter the tumor types in the TCGA_B37 main tab (to see sample numbers and understand the overall distribution of samples). Next, the user can search for EGFR and go to Gene FPKM view (Step 2). If the user wants to see the gene expression of TP53, previously the Land doesn't carry over the filter and the user needs to redo the filter again (Step 3).

Step 1  filter for Tumor Type KIRC and KIRP, and check sample statistics (data availability in this example search):

Step 2 search for Gene EGFR and view Gene FPKM view:

Step 3 search for Gene TP53, the sample filter does not carry over and the user has to redo the filter all over again:

 

Now with the new version, the filter carries  over without any additional steps:

 

Imagine when one has already filtered many steps and navigated to a group of samples/genes that appear intriguing, how easy and time-saving it becomes to directly have all the filter steps applied to the new search. 

This filter logic applies to all left-hand side filter tabs including Sample, Comparison and all data type filter tabs. 

If the user does not want to apply the filters, simply click Clear All Filters button to reset everything:

[New Feature] Geneset Analysis Updates: Enrichment Score, Volcano Plot and Summary Bar Plot

Vivian Zhang

Early this fall, Omicsoft released the new Geneset Analysis functonality (See webinar  Announcing GeneSet Analysis Functionality, integrated with Omicsoft’s Land databases and blog post Geneset Analysis Functionality: Integrated With Omicsoft Land Databases.). It helps users to identify comparisons containing similar gene set enrichment from both tens of thousands of gene sets in the Lands as well as customer gene sets, with directional results. Geneset Analysis is under active development, and we would like to update you with a few new features since its release.

GeneSet Analysis result

GeneSet Analysis result

 

Gene Set Enrichment Analysis Report
 

The Geneset Enrichment Analysis Report reports p-value, enrichment score, direction of enrichment and other annotation information:

Geneset Enrichment Analysis Report

Geneset Enrichment Analysis Report

 

Enrichment Volcano Plot

 

Enrichment Volcano Plot is a plot of Enrichment Score vs P-Value. The Enrichment score for the gene set is the degree to which this gene set is overrepresented at the top or bottom of the ranked list of genes in the comparisons. The plot helps to visualize potential gene sets of interest to further research on, with indication of enrichment directions. 

Enrichment Volcano Plot

Enrichment Volcano Plot

Summary Bar Plot

The Summary Bar Plot helps to visualize the number of overlapped genes and dynamically links to those genes with details in details window.

Summary Bar Plot

Summary Bar Plot

 

If you an Omicsoft Land customer, give it a try with the latest Array Suite version. Let us know any comments or suggestions you have!

[Land Tutorial] Getting Started with OncoLand

Vivian Zhang

OncoLand is an Oncology database and visualization software that helps users explore public and private cancer genomics datasets. It contains tens of thousands of carefully processed and curated oncology -Omic data samples. OmicSoft uses the Land framework to deliver an increasing number of large datasets, including data types such as RNA-Seq, DNA-Seq, miRNA-Seq, Copy Number Variation, Gene Expression Chip, Protein Expression, Methylation and hundreds of clinical measurements. 

Omicsoft contains data from more than 10 large public dataset, including TCGA, CCLE, CGCI, ICGC, TARGET, Multiple Myeloma, GTEx, Blueprint and more. In this blog, we will introduce our data content based on our video tutorials: Getting Started With OncoLand

For more details about Land content, please refer to our NEW wiki pages: Introduction to TCGA Land Content and Introduction to CCLE Land Content

A first look at OncoLand

Most our OncoLand users are likely to be familiar with our Land interface. After selecting Land, you are likely to see the graphical interface similar to the following:

Example TCGA_B37 default view, displaying Sample Distribution view. 

Example TCGA_B37 default view, displaying Sample Distribution view. 

 

TCGALand Introduction and Overview

TCGA, The Cancer Genome Atlas, is a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of genome analysis technologies. TCGALand is OncoLand's signature Land, it contains RNA-Seq, Expression Array, DNA-Seq, CNV, Methylation, and Protein data from more than 30 tumor types. 

TCGALand Sample Distribution across Tumor Type.

TCGALand Sample Distribution across Tumor Type.

TCGALand provides table and figure views on the sample, gene and clinical data level. We will introduce genomic data views in the following article, or you can refer to our video tutorials: Getting Started With OncoLand. Here, we would like to highlight clinical data views, which is introduced in the TCGALand Introduction and Overview video clip.

Clinical Significance - Group Association is a dynamic view showing the association of all clinical variables with the selected grouping variable. It quickly provides insights on which clinical variables are potentially associated with the selected grouping variable. 

Clinical Association for TCGALand Tumor Type.

Clinical Association for TCGALand Tumor Type.

Another useful view is Survival View. It plots survival rate over time for selected grouping variables

TCGALand Survival Plot by Tumor Type.

TCGALand Survival Plot by Tumor Type.

 

CCLELand Introduction and Overview

The Cancer Cell Line Encyclopedia (CCLE) project is an effort to conduct a detailed genetic characterization of a large panel of human cancer cell lines. CCLE provides public access analysis and visualization of DNA copy number, mRNA expression, mutation data and more, for 1000 cancer cell lines. CCLELand groups data according to Primary Site (Tissue), with histology as the secondary grouping.

CCLELand Primary Grouping by Primary Site, instead of Tumor Type.

CCLELand Primary Grouping by Primary Site, instead of Tumor Type.

 

Stay tuned for more on OncoLand! 

[Omic Data Analysis Tutorial] Microarray Data Visualization, Statistical Inference and Pattern Discovery

Vivian Zhang

 

No matter if you're dealing with  microarray or RNA-seq data with calculated FPKM or read counts, it is important to perform downstream analysis to make sense of the data and identify interesting data patterns, samples, genes or proteins. In this article, we will introduce some commonly-used visualization and statistical analysis functions that are covered in the second half of our Microarray Analysis video tutorials:

 

 

Visualize Data with Array Studio Views

-Omic Data are read-only data constructs. The most common way to explore -Omic data is to add "Views" onto your data, including a "table" view to directly visualize the numerical data values or a "chart" view, such as the Variable view and Scatter plot. 

1 The Table View

The most common way to look at your -Omic data is with the Table View. Although it looks like a standard spreadsheet, the Table View is actually a visualization of your underlying data. It is dynamically connected to the attached annotation and design metadata, and can be sorted and filtered without worry of altering the underlying data. Array Studio is able to easily handle millions of rows and columns in the Table View .

Example functions introduced in the video tutorial will allow you to:

  • Sort and Filter Table Views 
  • Display context-specific details from metadata 
  • Convert read-only -Omic data to editable Table data 
  • Log2-transform your expression data 
  • Link to publish databases through Web Details On-Demand
  • Visualize distribution of expression values with Kernel Density 
Example table view of microarray data. The details window display the data details for selected probe sets.

Example table view of microarray data. The details window display the data details for selected probe sets.

 

2 Adding Additional Views: The Variable View and Scatter Plot

Depending on the contents of your -Omic data or table, Array Studio has about 40 views to interactively display your data. This video clip briefly walks through some of the more popular Views for Gene-level data; the Variable View and Pairwise Scatter Plot.

Array Studio not only provides dozens of views depending on the content of data, but also allows user to tailor the visualizations to the user's preferred method. Some commonly used views for microarray data include BoxPlot, ScatterView, VariableView and VennDiagramView. The example chart is fine-tuned from variable view into violin plot grouped by time and treatment.

Array Studio not only provides dozens of views depending on the content of data, but also allows user to tailor the visualizations to the user's preferred method. Some commonly used views for microarray data include BoxPlot, ScatterView, VariableView and VennDiagramView. The example chart is fine-tuned from variable view into violin plot grouped by time and treatment.

 

 

Statistical Inference and Pattern Discovery

 Hierarchical Clustering and Pattern Matching to identify similar Gene Expression Dynamics

Gene expression data can be grouped using Hierarchical Clustering by Variables (e.g. genes) and Observations (e.g. samples) to reveal associations in your data.

In additional to visualizing the overall clustering pattern, you can also search datasets for variables/observations with similar patterns to your variable/observation of interest through Find Neighbors. You can display these comparisons in multiple ways, including pairwise correlation/MA plots, heatmaps, and 3D scatter plots.

Probes with similar pattern to probe 1371785_at are detected through Find Neighbors module. After a list of "neighbor" probes created, users can visualize the data pattern among those probes through pairwise correlation plots or 3D scatter plots. 

Probes with similar pattern to probe 1371785_at are detected through Find Neighbors module. After a list of "neighbor" probes created, users can visualize the data pattern among those probes through pairwise correlation plots or 3D scatter plots. 

 

Discover Differentially-Expressed Genes by ANOVA

The One-Way ANOVA is used to research the effects of a single factor, while Two-Way ANOVA can be used to research the effects of two factors on expression data. This model generates an inference report, including automatically generated Report View and VolcanoPlotView. Additionally, the Venn Diagram and Inference Report Summary can help to quickly visualize the deferentially expressed genes.

Inference Report Summary and Venn Diagram help to quickly research significant genes and compare across groups. 

Inference Report Summary and Venn Diagram help to quickly research significant genes and compare across groups. 

 

Identify Enriched Gene Ontology Terms

If you are interested in discovering pathways or functionally related genes that are enriched in your data, you can run the Gene Ontology (GO) module. This module will perform built-in gene ontology classification on one or more significant lists. Once you generate a list of significant variables, Array Studio can go through all possible GO terms (across different class levels) to see how many variables in the list are covered by the GO terms. You can infer different biological attributes (such as functions, corresponding biological process) of the variables in the list. 

Example table results. Each Category lists a GO Term (with a link to the Gene Ontology website), as well as the number of hits for that category in a particular list (The column name is the list name). A corresponding p-values can also be generated.

Example table results. Each Category lists a GO Term (with a link to the Gene Ontology website), as well as the number of hits for that category in a particular list (The column name is the list name). A corresponding p-values can also be generated.

[Omic Data Analysis Tutorial] Getting Started with Microarray Data Analysis

Vivian Zhang

In bioinformatics research, there are many different data sources, including microarray, sequence data, CNV data, ChIP-chip data, genotype data, etc. In Array Studio, we divide genomic data into two groups, -Omic data and Table data. First, -Omic data, which is basically a data matrix with annotation for both columns and rows. Microarray data is a standard example of -Omic data. The microarray tutorial is a great starting point for new users of Array Studio, whether or not you will be working directly with microarray data. In this article, we will cover Getting Started with Array Studio Microarray Analysis on microarray analyses basics.

 

1 Getting Started with Array Studio Microarray Analysis

When Array Studio is first installed, it will look similar to below. Array Studio organizes projects in the Solution Explorer. Any generated data or figure can be displayed in the middle of the window, while a Legend and Filter window appears on the right side of the window.

 

 

After you create a new project, Array Studio will guide you through importing your expression microarray datasets. Three data types, OMIC measurement data table, design table and annotation table are the basic -Omic data types.

After importing data and downstream analysis, Array Studio organizes data in four main data types: List Data, Table Data, -Omic Data and NGS Data in a project.  -Omic data is read only table data with annotation and design tables attached (these can be modified). -Omic data and table data can be converted from one type to the other.

Array Studio provides several methods to reproduce analysis steps. Omicsoft scripts (Oscript) for analysis functions can be viewed in every function window, by right-clicking on an object name, or by viewing the full Audit Trail. Array Studio tracks all analysis steps done in a project, using its Audit Trail feature. It is important for data integrity needs, and for individual users to track the changes and reproduce the procedures.

 

2 Preparing Your Data for Downstream Analysis

Before downstream analysis, Array Studio contains modules to identify samples that deviate significantly from the rest of the data set, possibly indicating a failed sample that should be excluded from downstream analysis. 

QC by PCA and Removing Failed Samples

Principal Component Analysis (PCA) can identify variance in data sets, which can come from real differences between sample groups, or it can come from a failed microarray chip. Failed experiments can quickly be removed from your -Omic data objects for downstream analysis.

 

 

QC by Correlation of Expression

Array Studio can identify samples that deviate significantly from others in your data set, by calculating the correlation coefficient of each gene/probeset. Samples that correlate unusually poorly will be flagged as possible failed samples, and can be excluded from downstream analysis.

For step-by-step instructions, please check out our video tutorial: Getting Started with Array Studio Microarray Analysis

[Array Studio Video Tutorial] RNA-Seq Advanced Analysis

Vivian Zhang

Finding genes or transcripts that are differentially expressed among different conditions is an important analysis step in understanding the functions of genetic variants. Array Studio contains a number of different modules for performing univariate analysis/differential expression, including One-Way ANOVA, Two-Way ANOVA, and the more advanced General Linear Model, as well as a few others. Statistical inference can be performed on your feature-level data, whether it was quantified in Array Studio or imported from external programs. In this article, we will introduce popular methods of Advanced Analysis of RNA-seq data

 

 

1 ANOVA on RNA-seq Data

A One-Way ANOVA is used to research the effects of a single factor, while Two-Way ANOVAs can be used to research the effects of two factors on expression data.  For example, if a user has an experiment with factors for time and treatment, this model can be quickly used to generate results (including fold changes, estimates, raw and adjusted p-values, LSMeans, and Estimate data). By selecting factor 1 and factor 2, and then the level to compare to, Array Studio will automatically create the comparisons and model for the user. This model generates an inference report, including automatically generated Report View and VolcanoPlotView:

 

2 DESeq on RNA-seq Data

For RNA-Seq, read count is a good estimate of the abundance of the target transcript. Thus, it is of great interest to compare read counts between different conditions. The DESeq GLM test is a powerful tool for inferring differential expression of genes/transcripts from raw count data. It allows the user to model the data using a linear model and test for differential expression using negative binomial distribution. The function should perform similarly to the DESeq R packageDESeq only works on raw counts of sequencing reads (with no additional background reads added to the dataset). After running the test, a report table is generated along with a scatter plot. A volcano plot will be generated as well, similarly as in the ANOVA analysis. For more details on how the DESeq method works and more functions, check out the DESeq R manual

 

3 Identifying Differential Usage of Isoforms

ArrayStudio uses a straightforward approach to identifying genes with differential transcript usage between groups. This function allows user to identify diferentially expressed isoforms between comparisons. Based on transcript level data, either RPKM, FPKM or Count data, the function convert the expression values to ratios, dividing the value of each transcript by sum of all transcripts in the same gene. The highest ranking p-value reflects the largest difference in relative transcript usage.

Differentially expressed isoforms report sorted by p-value. The user can directly visualized the difference in transcript usage in genome browser.

Differentially expressed isoforms report sorted by p-value. The user can directly visualized the difference in transcript usage in genome browser.

Genome browser view can display exon junction reads. As it is shown, in lung, only 37 reads span certain junction but 11000 reads span the same junction in skin.

Genome browser view can display exon junction reads. As it is shown, in lung, only 37 reads span certain junction but 11000 reads span the same junction in skin.

 

For how to achieve the above results, please check out our video tutorial: Advanced Analysis of RNA-seq data

[Array Studio Video Tutorial] RNA-Seq Downstream Analysis: Normalization, Visualization and Data Integration

Vivian Zhang

After aligning data, there are a number of downstream analyses that can be done. For instance, the generated RPKM (or FPKM) dataset can be used, as Microarray Data, for clustering (log2 transformation may be necessary). Count data can be used to look for changes between groups of samples through DESeq analysis. A large number of visualization and QC functions are available to analyze feature-level RNA-seq data in Array Studio. In this article, we will introduce our video tutorials on RNA-Seq Downstream Analysis 

 

 

 

1 Normalizing and Transforming RNA-seq Data for MicroArray-type analysis

Array Studio has a large number of modules originally designed for Gene Expression MicroArray analysis, but these modules are also useful for analyzing feature-level (e.g. gene-level, exon-level) RNA-seq data. However, many of these modules expect normalized and log-transformed input data. Array Studio provides a number of methods for normalizing RNA-Seq data, including Log Geometric Mean, Mean, Median, Quantile, TMM (edgeR), TotalCount, RPKM to TPM, UpperQuartile, and LandNormalization. Array Studio also provides methods for normalizing and transforming -Omic data. 

 

2 Attach new Views to Data

In Array Studio, data can be directly viewed in tables, but can also be displayed in up to 40 Views, depending on the contents of the underlying data. Array Studio features the very powerful Variable View, among it's most popular views:

The Variable View allows the user to visualize one chart for each variable in the dataset. The example variable view shows the Log 2 FPKM values for gene CLDM18, categorized by tissue and gender.

The Variable View allows the user to visualize one chart for each variable in the dataset. The example variable view shows the Log 2 FPKM values for gene CLDM18, categorized by tissue and gender.

 

3 Principal Component Analysis on normalized expression data

Principal Component Analysis (PCA) is an effective tool to group data by components that contribute to the greatest variance in the dataset. In other words, PCA can group your data based on variance, which should reflect differences between samples. Outliers (such as failed samples) will often appear as outliers. 

Both 2D and 3D PCA plots are commonly used to group data or identify outliers. 

Both 2D and 3D PCA plots are commonly used to group data or identify outliers. 

 

4 Hierarchical Clustering of normalized expression data

Gene expression data can be grouped by Hierarchical Clustering by Variables (e.g. genes) and Observations (e.g. samples) to reveal associations in your data. Array Studio can easily handle Hierarchical Clustering of up to 20000 variables, far more than the capacity of many popular gene clustering programs.

Classic dendrogram is an older version of dendrogram. The new version is more interactive and provides more gene annotation information for downstream analysis. 

Classic dendrogram is an older version of dendrogram. The new version is more interactive and provides more gene annotation information for downstream analysis. 

 

5 RNAseq-MicroArray Integration

Feature-level (genes, transcripts, etc.) results from RNA-seq experiments can directly be compared to microarray data from the same samples, using the Microarray-Microarray Integration module. This module allows the user to create a duplex matrix (two values for each variable in the dataset) for two “microarray” data types. The resulting dataset can also contain correlation information for each variable, making it easy to figure out which variables correlate well between datasets.

Microarray-microarray integration module provides variable views on gene and sample level showing how well microarray and RNA-seq data correlate. 

Microarray-microarray integration module provides variable views on gene and sample level showing how well microarray and RNA-seq data correlate. 

 

To learn how to perform these downstream analysis on RNA-seq data, please check out our video tutorials on RNA-Seq Downstream Analysis 

[New Feature] Geneset Analysis Functionality: integrated with Omicsoft Land databases

Vivian Zhang

Gene Set Analysis is a powerful tool to help users who have their own gene signatures and would like to identify comparisons or other signatures containing similar gene set enrichment from both tens of thousands of comparisons in the Lands as well as customer gene sets for on-premises customers. Recently, Omicsoft officially released our new GeneSet Analysis function. For more details, check out our webinar recording Announcing GeneSet Analysis Functionality, integrated with Omicsoft’s Land databases presented by Matt Newman, VP of Business Development at Omicsoft on September 28th, 2016. 

Previously, Omicsoft's Land system offered a simplified GeneSet Enrichment Analysis. It allowed users to compare their own gene sets with those contained in the Lands: 

Although this was powerful enough to identify comparisons with similar gene sets:

1. it was restricted within a specific Land of choice and not shared across Lands

2. it did not take directionality into account

3. it was not able to include other genesets beyond Land data as target gene sets 

4. it required the user to be familiar with the Land system, and not just the analysis sub-system of Array Suite.

Even though Omicsoft's Array Studio also provides a Molecular Signature module that allows users to compare to Broad's molecular signature database, the Molecular Signature module also does not take directionality into account and requires user to add straight lists to Array Studio Projects, with no ability to incorporate inference reports, nor any of the important data stored within the Lands or easily incorporate customer Gene Sets.

 

In order to more fully leverage Omicsoft's data assets, we have officially released our new GeneSet Analysis module. The new GeneSet Analysis allows the users to query across OncoLand, DiseaseLand, Molecular Signatures, and more. 

GeneSet Analysis Wizard

GeneSet Analysis Wizard

In addition to the geneset databases included, the new GeneSet Analysis also provides directional results -- up and down p-values and directions.

GeneSet Analysis result

GeneSet Analysis result

We are still in active development of the GeneSet Analysis module, constantly improving our content, functions and visualizations. Here are a couple examples we are working on:

1. Multi-species data support in addition to human and mouse data

2. Additional visualizations based on table results

If you have any comments or suggests, please let us know. 

 

Want to give it a try? Please check out our latest webinar Announcing GeneSet Analysis Functionality, integrated with Omicsoft’s Land databases and our GeneSet Analysis wiki for detailed illustration. 

 

 

[Array Studio Video Tutorial] RNA-Seq Analysis Basic functions: Reads Quantification, Exon Junction and Gene Fusion Detection

Vivian Zhang

RNA-Seq has become one of the most popular methods in gene and transcript level genomic research. It could help quantify gene and transcript expression, identify sequence variants and detect gene, transcript or exon level genomic events. Array Studio provides a variety of functions powerful enough for small and large scale genomic research. In this article, we will introduce a few basic and the most commonly used functions, including sequence quantification, gene annotation, exon junction detection and gene fusion detection. 

 

 

ArrayStudio provides a number of modules and options for RNA-Seq quantification at gene, transcript, exon and exon junction levels. Both FPKM and Count tables can be generated. 

Example RNA-seq gene count table and its corresponding design table.

Example RNA-seq gene count table and its corresponding design table.

 

Alternative splicing has been shown to play an important role in a number of human diseases, including cancer, cardiovascular and neurodegenerative diseases. In Omicsoft Array Studio and the Land products, we provide modules and visualization functions that make it easier for users to research splicing. In RNA-Seq analysis, besides gene and transcript counts, Array Studio can report exon junction counts as well. Results can be visualized in Omicsoft's Genome Browser.

Exon junction report and genome browser view.

Exon junction report and genome browser view.

Mutation data allows user to compare mutation frequencies and research individual variants. Users can run the  Summarize Variant Data module to annotate variants. Variants can be annotated in Mutation Reports or VCF files, and visualized directly in the Genome Browser.

Mutation annotation report and example genome browser view of variant V600E. 

Mutation annotation report and example genome browser view of variant V600E. 

 

 

Fusion genes can play an important role in cancer mutations that have multiple effects on a target gene. At Omicsoft, we provide a powerful fusion detection algorithm in FusionMap. FusionMap identifies unmapped reads that span multiple genomic locations, indicating possible gene fusion events:

Map Fusion Reads module will detect fusion genes from fusion junction-spanning reads which can characterize fusion genes at base pair resolution. This works with single end or paired end data. Combined Fusion Analysis will run fusion junction spanning + inter-transcript fusion read pairs detection at the same time. It detects fusion junction spanning reads from unmapped reads in BAM files, and detects inter-transcript fusion read pairs from singletons from BAM alignment entries. It will return a report showing potential fusion genes and counts for each fusion junction  Combined fusion analysis can only be run on paired-end data. 

Fusion report reports fusion count data with fusion annotation information attached. Fusion genome browser can display sequence information at base pair resolution. 

Fusion report reports fusion count data with fusion annotation information attached. Fusion genome browser can display sequence information at base pair resolution

 

 

[Land Update] Omicsoft OncoLand 2016 Q2 Update

Vivian Zhang

We've reached the time for our OncoLand Quarterly Update, and we're excited about what we have to tell you about!

In our Q1 2016 release following our kick-off User Group Meeting, we had a major update to the Lands including CCLE_B37, CGCI_B37, , Hematology_B37, ICGC_B37, OncoGEO_B37, TARGET_B37, TCGA_B37, and TumorMutation_B37, and the addition of two new lands, ClinicalOutcome_B37 and expO_B37. In the Q2 update, we provided update for Hematology_B37, ICGC_B37, TCGA_B37 and OncoGEO_B37. 

Here is  the sample statistics for updated Lands. For details, please refer to OncoLand 2016 Q2 Release Whitepaper.

 

Hematology_B37

•    60 samples (two cell lines under different conditions) with RNA-Seq data; based on SRA SRP041036
•    5484 samples with Affymetrix (U133 Plus 2.0) expression data; based on GEO GSE15695, GSE19784, GS6891, GSE12417, GSE13159, GSE17855 and MMGP
•    767 samples with CNV data; based on GEO, MMRC Collection, HMCL69 cell line and Corral2012 study
•    203 samples with DNA-Seq somatic mutation data; based on MMRC Reference Collection
•    68 samples with DNA-Seq mutation data; based on HMCL69 cell line collection

 

ICGC_B37

•    577 samples with RNA-Seq data
•    779 samples with Methylation450 data
•    5587 samples with DNA-Seq Somatic Mutation da
•    2869 samples with CNV data

 

OncoGEO_B37

•    2001 samples with RNA-Seq data
•    4786 samples with expression data

 

TCGA_B37

•    22301 samples with CNV data
•    9677 samples with DNA-Seq Somatic Mutation data
•    2377 samples with Expression Ratio (Agilent) data
•    9793 samples with Methylation450 data
•    11022 samples with miRNA-Seq data
•    7933 samples with RPPA (protein array) data
•    4735 samples with RPPA_RBN (protein array) data
•    11291 samples with RNA-Seq data

 

Most users should have already been contacted about this release update, and if not, we will work with you to update your servers in the near future.