PRICING & INQUIRIES

For pricing and inquiries, send an email to sales@omicsoft.com.

5001 Weston Parkway, Suite 201
Cary, NC 27513
US

888-259-6642

Overview

Omicsoft is the leading provider of Next Generation Sequencing, Cancer Genomics, Immunology, and Bioinformatics solutions for Next Generation Sequencing Data and Gene Expression Analysis.

Exciting Updates and Latest News

Keeping you up-to-date with the latest in NGS, Bioinformatics Analysis, and cancer genomics with blogs on Array Suite, OncoLand (TCGA and more), ImmunoLand, and more.

Filtering by Tag: Land

[Land Update] Next Generation Of OmicSoft Lands on AWS Cloud

Vivian Zhang

In April 2017, Gary Ge from OmicSoft presented the webinar, Next Generation Of OmicSoft Lands on AWS Cloud, which described OmicSoft's transition into a cloud-based Land system. For those who missed the webinar, please watch the recording here, or read through this article on how our new Land technology may improve our service.

OmicSoft’s Land technology has enabled collection and management of large public data sets in curated knowledge bases in the fields of cancer genomics (OncoLand), cardiovascular, metabolic and immunology (DiseaseLand), as well as genetic research (GeneticsLand). With more data being curated daily, and more users requesting content faster, we have been focused on creating a better solution for public Land delivery.

With the rapid growth of our Land database, we now provide 32 Lands including OncoLand and DiseaseLand to customers. Previously, we delivered and updated all content of approximately 1.5TB data each quarter. The delivery often takes 1 to 5 days, and requires server-based parallel publishing, which takes a lot effort for both OmicSoft and company IT/OmicSoft product administrators.

In 2014, OmicSoft released Studio on the Cloud, and continues to improve its cloud implementation since inception. Studio on the Cloud allows users to seamlessly run all Array Studio analytics from Amazon, combining the storage of S3 (Amazon Simple Storage Service) with the analytical power of EC2 (Amazon Elastic Compute Cloud). Omicsoft has seen an increasing number of clients that implement mixed mode solutions (cloud solution in addition to their SGE/PBS/LSF cluster). 

With new technology breakthroughs, OmicSoft now offer the cloud-based Land. Our cloud Land designe enables Land streaming from Amazon AWS. It makes Land data delivery much easier. Here is a comparison on Land delivery performance:

 

The design has the following features: 

• 10x performance improvement for dynamic query
• Stream to client’s ArrayServer with server cache
• Quick land delivery with minimal local storage footprint
• Faster future content updates
• Potential On-demand content updates (particularly for DiseaseLand) 
• Allowing virtual land of public cloud lands and local server internal lands

To date, most of our clients have chosen to switch to cloud-based Land delivery. Please speak to the OmicSoft support team, or your company administrator to understand how cloud-based Land delivery can benefit your research.

[Land Tutorial] Getting Started With DiseaseLand (ImmunoLand And CVMLand) II

Vivian Zhang

Last time, we introduced  some of the basic views in DiseaseLand. In this article, we would like to introduce you to some advanced functionality: SampleSets and GeneSets. 

 

The SampleSet is powerful concept/tool that allow users to create custom sample groupings, based on data in the Land or imported tables. This video tutorial demonstrates several ways to build a SampleSet from data using selection and filters, then uses SampleSets in Land Analytics to scan the entire DiseaseLand to discover differential splicing.

Grouping lesional vs non-lesional Psoriasis samples in ImmunoLand (DiseaseLand). 

Grouping lesional vs non-lesional Psoriasis samples in ImmunoLand (DiseaseLand). 

Serpinb7 Transcript Expression grouped by lesional vs. non-lesional samples. 

Serpinb7 Transcript Expression grouped by lesional vs. non-lesional samples. 

With the newly created sampleset, users can visualize isoform differential expression at the transcript level in the Genome Browser:

Differential expression of Serpinb 7 in lesional vs. non-lesional samples.

Differential expression of Serpinb 7 in lesional vs. non-lesional samples.

3. GeneSets

Besides the SampleSet tool, GeneSets are a powerful tool for grouping and comparing genes, including members of a gene family, a pathway, or co-regulated genes. These GeneSets can be used to discover DiseaseLand studies that share "genetic signatures" of common up- or down-regulated genes with your GeneSet.

After adding GeneSet Some_IBD_Genes, the geneset becomes available for search.

After adding GeneSet Some_IBD_Genes, the geneset becomes available for search.

With the created geneset, users can visualize and perform analyses on the set of genes. 

Heatmap of Comparison for GeneSet Some_IBD_Genes.

Heatmap of Comparison for GeneSet Some_IBD_Genes.

 

Stay tuned for a DiseaseLand comparison views tutorial!

[Land Tutorial] Getting Started With DiseaseLand (ImmunoLand And CVMLand)

Vivian Zhang

 

Starting with the Q2 2016 release, Omicsoft replaced ImmunoLand and CVMLand with DiseaseLand in its Land user interface.  DiseaseLand focuses on datasets of common genetic disease including but not limited to immunological diseases, neurological disorders, metabolic diseases and cardiovascular diseases. 

Relatively new to customers, DiseaseLand has been gaining popularity among our prestigious client companies. For a quick review of DiseaseLand content, please check out our wiki article Introduction to DiseaseLand Content. Today, we would like to introduce to you our ImmunoLand video tutorials to help you quickly get started with DiseaseLand.

 

 

DiseaseLand is similar to OncoLand at large. If you are already an OncoLand user, DiseaseLand has most of the advanced analytics functions as covered in OncoLand trainings. 

Besides sample, gene and variable views, DiseaseLand features comparison views. It allows users to easily search and visualize data using common queries: Treated vs Control, Disease vs Normal, Responder vs Non-Responder etc. Comparison statistics are also available when taking a first look at DiseaseLand:

Comparison Distribution View. Comparison Details and Project Details are available once the users select a group of samples. For examples, Comparison Details include information on comparison test method and comparison category as highlighted in the figure.     

Comparison Distribution View. Comparison Details and Project Details are available once the users select a group of samples. For examples, Comparison Details include information on comparison test method and comparison category as highlighted in the figure. 

 

 

At present, DiseaseLand data is primarily focused on gene expression from microarrays and NGS studies. The user can search for a gene of interest and narrow down to find interesting projects interactively. The default view for DiseaseLand is Disease vs Normal Comparison. By selecting the comparison of interest, sample details of the comparison will be displayed in the Details Window. More details on gene, probe and project level are also available. 

 

 

Experimental designs in projects within DiseaseLand are quite different, and batch effects in microarray projects are difficult to remove. Omicsoft created project-specific views to display expression values based on experimental design within each project. Expression Intensity Project View provides log 2 expression intensity values. The users can easily filter their project of interest, or any other filters like disease, clinical details, or even project contributors.

Expression Intensity Project View of Project GSE14580.

Expression Intensity Project View of Project GSE14580.

 

 

To maximize inter-study comparisons of RNA-seq data in Diseaseland, Omicsoft processes data from each study, starting from fastq files, through a commmon pipeline. Expression values from RNA-Seq studies are expressed as FPKM values, with upper quantile normalization. DiseaseLand offer project-specific views and also a merged view from all samples. Views display log transformed FPKM values. Samples and projects can be filtered interactively to allow exploration of data.

Gene FPKM View of all psoriasis samples in DiseaseLand HumanDisease. Samples can be categorized by project name (Change Symbol Properties) and clicking each project name selects the samples from the project to display the details in Detailed Window.

Gene FPKM View of all psoriasis samples in DiseaseLand HumanDisease. Samples can be categorized by project name (Change Symbol Properties) and clicking each project name selects the samples from the project to display the details in Detailed Window.

 

Stay tuned for more DiseaseLand tutorials!

[Important Land Update] Land Filter Now Carries Over Across Multiple Searches

Vivian Zhang

Omicsoft's Lands are known for being comprehensive, powerful and integrated, allowing users to navigate across samples, genes, data types, datasets and platforms. As comprehensive and flexible as it can be, the system may appear to be complex for some users, with growing numbers of samples, datasets and data types. To help user apply filters more easily and efficiently, Omicsoft recently improved its filtering logic in the Lands. 

Previously, filters applied to one search do not carry over from the main Land tab, requiring users to apply filters all over again for any new search.

For example, if the user wants to compare gene expression FPKM for EGFR in KIRC (Kidney Renal Clear Cell Carcinoma) and KIRP (Kidney Renal Papillary Cell Carcinoma), the first step might be to filter the tumor types in the TCGA_B37 main tab (to see sample numbers and understand the overall distribution of samples). Next, the user can search for EGFR and go to Gene FPKM view (Step 2). If the user wants to see the gene expression of TP53, previously the Land doesn't carry over the filter and the user needs to redo the filter again (Step 3).

Step 1  filter for Tumor Type KIRC and KIRP, and check sample statistics (data availability in this example search):

Step 2 search for Gene EGFR and view Gene FPKM view:

Step 3 search for Gene TP53, the sample filter does not carry over and the user has to redo the filter all over again:

 

Now with the new version, the filter carries  over without any additional steps:

 

Imagine when one has already filtered many steps and navigated to a group of samples/genes that appear intriguing, how easy and time-saving it becomes to directly have all the filter steps applied to the new search. 

This filter logic applies to all left-hand side filter tabs including Sample, Comparison and all data type filter tabs. 

If the user does not want to apply the filters, simply click Clear All Filters button to reset everything:

[New Feature] Geneset Analysis Functionality: integrated with Omicsoft Land databases

Vivian Zhang

Gene Set Analysis is a powerful tool to help users who have their own gene signatures and would like to identify comparisons or other signatures containing similar gene set enrichment from both tens of thousands of comparisons in the Lands as well as customer gene sets for on-premises customers. Recently, Omicsoft officially released our new GeneSet Analysis function. For more details, check out our webinar recording Announcing GeneSet Analysis Functionality, integrated with Omicsoft’s Land databases presented by Matt Newman, VP of Business Development at Omicsoft on September 28th, 2016. 

Previously, Omicsoft's Land system offered a simplified GeneSet Enrichment Analysis. It allowed users to compare their own gene sets with those contained in the Lands: 

Although this was powerful enough to identify comparisons with similar gene sets:

1. it was restricted within a specific Land of choice and not shared across Lands

2. it did not take directionality into account

3. it was not able to include other genesets beyond Land data as target gene sets 

4. it required the user to be familiar with the Land system, and not just the analysis sub-system of Array Suite.

Even though Omicsoft's Array Studio also provides a Molecular Signature module that allows users to compare to Broad's molecular signature database, the Molecular Signature module also does not take directionality into account and requires user to add straight lists to Array Studio Projects, with no ability to incorporate inference reports, nor any of the important data stored within the Lands or easily incorporate customer Gene Sets.

 

In order to more fully leverage Omicsoft's data assets, we have officially released our new GeneSet Analysis module. The new GeneSet Analysis allows the users to query across OncoLand, DiseaseLand, Molecular Signatures, and more. 

GeneSet Analysis Wizard

GeneSet Analysis Wizard

In addition to the geneset databases included, the new GeneSet Analysis also provides directional results -- up and down p-values and directions.

GeneSet Analysis result

GeneSet Analysis result

We are still in active development of the GeneSet Analysis module, constantly improving our content, functions and visualizations. Here are a couple examples we are working on:

1. Multi-species data support in addition to human and mouse data

2. Additional visualizations based on table results

If you have any comments or suggests, please let us know. 

 

Want to give it a try? Please check out our latest webinar Announcing GeneSet Analysis Functionality, integrated with Omicsoft’s Land databases and our GeneSet Analysis wiki for detailed illustration. 

 

 

[Land Update] Omicsoft OncoLand 2016 Q2 Update

Vivian Zhang

We've reached the time for our OncoLand Quarterly Update, and we're excited about what we have to tell you about!

In our Q1 2016 release following our kick-off User Group Meeting, we had a major update to the Lands including CCLE_B37, CGCI_B37, , Hematology_B37, ICGC_B37, OncoGEO_B37, TARGET_B37, TCGA_B37, and TumorMutation_B37, and the addition of two new lands, ClinicalOutcome_B37 and expO_B37. In the Q2 update, we provided update for Hematology_B37, ICGC_B37, TCGA_B37 and OncoGEO_B37. 

Here is  the sample statistics for updated Lands. For details, please refer to OncoLand 2016 Q2 Release Whitepaper.

 

Hematology_B37

•    60 samples (two cell lines under different conditions) with RNA-Seq data; based on SRA SRP041036
•    5484 samples with Affymetrix (U133 Plus 2.0) expression data; based on GEO GSE15695, GSE19784, GS6891, GSE12417, GSE13159, GSE17855 and MMGP
•    767 samples with CNV data; based on GEO, MMRC Collection, HMCL69 cell line and Corral2012 study
•    203 samples with DNA-Seq somatic mutation data; based on MMRC Reference Collection
•    68 samples with DNA-Seq mutation data; based on HMCL69 cell line collection

 

ICGC_B37

•    577 samples with RNA-Seq data
•    779 samples with Methylation450 data
•    5587 samples with DNA-Seq Somatic Mutation da
•    2869 samples with CNV data

 

OncoGEO_B37

•    2001 samples with RNA-Seq data
•    4786 samples with expression data

 

TCGA_B37

•    22301 samples with CNV data
•    9677 samples with DNA-Seq Somatic Mutation data
•    2377 samples with Expression Ratio (Agilent) data
•    9793 samples with Methylation450 data
•    11022 samples with miRNA-Seq data
•    7933 samples with RPPA (protein array) data
•    4735 samples with RPPA_RBN (protein array) data
•    11291 samples with RNA-Seq data

 

Most users should have already been contacted about this release update, and if not, we will work with you to update your servers in the near future.

 

[OncoLand Case Study] Empower OncoLand with Array Studio Analysis: Visualize "mutation burden" in each tumor in TCGALand

Vivian Zhang

One of the common goals in cancer research is identification of genes or samples with mutations that occur during tumor development. The number of identified mutations in cancer samples can vary wildly, but some tumors tend to aggregate widespread alterations. This Nature paper about the mutation landscape and significance across 12 major cancer types (as part of the TCGA Pan-Cancer effort) is a good example. In the very first figure, the authors investigated the mutation frequencies of six transition (Ti) and transversion (Tv) categories for each cancer type:

Figure 1: Mutation frequencies, spectra and contexts across 12 cancer types.   Kandoth, Cyriac, et al. "Mutational landscape and significance across 12 major cancer types."   Nature   502.7471 (2013): 333-339.

Figure 1: Mutation frequencies, spectra and contexts across 12 cancer types. Kandoth, Cyriac, et al. "Mutational landscape and significance across 12 major cancer types." Nature 502.7471 (2013): 333-339.

In another recent Nature paper, Whole-genome mutational burden analysis of three pluripotency induction methods, the authors researched mutational subtypes in each sample:

Figure 2: Characterization of variants caused by reprogramming method.   Bhutani, Kunal, et al. "Whole-genome mutational burden analysis of three pluripotency induction methods."   Nature communications   7 (2016). 

Figure 2: Characterization of variants caused by reprogramming method. Bhutani, Kunal, et al. "Whole-genome mutational burden analysis of three pluripotency induction methods." Nature communications 7 (2016). 

Using OncoLand, you can easily calculate and visualize total mutation burden of every sample or tumor type. Check out this OncoLand case study: Visualize "mutation burden" of each tumor in TCGALand

1. Calculate total mutation burden of every sample in TCGALand.

To calculate the number of total mutations per tumor sample (mutation burden), you can simply use Summarize Sample Mutation Count under Analytics tab in Land. By specifying the individual nucleotide changes, for example "A->C", the result will calculate the total number of mutations (from a selected GeneSet) mutated in each sample (from selected SampleSet).

You can further summarize the data by downloading this TotalMutationBurdenByNTchange table to Array Studio's local analysis. For example, adding a variable view to better visualize mutation burden across samples:

Mutation burden variable view. Y-axis represents mutation number. X-axis represents different samples.

Mutation burden variable view. Y-axis represents mutation number. X-axis represents different samples.

2. Calculate average mutation burden in each tumor in TCGALand

Using local analysis functions, you can further research mutation burden in each tumor in Land data. The Summarize function allows user to calculate the mean mutation number grouped by tumor type, or other preferred grouping options.  

After Stacking the table, you can plot another variable view to visualize the distribution of each type of nucleotide change in each tumor type

                                                                               Stack table by row to generate variable view 

                                                                               Stack table by row to generate variable view 

In this way, we can easily tell which tumor type has the highest mutation burden.

To learn how to exactly perform the above analysis, please watch our OncoLand case study: Visualize "mutation burden" of each tumor in TCGALand.

[OncoLand Case Study] Find genes that are frequently co-mutated with your gene-of-interest: Co-mutation of TP53 and ATRX when IDH1-R132 is mutated

Vivian Zhang

The IDH1 gene encodes isocitrate dehydrogenase, which is  involved in NADPH production, especially in the brain. Mutations in IDH1 are frequently found in low grade and high grade gliomas (Low grade (grade II), anaplastic (grade III), and glioblastoma (GBM, grade IV).). (Research Article: IDH1 and IDH2 Mutations in Gliomas) These mutations play an important role in gliomagenesis and thus have clinical interest. We can query OncoLand to learn about IDH1 mutations, and other genes frequently co-mutated. For details, please refer to our OncoLand case study wiki:

Identify mutation hotspots in a gene of interest

In several cancers, IDH1 is frequently mutated at arginine 132, which alters the enzyme's active site. We can visualize the frequencies of mutations at different sites in each tumor. As we can see, our data confirms that IDH1 arginine 132 is frequently mutated in low grade gliomas (LGG) and glioblastoma (GBM):

TCGALand DNA-Seq Somatic Mutation Site Distribution View. 

TCGALand DNA-Seq Somatic Mutation Site Distribution View. 

The user can create a SampleSet, for example the one shown below, IDH1_mutaion, from the Analytics | Generate Sample Set | Generate Site Mutation Status SampleSet. 

SampleSet: IDH1_mutation

SampleSet: IDH1_mutation

Identify other genes that are co-mutated with your gene of interest

With the SampleSet, we can identify the gene mutations that are correlated through Analytics | Integration Analysis | Sample Grouping to Mutation. The test may take a few minutes if all genes are queried, and the results will be available from the Analytics | Open Result Set menu. From the results table, we can rank genes with the PValue from the Fisher Exact Test to identify the correlated genes, for instance ARRX and TP53 in LGG and GBM:

Analytics | Integration Analysis | Sample Grouping to Mutation Test results. Rank by PValue, filter by only co-occurring gene in LGG and GBM.

Analytics | Integration Analysis | Sample Grouping to Mutation Test results. Rank by PValue, filter by only co-occurring gene in LGG and GBM.

Visualize Co-mutation patterns with the Alteration Omicprint

There are several ways to visualize co-mutation frequencies of multiple genes. While the "Alteration Distribution" displays the number of samples mutated in any gene of the GeneSet, "Somatic Co-mutation Frequencies" will display the distribution of samples with different mutation loads. The "Alteration Omicprint" efficiently displays per-sample mutation status of one, ten, or even hundreds of genes. You can also generate custom Omicprinst based on custom queries if you want to query mutation status. Please check out our case study tutorial videos to learn how to perform the analysis. 

Alteration Omicprint displays gene alteration status for multiple genes for corresponding samples. Custom quires for IDH1 and TP53 somatic mutation status, and BMP2 RNA-Seq FPKM are created. Next, check out Custom Query Omicprint view. For each custom query, sample status is displayed. As we can see, samples with mutated IDH1 and TP53 frequently over-express BMP2 in GBM. 

Alteration Omicprint displays gene alteration status for multiple genes for corresponding samples. Custom quires for IDH1 and TP53 somatic mutation status, and BMP2 RNA-Seq FPKM are created. Next, check out Custom Query Omicprint view. For each custom query, sample status is displayed. As we can see, samples with mutated IDH1 and TP53 frequently over-express BMP2 in GBM. 

Bridging Bioinformatics|Genomics|Genetics Research: 2016 Omicsoft User Group Meeting

Vivian Zhang

 

  • Who Attended:
    • More than 30 leading pharmaceutical and biotech companies. 
    • More than 100 attendees who are experts and scientists in the field of bioinformatics/genomics/genetics.
  • What Occurred:
    • Numerous discussions among attendees on the future of biomarker discovery, as well as best practices of data management, visualization and analysis.

 

  

Omicsoft Corporation successfully held our kick-off Omicsoft User Group Meeting in Cambridge, MA on Wednesday May 4, 2016.

We would like to thank all speakers and attendees, all of whom are extremely important in helping build out our platform successfully.  We've received extremely positive feedback from the meeting, and hope to do it again in the future.  Feedback on our software and services help drive our business, and the direct interaction with our customers during the event proved invaluable to us. 

Highlights from the meeting:

  • Introduction of GeneticsLand for management of genetics data
  • Introduction to the future SingleCell Land
  • Overview on curation processes
  • Updates on current data subscription Lands

For more details, please visit our 2016 User Group Meeting webpage.

 

Above is just a glance of some exciting moments at our meeting. If you missed the meeting, we have uploaded our speaker presentations and videos on our 2016 User Group Meeting webpage.

If you have any question with regard to the meeting, please contact us. 

 

[Land Update]GeneticsLand: Data Warehouse for Variant Level Data

Vivian Zhang

 

What is GeneticsLand?


GeneticsLand is a data warehouse for variant level data and provides a turnkey solution to genetic data storage, analysis, and annotation to facilitate a wide range of genetic-based activities in drug discovery and development. 

It allows user to import and export VCF files, array based data, imputed array data, eQTL data, association results, and variant annotated data. GeneticsLand is a subscription, adding value regularly with new content after purchase.. It is a search engine that not only stores millions of samples but also provides fast, easy and accurate search of variants, genes, chromosomal region and phenotype data.


Why use GeneticsLand?

 

Big Data:  GeneticsLand can store up to one million VCF samples per "Land" (database), (assuming 100 million SNPs per sample and essentially unlimited array based files). It can support thousands of association studies. At the same time,  data storage is compressed, significantly reducing IT storage burden.

Fast: GeneticsLand can access 100 trillion data points and perform advanced visualizations and dynamic annotation in real time.  Queries of 10 billion data points can occur in much less than 1 second (0.01 seconds in benchmark testing).

Integrated Solution: GeneticsLand will provide biologist-friendly and high level client for genomic/genetic data integration (including a future web-based client), accelerating target identification and validation using our sophisticated queries and visualization system. Our adjacent products, including OncoLand, ImmunoLand and CVMLand, work together with GeneticsLand to provide comprehensive genomic research tools that empower researchers in all disease fields. 

SNEAK PEEK OF GENETICSLAND

 

Search variant/genes for individual genotypes, SNP allele frequencies (population genetics), and annotations:

Access curated public association data: 

Visualize public and private eQTL data:

Correlate numerical clinical variables to genotypes:

How could GeneticsLand help you with your genetics research?

  • Big Picture - centralized storage and search engine
  • Everyday work
    • Search variant/gene for annotations
    • Search frequencies for a given variant/gene/region cross projects
    • Access individual genotype data (for data managers) cross projects
    • Access public association data (e.g. GRASP2)
    • Access public variant data
    • normalized data from any# of studies through exportinf
    • Instant region/genome plots for association studies

Stay tuned for more information in the coming days and weeks! 

Contact us at: sales@omicsoft.com for any questions or request for demo.

 

[New Feature] Columns/levels/rows reorder in Land Made Easier

Vivian Zhang

At Omicsoft, we continue to expand and improve variable annotation, meta data and clinical measurement to provide data as comprehensive as possible. For example, in Land, a table or variable view may have hundreds of columns. 

In this case, being able to flexibly order and display columns, rows or levels becomes important. Recently, we improved our columns/levels/rows reorder functions in Land. For example, when the user wants to specify columns:

When there are multiple or even hundreds of listed columns, previously the user could only reorder a specific column one position at a time by clicking the up and down arrows. Now, we implemented two additional buttons that allow the user to move a column to the top or bottom of the list by one click:

By clicking the highlighted buttons, the user can move the selected column up to the top or down to the bottom.

Reorder window after.png

At Omicsoft, we believe that customer-oriented product design with attention to details determines excellence.



[NEW FEATURE] GENESET ANALYSIS - VISUALIZE EXPRESSION COMPARISON FOR ANY SET OF GENES OF INTEREST

Vivian Zhang

ImmunoLand is Omicsoft's most recently developed Land database. It is an immune-related genomics database and visualization software that helps users explore public and private immune-focused genomics datasets. In ImmunoLand, researchers can search a gene, multiple genes, a pathway, a project or multiple projects. With the recently implemented Gene Set Analysis, users can visualize comparison data for any set of genes of interest.

After the user create a geneset, for example, GSE26927:

The user can go to the view directory and select Gene Set Analysis.

By selecting the geneset that was just created, the user will get a GeneSet Enrichment Analysis plot, displaying comparison P-Value of the comparisons that have overlapped genes with the selected geneset. As an alternative, the user can search for the geneset from the search gene toolbox.

[New Land Update] GeneticsLand: a turnkey solution for genetic data storage, analysis and annotation

Vivian Zhang

GeneticsLand will provide a turnkey solution to genetic data storage, analysis, and annotation to facilitate a wide range of genetic-based activities in drug discovery and development. It serves as a gateway to bring both internal and external genetic data in one place to allow easy access and interpretation of genetic data.

GeneticsLand utilizes a proprietary data storage framework that enables efficient storage of whole genome sequencing, whole exome sequencing, GWAS and imputed data. Data quality control and analysis adopting up-to-date best practices are implemented in GeneticsLand to allow consistent and rapid genetic data QC and analysis. By linking both current and historical internal data with most-current external databases, GeneticsLand can help researchers interpret genetic analysis results.

GeneticsLand is a tool that will be used for drug target discovery, drug target validation, pharmacogenetic and pharmacogenomics studies. Expected release 2015 Q4. Contact us at sales@omicsoft.com if you are interested in a free trial. Also, talk to us if you have public or private datasets of interest. We would love to customize the Land for your research needs.

GeneticLand Sneak peek:

Figure. Variant Region Plot 

Figure. Variant Region Plot 

GeneticsLand has the following 6 major modules:

(1)     Genetics data management:

  • Stores array-based or sequencing-based genotype data, imputed dosage data on the server with powerful search/visualization/exporting functions
  • Manages QC results and association reports

(2)    Genetic variant management: manages a huge collection of variant annotation

(3)    Array-based data analysis pipeline: Genetic data analysis pipeline based on GWAS and imputed data

(4)    NGS data analysis pipeline:

  • Uses server, cluster or cloud to run OSA4+GATK or BWA+GATK pipeline
  • Generates QC and VCF from FASTQ in a simple yet powerful pipeline.
  • Includes built-in cloud/server based BAM streaming

(5)    Genetic search engine:

  • Provides gene based search engine for full genetic information – GWAS catalog, eQTL.
  • Makes use of LD information to provide insights for target validation and interpretation of genetic contribution to drug response

(6)    eQTL management: manages public or internal eQTL data in a highly searchable/visual user interface

Screenshots of GeneticLand prototype:

Table. Vatiant Annotation

Table. Vatiant Annotation

Figure. Variant Filter Directory. (Partial display)

Figure. Variant Filter Directory. (Partial display)

Figure. LDL Association Plot

Figure. LDL Association Plot

[New Feature] Sample filtering made easy with new String Filter function

Vivian Zhang

At Omicsoft, we have a continually growing Land user base. The increasing number of public genomic research projects and datasets has made it possible to research on public samples with certain disease, gene mutation or clinical phenotype without spending millions of dollars to conduct the experiments. As we continue to improve Land sample search and filter capacities, we are glad to introduce a new String Filter function that will make it easier to search multiple samples, genes or any conditions. 

For any string variables, no matter it is sample ID, gene name, clinical measurement or others, the user can filter multiple strings using Add String Filter function:

For example, if the user is interested in gene expression of EGFR gene and wants to further research on a few samples with high EGFR expression in breast cancer, he or she will likely check the gene FPKM view of EGFR gene in TCGA Land and identify a few samples:

Samples with high EGFR expression in breast cancer patients are highlighted in pink.

Samples with high EGFR expression in breast cancer patients are highlighted in pink.

Next, the user can right click on SampleID filter, as it is shown in Figure 1, and choose Add String Filter (Select) to select sample ID names:

Or, the user can choose Add String Filter (Input) and just copy in the sample ID:

The string filter function applies to all string variable filters. Now, let's get started with fast string filtering on your sample of interest!

Note: Array Studio version requirement: v8.1.0.95  or higher.

[Feature Update] Checking Sample Details Made Easy: Improved Land Sample TableView Visualization

Vivian Zhang

At Omicsoft, we constantly expand our sample datasets, improve graphical visualization and introduce new features, all for the goal to help researchers better conduct genomic research. For 2015 Q2 Land updates, we formally released our ImmunoLand and significantly improved clinical integration. (For more details, please watch our recorded webinar:  Omicsoft 2015 Q2 Land Updates.) With more than 100,000 samples of different types of genomic data and hundreds of clinical measurements available, we also improved our user interface for better data display and query. One of the improvements is the improved Land sample TableView visualization.

In previous version, the sample TableView, where all the sample information including clinical information is displayed, appears to be: 

The size of each cell is predefined regardless of the length of content in the cell. To check the longer content, user needs to mouse over or manually expand the cell. 

With the new user interface, the cell size is customized based on the cell content, making all the information clear and easy to check at a glance :

At Omicsoft, we live to improve for the better good of customer need. While you are enjoying the convenience of the new TableView, talk to us if you have suggestion or request to improve our product and service.

New cancer genomics datasets (TCGA and more) with OncoLand's 2015 Q2 Release

Vivian Zhang

PR News Release

Omicsoft Corporation, an industry leader in cancer genomics, bioinformatics, and next generation sequencing storage and analysis, today announced their quarterly release of their OncoLand data service. Watch the Land 2015 Q2 Release Webinar

Omicsoft Corporation provides a data service and oncology database platform, OncoLand, that focuses on management of both public and customer cancer datasets, including clinical, next generation sequencing, gene expression, copy number, protein, and methylation data. 

In the 2015 Q2 release, the biggest Land update since it's introduction in 2013, Omicsoft highlights the following updates:

  • Introduction of a new clinical subsystem
  • 10+ Patient centric views now available
  • Introduction of new lands, such as GenentechCellLine, and more than 10,000 new sample data (requires controlled access to Genentech Cell Line study published in Nature in 2014)
  • Feature updates:
  1. Dynamic correlation (among RNA-Seq, Mutation, CNV and protein expression data)
  2. Viral and bacterial data integration
  3. Sample centric views
  4. Multiple grouping
  5. Geneset improvement
  6. Land Audit Trail
  7. "Missing data" visualizations

For more details about feature update, please watch the Land 2015 Q2 Release Webinar

Along with the OncoLand release, Omicsoft is pleased to introduce the next release of ImmunoLand. ImmunoLand incorporates public immunology data in disease areas including Arthritis, Asthma, COPD, IBD (Ulcerative Colitis, Crohn’s diseases), Lupus, Psoriasis and other skin diseases, Infectious diseases and vaccine, and Neuroimmuno-diseases (multiple sclerosis and more).

For new users, please contact sales@omicsoft.com for a free trial and consultation. For existing user, please contact support@omicsoft.com for more details. 

 

 

 

[Newly implemented features] Show Query Status-Audit Your Queries in Land

Vivian Zhang

Omicsoft has rapidly expanded it’s Land (OncoLand and ImmunoLand) datasets, and with sample/patient information being increasingly important, it is crucial to be able to navigate and filter sample information in order to fine tune the data visualizations. We aim to provide extremely detailed sample and clinical information.

With the powerful sample query abilitiy and hundreds of clinical annotation items available, it may be hard to remember and trace back all the customized filters and queries.

To help navigate through the sample and clinical information jungle, Omicsoft has added a new feature: Show Query Status. 

With this feature, you don’t need to worry about being interrupted when navigating the data and forget about what filters you applied. The Show Query Status function will record it all for you.  You can use this information when creating presentations, taking notes for your research, or just as a way to ensure you know what filters are applied to your current search query.

Navigate through the jungle faster and replicate your success path! 



Detection of high frequency mutations in tumor suppressors

Matt Newman

In this video, we use OncoLand to find a list of the highest frequency mutations per tumor type across TCGA, limited to a list of known tumor suppressors.  This technique can easily be extended for any gene, and the categorization can be across all tumor types, histologies, or any other classification available in the system or that you create.  It also applies to all datasets, not just TCGA, so could easily be applied to ICGC, CCLE, etc.



Transcript-specific expression of PDLIM5 in prostate cancer vs all other tumors and normal samples

Matt Newman

Recently, I was asked a question by a customer: How can I find transcripts specific to prostate cancer?  The added difficulty to this question was that the answer needed to assure that the transcript's expression was not tissue-specific and was confined to just Prostate Cancer.

Using Omicsoft's OncoLand and Land analytics, we were able to accomplish this in a few minutes.

  1. Group samples across all Lands, including GTEx (normal samples) into Prostate cancer vs All.  Any sample from the PRAD grouping was labeled Prostate Cancer, and all other samples were labeled "All".
  2. Use Sample Grouping -> Expression Land Analytics module to find alternatively spliced transcripts across these two groups.
  3. As a further refining criteria, run the analytical module again, this time just comparing the PRAD Tumor samples to the PRAD control normal samples.
  4. Merge the two results in Array Studio, then filter for a set of transcripts that both go in the same direction (i.e PRAD>Control and Prostate Cancer>All) with an mean fold change of >2.

This gave me a list of transcripts that can be considered alternatively spliced, specific to Prostate Cancer, including transcript uc003htj.4, which I show here.

Prostate cancer specific expression of PDLIM5 transcript across TCGA, ICGC, GTEx, and CGCI RNA-Seq Datasets.

Interestingly, expression of other transcripts for PDLIM5 were specific to other tissues, including Heart and Muscle-specific transcripts.