Studio on the Cloud: Leveraging the Amazon Cloud in a mixed computing environment | Improvements and Updates
Array Suite empowers medium to large pharmaceutical, biotech companies and research institutes to perform state-of-the-art NGS and OMIC analysis with superior accuracy and speed. However, maintaining a server or HPC cluster may not be a cost effective solution for small organizations or research units that do not have high demand for NGS and OMIC analysis. Even for large pharmaceutical and biotech companies, the computing demand can vary from time to time and many companies have started to leverage cloud solutions for internal data management and analysis.
Omicsoft has a long-term goal to be data location agnostic, allowing a customer to keep data locally, within their firewall, but also stored with a variety of cloud providers. This makes sense in a world where collaborators each use different platforms and the need to share extremely large datasets safely and efficiently becomes more and more important.
Omicsoft's cloud solutions help both large and small pharma, biotech and research institutes manage their genomic and clinical data faster, more efficiently, and for a lower cost than traditional computing.
Studio on the Cloud allows you to seamlessly run all Array Studio analytics from Amazon, combining the storage of S3 (Amazon Simple Storage Service) with the analytical power of EC2 (Amazon Elastic Compute Cloud). Easily scale up any number of instances for every analysis., while allowing users of Array Suite to easily intermix local storage-based analytics with cloud-based analytics. The user can create a standard Server Project, but instead of adding data from their server, seamlessly add data from the cloud instead. Folders in S3 brackets are mapped to the ArrayServer folder structure and to the user this appears seamlessly.
Users can select raw data from a cloud folder or its subfolder. ArrayServer will launch one machine for each sample and run analysis using an optimized EC2 instance. Input files in S3 are copied to EC2 machines where EBS storage is attached. Next, Cloud instances are launched with Omicsoft software installed. The instances receive message from ArraySever and perform the analysis. When a job is finished, all results are uploaded to a S3 output folder.
Omicsoft has seen an increasing number of clients that implement mixed mode solutions (cloud solution in addition to their SGE/PBS/LSF cluster). Omicsoft integrates cloud and cluster seamlessly so that users can perform jobs either on the cloud or on the local cluster with maximum flexibility, entirely according to their analytical need. For more details on cloud configuration, logic and cost comparisons, please check out our wiki page Example of running 100 CCLE samples on cloud.
With the growing user base, Omicsoft continues to improve the cloud implementation. Recently, we significantly improved the data transfer speed between S3 and EC2 with AWS Command Line Interface (CLI). For example, downloading the reference library now only takes less than 1 minute compared to up to 20 minutes previously. This improvement helps users to reduce analysis time, hence saving money. If you are interested in knowing more technical details, please contact customer support.