clock menu more-arrow no yes

Filed under:

Amazon Web Services now hosting massive genetic database: the 1,000 Genomes Project

New, 5 comments

Amazon Web Services has announced it will be hosting 200 terabytes of genomic data from the 1,000 Genomes Project for free. Researchers will pay to use Amazon's cloud computing tools to analyze the data.

1,000 Genomes Project
1,000 Genomes Project

Amazon Web Services has just announced that it will be providing space for hosting and analyzing data collected by the 1,000 Genomes Project, an international effort to collect and catalog a vast amount of genetic information from anonymous donors worldwide. Amazon will be footing the bill to host the roughly 200 terabytes of information, but will charge researchers to use its cloud computers if they want to analyze specific sets of data. The company certainly stands to make money on the deal, but its computing tools may also allow greater access to groups with limited computing resources of their own.

The 1,700 DNA sequences from the Project are currently hosted by multiple institutions worldwide, including the National Institutes of Health and the European Bioinformatics Institute. However, Amazon's service will let researchers import the data directly to Amazon Elastic Compute Cloud and Amazon Elastic MapReduce for analysis, removing the need to download the huge set to a physical computer. "Putting the data in the cloud provides a tremendous opportunity for researchers around the world who want to study large-scale human genetic variation but lack the computer capability to do so," says Richard Durbin, the Project's co-founder. The project is part of a larger Obama Administration plan for "Big Data," (PDF) which is meant to expand the resources currently available for accessing, organizing, and analyzing large collections of information.