Amazon Web Services has just announced that it will be providing space for hosting and analyzing data collected by the 1,000 Genomes Project, an international effort to collect and catalog a vast amount of genetic information from anonymous donors worldwide. Amazon will be footing the bill to host the roughly 200 terabytes of information, but will charge researchers to use its cloud computers if they want to analyze specific sets of data. The company certainly stands to make money on the deal, but its computing tools may also allow greater access to groups with limited computing resources of their own.
The 1,700 DNA sequences from the Project are currently hosted by multiple institutions worldwide, including the National Institutes of Health and the European Bioinformatics Institute. However, Amazon's service will let researchers import the data directly to Amazon Elastic Compute Cloud and Amazon Elastic MapReduce for analysis, removing the need to download the huge set to a physical computer. "Putting the data in the cloud provides a tremendous opportunity for researchers around the world who want to study large-scale human genetic variation but lack the computer capability to do so," says Richard Durbin, the Project's co-founder. The project is part of a larger Obama Administration plan for "Big Data," (PDF) which is meant to expand the resources currently available for accessing, organizing, and analyzing large collections of information.