Amazon Web Services and the U.S. National Institutes of Health said the complete 1000 Genomes Project is now available as a public data set.
The announcement was delivered at that White House Big Data Summit. The move puts the largest collection of human genetics data on AWS.
The collection of genetic variations weighs in at 200 terabytes and DNA sequenced from 1,700 people. The 1000 Genomes Project is an international research effort that consists of 75 companies and organizations to catalog the human genome.
With the 1000 Genomes Project now available various groups can access it for research. The 1000 Genomes Project plans to include the genomes of 2,600 people from 26 populations around the world. The NIH will add the samples to the AWS version.
In a statement, NIH program director Lisa D. Brooks, Ph.D. said that the AWS data set will save time. The 1000 Genomes Project was typically downloaded from government servers or shipped on disk.
AWS' genome data set is stored on S3 and its Elastic Block Store. That data can be used for Amazon's EC2 and Elastic MapReduce services. Amazon said the public data sets are free to access, but researchers need to pay for compute.