The project will collect two samples from an individual. The first will be the tumor and the second will be normal tissue. These will be tracked via bar code and DNA tests.
Data from the samples will be compiled and combined with other information. The National Cancer Institute has made privacy a big priority.
Data management will become difficult given all the variables in cancer tissue.
Once the data is compiled it will be stored in a data coordination center. Middleware, specifically Jboss, will be the interface to various databases involved with the project.
Analyzing the information will take some computing horsepower. The Cancer Genomic Atlas plans to tap into other high performance computing grids to analyze the information.
A committee will determine who has access to specific data on each sample.
One challenge with managing these genetic sequences is agreeing on data definitions.
If this plays out like the project to map human DNA the public database will lead to more commercial development for cancer-targeting drugs.
The Cancer Genome Atlas is still in pilot stage. More funding will be needed to expand to analyze all forms of cancer.