The Cancer Genome Atlas Database aims to characterize the changes that occur in genes due to cancer. Knowledge about such changes can be of central nature when aiming to predict the life expectancy as well as the medication or sequence of medication that should be administered to a patient to ensure his/her survival. So far, experts that needed this data had to wait in long data queues and write dedicated tools to analyze this data.

The 20.4 billion triples in LinkedTCGA remedy these problems by being a five-star Linked Data representation of the data made publicly available by TCGA. Researchers now need solely a small set of SPARQL queries to gather data represented in a standard format (RDF). Linked TCGA publishes different types of data (clinical, exon, methylation, miRNA etc.). The data to each tumor is have distributed into 3 different SPARQL endpoints. The federated query engine is powered by TopFed, a dedicated engine for the TCGA data that was shown to outperform state-of-the-art approaches significantly (see publications). More information on the LinkedTCGA data can be found on the project website.

