GHO: Publishing and Interlinking the Global Health Observatory Dataset

  • screenshot

The improvement of public health is one of the main indicators for societal progress. Statistical data for monitoring public health is highly relevant for a number of sectors, such as research (e.g. in the life sciences or economy), policy making, health care, pharmaceutical industry, insurances etc. Such data is meanwhile available even on a global scale, e.g. in the Global Health Observatory (GHO) of the United Nations’s World Health Organization (WHO). GHO comprises more than 50 different datasets, it covers all 198 WHO member countries and is updated as more recent or revised data becomes available or when there are changes to the methodology being used. However, this data is only accessible via complex spreadsheets and, therefore, queries over the 50 different datasets as well as combinations with other datasets are very tedious and require a significant amount of manual work. By making the data available as RDF, we lower the barrier for data re-use and integration.

Homepage Download

The converted GHO data is now available at http://gho.aksw.org/. After converting the entire GHO data, an RDF dataset containing almost 8 million triples was obtained. Following is the example of a single statistical item, the death value of 1098, from the GHO dataset represented using the Data Cube vocabulary:

gho:o1     a           qb:Observation;
       gho:Country     Afghanistan;
       gho:stat_pop    24076;
       gho:Disease     All Causes;
       gho:incidence   18437.

gho:Country    a    qb:DimensionProperty;
                    rdfs:label    "Country".

gho:Disease    a    qb:DimensionProperty;
                    rdfs:label    "Disease".

Further Information

  • This is a short presentation describing the process of conversion of the CSV files to RDF using SCOVO (Statistical Core Vocabulary) in OntoWiki. SCOVO is an earlier version of the Data Cube Vocabulary and the conversion process is similar for both.
  • This is a position paper that was accepted for a presentation at the Ontologies in Biomedicine and Life Sciences workshop held at Mannheim (Germany) from September 9 – 10, 2010.
  • The dataset description has been published here.
  • This dataset is also part of the LODD datasets.

Project Team

Former Members

Publications

by (Editors: ) [BibTex of ]

News

DBpedia Tutorial @ Knowledge Graph Conference 2021 ( 2021-04-09T13:20:50+02:00 by Julia Holze)

2021-04-09T13:20:50+02:00 by Julia Holze

On May 4, 2021 we will organize a tutorial at the Knowledge Graph Conference (KGC) 2021. Read more about "DBpedia Tutorial @ Knowledge Graph Conference 2021"

DBpedia @ Google Summer of Code program 2021 ( 2021-03-15T09:41:22+01:00 by Julia Holze)

2021-03-15T09:41:22+01:00 by Julia Holze

DBpedia, one of InfAI’s community projects, will participate in the Google Summer of Code (GSoC) program for the 10th time. The GsoC program has the goal to bring students from all over the globe into open source software development. Read more about "DBpedia @ Google Summer of Code program 2021"

DBpedia’s New Website ( 2021-01-28T12:42:40+01:00 by Julia Holze)

2021-01-28T12:42:40+01:00 by Julia Holze

We are proud to announce the completion of the new DBpedia website. Read more about "DBpedia’s New Website"

SANSA 0.7.1 (Semantic Analytics Stack) Released ( 2020-01-17T09:52:41+01:00 by Prof. Dr. Jens Lehmann)

2020-01-17T09:52:41+01:00 by Prof. Dr. Jens Lehmann

We are happy to announce SANSA 0.7.1 – the seventh release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Flink in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs. Read more about "SANSA 0.7.1 (Semantic Analytics Stack) Released"

More Complete Resultset Retrieval from Large Heterogeneous RDF Sources ( 2019-12-05T15:46:09+01:00 Andre Valdestilhas)

2019-12-05T15:46:09+01:00 Andre Valdestilhas

Over recent years, the Web of Data has grown significantly. Various interfaces such as LOD Stats, LOD Laundromat and SPARQL endpoints provide access to hundreds of thousands of RDF datasets, representing billions of facts. Read more about "More Complete Resultset Retrieval from Large Heterogeneous RDF Sources"