Palmetto: Palmetto is a quality measuring tool for topics

Palmetto is a quality measuring tool for topics based on coherence calculations.

Issues Demo Source Code

Logo Palmetto Palmetto is a quality measuring tool for topics

With Topic Modeling it is possible to extract topics from a collection of documents automatically and unsupervised. A disadvantage of Topic Modeling is that in most cases the created topics have to be evaluated manually by humans. Palmetto is a tool which tries to help researchers by offering different coherence calculations for a topic's top words. These coherences are based on word co-occurrences in the english wikipedia and have been proven to correlate with human ratings.

The source code is dual licensed and can be found at github. For larger experiments the program can be downloaded or the webservice can be used. More on how Palmetto could be used can be found on this wikipage.

A Dutch index for Palmetto has been created by van der Zwaan, Marx and Kamps. Thus, Palmetto can be used for Dutch as well. The index can be downloaded here.

For researchers who want to try out different coherences by themself, it might be interesting that Palmetto can be used as Java library and already contains more than 200.000 coherences that have been evaluated for the publication Exploring the Space of Topic Coherences.

The topics and human ratings used in this publication as well as the Movie and RTL-Wiki corpora can be found here. Since we did not create all datasets by ourself, please cite the creators/providers of the datasets where appropriate. You can find the reference of their publications in our paper in the section that describes the datasets.

Project Team

Publications

by (Editors: ) [BibTex of ]

News

DBpedia Tutorial @ Knowledge Graph Conference 2021 ( 2021-04-09T13:20:50+02:00 by Julia Holze)

2021-04-09T13:20:50+02:00 by Julia Holze

On May 4, 2021 we will organize a tutorial at the Knowledge Graph Conference (KGC) 2021. Read more about "DBpedia Tutorial @ Knowledge Graph Conference 2021"

DBpedia @ Google Summer of Code program 2021 ( 2021-03-15T09:41:22+01:00 by Julia Holze)

2021-03-15T09:41:22+01:00 by Julia Holze

DBpedia, one of InfAI’s community projects, will participate in the Google Summer of Code (GSoC) program for the 10th time. The GsoC program has the goal to bring students from all over the globe into open source software development. Read more about "DBpedia @ Google Summer of Code program 2021"

DBpedia’s New Website ( 2021-01-28T12:42:40+01:00 by Julia Holze)

2021-01-28T12:42:40+01:00 by Julia Holze

We are proud to announce the completion of the new DBpedia website. Read more about "DBpedia’s New Website"

SANSA 0.7.1 (Semantic Analytics Stack) Released ( 2020-01-17T09:52:41+01:00 by Prof. Dr. Jens Lehmann)

2020-01-17T09:52:41+01:00 by Prof. Dr. Jens Lehmann

We are happy to announce SANSA 0.7.1 – the seventh release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Flink in order to allow scalable machine learning, inference and querying capabilities for large knowledge graphs. Read more about "SANSA 0.7.1 (Semantic Analytics Stack) Released"

More Complete Resultset Retrieval from Large Heterogeneous RDF Sources ( 2019-12-05T15:46:09+01:00 Andre Valdestilhas)

2019-12-05T15:46:09+01:00 Andre Valdestilhas

Over recent years, the Web of Data has grown significantly. Various interfaces such as LOD Stats, LOD Laundromat and SPARQL endpoints provide access to hundreds of thousands of RDF datasets, representing billions of facts. Read more about "More Complete Resultset Retrieval from Large Heterogeneous RDF Sources"