Palmetto: Palmetto is a quality measuring tool for topics

Palmetto is a quality measuring tool for topics based on coherence calculations.

Issues Demo Source Code

Logo Palmetto Palmetto is a quality measuring tool for topics

With Topic Modeling it is possible to extract topics from a collection of documents automatically and unsupervised. A disadvantage of Topic Modeling is that in most cases the created topics have to be evaluated manually by humans. Palmetto is a tool which tries to help researchers by offering different coherence calculations for a topic's top words. These coherences are based on word co-occurrences in the english wikipedia and have been proven to correlate with human ratings.

The source code is dual licensed and can be found at github. For larger experiments the program can be downloaded or the webservice can be used. More on how Palmetto could be used can be found on this wikipage.

A Dutch index for Palmetto has been created by van der Zwaan, Marx and Kamps. Thus, Palmetto can be used for Dutch as well. The index can be downloaded here.

For researchers who want to try out different coherences by themself, it might be interesting that Palmetto can be used as Java library and already contains more than 200.000 coherences that have been evaluated for the publication Exploring the Space of Topic Coherences.

The topics and human ratings used in this publication as well as the Movie and RTL-Wiki corpora can be found here. Since we did not create all datasets by ourself, please cite the creators/providers of the datasets where appropriate. You can find the reference of their publications in our paper in the section that describes the datasets.

Project Team

Publications

by (Editors: ) [BibTex of ]

News

SANSA 0.2 (Semantic Analytics Stack) Released ( 2017-06-13T18:18:28+02:00 by Prof. Dr. Jens Lehmann)

2017-06-13T18:18:28+02:00 by Prof. Dr. Jens Lehmann

The AKSW and Smart Data Analytics groups are happy to announce SANSA 0.2 – the second release of the Scalable Semantic Analytics Stack. Read more about "SANSA 0.2 (Semantic Analytics Stack) Released"

AKSW at ESWC 2017 ( 2017-06-12T10:53:35+02:00 Christopher Schulz)

2017-06-12T10:53:35+02:00 Christopher Schulz

Hello Community! The ESWC 2017 just ended and we give a short report of the course at the conference, especially regarding the AKSW-Group. Our members Dr. Muhammad Saleem, Dr. Mohamed Ahmed Sherif, Claus Stadler, Michael Röder, Prof. Dr. Read more about "AKSW at ESWC 2017"

Four papers accepted at WI 2017 ( 2017-06-10T15:01:31+02:00 Christopher Schulz)

2017-06-10T15:01:31+02:00 Christopher Schulz

Hello Community! We proudly announce that The International Conference on Web Intelligence (WI) accepted four papers by our group. The WI takes place in Leipzig between the 23th – 26th of August. Read more about "Four papers accepted at WI 2017"

AKSW Colloquium, 29.05.2017, Addressing open Machine Translation problems with Linked Data. ( 2017-05-26T13:51:11+02:00 by Diego Moussallem)

2017-05-26T13:51:11+02:00 by Diego Moussallem

At the AKSW Colloquium, on Monday 29th of May 2017, 3 PM, Diego Moussallem will present two papers related to his topic. First paper titled “Using BabelNet to Improve OOV Coverage in SMT” of Du et al. Read more about "AKSW Colloquium, 29.05.2017, Addressing open Machine Translation problems with Linked Data."

SML-Bench 0.2 Released ( 2017-05-11T13:01:45+02:00 by Patrick Westphal)

2017-05-11T13:01:45+02:00 by Patrick Westphal

Dear all, we are happy to announce the 0.2 release of SML-Bench, our Structured Machine Learning benchmark framework. SML-Bench provides full benchmarking scenarios for inductive supervised machine learning covering different knowledge representation languages like OWL and Prolog. Read more about "SML-Bench 0.2 Released"