Palmetto: Palmetto is a quality measuring tool for topics

Palmetto is a quality measuring tool for topics based on coherence calculations.

Issues Demo Source Code

Logo Palmetto Palmetto is a quality measuring tool for topics

With Topic Modeling it is possible to extract topics from a collection of documents automatically and unsupervised. A disadvantage of Topic Modeling is that in most cases the created topics have to be evaluated manually by humans. Palmetto is a tool which tries to help researchers by offering different coherence calculations for a topic's top words. These coherences are based on word co-occurrences in the english wikipedia and have been proven to correlate with human ratings.

The source code is dual licensed and can be found at github. For larger experiments the program can be downloaded or the webservice can be used. More on how Palmetto could be used can be found on this wikipage.

A Dutch index for Palmetto has been created by van der Zwaan, Marx and Kamps. Thus, Palmetto can be used for Dutch as well. The index can be downloaded here.

For researchers who want to try out different coherences by themself, it might be interesting that Palmetto can be used as Java library and already contains more than 200.000 coherences that have been evaluated for the publication Exploring the Space of Topic Coherences.

The topics and human ratings used in this publication as well as the Movie and RTL-Wiki corpora can be found here. Since we did not create all datasets by ourself, please cite the creators/providers of the datasets where appropriate. You can find the reference of their publications in our paper in the section that describes the datasets.

Project Team

Publications

by (Editors: ) [BibTex of ]

News

DBpedia @ Google Summer of Code – GSoC 2017 ( 2017-03-13T11:12:50+01:00 Christopher Schulz)

2017-03-13T11:12:50+01:00 Christopher Schulz

DBpedia, one of InfAI’s community projects, will be part of the 5th Google Summer of Code program. The GsoC has the goal to bring students from all over the globe into open source software development. Read more about "DBpedia @ Google Summer of Code – GSoC 2017"

New GERBIL release v1.2.5 – Benchmarking entity annotation systems ( 2017-03-10T11:49:51+01:00 by Ricardo Usbeck)

2017-03-10T11:49:51+01:00 by Ricardo Usbeck

Dear all, the Smart Data Management competence center at AKSW is happy to announce GERBIL 1.2.5. Read more about "New GERBIL release v1.2.5 – Benchmarking entity annotation systems"

DBpedia Open Text Extraction Challenge – TextExt ( 2017-03-09T12:15:57+01:00 Christopher Schulz)

2017-03-09T12:15:57+01:00 Christopher Schulz

DBpedia, a community project affiliated with the Institute for Applied Informatics (InfAI) e.V., extract structured information from Wikipedia & Wikidata. Now DBpedia started the DBpedia Open Text Extraction Challenge – TextExt. Read more about "DBpedia Open Text Extraction Challenge – TextExt"

The USPTO Linked Patent Dataset release ( 2017-02-24T17:18:51+01:00 by Mofeed Hassan)

2017-02-24T17:18:51+01:00 by Mofeed Hassan

Dear all, We are happy to announce USPTO Linked Patent Dataset release. Patents are widely used to protect intellectual property and a measure of innovation output. Read more about "The USPTO Linked Patent Dataset release"

Two accepted papers in ESWC 2017 ( 2017-02-22T17:43:38+01:00 by Dr. Mohamed Ahmed Sherif)

2017-02-22T17:43:38+01:00 by Dr. Mohamed Ahmed Sherif

Hello Community! We are very pleased to announce the acceptance of two papers in ESWC 2017 research track. The ESWC 2017 is to be held in Portoroz, Slovenia from 28th of May to the 1st of June. Read more about "Two accepted papers in ESWC 2017"