Skip to content

A multilingual, cross-domain temporal tagger developed at the Database Systems Research Group at Heidelberg University.

License

Notifications You must be signed in to change notification settings

texttechnologylab/heideltime

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

version GitHub Latest Release Package paper

About TTLab's Extension of HeidelTime

HeidelTime is one of the most widespread and successful tools for detecting temporal expressions in texts. Since HeidelTime's pattern matching system is based on regular expression, it can be extended in a convenient way. We present such an extension for the German resources of HeidelTime: HeidelTimeext. The extension has been brought about by means of observing false negatives within real world texts and various time banks. The gain in coverage is 2.7 % or 8.5 %, depending on the admitted degree of potential overgeneralization. We describe the development of HeidelTimeext, its evaluation on text samples from various genres, and share some linguistic observations.

How to Cite

Andy Luecking, Manuel Stoeckel, Giuseppe Abrami, and Alexander Mehler. 2022. I still have Time(s): Extending HeidelTime for German Texts. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4723–4728, Marseille, France. European Language Resources Association. [PDF]

BibTex

@inproceedings{luecking-etal-2022-still,
    title = "{I} still have Time(s): Extending {H}eidel{T}ime for {G}erman Texts",
    author = "Luecking, Andy  and
      Stoeckel, Manuel  and
      Abrami, Giuseppe  and
      Mehler, Alexander",
    editor = "Calzolari, Nicoletta  and
      B{\'e}chet, Fr{\'e}d{\'e}ric  and
      Blache, Philippe  and
      Choukri, Khalid  and
      Cieri, Christopher  and
      Declerck, Thierry  and
      Goggi, Sara  and
      Isahara, Hitoshi  and
      Maegaard, Bente  and
      Mariani, Joseph  and
      Mazo, H{\'e}l{\`e}ne  and
      Odijk, Jan  and
      Piperidis, Stelios",
    booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
    month = jun,
    year = "2022",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://aclanthology.org/2022.lrec-1.505/",
    pages = "4723--4728",
    abstract = "HeidelTime is one of the most widespread and successful tools for detecting temporal expressions in texts. Since HeidelTime{'}s pattern matching system is based on regular expression, it can be extended in a convenient way. We present such an extension for the German resources of HeidelTime: HeidelTimeExt. The extension has been brought about by means of observing false negatives within real world texts and various time banks. The gain in coverage is 2.7 {\%} or 8.5 {\%}, depending on the admitted degree of potential overgeneralization. We describe the development of HeidelTimeExt, its evaluation on text samples from various genres, and share some linguistic observations. HeidelTimeExt can be obtained from \url{https://github.com/texttechnologylab/heideltime}."
}

Maven

Requires Maven to be set-up for authentication with GitHub Packages.

<repositories>
  <repository>
    <id>github</id>
    <url>https://maven.pkg.github.com/texttechnologylab/*</url>
    <snapshots>
      <enabled>true</enabled>
    </snapshots>
  </repository>
</repositories>

<dependencies>
  <dependency>
    <groupId>org.texttechnologylab</groupId>
    <artifactId>heideltime</artifactId>
    <version>4.0.4</version>
  </dependency>
</dependencies>

<!-- Authentication can also be set-up in your ~/.m2/settings.xml file -->
<servers>
  <server>
    <id>github</id>
    <username>USERNAME</username>
    <password>TOKEN</password>
  </server>
</servers>

Add the JitPack repository and the dependency to your pom.xml:

<repositories>
  <repository>
    <id>jitpack.io</id>
    <url>https://jitpack.io</url>
  </repository>
</repositories>

<dependencies>
  <dependency>
    <groupId>com.github.texttechnologylab</groupId>
    <artifactId>heideltime</artifactId>
    <version>4.0.4</version>
  </dependency>
</dependencies>

Original HeidelTime

HeidelTime is a multilingual, domain-sensitive temporal tagger developed at the Database Systems Research Group at Heidelberg University. It extracts temporal expressions from documents and normalizes them according to the TIMEX3 annotation standard. HeidelTime is available as UIMA annotator and as standalone version.

HeidelTime currently contains hand-crafted resources for 13 languages: English, German, Dutch, Vietnamese, Arabic, Spanish, Italian, French, Chinese, Russian, Croatian, Estonian and Portuguese. In addition, starting with version 2.0, HeidelTime contains automatically created resources for more than 200 languages. Although these resources are of lower quality than the manually created ones, temporal tagging of many of these languages has never been addressed before. Thus, HeidelTime can be used as a baseline for temporal tagging of all these languages or as a starting point for developing temporal tagging capabilities for them.

HeidelTime distinguishes between news-style documents and narrative-style documents (e.g., Wikipedia articles) in all languages. In addition, English colloquial (e.g., Tweets and SMS) and scientific articles (e.g., clinical trails) are supported.

Original HeidelTime can be obtained at github.

Want to see what it can do before you delve in? Take a look at HeidelTime's online demo.

About

A multilingual, cross-domain temporal tagger developed at the Database Systems Research Group at Heidelberg University.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages

  • Java 93.4%
  • Perl 3.4%
  • Shell 3.2%