Skip to content

horas in time expressions is lemmatized horas and not featurized #58

@AngledLuffa

Description

@AngledLuffa

Here, we got constructions like

# sent_id = CESS-CAST-AA-20000203-2543-s1
# text = El partido aplazado de la vigésima tercera jornada de la Liga de División de Honor de fútbol sala Playas Castellón-Caja San Fernando se disputará el martes 29 de febrero a las 20.45 horas.
30      a       a       ADP     _       _       32      case    32:case _
31      las     el      DET     _       Definite=Def|Gender=Fem|Number=Plur|PronType=Art        32      det     32:det  _
32      20.45   20.45   NUM     _       NumForm=Digit|NumType=Card      26      compound        26:compound     _
33      horas   horas   NOUN    _       _       26      compound        26:compound     Entity=CESSCASTAA200002032543c2)|SpaceAfter=No

in GSD instead we get

# sent_id = es-train-005-s288
# text = En concreto, la alerta se ha activado para la Cordillera Cantábrica leonesa y la comarca zamorana de Sanabria, en las que permanecerá en vigor entre las 00.00 y las 15.00 horas del martes.
31      las     el      DET     _       Definite=Def|Gender=Fem|Number=Plur|PronType=Art        33      det     _       _
32      15.00   15.00   NUM     _       NumForm=Digit|NumType=Card      33      nummod  _       _
33      horas   hora    NOUN    _       Gender=Fem|Number=Plur  29      conj    _       _
34-35   del     _       _       _       _       _       _       _       _
34      de      de      ADP     _       _       36      case    _       _
35      el      el      DET     _       Definite=Def|Gender=Masc|Number=Sing|PronType=Art       36      det     _       _
36      martes  martes  NOUN    _       _       29      nmod    _       SpaceAfter=No

I don't care too much which standard we follow, but it would be nicer for the models if horas had the same lemma in all contexts, so if it were up to me I'd follow the GSD model and lemmatize everything hora

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions