Skip to content

Try finetuning a language model for consultation topic classification #22

Description

@vitawasalreadytaken

In the past we did some very rough experiments with training LegalLMs from https://huggingface.co/collections/joelniklaus/legallms-65303ccfc2f20ed637f17cb6, mostly https://huggingface.co/joelniklaus/legal-swiss-roberta-large. This (or another LM) could be a better approach than our simple pipeline of embed -> PCA -> LogisticRegression. The input for such a model could simply be a concatenation of consultation title, consultation description, organisation name, and the beginnings of some documents such as DRAFT and REPORT.

A GPU is required for finetuning these models.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions