In the past we did some very rough experiments with training LegalLMs from https://huggingface.co/collections/joelniklaus/legallms-65303ccfc2f20ed637f17cb6, mostly https://huggingface.co/joelniklaus/legal-swiss-roberta-large. This (or another LM) could be a better approach than our simple pipeline of embed -> PCA -> LogisticRegression. The input for such a model could simply be a concatenation of consultation title, consultation description, organisation name, and the beginnings of some documents such as DRAFT and REPORT.
A GPU is required for finetuning these models.
In the past we did some very rough experiments with training LegalLMs from https://huggingface.co/collections/joelniklaus/legallms-65303ccfc2f20ed637f17cb6, mostly https://huggingface.co/joelniklaus/legal-swiss-roberta-large. This (or another LM) could be a better approach than our simple pipeline of
embed -> PCA -> LogisticRegression. The input for such a model could simply be a concatenation of consultation title, consultation description, organisation name, and the beginnings of some documents such as DRAFT and REPORT.A GPU is required for finetuning these models.