LeXFiles and LegalLAMA

LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

LeXFiles and LegalLAMA : Facilitating English Multinational Legal Language Model Development. / Chalkidis, Ilias; Garneau, Nicolas; Søgaard, Anders; Goantă, Cătălină; Katz, Daniel Martin.

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics (ACL), 2023. p. 15513-15535.

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Chalkidis, I, Garneau, N, Søgaard, A, Goantă, C & Katz, DM 2023, LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development. in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics (ACL), pp. 15513-15535, 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023, Toronto, Canada, 09/07/2023. https://doi.org/10.18653/v1/2023.acl-long.865

APA

Chalkidis, I., Garneau, N., Søgaard, A., Goantă, C., & Katz, D. M. (2023). LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 15513-15535). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-long.865

Vancouver

Chalkidis I, Garneau N, Søgaard A, Goantă C, Katz DM. LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics (ACL). 2023. p. 15513-15535 https://doi.org/10.18653/v1/2023.acl-long.865

Author

Chalkidis, Ilias ; Garneau, Nicolas ; Søgaard, Anders ; Goantă, Cătălină ; Katz, Daniel Martin. / LeXFiles and LegalLAMA : Facilitating English Multinational Legal Language Model Development. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics (ACL), 2023. pp. 15513-15535

Bibtex

@inproceedings{b78dbf9600e7480494ee6081081a3a3b,

title = "LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development",

abstract = "In this work, we conduct a detailed analysis on the performance of legal-oriented pre-trained language models (PLMs). We examine the interplay between their original objective, acquired knowledge, and legal language understanding capacities which we define as the upstream, probing, and downstream performance, respectively. We consider not only the models' size but also the pre-training corpora used as important dimensions in our study. To this end, we release a multinational English legal corpus (LeXFiles) and a legal knowledge probing benchmark (LegalLAMA) to facilitate training and detailed analysis of legal-oriented PLMs. We release two new legal PLMs trained on LeXFiles and evaluate them alongside others on LegalLAMA and LexGLUE. We find that probing performance strongly correlates with upstream performance in related legal topics. On the other hand, downstream performance is mainly driven by the model's size and prior legal knowledge which can be estimated by upstream and probing performance. Based on these findings, we can conclude that both dimensions are important for those seeking the development of domain-specific PLMs.",

author = "Ilias Chalkidis and Nicolas Garneau and Anders S{\o}gaard and C{\u a}t{\u a}lin{\u a} Goant{\u a} and Katz, {Daniel Martin}",

note = "Publisher Copyright: {\textcopyright} 2023 Association for Computational Linguistics.; 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 ; Conference date: 09-07-2023 Through 14-07-2023",

year = "2023",

doi = "10.18653/v1/2023.acl-long.865",

language = "English",

pages = "15513--15535",

booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",

publisher = "Association for Computational Linguistics (ACL)",

address = "United States",

}

RIS

TY - GEN

T1 - LeXFiles and LegalLAMA

T2 - 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023

AU - Chalkidis, Ilias

AU - Garneau, Nicolas

AU - Søgaard, Anders

AU - Goantă, Cătălină

AU - Katz, Daniel Martin

PY - 2023

Y1 - 2023

N2 - In this work, we conduct a detailed analysis on the performance of legal-oriented pre-trained language models (PLMs). We examine the interplay between their original objective, acquired knowledge, and legal language understanding capacities which we define as the upstream, probing, and downstream performance, respectively. We consider not only the models' size but also the pre-training corpora used as important dimensions in our study. To this end, we release a multinational English legal corpus (LeXFiles) and a legal knowledge probing benchmark (LegalLAMA) to facilitate training and detailed analysis of legal-oriented PLMs. We release two new legal PLMs trained on LeXFiles and evaluate them alongside others on LegalLAMA and LexGLUE. We find that probing performance strongly correlates with upstream performance in related legal topics. On the other hand, downstream performance is mainly driven by the model's size and prior legal knowledge which can be estimated by upstream and probing performance. Based on these findings, we can conclude that both dimensions are important for those seeking the development of domain-specific PLMs.

AB - In this work, we conduct a detailed analysis on the performance of legal-oriented pre-trained language models (PLMs). We examine the interplay between their original objective, acquired knowledge, and legal language understanding capacities which we define as the upstream, probing, and downstream performance, respectively. We consider not only the models' size but also the pre-training corpora used as important dimensions in our study. To this end, we release a multinational English legal corpus (LeXFiles) and a legal knowledge probing benchmark (LegalLAMA) to facilitate training and detailed analysis of legal-oriented PLMs. We release two new legal PLMs trained on LeXFiles and evaluate them alongside others on LegalLAMA and LexGLUE. We find that probing performance strongly correlates with upstream performance in related legal topics. On the other hand, downstream performance is mainly driven by the model's size and prior legal knowledge which can be estimated by upstream and probing performance. Based on these findings, we can conclude that both dimensions are important for those seeking the development of domain-specific PLMs.

U2 - 10.18653/v1/2023.acl-long.865

DO - 10.18653/v1/2023.acl-long.865

M3 - Article in proceedings

AN - SCOPUS:85173828437

SP - 15513

EP - 15535

BT - Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

PB - Association for Computational Linguistics (ACL)

Y2 - 9 July 2023 through 14 July 2023

ER -

ID: 372528222

Department of Computer Science