How far can we get with one GPU in 100 hours?

How far can we get with one GPU in 100 hours? CoAStaL at MultiIndicMT Shared Task

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Documents

Fulltext
Final published version, 327 KB, PDF document

Rahul Aralikatte
Héctor Ricardo Murrieta Bello
Hershcovich, Daniel
Marcel Bollmann
Søgaard, Anders

This work shows that competitive translation results can be obtained in a constrained setting by incorporating the latest advances in memory and compute optimization. We train and evaluate large multilingual translation models using a single GPU for a maximum of 100 hours and get within 4-5 BLEU points of the top submission on the leaderboard. We also benchmark standard baselines on the PMI corpus and re-discover well-known shortcomings of translation systems and metrics.

Original language	English
Title of host publication	Proceedings of the 8th Workshop on Asian Translation (WAT2021)
Publisher	Association for Computational Linguistics
Publication date	2021
Pages	205-211
DOIs	https://doi.org/10.18653/v1/2021.wat-1.24
Publication status	Published - 2021
Event	8th Workshop on Asian Translation (WAT2021) - Online Duration: 5 Aug 2021 → 6 Aug 2021

Conference

Conference	8th Workshop on Asian Translation (WAT2021)
By	Online
Periode	05/08/2021 → 06/08/2021

ID: 300450019

Department of Computer Science

How far can we get with one GPU in 100 hours? CoAStaL at MultiIndicMT Shared Task

Documents

Conference