Reducing the carbon footprint of natural language processing

The GreenNLP project is building resources for sustainable NLP

The recent dramatic advances in natural language processing (NLP) technology, such as neural machine translation (NMT) and large language models (LLM), are changing the way people work and interact with technology. These new NLP technologies have the potential to increase productivity and levels of automation in a wide variety of fields.

The downside of the new NLP technology is its enormous energy consumption. At a time when energy efficiency has become essential due to the climate crisis, the advances in NLP are vastly increasing the energy usage of the IT sector. The GreenNLP project addresses this issue by developing more environmentally sustainable ways of building and using NLP applications.

News from GreenNLP

New guide: working with LLMs on supercomputers

An important part of the GreenNLP project is to disseminate guides and best-practices on efficient training of large language models in supercomputers. The first version of our “Working with large language models on supercomputers”-guide has been published on CSC’s documentation site.

                  

13 September 2024

Course on training AI models on LUMI

LUMI User Support Team (LUST) together with CSC and DeiC arranges a two day workshop on how to use LUMI for training AI models. Participants will get to try out fine-tuning a language model on LUMI and scaling it up to multiple GPUs and multiple nodes.

                  

20 May 2024

Industry and Society Webinar: Sustainable AI Solutions

GreenNLP is present at the Industry and Society Webinar on Sustainable AI Solutions organized by FCAI on Tuesday, April 16, 2024 from 09:30 11:00.

                  

16 April 2024

Benchmarking large-scale LLM training on LUMI

CSC, a partner in the GreenNLP project, has evaluated the scalability of large language model (LLM) training on the LUMI supercomputer. The results indicate that there are no fundamental scaling bottlenecks even when training with thousands of GPUs.

                  

11 January 2024

First Call for papers: MOOMIN

The GreenNLP project is one of the organizers of the first edition of the MOOMIN workshop on Modular and Open Multilingual NLP, to be held at EACL 2024 on March 21 or 22, 2024.

                  

1 November 2023

Areas of research

Data curation

Reducing training costs through data curation and selection

Compact language models

Decreasing runtime costs with compact language models

Compact translation models

Decreasing runtime costs with compact translation models

Efficient computation

Reducing computation with efficient training and inference procedures

Modular NLP

Cost-efficient components with modular multilingual NLP

Reuse and sustainability

Documentation, packaging and distribution

Consortium partners