GreenNLP

  1. Home
  2. Reuse and sustainability

Reuse and sustainability

Last updated: January 08, 2024

Documentation

  • Working with large language models on supercomputers
  • Measuring energy usage of GPU jobs

Open source software

GitHub workspaces

Project partners in GreenNLP publish there software packages and tools on the following GitHub workspaces:

  • Helsinki-NLP
  • Turku-NLP
  • CSC- IT Center for Science

Language modelling

  • TurkuNLP Megatron-DeepSpeed fork
  • LLM scaling on LUMI

Machine translation

  • OpusTools for accessing parallel data in OPUS
  • OpusFilter for parallel data curation
  • OPUS-MT for deploying pre-trained MT models
  • OPUS-MT-train for training NMT models
  • OPUS-MT dashboard
  • MAMMOTH for modular multilingual NMT

Tools for high-performance computing

  • Machine learning scripts
  • GPU energy usage counter for AMD/ROCm

Open data sets

Monolingual data

  • The HPLT data sets

Parallel data

  • OPUS - the open parallel corpus
  • The Tatoeba translation challenge

Benchmarks

  • MT testsets

Open models

Language models

  • TurkuNLP models at huggingface
  • Poro 34B from LumiOpen

Translation models

  • OPUS-MT models (Marian-NMT-based)
  • OPUS-MT models (converted to pytorch/trasformers)
  • Marian-NMT models from the Tatoeba translation challenge
  • Documentation
  • Open source software
    • GitHub workspaces
    • Language modelling
    • Machine translation
    • Tools for high-performance computing
  • Open data sets
    • Monolingual data
    • Parallel data
    • Benchmarks
  • Open models
    • Language models
    • Translation models
Lorem Ipsum © 2025 GreenNLP
Theme PrettyDocs designed with by Xiaoying Riley for developers • Refactored by Guillermo Calvo to be used with Jekyll