There are tons of LLMs out there. Some, such as OpenAI’s models are only available via API. Their models are good and fast, but can get costly to run, if you need to make lots of API calls or have to process large amounts of data.
Luckily, there are open-weight models that you can run on your own computer or supercomputer cluster. This can be a lot cheaper, but requires some expertise. With a little effort, you can optimize the performance of the models, saving you time, money and electricity!
This page contains advice for how to run LLMs efficiently locally and links to other useful recourses.
More to come…