Blogs

How does LLM fine-tuning work?

Helen Gu

11 Jun 2024
5 min read

LLM fine-tuning is becoming a critical capability. As organizations add Large Language Models (LLMs) to their operational systems, monitoring both model performance and model accuracy is becoming increasingly important. This post (which covers LLM fine-tuning) is the first in a series that looks at the challenges and techniques for deploying and managing LLMs.

1. Introduction – How Does LLM fine-tuning work?

Although large language models (LLMs) have powerful natural language capabilities, they can be costly to train. Nevertheless, technologies such as OLLAMA (Omni-Layer Learning Language Acquisition Model) are developed for efficient deployment and tuning of LLMs. For example, they allow the data scientist to efficiently fine-tune select areas of underlying neural network models with new training data.

In this guide, we would like to share how to fine-tune and improve your existing base LLM models using new training data. Generally speaking, fine-tuning is broadly composed of the following steps:

Collect new training data
Generate an adapter patch
Patch the existing base model with the adapter patch

2. Collect new training data

We want to fine-tune a base model by learning from some natural language corpus such as a chat conversation excerpt. First, you need to get your excerpt training data into an appropriate format.

In the upcoming demo in this guide, we obtain the chat content from the popular demo guanaco dataset. Nevertheless, we can include a custom chat training dataset using a variety of formats such as JSON (JavaScript Object Notation).

3. Generate an adapter patch

With the training data, we can now create an adapter patch model. The adapter patch allows us to efficiently train LLMs by dealing with fewer parameters, saving computational resources, and promoting task-specific customizations. Obtaining the adapter patch involves the following main steps. We will contextualize the steps with a demo setup afterwards.

Load the training data
1. This step loads the training data (to be used in training – Step 4).
Load the base model
1. This step loads the base model and extracts the tokenizer for the model (to be used in training – Step 4)
2. The tokenizer conveys how to section text into ‘token’ units for natural language processing.
Load parameters
1. This step includes preparing PEFT (Parameter-Efficient Fine-Tuning) parameters with the LoRA (Low-Rank Adaptation of Large Language Models) technique, and other training parameters (to be used in training – Step 4)
Run training with the training data, base model, and parameters
1. This step actually calls the train function of your LLM library to perform the training
Save the new model and tokenizer
1. This step saves the model so it can later be used for inference
Convert the adapter patch to the appropriate GGML format with OLLAMA
1. This step finally turns the adapter model file to the GGML formatted file suited for the next merge step

Let us try a demo setup via JupyterLab as follows [based on the datacamp link]:

There is a demo notebook “demo.ipynb” in the current directory [our own generic demo link if provided]

The notebook cell lines corresponding to the main steps are as follows:

Load the training data
1. You may modify the new_model variable to your model name preference

2. Load the base model and tokenizer

3. Load parameters

4. Run training with the training data, base model, and the parameters

5. Save the new model and tokenizer with new model variable

After running, the tuned adapter model is saved as a binary file under the specified model directory (i.e. llama-2-7b-chat-custom/adapter-model.bin).

For step 6, use the OLLAMA library to convert the adapter model file to the GGML file.

This can be encapsulated in a script as follows:

At the end of these steps, we generate the GGML-formatted adapter model file (e.g. “ggml-adapter-model.bin”) that we will use to finally tune the base model.

4. Patch the existing base model with the adapter patch

Given the adapter patch, we can now effectively fine-tune the base mode in two steps:

Here, call the merge library function of the base model to perform a merge between the loaded base model and the adapter patch.
- The transformer library from HuggingFace uses a merge_and_unload function, for instance.
Thereafter, save the resulting model and tokenizer to your work directory. The model will be output as the safetensor serialization file while the tokenizer will be output as json.

These steps can be encapsulated in a script.

The tuned model can now be used for inference to support your AI tasks.

5. Conclusion

In conclusion, we discussed how to fine-tune LLM models with training data from a natural language corpus such as a conversation excerpt.

In particular, we learned to prepare a custom corpus into structured training data (i.e. JSON formatted data). We followed a set of steps to efficiently train a baseline model with the training data and obtain an adapter patch. Finally, we merged the baseline model with the generated adapter patch.

Congratulations, you now understand how LLM fine-tuning works!

Contents

Helen Gu

Published: 11 Jun 2024
5 min read

Blogs

Making Sense of LLMs, RAG, Fine-Tuning, and Evaluation: How InsightFinder AI Delivers Observability for AI Systems

As large language models (LLMs) continue to revolutionize how we interact with data and…

Blogs

What is AIOps vs DevOps, and how and why do they work well together?

AIOps vs DevOps DevOps and AIOps are both popular operational terms of the last…

Blogs

How Do Open Source Monitoring Tools Integrate with AI Platforms for Predictive Scale?

In today’s complex, distributed IT environments, operations teams face the growing challenge of ensuring…

See how InsightFinder helps your team deliver reliable services across every layer of the stack

Take InsightFinder AI for a no-obligation test drive. We’ll provide you with a detailed report on your outages to uncover what could have been prevented.

AI Reliability

IT Reliability

ARI

ARI Mobile

Unified Intelligence Engine - UIE

Integrations

Release Notes

How does LLM fine-tuning work?

1. Introduction – How Does LLM fine-tuning work?

2. Collect new training data

3. Generate an adapter patch

4. Patch the existing base model with the adapter patch

5. Conclusion

Related Resources

Making Sense of LLMs, RAG, Fine-Tuning, and Evaluation: How InsightFinder AI Delivers Observability for AI Systems

What is AIOps vs DevOps, and how and why do they work well together?

How Do Open Source Monitoring Tools Integrate with AI Platforms for Predictive Scale?

See how InsightFinder helps your team deliver reliable services across every layer of the stack