Join our upcoming webinar “Deriving Business Value from LLMs and RAGs.”
Register now
llm fine-tuning superannotate databricks

Increasingly, enterprises building systems using large language models (LLMs) are exploring fine-tuning their LLMs. They usually do this when encountering one or more of the following problems.

  • The LLM needs domain-specific knowledge and help understanding jargon.
  • It’s hard to control the model output using just prompts.
  • The LLM has a high inference cost. This can be due to its large size or the fact that it requires many tokens for context in the prompt.

LLM fine-tuning takes pre-trained LLMs and trains them on smaller, specific datasets. Fine-tuning refines their capabilities and improves performance in a particular task or domain. You can fine-tune the language model with either continued pre-training or instruction fine-tuning. Continued pre-training is when the model is trained on more unstructured domain-specific data. In instruction fine-tuning, on the other hand, the model is trained on a set of carefully curated prompt-completion pairs.

In this blog post, we'll explore the benefits of LLM fine-tuning, the challenges companies face in doing so, and how SuperAnnotate's collaboration with Databricks offers a solution.

Fine-tuning benefits

Fine-tuning provides several benefits to organizations:

  • It allows the model to gain in-depth proprietary knowledge and understanding of jargon.
  • The model learns to mimic the style of the examples. This gives you more control of the behavior compared to prompt engineering.
  • It can reduce the inference costs of the model in two ways. It reduces the need for long prompts with context. It can also allow a smaller model to do the task that a larger model did before.

Fine-tuning challenges for enterprises

The increased need for fine-tuning has led several cloud data platforms to build dedicated tooling. Databricks, which works with Mosaic ML, is one of the companies that provides LLM fine-tuning tools on their data platform. (You can hear more about fine-tuning and pre-training large language models on Databricks here. These tools make fine-tuning easier, but two key things still prevent most enterprises from using them.

The first challenge is that fine-tuning tools assume that data in the required formats is ready to be trained on. Unfortunately, most companies do not have data in the required formats. Instruction fine-tuning teaches the model to mimic the style and tone of the training dataset. It is, therefore, essential that the data used for training strictly follows the intended behavior of the final model. Often, this requires creating new data from scratch using staff with internal know-how. In cases like customer support questions and answers, data might already exist. Still, this might contain data the model should not learn, such as promising refunds or accessing personal information. Therefore, the data needs to be cleaned and filtered before training.

The second of these blockers is model evaluation and red-teaming. (Red-teaming is a process where you try to find prompts that elicit unwanted responses, thereby identifying areas of improvement). Many platforms for fine-tuning AI models offer metrics and comparisons of prompts before and after fine-tuning for evaluation. However, metrics alone don't capture everything, and comparisons only available to data engineers aren't enough for making decisions about using the models across a whole company. The importance of thorough testing is highlighted by notable failures, like when an Air Canada chatbot mistakenly promised a refund that a court confirmed, and a chatbot for a Chevrolet dealership advised customers to consider rival brands.

Databricks x SuperAnnotate integration

Databricks and SuperAnnotate have worked together to make creating datasets and evaluating models easier by integrating our platforms. SuperAnnotate provides a powerful platform for creating, managing, and refining datasets with tools that support teamwork, whether with colleagues or external partners, to prepare datasets optimized for training models. The platform also enables direct integration with LLMs for detailed evaluations. It supports red-teaming to catch and correct any problematic behavior in models before they go live.

For more details on how to set up and use the integration, read our documentation or watch the video below.

Recommended for you

Stay connected

Subscribe to receive new blog posts and latest discoveries in the industry from SuperAnnotate