Fine-Tuning and Costs—Building on Top of a Base Model

Imagine you’re a news organization with a large archive of articles. Now, you want to specialize your newsroom to cover virtual training applications. Rather than rewrite the entire archive, you train your staff to handle this new topic. This process is similar to fine-tuning an LLM, which helps specialize a pre-trained model without starting from scratch.

In this post, we’ll walk through how fine-tuning works, provide cost calculations, and introduce the concept of content pairs for training.

1. What Is Fine-Tuning?

Fine-tuning adjusts a pre-trained LLM (e.g., GPT-3.5) by training it on domain-specific data. It modifies the model’s internal parameters to improve performance on tasks relevant to your business.

Fine-tuning uses content pairs, which consist of:

Prompt: The input or instruction you want the model to respond to.
Completion: The output or content you want the model to generate based on the prompt.

2. Example: Fine-Tuning with an Op-Ed Article

Suppose you want to fine-tune a model to generate insightful articles on virtual training applications. Here’s how you might structure your training data:

Prompt and Completion Pair

Prompt:
“Write an 1800-word op-ed about the rise of virtual training applications, focusing on how they are transforming industries like education, healthcare, and corporate training.”

Completion:
“In recent years, virtual training applications have become essential tools for organizations across various industries. As technological advancements accelerate, these platforms are bridging the gap between remote and in-person learning environments…”

This prompt-completion pair serves as one training example, and you may provide multiple articles for fine-tuning.

3. Tokenization and Training Data Size

In AI models, both prompts and completions are broken down into tokens. A token can be as short as one character or as long as a word, depending on the text structure. For example, “virtual training” might be two tokens.

Text	Approx Tokens
“Write an op-ed on virtual training”	8 tokens
“Virtual training is transforming industries.”	6 tokens

Since a full 1800-word article typically includes 2000 tokens (including the prompt), you’ll need to account for the token count when calculating training costs.

4. Cost Calculation for Fine-Tuning

Let’s assume you’re using OpenAI’s GPT-3.5 model for fine-tuning. OpenAI charges $0.008 per token for training. Here’s a cost breakdown:

Number of Articles: 100 op-eds on virtual training
Tokens per Article: ~2000 tokens
Total Tokens: 100 articles x 2000 tokens = 200,000 tokens

Fine-Tuning Cost:

200,000 tokens x $0.008 per token = $1,600

Training costs can add up quickly, so it’s important to be selective about your training data.

5. Inference Costs

Generating responses after fine-tuning also incurs costs. Suppose you generate an 1800-token response:

Usage Cost: $0.002 per 1000 tokens x 1800 tokens = $0.0036 per query.

While this seems low, costs can accumulate with frequent queries.

6. OpenAI vs. Open-Source Options (LLama)

OpenAI provides a convenient, service-based fine-tuning option with minimal upfront investment. However, if you’re using open-source models like LLama, you may face different trade-offs:

OpenAI

Pros: Easy to set up, scalable infrastructure, fast deployment.
Cons: Costs scale with usage and training data size.

LLama (Open Source)

Pros: No direct service costs—full control over training and deployment.
Cons: You need your own hardware (e.g., GPUs) or access to a cloud provider, which can require significant expertise and infrastructure.

Organizations must weigh whether they prefer the convenience of OpenAI or the cost control of managing their own infrastructure with open-source solutions.

7. Managing Costs and Risks

To prevent costs from spiraling, organizations should:

Limit training data to high-value examples.
Use data tagging and security standards to ensure sensitive data isn’t embedded unintentionally.
Consider hybrid approaches where both fine-tuned and real-time retrieval systems work together.

For more details on tokenization, check out our post:
Next Up: What are tokens and why do they matter

Conclusion

Fine-tuning can provide powerful, specialized AI capabilities but comes with significant costs. By understanding content pairs, tokenization, and pricing, businesses can optimize their training pipelines. In the next post, we’ll explore how data tagging and security standards like Global Information Security Standards (GISS) can safeguard AI pipelines from risks related to sensitive data exposure.