OpenAI GPT-4 for Translation: Costs & Limits

GPT-4 can translate text. In some cases, it translates better than dedicated translation APIs, especially for context-heavy content where understanding the full paragraph matters. But using OpenAI's API as a translation engine is different from using a dedicated translation service, and the differences affect cost, speed, reliability, and integration complexity.

This guide covers the practical aspects of using GPT-4 for translation: how to prompt it, what it costs compared to alternatives, where it excels, and where dedicated translation APIs are a better fit.

How to use GPT-4 for translation

OpenAI doesn't have a dedicated translation endpoint. You send a chat completion request with a translation instruction as the system prompt and the source text as the user message.

Basic prompt

import openai

client = openai.OpenAI()

response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {
            "role": "system",
            "content": "You are a translator. Translate the following text from English to German. Return only the translation, no explanations."
        },
        {
            "role": "user",
            "content": "The server returned an error. Please try again later."
        }
    ],
    temperature=0
)

print(response.choices[0].message.content)
# "Der Server hat einen Fehler zurückgegeben. Bitte versuchen Sie es später erneut."

Setting temperature to 0 gives you the most deterministic output. For creative content like marketing copy, a slightly higher temperature (0.3-0.5) can produce more natural-sounding translations.

Better prompt with constraints

The basic prompt works for simple sentences. For production use, you need more constraints:

{
    "role": "system",
    "content": "You are a professional translator specializing in software localization. Translate from English to German. Rules: 1) Preserve all HTML tags and attributes exactly. 2) Do not translate text inside code blocks or <code> tags. 3) Preserve all placeholder patterns like {variable} or %s. 4) Use formal register (Sie, not du). 5) Return only the translated text."
}

Without explicit rules about HTML tags, placeholders, and code blocks, GPT-4 sometimes translates things it shouldn't. The more constraints you add, the more reliable the output. But you're essentially building a translation engine from prompts, which is work that dedicated translation APIs have already done.

Handling batch translation

Dedicated translation APIs let you send multiple strings in one request. With OpenAI, you can do something similar by formatting strings as a numbered list:

{
    "role": "user",
    "content": "Translate each line separately, maintaining the numbered format:\n1. Save changes\n2. Delete account\n3. Settings\n4. Log out"
}

This works but introduces parsing complexity. You need to split the response back into individual strings and handle cases where GPT-4 merges or reorders lines (rare with temperature=0, but it happens).

Cost comparison

This is where GPT-4 for translation gets expensive. OpenAI charges per token (roughly 4 characters per token for English, fewer for other languages). Translation requires both input and output tokens.

Current GPT-4 pricing (as of early 2026):

GPT-4: $30 per 1M input tokens, $60 per 1M output tokens
GPT-4 Turbo: $10 per 1M input tokens, $30 per 1M output tokens
GPT-4o: $2.50 per 1M input tokens, $10 per 1M output tokens
GPT-4o-mini: $0.15 per 1M input tokens, $0.60 per 1M output tokens

For translation, output length roughly equals input length (with some language-dependent variation). Converting tokens to characters and including the system prompt overhead:

Model	Cost per 1M characters	Quality
GPT-4	~$22-35	Excellent
GPT-4 Turbo	~$10-15	Excellent
GPT-4o	~$3-5	Very good
GPT-4o-mini	~$0.20-0.40	Good
Google Translate	$20	Good
DeepL Pro	$25	Very good
Langbly	$5	Very good

GPT-4o-mini looks cheap on paper, but the quality gap compared to GPT-4 or dedicated translation APIs is noticeable for complex content. GPT-4 itself is the most expensive option by far. GPT-4o hits a reasonable middle ground but is comparable in cost to Langbly, which offers dedicated translation optimization.

The system prompt adds overhead on every request. A 200-token system prompt on a 100-token source text means you're paying for 300 input tokens, not 100. For short strings (UI labels, button text), the prompt overhead can double your effective cost.

Latency

GPT-4 is significantly slower than dedicated translation APIs. It generates tokens sequentially, so translation speed depends on output length.

Typical latencies for translating a 200-word paragraph:

GPT-4: 3-8 seconds
GPT-4 Turbo / GPT-4o: 1-3 seconds
Google Translate: 100-300 milliseconds
DeepL: 200-500 milliseconds
Langbly: 200-800 milliseconds

For batch translation of documentation or pre-translation workflows, latency doesn't matter much. You're translating in the background anyway. For real-time translation (user types something, sees translation immediately), GPT-4 is too slow. Dedicated APIs respond 10-40x faster.

Where GPT-4 translation is better

Despite the cost and speed disadvantages, there are scenarios where GPT-4 produces better translations than dedicated APIs:

Creative and marketing content

Marketing headlines, taglines, and creative copy require understanding intent and cultural context. "Think Different" needs a culturally resonant translation, not a literal one. GPT-4 handles this better because it understands the persuasive intent behind the words.

Long-form content with document-level context

GPT-4 can process an entire document and maintain context across paragraphs. Traditional NMT engines translate sentence by sentence, which leads to inconsistent pronoun references and terminology across a document. GPT-4 can (usually) keep "she" referring to the same person throughout a 2,000-word article.

Content requiring cultural adaptation

When content needs to be adapted rather than translated (changing a sports reference from baseball to cricket for an Indian audience, for example), GPT-4 can make those adaptations if instructed. Dedicated translation APIs do literal translation and won't adapt cultural references.

Low-resource language pairs

For uncommon language pairs where dedicated translation engines have less training data, GPT-4 sometimes performs better because its training data includes a broader range of content in more languages.

Where dedicated APIs are better

High volume

If you're translating millions of characters monthly, the cost difference between GPT-4 ($3-35/M chars depending on model) and a dedicated translation API ($5-20/M chars) adds up fast. At 100M characters per month, GPT-4o costs $300-500. Langbly costs $505.

Structured content

Dedicated translation APIs are built to handle HTML, XML, JSON, and other structured formats. They know not to translate attribute values, not to break tag nesting, and not to mangle placeholder strings. GPT-4 can be instructed to do this, but it occasionally fails, especially with complex nested markup.

Consistency

Send the same sentence to Google Translate twice, you get the same result. Send it to GPT-4 twice (even with temperature=0), you might get slightly different translations. For software localization where identical strings must translate identically, this non-determinism is a problem.

Integration simplicity

Dedicated translation APIs have a simple request-response format: send text and target language, receive translation. With OpenAI, you're building prompt templates, parsing responses, handling token limits, and managing the prompt engineering yourself. It's more engineering work for a worse developer experience.

Rate limits and reliability

OpenAI has rate limits that vary by model and account tier. During peak times, you might hit rate limits or experience increased latency. Dedicated translation APIs have more predictable performance because they're designed for high-throughput translation workloads.

Practical recommendations

Use GPT-4 for translation when:

You need creative/marketing copy translation with cultural adaptation
Volume is low (under 1M characters per month)
Latency doesn't matter (batch processing, not real-time)
You need document-level context for long-form content
You're already using OpenAI and want to minimize vendor dependencies

Use a dedicated translation API when:

Volume is medium to high (over 1M characters per month)
You need fast response times (real-time or near-real-time)
You're translating structured content (HTML, JSON, XML)
Consistency matters (same input should produce same output)
You want a simpler integration with less maintenance

For most software localization projects, a dedicated translation API is the right choice. The integration is simpler, the cost is lower, the speed is faster, and the output quality is comparable for technical content. If you occasionally need GPT-4 quality for marketing copy, use it selectively for that content type while using a translation API for the bulk of your strings.

Langbly provides context-aware translation quality through a simple REST API at $5 per million characters. No prompt engineering, no token math, no parsing responses. It's Google Translate v2 compatible, so existing integrations work without changes.

Using OpenAI for Translation: GPT-4 as a Translation API