GPT-4 can translate text. In some cases, it translates better than dedicated translation APIs, especially for context-heavy content where understanding the full paragraph matters. But using OpenAI's API as a translation engine is different from using a dedicated translation service, and the differences affect cost, speed, reliability, and integration complexity.
This guide covers the practical aspects of using GPT-4 for translation: how to prompt it, what it costs compared to alternatives, where it excels, and where dedicated translation APIs are a better fit.
How to use GPT-4 for translation
OpenAI doesn't have a dedicated translation endpoint. You send a chat completion request with a translation instruction as the system prompt and the source text as the user message.
Basic prompt
import openai
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "system",
"content": "You are a translator. Translate the following text from English to German. Return only the translation, no explanations."
},
{
"role": "user",
"content": "The server returned an error. Please try again later."
}
],
temperature=0
)
print(response.choices[0].message.content)
# "Der Server hat einen Fehler zurückgegeben. Bitte versuchen Sie es später erneut."
Setting temperature to 0 gives you the most deterministic output. For creative content like marketing copy, a slightly higher temperature (0.3-0.5) can produce more natural-sounding translations.
Better prompt with constraints
The basic prompt works for simple sentences. For production use, you need more constraints:
{
"role": "system",
"content": "You are a professional translator specializing in software localization. Translate from English to German. Rules: 1) Preserve all HTML tags and attributes exactly. 2) Do not translate text inside code blocks or <code> tags. 3) Preserve all placeholder patterns like {variable} or %s. 4) Use formal register (Sie, not du). 5) Return only the translated text."
}
Without explicit rules about HTML tags, placeholders, and code blocks, GPT-4 sometimes translates things it shouldn't. The more constraints you add, the more reliable the output. But you're essentially building a translation engine from prompts, which is work that dedicated translation APIs have already done.
Handling batch translation
Dedicated translation APIs let you send multiple strings in one request. With OpenAI, you can do something similar by formatting strings as a numbered list:
{
"role": "user",
"content": "Translate each line separately, maintaining the numbered format:\n1. Save changes\n2. Delete account\n3. Settings\n4. Log out"
}
This works but introduces parsing complexity. You need to split the response back into individual strings and handle cases where GPT-4 merges or reorders lines (rare with temperature=0, but it happens).
Cost comparison
This is where GPT-4 for translation gets expensive. OpenAI charges per token (roughly 4 characters per token for English, fewer for other languages). Translation requires both input and output tokens.
Current GPT-4 pricing (as of early 2026):
- GPT-4: $30 per 1M input tokens, $60 per 1M output tokens
- GPT-4 Turbo: $10 per 1M input tokens, $30 per 1M output tokens
- GPT-4o: $2.50 per 1M input tokens, $10 per 1M output tokens
- GPT-4o-mini: $0.15 per 1M input tokens, $0.60 per 1M output tokens
For translation, output length roughly equals input length (with some language-dependent variation). Converting tokens to characters and including the system prompt overhead:
| Model | Cost per 1M characters | Quality |
|---|---|---|
| GPT-4 | ~$22-35 | Excellent |
| GPT-4 Turbo | ~$10-15 | Excellent |
| GPT-4o | ~$3-5 | Very good |
| GPT-4o-mini | ~$0.20-0.40 | Good |
| Google Translate | $20 | Good |
| DeepL Pro | $25 | Very good |
| Langbly | $1.99-3.80 | Very good |
GPT-4o-mini looks cheap on paper, but the quality gap compared to GPT-4 or dedicated translation APIs is noticeable for complex content. GPT-4 itself is the most expensive option by far. GPT-4o hits a reasonable middle ground but is still 1.5-2.5x more expensive than Langbly.
The system prompt adds overhead on every request. A 200-token system prompt on a 100-token source text means you're paying for 300 input tokens, not 100. For short strings (UI labels, button text), the prompt overhead can double your effective cost.
Latency
GPT-4 is significantly slower than dedicated translation APIs. It generates tokens sequentially, so translation speed depends on output length.
Typical latencies for translating a 200-word paragraph:
- GPT-4: 3-8 seconds
- GPT-4 Turbo / GPT-4o: 1-3 seconds
- Google Translate: 100-300 milliseconds
- DeepL: 200-500 milliseconds
- Langbly: 200-800 milliseconds
For batch translation of documentation or pre-translation workflows, latency doesn't matter much. You're translating in the background anyway. For real-time translation (user types something, sees translation immediately), GPT-4 is too slow. Dedicated APIs respond 10-40x faster.
Where GPT-4 translation is better
Despite the cost and speed disadvantages, there are scenarios where GPT-4 produces better translations than dedicated APIs:
Creative and marketing content
Marketing headlines, taglines, and creative copy require understanding intent and cultural context. "Think Different" needs a culturally resonant translation, not a literal one. GPT-4 handles this better because it understands the persuasive intent behind the words.
Long-form content with document-level context
GPT-4 can process an entire document and maintain context across paragraphs. Traditional NMT engines translate sentence by sentence, which leads to inconsistent pronoun references and terminology across a document. GPT-4 can (usually) keep "she" referring to the same person throughout a 2,000-word article.
Content requiring cultural adaptation
When content needs to be adapted rather than translated (changing a sports reference from baseball to cricket for an Indian audience, for example), GPT-4 can make those adaptations if instructed. Dedicated translation APIs do literal translation and won't adapt cultural references.
Low-resource language pairs
For uncommon language pairs where dedicated translation engines have less training data, GPT-4 sometimes performs better because its training data includes a broader range of content in more languages.
Where dedicated APIs are better
High volume
If you're translating millions of characters monthly, the cost difference between GPT-4 ($3-35/M chars depending on model) and a dedicated translation API ($1.99-20/M chars) adds up fast. At 100M characters per month, GPT-4o costs $300-500. Langbly costs $199.
Structured content
Dedicated translation APIs are built to handle HTML, XML, JSON, and other structured formats. They know not to translate attribute values, not to break tag nesting, and not to mangle placeholder strings. GPT-4 can be instructed to do this, but it occasionally fails, especially with complex nested markup.
Consistency
Send the same sentence to Google Translate twice, you get the same result. Send it to GPT-4 twice (even with temperature=0), you might get slightly different translations. For software localization where identical strings must translate identically, this non-determinism is a problem.
Integration simplicity
Dedicated translation APIs have a simple request-response format: send text and target language, receive translation. With OpenAI, you're building prompt templates, parsing responses, handling token limits, and managing the prompt engineering yourself. It's more engineering work for a worse developer experience.
Rate limits and reliability
OpenAI has rate limits that vary by model and account tier. During peak times, you might hit rate limits or experience increased latency. Dedicated translation APIs have more predictable performance because they're designed for high-throughput translation workloads.
Practical recommendations
Use GPT-4 for translation when:
- You need creative/marketing copy translation with cultural adaptation
- Volume is low (under 1M characters per month)
- Latency doesn't matter (batch processing, not real-time)
- You need document-level context for long-form content
- You're already using OpenAI and want to minimize vendor dependencies
Use a dedicated translation API when:
- Volume is medium to high (over 1M characters per month)
- You need fast response times (real-time or near-real-time)
- You're translating structured content (HTML, JSON, XML)
- Consistency matters (same input should produce same output)
- You want a simpler integration with less maintenance
For most software localization projects, a dedicated translation API is the right choice. The integration is simpler, the cost is lower, the speed is faster, and the output quality is comparable for technical content. If you occasionally need GPT-4 quality for marketing copy, use it selectively for that content type while using a translation API for the bulk of your strings.
Langbly provides context-aware translation quality through a simple REST API at $1.99-$3.80 per million characters. No prompt engineering, no token math, no parsing responses. It's Google Translate v2 compatible, so existing integrations work without changes.