Illustration of abstract stream. Artificial intelligence. Big data, technology, AI, data transfer, data flow, large language model, generative AI, binary concept

Training and Improving Quality for One of the World’s Leading AI Tools

Location: California, USA
Sector: General content to train AI
Project: Translating and proofreading online news and magazine articles
Language(s): German > US English

 

Client

 

Our client is a multinational tech company providing services in search engine technology, online advertising, e-commerce and artificial intelligence. They recently launched a new large language model (LLM) that is designed to generate content, answer questions, caption images, translate audio and much more. It’s already being used by approximately 275 million people around the world. Providing high-quality training data for a large language model like our client’s is essential to help it to:

 

  • Learn language patterns from a large dataset
  • Acquire knowledge across a range of subjects
  • Understand how to perform a variety of tasks
  • Recognise cultural references and idioms

 

Challenge / Solution

 

Our task was to translate and proofread around 20,000 words of online content from German into US English within a particularly tight timeframe. Quality was crucial in this pilot project for our long-standing multinational client. We needed to deliver translations that were not only accurate and faithful to the source, but that also sounded natural in the target language, while rendering any cultural references, idioms or humour in a way that would resonate appropriately with native US English speakers.

 

We were provided with around 30 pages of reference material, quality guidelines and specific instructions to ensure the success of the project, as well as a video tutorial outlining the key requirements. Our client emphasised that all content must be translated and proofread by human linguists, as the results would be used to train their AI tool and they needed to be free of any translation software bias.

 

A small, trusted team of our most highly qualified translators and proofreaders worked together on this project. We made sure that we followed the instructions and quality guidelines to a tee. Thorough and targeted research was required, for example some of the articles featured the names of German books and podcasts while others referenced social media posts. When translating quotes, we followed the original English versions that had been posted by established news outlets online while adhering to the client’s stylistic instructions.

 

We also made sure to strike the right balance between sticking closely to the source content, without any additions or omissions, and translating freely and creatively to produce natural-sounding translations; this approach was crucial to ensuring that the translations would be effective in training our client’s LLM.

 

We completed the project during an exceptionally busy run-up to Christmas and made sure we were available for linguistic quality assurance (LQA) checks carried out by our client. Our client’s reviewer confirmed that our translations were exceptional in terms of quality.

 

Impact

 

Training data is essential for LLMs such as the tool created by our client. Their AI platform is already being used by millions of people around the world to help them with tasks ranging from composing work emails to writing poetry! Companies that develop their own LLMs need specialised training data to:

 

  • ensure quality
  • differentiate their product from others on the market
  • address specific use cases
  • potentially reduce legal risks compared to using only public data sources

Our client was impressed by our team’s exceptionally written and thoroughly researched translations, which allowed them to further refine their LLM in the following ways:

 

  • The texts could be used to teach the software language patterns relating to vocabulary, grammar and sentence structure
  • The translations we provided covered topics ranging from cooking and music to travel and real estate, meaning that our client’s LLM could acquire a vast amount of information on a range of subjects
  • Our fully localised and culturally adapted translations included idioms and other cultural references that the software can use to better understand context and nuance
  • Our factually accurate and thoroughly researched translations provided our client’s LLM with diverse examples on a range of subjects to allow it to make efficient generalisations and respond appropriately to new inputs
  • Thanks to our team’s dedicated work on this project, our client can feel confident in the quality of the training data being fed to its LLM, which will ultimately make it stand out among its competitors

Our team was delighted to take on this project and found the broad range of content fascinating to translate – this was a thoroughly enjoyable project to work on, covering a wide range of subjects, and we felt incredibly proud to be helping our client further refine its world-leading AI tool by providing our usual high-quality translations.

 

Get in touch with us today for a consultation on your AI language strategy: info@ecls-translations.com.

Tags: No tags

Comments are closed.