How AI translation evolved and what’s next

Richard Davies | Oct 25, 2023 1:15:06 PM

Using new Generative AI like GPT to automate work is now increasingly popular due to the drive to improve efficiency. Automating translation is our area of expertise at Guildhawk.

Revolutionising Work with Humanity

Whether it’s industrialising translation of property leases to convert data into a new asset or creating multilingual scripts for Digital Humans, the possibilities are endless.

Clients often ask how we evolved from originally being a human translation business in 2001 to being developers of cutting-edge AI translation technologies. Today we have our own in-house developers, a vast data lake to train safe AI and a global network of qualified humans. Thus, you can see how Guildhawk has transformed.

To know why we evolved into a data driven translation business in 20 years, and what’s coming next, you must go back 300 years or so.

I’ve summarised all the big developments that happened in the world of Machine Translation that influenced where we are today and, importantly, how AI will evolve.

The Concept of Machine Translation

Machine Translation is the task of using software to translate a source text into a target language.

The goal is to produce an accurate translation that preserves meaning and ensures the output is culturally sensitive. Achieving accurate translations has long been a challenge in computational linguistics.

The concept of machine translation is not a new one. Philosophical discussions and experiments in this area have been ongoing since the 9th century. The Arabic cryptographer Al-Kindi developed a systematic process for translation using frequency analysis, probability, and statistics, principles still applied in modern machine translation (Wikipedia, n.d.).

The topic was revisited in the 17th century. Real progress began in the early 20th century, which saw the utilisation of term dictionaries and the management of simple grammatical structures between languages (Hutchins, 2014).

In the 1950s, the first computer-based, automatic machine translation systems were proposed, with Warren Weaver making notable contributions in 1949 (Locke & Booth, 1955).

This topic was widely researched in universities around the world, with significant success in many American institutions. Following these research efforts, IBM showcased the first computer software capable of automatic translation between two languages (Georgetown-IBM) specifically from Russian to English.

This sparked considerable public interest and opened up large investment opportunities. However, subsequent reports indicated that systems based solely on term dictionaries struggled to resolve semantic ambiguity (Hutchins, 2004).

A well-known example from Yehoshua Bar-Hillel illustrates this point: "Little John was looking for his toy box. Finally, he found it. The box was in the pen." (Bar-Hillel, 1960).

These issues have largely been addressed in recent years, thanks to the advent of statistical machine translation and the emergence of large language models.

History of Machine Translation

The evolution of Machine Translation

Statistical Machine Translation (SMT)

Statistical Machine Translation became ubiquitous in the mid-2000s after the release of Google Translate (Och, 2005). Since then, its quality has improved significantly each year. However, it still faces challenges such as semantic ambiguity, contextual awareness, and the intricacies of cultural sensitivities.

These limitations necessitate human post-editing of translations. This symbiotic relationship between machine and human post-editors will likely be the future of machine translation for the long term.

Throughout the early years of the Cold War, the United States and the Soviet Union, both investing heavily in warfare, space travel, and nuclear capabilities, turned their attention to the early stages of dictionary-based machine translation.

Their aim was to benefit from faster dissemination of scientific knowledge, bridge communication gaps, and reduce the reliance on human interpreters, who were difficult to train and costly (Hutchins & Somers, 1992).

During this time, machine translation was still in its early stages, relying on term dictionary look-ups and simple grammar rule matching, which resulted in literal translations (Slocum, 1985).

Innovation of Rule-based Translation (RBMT)

The heavy investment during this stage led to the innovation of rule-based translation, which utilised linguistic rules to translate from the source to the target language (Arnold et al., 1994).

These rules, designed by expert translators, could translate terms between languages while also considering grammatical structures, syntax, and simple semantic patterns (Handbook of Natural Language Processing, n.d.). The design of these rule-based systems was costly and required deep domain expertise.

This meant that as the number of languages increased, so did the cost, and it rose exponentially (Rule Based Systems, n.d.).

For the most part, these rule-based techniques produced good translations. However, they were brittle and struggled with idiomatic expressions, colloquialisms, and cultural nuances. Compared to human translations, the outputs often lacked fluency and were considered stiff (Arnold et al., 1994).

In the late 1980s, the era of Statistical Machine Translation (SMT) led to a shift in research direction away from the brittle Rule-Based Machine Translation (RBMT) systems (Hutchins & Somers, 1992).

Leading figures of this era were Peter Brown and Stephen Pietra, who advanced statistical models trained to infer the probability of a word or phrase in one language being the translation of another (Brown, Pietra, Pietra & Mercer, 1993).

The statistical methods showed significant performance improvements over rule-based systems, as they were trained on large bilingual datasets rather than human-defined rules (Koehn, 2010). However, Statistical Machine Translation had its drawbacks.

Although it produced statistically plausible translations, they were often not grammatically or contextually correct (Koehn, 2010). Additionally, there were limitations when the sentence length became much longer (Callison-Burch et al., 2012).

Neural Machine Translation (NMT)

Neural Machine Translation (NMT) has become the latest method of Machine Translation, leading to a significant improvement in translation performance over Statistical Machine Translation Techniques (SMT).

Neural Machine Translation employs Deep Neural Network Architectures pioneered by Geoffrey Hinton, Illya Sutskever, Yann LeCun, and Jürgen Schmidhuber (Hinton, Osindero & Teh, 2006; Sutskever, Vinyals & Le, 2014; Schmidhuber, 2015).

These architectures consist of networks of neurons that learn a statistical mapping from source to target data across a vast bilingual corpus, resulting in translations with greater fluency and contextual awareness (Bahdanau, Cho & Bengio, 2014).

Models like BERT and GPT, which stem from the Transformer architecture paper by Vaswani et al., are now foundational for modern Neural Machine Translation systems (Vaswani et al., 2017).

However, despite the considerable performance enhancements, Neural Machine Translation has its drawbacks. These colossal models are computationally demanding, requiring extensive resources for training.

They often generate generic and safe translations that may overlook nuances, and they can perpetuate biases found in large datasets. To address these challenges, the current approach often involves Human-in-the-Loop post-editing (Green, Chuang, Heer & Manning, 2014).

Real-time Translation Services

Machine Translation (MT) has seen monumental advances in recent years, enabling real-time translation services across various industries. Google pioneered this service with Google Translate (Wikipedia, 2022). However, Microsoft quickly incorporated real-time translation into their products, such as Skype and Teams, significantly improving international communication (Bojar et al., 2018).

Read more about translation services to unlock global opportunities and its benefits.

The availability of real-time translation has boosted intercontinental communication and reduced language barriers. The latest techniques suggest that machines can learn to translate between languages without explicitly relying on bilingual corpora, an approach known as Zero-Shot learning (Johnson et al., 2017).

Despite the enticing marketing, these models have been exposed to a much larger number of multi-lingual translation examples than those in a typical bilingual dataset since they are trained on vast amounts of internet data (Brown et al., 2020).

The potential to train models on the entire collective knowledge of the human species, will sure entice researchers to continue to expand datasets used for modelling.

The Importance of Post-editing

Post-editing is the process of reviewing and refining machine-translated content to ensure the resulting translation is accurate, considers wider contextual information, and is culturally appropriate for the target audience.

Despite significant progress in recent years due to transformer-based neural machine translation models (Vaswani et al., 2017) and large language models like OpenAI's GPT (Brown et al., 2020), humans are still needed to account for linguistic and semantic nuances, cultural context, and idiomatic expressions (Koehn, 2017). Machines currently struggle with these edge cases (Specia & Farzindar, 2010; Guerberof Arenas, 2009).

Recognising this market gap, many translation companies now offer post-editing services, leading to reduced translation costs and faster content delivery (Plitt & Masselot, 2010).

This symbiosis between humans and machines will become even more closely intertwined in the future.

The Future of Generative AI techniques

Within the next few years, the synergy between human translators and artificial intelligence agents will become more pronounced. Translation quality will improve significantly, and translation times will decrease substantially as past translation memories are used to enhance the decision-making abilities of the artificial intelligence agents.

This advancement will be driven by generative AI techniques, such as large language models like OpenAI's GPT. Real-time multi-modal translations – encompassing text, images, and audio – will become the norm.

Earphones that cancel out the original speech and allow the wearer to hear the individual speaking in their chosen language will emerge.

Videos will be translated into multiple languages within minutes, facilitating the broader distribution of content and knowledge across international boundaries.

The future of translation will be shaped by machines and humans collaborating to tackle challenges neither can address alone.

Videos will be translated into multiple languages within minutes, facilitating the broader distribution of content and knowledge across international boundaries.

Guildhawk 2101 AD

In conclusion, we can see that successful automation of work is best done in collaboration with expert humans. That is why our founder, Jurga Zilinskiene, who is a passionate coder has grown our in-house software development team as an integral arm of our linguistic and creative teams.

That has enabled us to build AI-powered translation software with amazing new features like Expert-in-the-Loop (EITL). This is true human and machine collaboration, AI that benefits humanity.

When Guildhawk celebrates its centenary in 2101, Jurga hopes people will reflect on our evolution and see us as the place where humanity reigns.


  • Arnold, D., Balkan, L., Meijer, S., Humphreys, R. & Sadler, L., 1994. Machine Translation: An Introductory Guide. [pdf] Available here: [Accessed 23 October 2023].
  • Bahdanau, D., Cho, K. & Bengio, Y., 2014. Neural Machine Translation by Jointly Learning to Align and Translate. [online] Available at:
  • Bar-Hillel, Y., 1960. 'The present status of automatic translation of languages'. In: Advances in Computers, 1, pp. 91–163. Available here:
  • Bojar, O., Federmann, C., Fishel, M., Graham, Y., Haddow, B., Huck, M., Koehn, P., and Monz, C., 2018. Findings of the 2018 conference on machine translation (WMT18). In Proceedings of the Third Conference on Machine Translation, Volume 2: Shared Task Papers, pages 272–307, Belgium, Brussels. Association for Computational Linguistics.
  • Brown, P. F., Pietra, V. J. D., Pietra, S. A. D. & Mercer, R. L., 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2), pp. 263-311. Available at:
  • Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A. and Agarwal, S., 2020. Language Models are Few-Shot Learners. [online] Available at:
  • Callison-Burch, C., Koehn, P., Monz, C., Post, M., Soricut, R. & Specia, L., 2012. Findings of the 2012 workshop on statistical machine translation. Proceedings of the Seventh Workshop on Statistical Machine Translation. Available at:
  • Green, S., Chuang, J., Heer, J. & Manning, C.D., 2014. Predictive translation memory: a mixed-initiative system for human language translation. [online] Available here.
  • Guerberof Arenas, A., 2009. Productivity and quality in the post-editing of outputs from translation memories and machine translation. The International Journal of Localisation, 7(1), 11-21. [online] Available here.
  • Hinton, G., Osindero, S. & Teh, Y.W., 2006. A fast learning algorithm for deep belief nets. Neural Computation. [online] Available here.
  • Hutchins, J., 2004. 'The early years of machine translation: a personal memoir'. In: Somers, H. (ed.) Computers and Translation: A translator's guide. Available here.
  • Hutchins, J., 2014. 'The history of machine translation in a nutshell'. Available here.
  • Hutchins, W. J. & Somers, H. L., 1992. An Introduction to Machine Translation. Academic Press.
  • Johnson, M., Schuster, M., Le, Q.V., Krikun, M., Wu, Y., Chen, Z., Thorat, N., Viégas, F., Wattenberg, M., Corrado, G., Hughes, M. and Dean, J., 2017. Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. [online] Available at:
  • Koehn, P., 2010. Statistical machine translation. 1st ed. New York, NY, USA: Cambridge University Press.
  • Koehn, P., 2017. Neural machine translation. arXiv preprint arXiv:1709.07809. [online] Available at:
  • Locke, W. N. & Booth, A. D., 1955. Machine Translation of Languages. Available here.
  • Plitt, M., & Masselot, F., 2010. A productivity test of statistical machine translation post-editing in a typical localisation context. The Prague Bulletin of Mathematical Linguistics, 93(1), 7-16. [online] Available at: 
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. & Polosukhin, I., 2017. Attention is all you need. [online] Available at: