Tutorial: How to Make a Universal Language Translator With Python

Speech recognition technology has been of interest for some time now. The knowledge of converting data from one form to another has become much more relevant in recent years. Libraries like Keras, TensorFlow, PyTorch, and many others, have made data sciences, Machine Learning (ML) algorithms, deep learning and Artificial Intelligence (AI) applications approachable for many new and enthusiastic coders. Speech recognition, text and language are three such forms of data that carry a huge amount of information and together they form a great resource of knowledge. Being able to translate from English and transcribe into languages such as Spanish, Arabic, Korean - or any other language in real-time for that matter – can indeed be a very valuable tool in one’s application.

Here is a simple introduction to creating building blocks that in turn allow you to utilize speech, text and language in your application. The following article provides a step-by-step tutorial on these blocks.

Supplies

A Raspberry Pi with an internet connection
USB Speaker and Mic
Python3 installed
Basic knowledge of Python

*This article can also be used as a stepping stone to learn Python and Python modules and is thus relevant to novice coders as well.

*A useful python cheatsheet for beginners can be found at https://www.youngwonks.com/resources/python-cheatsheet

Text to Speech

Text to Speech is the process of listening and converting the text provided by the user into audio in the form of a file that can be played back through a speaker or any other form of audio output.

Python Modules:

GTTS – pip3 install gtts
Playsound – pip3 install Playsound

Basic Syntax:

The basic syntax here involves importing the modules and initializing the Google Text to Speech object with the string to be converted as a parameter. Then we call the save method to save the audio file (in our case .mp3 format) and finally utilize the Playsound module to playback the audio.

Additional Syntax:

Downloads

texttospeech.py

Speech to Text

Speech to Text refers to the process of speech recognition and transcription and is essentially the reverse process of what we learned in the previous step. Here we listen to the user’s speech and use the speech recognition Python module to create a transcript of the speech that has been recorded. Today, voice search and virtual assistant functionality is a common feature provided on every mobile device - be it Microsoft, Apple, Android or iOS.

Voice recognition can help in customizing voice commands on one’s commonly used devices, thus making one’s workflow quite smooth. Voice to Text is also a required step for further translation to different languages.

Python Modules:

Speech Recognition – pip3 install SpeechRecognition
PyAudio – pip3 install PyAudio

Notes:

PortAudio may be needed additionally on some Mac computers – pip3 install portaudio
For Python versions 3.7 or later, PyAudio cannot be installed via pip and for now must be installed by downloading and pip installing the appropriate version from https://www.lfd.uci.edu/~gohlke/pythonlibs/

Basic Syntax:

The basic syntax involves importing the sr class from the speech recognition module and initializing the Recognizer object. Next you initialize the microphone device and listen to it. The audio returned is then passed to a method that utilizes Google’s Speech to Text engine to return the transcript in the form of a string.

Downloads

speechtotext.py

Language Translation

The ability to translate a language from one to another is useful for many applications in the field of Natural Language Processing (NLP) and also provides access to otherwise inaccessible knowledge and information. The transcribed text can be utilized to get a much deeper understanding of the data.

Python Modules:

Translate – pip3 install translate

Basic Syntax:

The above code involves importing the translator class and initializing it with the output language. We then call the translate method on translator object with the text as the argument. In return, we receive a string containing the translated text.

Note:

Google Translate is another option that can be used for the same purpose. There are many such cloud Application Programming Interfaces (APIs), cloud services and speech recognition softwares that are commercially available today (including Amazon), that provide automatic speech recognition and translation that can be included in your web app or mobile app. They are not free and you may need to check their pricing to see if they have a free tier. Also, they may not be useful for an offline application.

Downloads

texttranslate.py

Going Further

The above tutorial provides a short and simple introduction to three concepts commonly used in the domain of NLP. A logical next step would be utilizing the blocks towards a particular application.

Some simple projects:

Speech to Text - Translation - Text to Speech
Tkinter and Text to Speech
Dictation app in Pygame
Dictation app in Tkinter

Some challenging projects:

1. Automatic Speech Recognition and Detection

2. Background Noise Removal

3. Translation and Dictation Mobile app in Flutter

Downloads

speech2text2translate2speech.py