Wearable Text to Audio Translator
by esarder in Circuits > Arduino
1391 Views, 3 Favorites, 0 Comments
Wearable Text to Audio Translator
I made a (currently unfinished) headband that takes photos/video of the user's surroundings, sends them to a laptop to grab any text from the image and convert the text to an audio recording, then receives the .mp3 audio recording and plays it out of a speaker to the user.
Supplies
- Arduino Uno Rev 3
- ESP32-Cam WiFi Bluetooth Camera Module
- Speakers (any Arduino compatible speakers, I used MakerHawk speakers)
- Socket-to-jack jumper wires
- Breadboard
Ideation and Brainstorming
Because we were asked to create a piece of tech for a community I don't belong to or have much knowledge of, I spent some time researching struggles that low-vision or blind people face while navigating through the world. While it is of course challenging to navigate unknown situations in general, many people are working on wearable tech like the WeWalk Smart Cane, which integrates map data to help users navigate their surroundings, I wanted to push a little further and see if there were any challenges posed to the community that were less obvious. This article mentioned how much of public information is visual: signs for out of service bathrooms, directions to the nearest subway station. My idea to create a headband that could pick up this visual text, translate it into audio, and provide this to visually impaired people stemmed from this challenge.
Install ESP32 in Arduino IDE
To use the ESP32 -CAM, an additional port add-on must be installed. I followed the following tutorial: Using ESP32 Cam with Arduino
Connect the ESP32-CAM to the Arduino
Follow the wiring diagram above to connect your ESP32-CAM to the Arduino
Photo to Text
The following code takes an image (presumably grabbed from the ESP32-CAM), reads any printed text from the photo as a string into the program using OpenCV and Pytesseract, and prints it to the terminal.
To install OpenCV and Pytesseract, run the following commands in your terminal:
pip install opencv-python pip install pytesseract
Code:
# program: https://www.geeksforgeeks.org/text-detection-and-extraction-using-opencv-and-ocr/ # Import required packages import cv2 import pytesseract img = cv2.imread('arduino_text.jpg') print(pytesseract.image_to_string(img))
Note: this program does not work well with hand-written text! That is a great next step for improving this project!
Reading this image into the program:
Produces the following in the terminal:
Text to Audio
The following code takes text (presumably read in from the image sent by the ESP323-CAM), translates it into a voice recording of the text using Google Text-To-Speech (gTTS), then plays the .mp3 file.
To install gTTS, run the following command in your terminal:
pip install gTTS
Code
# reference: https://www.geeksforgeeks.org/convert-text-speech-python/ # Import the required module for text # to speech conversion from gtts import gTTS import vlc # This module is imported so that we can # play the converted audio import os # The text that you want to convert to audio mytext = 'Out of Order' # Language in which you want to convert language = 'en' # Passing the text and language to the engine, # here we have marked slow=False. Which tells # the module that the converted audio should # have a high speed myobj = gTTS(text=mytext, lang=language, slow=False) # Saving the converted audio in a mp3 file named # welcome myobj.save("out_of_order.mp3") os.system("out_of_order.mp3")
This produces the attached .mp3 file!
Downloads
Connect to Arduino IDE (aKA Where It All Went Wrong)
Where this project falls off is the connections. I ran into issues getting programs to upload to my ESP32-CAM through my Arduino. I did quite a bit of troubleshooting on this issue, including switching the power source from 3.3V to 5V and holding the Reset button on the ESP32-CAM down during the connection process. A few sources recommended attaching a 10 uF electrolytic capacitor between the EN pin and GND. This is the last of the recommendations that I have yet to try, beyond switching out my module for another, and it will be the next thing I pursue to try and fix the project.
Here's some resources for troubleshooting any errors that come up with connection the ESP32-CAM:
- https://randomnerdtutorials.com/esp32-cam-troubleshooting-guide/
- https://rntlab.com/question/esp32-cam-a-fatal-error-occurred-failed-to-connect-to-esp32/
Next Steps
From here on out, most of my work is theoretical/yet to be tested. Ideally, once I'm able to upload sketches to the ESP32-CAM, I will be able to create a functional webserver by which I can send images over WIFI from the ESP32-CAM to my computer for processing. Those images will be processed by the text recognition and text-to-speech programs above and a .mp3 of the speech will be send back to the Arduino to be played out of the speakers. I've attached some low-detail sketches of the general system
Sewing Headband
Though the electric connections aren't... functional yet, I mocked up a headband for when this could actually be wearable! It consists of a piece of elastic threaded through cotton tubing, with a pocket at the front for the ESP32-CAM and a larger pocket in the back to house the Arduino. An additional pocket will be required to house the speaker, and I plan to thread the wires through a fabric casing when the project is finished to give it a clean look!