Voice Recognition with Python (speech_recognition and PyAudio)

Python has quite a handy library called speech_recognition, which we can use to create a program where a user’s voice can be transcribed into text.

Let’s have a look at how we can do this. Note that I’m using Python version 3.6.0 at the time of writing to run the below.

1. Install the speech_recognition library

Firstly, we open our Windows terminal as administrator (I’m using Windows 10), and enter:

pip install SpeechRecognition

speech_recognition

2. Install PyAudio

You will notice that upon trying to run python -m speech_recognition, we yield an error since we have not yet installed the PyAudio dependency:

PyAudio

Therefore, we install with pip again as follows:

pip install PyAudio

3. Import speech_recognition and record audio

Upon opening Python, we enter the below code:

import speech_recognition as sr
# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
    print("Speak into the microphone")
audio = r.listen(source)

Once I am prompted to speak into the microphone, I will say the words, “Hello, my name is Michael”.

4. Transcribe audio

Now, I am using the Google Speech API to transcribe my speech input:

try:
    print("Transcription: " + r.recognize_google(audio))
except sr.UnknownValueError:
    print("Audio unintelligible")
except sr.RequestError as e:
print("Cannot obtain results; {0}".format(e))

Transcription

Now, we can see that Python has transcribed my voice into text!

5. Read speech from audio file

It’s one thing if we wish to record speech from a microphone in real-time, but what if we would like to transcribe speech from an audio file?

Here is how we would do it.

Firstly, we will use an mp3 file which contains speech saying, “Hello, I am a data scientist”. Assume this file is called recording.mp3.

This time, we will use the speech_recognition library as last time, only our source will now be an audio file rather than sound being recorded into the microphone:

import speech_recognition as sr
r = sr.Recognizer()
with sr.AudioFile("C:/Users/directory/recording.mp3") as source:
    audio = r.record(source)

Now, our audio variable has been saved as an AudioData object:

speech_recognition.AudioData object at 0x7f8b65253978

We can then use the Google Speech API once again to transcribe the output:

try:
    s = r.recognize_google(audio)
    print("You said: "+s)
except Exception as e:
    print("Exception: "+str(e))

And here is our output:

You said: hello I am a data scientist

Conclusion

Here, we have looked at:

  • How to install speech_recognition and PyAudio
  • Recording and transcribing of speech with speech_recognition and Google’s speech API
  • How to transcribe speech from an audio file

Author: Michael Grogan

Michael Grogan is a machine learning consultant and educator, with a profound passion for statistics and data science.

Leave a Reply

Your email address will not be published. Required fields are marked *

seventeen − thirteen =