Speech to text

Installation

PyAudio is required for input from the Mac microphone. I had to install portaudio before I could install PyAudio. I had to chown (“change owner”) and chmod (“change mode”) before I could install portaudio.

pip3 install SpeechRecognition

pip3 show SpeechRecognition
Name: SpeechRecognition
Version: 3.8.1
Summary: Library for performing speech recognition, with support for several engines and APIs, online and offline.
Home-page: https://github.com/Uberi/speech_recognition#readme
Author: Anthony Zhang (Uberi)
Author-email: azhang9@gmail.com
License: BSD
Location: /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages
Requires:
Required-by:

sudo chown -R $(whoami) /usr/local/share/man/man7
chmod u+w /usr/local/share/man/man7

ls -ld /usr/local/share/man/man7
drwxr-xr-x  26 myname  wheel  832 Mar 10  2019 /usr/local/share/man/man7

brew --help
brew list
gdbm		pkg-config	python3		sqlite
openssl		python		readline	xz

brew install portaudio

brew list
gdbm		pkg-config	python		readline	xz
openssl		portaudio	python3		sqlite

pip3 install PyAudio

pip3 show PyAudio
Name: PyAudio
Version: 0.2.11
Summary: PortAudio Python Bindings
Home-page: http://people.csail.mit.edu/hubert/pyaudio/
Author: Hubert Pham
Author-email: UNKNOWN
License: UNKNOWN
Location: /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages
Requires:
Required-by:

python3 -m speech_recognition
A moment of silence, please...
Set minimum energy threshold to 50.8320699286259
Say something!   (I said “hello”.)
Got it! Now to recognize it...
You said hello
Say something!
Got it! Now to recognize it...
You said goodbye
Say something!
Got it! Now to recognize it...
You said stop
Say something!
Got it! Now to recognize it...
You said two houses both alike in dignity
Say something!
control-c
"""
Print what the user says (or sings).

pip3 install SpeechRecognition
sudo chown -R $(whoami) /usr/local/share/man/man7
chmod u+w /usr/local/share/man/man7
brew install portaudio
pip3 install PyAudio

Python code copied from "Recognize speech input from the microphone" example in
https://github.com/Uberi/speech_recognition#readme
"""

import sys
import speech_recognition

recognizer = speech_recognition.Recognizer()
print(f"The energy threshold is {recognizer.energy_threshold}.")
print(f"The pause threshold is {recognizer.pause_threshold} second(s).")

with speech_recognition.Microphone() as source:   #Python context manager
    print("Please say something.  I'm listening to the microphone.")
    audio = recognizer.listen(source)

print("You seem to be done speaking, so I stopped listening to the microphone.")
print(f"size = {len(audio.frame_data):,} bytes")
print(f"sample rate = {audio.sample_rate:,} samples per second")
print(f"sample width = {audio.sample_width} bytes")

try:
    seconds = len(audio.frame_data) / (audio.sample_rate * audio.sample_width)
except ZeroDivisionError: #This style is EAFP (vs. LBYL).
    pass                  #Do nothing.
else:
    print(f"duration = {seconds:.3f} seconds")

#Recognize speech using Google Speech Recognition.

try:
    s = recognizer.recognize_google(audio, language = "en-US")   #s is a string.
except speech_recognition.UnknownValueError:     #unintelligible
    print("Google Speech Recognition could not understand audio", file = sys.stderr)
    sys.exit(1)
except speech_recognition.RequestError as error: #no Internet connection
    print(f"Could not request results from Google Speech Recognition service: {error}",
          file = sys.stderr)
    sys.exit(1)

print(f"Google Speech Recognition thinks you said")
print()
print(s)
sys.exit(0)

Run the program.

To make sure the macOS microphone is on, pull down the apple in the upper left corner of the screen and select
System Preferences… → Sound → Input
and turn up the Input volume.

“IDLE.app” would like to access the microphone.
Don’t Allow          OK

44100 = 2102 is a perfect square.

The energy threshold is 300.
The pause threshold is 0.8 second(s).
Please say something.  I'm listening to the microphone.
You seem to be done speaking, so I stopped listening to the microphone.
size = 471,040 bytes
sample rate = 44,100 samples per second
sample width = 2 bytes
duration = 5.341 seconds
Google Speech Recognition thinks you said

Double Double Toil and Trouble fire burn and cauldron bubble