Working with sound is cool because it's one of very few ways that the outside world can touch the inside of a computer. When I talk, I shake atoms back and forth, which push and pull on a microphone, which pushes and pulls on electrons in a wire, which gets converted to a voltage, which gets represented as a floating point number for my program to work with. And as a bonus it can turn around and do the reverse. It's all the hardware connections we need to re-create the Star Trek computer voice interface. (The algorithms still need some work.)


To get set up for sound in Python, you'll need the sounddevice package. This is a cross platform tool (it works on Mac, Windows, and Linux) that talks to your machine's speakers and microphone. It converts sound to and from NumPy arrays, which is perfect if you want to use it for signal processing and machine learning applications, which we do.

At the command line run

python3 -m pip install sounddevice

Heads up: if you're working on Linux, you may get this "missing portaudio.h" error during the pip installs. If that happens, you'll have to manually install PortAudio, another library that sounddevice relies on. On my Ubuntu machine that is done with

sudo apt install portaudio19-dev

Recording sound

To start recording, we have to specify the number of channels (1 for mono, 2 for stereo) and the number of samples. If our device has a sampling rate of 44,100 samples per second and we want to record for 2 seconds, that would give total of n_samples = 44,100 x 2 = 88,200. (44,100 or 48,000 samples per second are what you're most likely to see on your computer.)

import sounddevice as sd  
recorded_array = sd.rec(n_samples, channels=2, blocking=True)

The blocking=True option forces the rest of your code to wait until the recording is complete before it carries on. This records the audio as a NumPy array, which sets it up for whatever signal processing or machine learning shenanigans you can dream up.

Playing sound

There's a complementary one liner to play back a NumPy array as sound., blocking=True)

This level of convenience for turning arrays into sound opens up a lot of possibilities for turning data into "auralizations" — audio representations of data. ( Here are the docs for play and rec.)

Knock yourself out!