Local text-to-speech on Raspberry Pi and Python

Ats
4 min readMay 11, 2024

--

I did some experiments on TTS on Raspberry Pi. This is the document of it.

Photo by Miguel Henriques on Unsplash

First of all

These are my development environments.

Hardware

Software

  • Raspberry Pi OS Bullseys

The reason why I didn’t use the bookworm is because I couldn’t use it with ReSpeaker 2-Mics Pi HAT.

Background

I needed to make my edge device speak for my private project like Google Home or Alexa, which was just a prototype. So I was looking for an easy way to do it. I had known there were many libraries for that purpose and I had never touched it before. Then I thought it was good timing and started researching and testing.

What I did

I use many machine learning models with Hugging Face for jobs and fun.

There are a lot of varieties of models like computer vision, NLP, and so on. Also, it’s super easy to use. As the starting point, I was wondering about using Hugging Face on Raspberry Pi. At that moment, I had one concern that my Raspberry Pi had only 2GB RAM so it would be a problem. The following article is about trying to run LLM on a Raspberry Pi and the author had a problem with the limitation of RAM, CPU, and GPU of it.

The setup is quite simple. I installed PyTorch and TensorFlow and transformer as following the README.

pip install tflite-runtime
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install transformers

While I was doing setup, I got the following error related to Hugging Face.

Cannot access gated repo for url https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/resolve/main/config.json.
Access to model mistralai/Mistral-7B-Instruct-v0.2 is restricted. You must be authenticated to access it.

I solved the error as following the comment.

Then I set up the Hugging Face environment in my Raspberry Pi and I picked a small TTS model, which is bark-small provided by Suno

As I saw the list of files, the model size is 1.68 GB so I thought it could work. I tested the following code snippet on the README.

from transformers import pipeline
import scipy

synthesiser = pipeline("text-to-speech", "suno/bark-small")

speech = synthesiser("Hello, my dog is cooler than you!", forward_params={"do_sample": True})

scipy.io.wavfile.write("bark_out.wav", rate=speech["sampling_rate"], data=speech["audio"])

Unfortunately, it didn’t finish and stopped in the middle of the script. It succeeded in downloading the model but failed to process it with CPU and RAM.

Then I gave up on running Hugging Face on Raspberry Pi here and changed the direction. I decided to find a library that is optimized for an edge device. After researching for a couple of hours, I found the following two options.

I installed the libraries with the following commands for each.

  • espeak-ng
sudo apt-get install espeak-ng
  • piper
pip install piper-tts

Those are quite easy to use. I just executed the following command on the terminal.

  • espeak-ng
espeak-ng "Thank you for everything and good luck!"
  • piper
echo "Thank you for everything and good luck!" | piper  --model en_US-lessac-medium  --output_file welcome.wav

After a few tests, there are some pros and cons.

  • espeak-ng is much faster than piper
  • espeak-ng can speak more languages than piper
  • piper is a much more natural sound than espeak-ng
  • piper is optimized for the Raspberry Pi 4

I would say piper is the best option to take if you use Raspberry Pi 4. If you need to implement TTS for languages not supported by piper or you have quite strict response-time constrains, espeak-ng would be your option.

I wished I could run Hugging Face on Raspberry Pi because I can do more things on the edge. But it’s still too early. We still need to find models optimized for an edge computer.

That’s it!

P.S

I’ve found the TTS library. I’ll try it later.

--

--

Ats
Ats

Written by Ats

I like building something tangible like touch, gesture, and voice. Ruby on Rails / React Native / Yocto / Raspberry Pi / Interaction Design / CIID IDP alumni

No responses yet