Local text-to-speech on Raspberry Pi and Python

Ats

4 min readMay 11, 2024

I did some experiments on TTS on Raspberry Pi. This is the document of it.

First of all

These are my development environments.

Hardware

Raspberry Pi CM4 Model B (RAM 2GB)
Waveshare Raspberry Pi CM4 IO board (https://www.waveshare.com/cm4-io-base-b.htm)
ReSpeaker 2-Mics Pi HAT (https://wiki.seeedstudio.com/ReSpeaker/)

Software

Raspberry Pi OS Bullseys

The reason why I didn’t use the bookworm is because I couldn’t use it with ReSpeaker 2-Mics Pi HAT.

Background

I needed to make my edge device speak for my private project like Google Home or Alexa, which was just a prototype. So I was looking for an easy way to do it. I had known there were many libraries for that purpose and I had never touched it before. Then I thought it was good timing and started researching and testing.

What I did

I use many machine learning models with Hugging Face for jobs and fun.

Models - Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

There are a lot of varieties of models like computer vision, NLP, and so on. Also, it’s super easy to use. As the starting point, I was wondering about using Hugging Face on Raspberry Pi. At that moment, I had one concern that my Raspberry Pi had only 2GB RAM so it would be a problem. The following article is about trying to run LLM on a Raspberry Pi and the author had a problem with the limitation of RAM, CPU, and GPU of it.

Running LLM Models (LLaMA, Alpaca, LLaMA2, ChatGLM) on Raspberry Pi 4B for Edge Computing - DFRobot

Explore the world of large language models and learn how to choose the right LLM model for your Raspberry Pi 4B…

www.dfrobot.co]\\\\

The setup is quite simple. I installed PyTorch and TensorFlow and transformer as following the README.

GitHub - huggingface/transformers: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch…

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. - huggingface/transformers

github.com

pip install tflite-runtime
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install transformers

While I was doing setup, I got the following error related to Hugging Face.

Cannot access gated repo for url https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/resolve/main/config.json.
Access to model mistralai/Mistral-7B-Instruct-v0.2 is restricted. You must be authenticated to access it.

I solved the error as following the comment.

mistralai/Mistral-7B-Instruct-v0.2 · Cannot access gated repo You must be authenticated to access…

Got the following error today and it works yesterday. Cannot access gated repo for url…

huggingface.co

Then I set up the Hugging Face environment in my Raspberry Pi and I picked a small TTS model, which is bark-small provided by Suno

suno/bark-small · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

As I saw the list of files, the model size is 1.68 GB so I thought it could work. I tested the following code snippet on the README.

from transformers import pipeline
import scipy

synthesiser = pipeline("text-to-speech", "suno/bark-small")

speech = synthesiser("Hello, my dog is cooler than you!", forward_params={"do_sample": True})

scipy.io.wavfile.write("bark_out.wav", rate=speech["sampling_rate"], data=speech["audio"])

Unfortunately, it didn’t finish and stopped in the middle of the script. It succeeded in downloading the model but failed to process it with CPU and RAM.

Then I gave up on running Hugging Face on Raspberry Pi here and changed the direction. I decided to find a library that is optimized for an edge device. After researching for a couple of hours, I found the following two options.

GitHub - espeak-ng/espeak-ng: eSpeak NG is an open source speech synthesizer that supports more…

eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents. …

github.com

GitHub - rhasspy/piper: A fast, local neural text to speech system

A fast, local neural text to speech system. Contribute to rhasspy/piper development by creating an account on GitHub.

github.com

I installed the libraries with the following commands for each.

espeak-ng

sudo apt-get install espeak-ng

piper

pip install piper-tts

Those are quite easy to use. I just executed the following command on the terminal.

espeak-ng

espeak-ng "Thank you for everything and good luck!"

piper

echo "Thank you for everything and good luck!" | piper  --model en_US-lessac-medium  --output_file welcome.wav

After a few tests, there are some pros and cons.

espeak-ng is much faster than piper
espeak-ng can speak more languages than piper
piper is a much more natural sound than espeak-ng
piper is optimized for the Raspberry Pi 4

I would say piper is the best option to take if you use Raspberry Pi 4. If you need to implement TTS for languages not supported by piper or you have quite strict response-time constrains, espeak-ng would be your option.

I wished I could run Hugging Face on Raspberry Pi because I can do more things on the edge. But it’s still too early. We still need to find models optimized for an edge computer.

That’s it!

P.S

I’ve found the TTS library. I’ll try it later.

GitHub - nateshmbhat/pyttsx3: Offline Text To Speech synthesis for python

Offline Text To Speech synthesis for python. Contribute to nateshmbhat/pyttsx3 development by creating an account on…

github.com

Local text-to-speech on Raspberry Pi and Python

First of all

Background

What I did

Models - Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

Running LLM Models (LLaMA, Alpaca, LLaMA2, ChatGLM) on Raspberry Pi 4B for Edge Computing - DFRobot

Explore the world of large language models and learn how to choose the right LLM model for your Raspberry Pi 4B…

GitHub - huggingface/transformers: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch…

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. - huggingface/transformers

mistralai/Mistral-7B-Instruct-v0.2 · Cannot access gated repo You must be authenticated to access…

Got the following error today and it works yesterday. Cannot access gated repo for url…

suno/bark-small · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

GitHub - espeak-ng/espeak-ng: eSpeak NG is an open source speech synthesizer that supports more…

eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents. …

GitHub - rhasspy/piper: A fast, local neural text to speech system

A fast, local neural text to speech system. Contribute to rhasspy/piper development by creating an account on GitHub.

GitHub - nateshmbhat/pyttsx3: Offline Text To Speech synthesis for python

Offline Text To Speech synthesis for python. Contribute to nateshmbhat/pyttsx3 development by creating an account on…

Written by Ats

No responses yet