Python continuous audio recording with periodic saving

Ats

3 min readMar 31, 2024

This is the documentation of the way to save audio chunks during long-term recording with python

First of all

These are my development environments

Hardware

Raspberry Pi CM4 Model B
Waveshare Raspberry Pi CM4 IO board (https://www.waveshare.com/cm4-io-base-b.htm)
ReSpeaker 2-Mics Pi HAT (https://wiki.seeedstudio.com/ReSpeaker/)

Software

Raspberry Pi OS Bullseys
PyAudio (https://people.csail.mit.edu/hubert/pyaudio/)

Background

Last week, I recorded a voice audio with PyAudio, which was about 25 minutes long. Then it took over 10 minutes to process the audio stream. I just copied and pasted from PyAudio dosc about how to record.

PyAudio

PyAudio provides Python bindings for PortAudio, the cross platform audio API.

people.csail.mit.eduI

I could improve the time somehow. However, I thought I didn't have to wait to finish recording and could start saving while recording.

First of all, I googled the way to do that because I thought there should have been some ways or libraries. After researching for a while, I didn’t find any good solutions to it. So I decided to document my way.

What I did

I didn’t find the same as what I wanted to do. However, I found the following article written by the Google team and thought I could modify the code lines for my purpose.

Transcribe audio from streaming input | Cloud Speech-to-Text Documentation | Google Cloud

Transcribe audio from streaming input to text.

cloud.google.com

This code snippet is for the real-time transcription through sending API to a Google service. So I need to make it save audio chunks somehow instead of sending API requests. In the reference, they use a generator to send requests but I decided to save chunks in the stream_callback function because the PyAudio docs say the function is called in a thread apart from the main by default. I thought I could do it with little effort.

stream_callback is called in a separate thread (from the main thread).

PyAudio Documentation - PyAudio 0.2.14 documentation

Edit description

people.csail.mit.edu

So I changed the reference like below.

    def _fill_buffer(
        self: object,
        in_data: object,
        frame_count: int,
        time_info: object,
        status_flags: object,
    ) -> object:
        """Continuously collect data from the audio stream, into the buffer.

        Args:
            in_data: The audio data as a bytes object
            frame_count: The number of frames captured
            time_info: The time information
            status_flags: The status flags

        Returns:
            The audio data as a bytes object
        """
        logger.info(f'fill buffer: {len(self._recording_frames)}')
        self._recording_frames.append(in_data)
        if len(self._recording_frames) >= RECORDING_CHUNK_SIZE:
            saving_frames = self._recording_frames[:]
            self._recording_frames = []
            self._count += 1
            self._save(saving_frams, self._count, 

        return None, pyaudio.paContinue


    def _save(self, frames, count):
        with wave.open(f'{CHUNK_DIR}{self._session_id}_{count:02}.{FILE_EXTENSION}', 'wb') as wf:
            wf.setnchannels(self._channel)
            wf.setsampwidth(self._audio_interface.get_sample_size(self._sample_width))
            wf.setframerate(self._rate)
            wf.writeframes(b''.join(frames))
        logger.info(f'Finish recording: count: {count}, frames: {len(frames)}')

However, the _save function blocked the main thread even though the docs say the function is run in a separate thread. I tested the callback_function with time.sleep and saw the log. The main thread was blocked by time.sleep in the callback_function , which means the recording stopped during the time.sleep in my case. Actually, I haven’t got the reason yet. But I changed the code snippet to create the thread by myself like below for now.

    def _fill_buffer(
        self: object,
        in_data: object,
        frame_count: int,
        time_info: object,
        status_flags: object,
    ) -> object:
        """Continuously collect data from the audio stream, into the buffer.

        Args:
            in_data: The audio data as a bytes object
            frame_count: The number of frames captured
            time_info: The time information
            status_flags: The status flags

        Returns:
            The audio data as a bytes object
        """
        logger.info(f'fill buffer: {len(self._recording_frames)}')
        self._recording_frames.append(in_data)
        if len(self._recording_frames) >= RECORDING_CHUNK_SIZE:
            saving_frames = self._recording_frames[:]
            self._recording_frames = []
            self._count += 1

            # FIXME: Should not have to create a thread by myself
            self._create_chunk_saving_thread(saving_frames, self._count)

        return None, pyaudio.paContinue

    def _create_chunk_saving_thread(self, saving_frames, count):
        created_at = datetime.datetime.now().strftime('%y%m%d%H%M%S')

        saving_thread = Thread(
            target=self._save,
            args=(saving_frames, count, created_at,),
            daemon=True,
        )
        saving_thread.start()
        logger.info(f'Start saving: session_id: {self._session_id}, count: {self._count}')


    def _save(self, frames, count, start_time):
        with wave.open(f'{CHUNK_DIR}{self._session_id}_{count:02}.{FILE_EXTENSION}', 'wb') as wf:
            wf.setnchannels(self._channel)
            wf.setsampwidth(self._audio_interface.get_sample_size(self._sample_width))
            wf.setframerate(self._rate)
            wf.writeframes(b''.join(frames))
        logger.info(f'Finish recording: count: {count}, start_time: {start_time}, frames: {len(frames)}')

Afterward, the chunk saving worked well without blocking audio recording. These are the whole code lines.

pyaudiorecorder/microphone_stream.py at 3c64cdced6751de09350846b1ebeeb237f488e4b ·…

Contribute to atsss/pyaudiorecorder development by creating an account on GitHub.

github.com

I’ll update after I get the reason why the callback_function blocked the main thread.

That’s it!