Antoine Chatry profile picture, web developer

Antoine Chatry

codewars score

How does my real-time YouTube transcription tool work?

I am proud to share with you my latest open-source project: a tool that transcribes the YouTube videos you watch live (in real time).

Here is a step-by-step guide to how it works:


1. Capturing the sound coming out of your computer

The tool "listens" to exactly what your speakers are playing using a virtual audio cable (called loopback). The most well-known are: VB-Cable (Windows), BlackHole (Mac), or Stereo Mix when enabled.

2. Cutting the sound into small pieces

Every ~5 seconds, we take a piece of audio. It's short enough to be fast, long enough for the AI to understand the context.

3. Automatic transcription by local AI

Each small piece is sent to a Whisper model (Whisper.cpp). The model runs entirely on your machine → zero sending to the cloud, zero subscriptions, zero data leaks.

4. Automatic addition of timestamps

As soon as the text is found, the exact time is noted:

[00:03:42] And that's where the story gets really interesting...

5. Progressive writing to a text file

Everything is continuously added to a .txt file with the date and time of day. At the end of the video → you have the complete transcript, ready to copy and paste or reread.

Quick summary & technologies used

In a nutshell:
You put on a YouTube video → you run the script → you watch as usual → Ctrl+C to stop → you have all the text time-stamped in a file.

Technologies used:

Link to the project : https://github.com/AntoineChatry/realtime-youtube-transcribe
Happy transcription to everyone! 🚀