* [Galene] Running galene-stt on the GPU
@ 2024-11-20 18:34 Juliusz Chroboczek
0 siblings, 0 replies; only message in thread
From: Juliusz Chroboczek @ 2024-11-20 18:34 UTC (permalink / raw)
To: galene
Performance of galene-stt
=========================
CPU/GPU Model times realtime
(larger is better)
---------------------------------------------------
i5-8350U (4-core CPU) base 1.17
Xeon(R) Gold 5220R (24-core CPU) base 1.49
2 * RTX A6000 (GPU) base 27.61
2 * RTX A6000 (GPU) medium 12.68
2 * RTX A6000 (GPU) large-v3 8.66
Clearly, whisper.cpp does not scale well on the CPU: the performance on
the 24-core server is only slightly better than on the 4-core laptop.
Hence, there's no hope to run anything better than the "base" model on the
CPU. Since the "base" model produces random gibberish, running galene-stt
on the CPU is not useful.
Performance on the GPU, on the other hand, is excellent. On two RTX A6000
using CUDA, we get over 8 times realtime using the large-v3 model (the
best model available at the time of writing as far as I'm aware), which
means that we should be able to transcribe eight simultaneous speakers.
Thanks to Jean-Baptiste Yunès for giving me access to the server.
Transcripts
===========
The "base" transcript is almost complete gibberish:
into little pieces.
because...
There's not an infinite number of wires between any people.
some places. Not everything is connected with its own individual pairs of
for some wires.
So we're going to have to share the path.
The only way that we could share the path among men.
If you think.
is to have the amount of work that we do per thing be controlled.
>> Yeah. All right.
Somebody can send.
and to a gigabyte into the path and I can't do anything until
until that gigabyte has gone away.
makes it.
seeing cars and trains on the city road.
roadway system.
you're gonna spend a lot of time in your car waiting.
you want to add an intersection for trying to go by.
you want to have.
Limited rent things.
The "medium" transcript is comprehensible, except for the interesting
hallucinations in the fourth caption and at the very end:
up into little pieces.
because...
There's not an infinite number of wires between any two wires.
places. Not everything is connected with its own individual parasol.
of wires. So we're going to have to share the path.
The only way that we could share the path among men.
many things.
is to have the amount of work that we do per thing be controlled.
rolled.
If somebody can send--
and a gigabyte into the path and I can't do anything until
until that gigabyte has gone away, you know.
much like.
fixing cars and trains on the city rail.
So roadway system.
and spent a lot of time in your car waiting at a--
to have an intersection for a train to go by. You don't want to do that. You want
have women that learn things.
The "large-v3" transcript avoids the first hallucination:
into little pieces.
Because
There's not an infinite number of wires between any of
in places. Not everything is connected with its own individual pairs of
of wires. So we're going to have to share the path.
The only way that we could share the path among men
many things
is to have the amount of work that we do per thing be controlled.
if somebody can send
like a gigabyte into the path and I can't do anything until I
that gigabyte has gone away.
It's like...
fixing cars and trains on the city roads.
So roadway system.
at an-- spent a lot of time in your car waiting at
have an intersection for a train to go by. You don't want to do that. You want to have
and have women who'd lend things.
Experimental setup
==================
I've tested on a forty-two second fragment of a talk by Van Jacobson from
2006. The clip has a lot of background noise, which is realistic for
videoconferencing, but Jacobson speaks slowly and in a very clear (to my
ear) American accent. The clip was obtained with the following commands:
yt-dlp -f 234 'https://www.youtube.com/watch?v=gqGEMQveoqg'
ffmpeg -ss 18:45 -t 42 -i "A New Way to look at Networking [gqGEMQveoqg].mp4" -acodec copy van.mp4
The clip was played from a Chromium-based browser through the instance of
Galene at galene.org, packet loss and all. The laptop was getting RTP/UDP
data, while the server was using RTP/TCP/TURN over an SSH tunnel. Don't
ask.
The commits used were:
whisper.cpp 6266a9f9e56a5b925e9892acf650f3eb1245814d
galene-stt 88b89cc0550d0b9599b0beeb180fcaf70e591186
The laptop was using gcc 14.2.0 and Go 1.23.3. The server was using
gcc 13.2.0, Go 1.22.2, and CUDA 12.6.r12.6.
-- Juliusz
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2024-11-20 18:34 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-20 18:34 [Galene] Running galene-stt on the GPU Juliusz Chroboczek
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox