From: Juliusz Chroboczek <jch@irif.fr>
To: galene@lists.galene.org
Subject: [Galene] Running galene-stt on the GPU
Date: Wed, 20 Nov 2024 19:34:43 +0100 [thread overview]
Message-ID: <87o729oit8.wl-jch@irif.fr> (raw)
Performance of galene-stt
=========================
CPU/GPU Model times realtime
(larger is better)
---------------------------------------------------
i5-8350U (4-core CPU) base 1.17
Xeon(R) Gold 5220R (24-core CPU) base 1.49
2 * RTX A6000 (GPU) base 27.61
2 * RTX A6000 (GPU) medium 12.68
2 * RTX A6000 (GPU) large-v3 8.66
Clearly, whisper.cpp does not scale well on the CPU: the performance on
the 24-core server is only slightly better than on the 4-core laptop.
Hence, there's no hope to run anything better than the "base" model on the
CPU. Since the "base" model produces random gibberish, running galene-stt
on the CPU is not useful.
Performance on the GPU, on the other hand, is excellent. On two RTX A6000
using CUDA, we get over 8 times realtime using the large-v3 model (the
best model available at the time of writing as far as I'm aware), which
means that we should be able to transcribe eight simultaneous speakers.
Thanks to Jean-Baptiste Yunès for giving me access to the server.
Transcripts
===========
The "base" transcript is almost complete gibberish:
into little pieces.
because...
There's not an infinite number of wires between any people.
some places. Not everything is connected with its own individual pairs of
for some wires.
So we're going to have to share the path.
The only way that we could share the path among men.
If you think.
is to have the amount of work that we do per thing be controlled.
>> Yeah. All right.
Somebody can send.
and to a gigabyte into the path and I can't do anything until
until that gigabyte has gone away.
makes it.
seeing cars and trains on the city road.
roadway system.
you're gonna spend a lot of time in your car waiting.
you want to add an intersection for trying to go by.
you want to have.
Limited rent things.
The "medium" transcript is comprehensible, except for the interesting
hallucinations in the fourth caption and at the very end:
up into little pieces.
because...
There's not an infinite number of wires between any two wires.
places. Not everything is connected with its own individual parasol.
of wires. So we're going to have to share the path.
The only way that we could share the path among men.
many things.
is to have the amount of work that we do per thing be controlled.
rolled.
If somebody can send--
and a gigabyte into the path and I can't do anything until
until that gigabyte has gone away, you know.
much like.
fixing cars and trains on the city rail.
So roadway system.
and spent a lot of time in your car waiting at a--
to have an intersection for a train to go by. You don't want to do that. You want
have women that learn things.
The "large-v3" transcript avoids the first hallucination:
into little pieces.
Because
There's not an infinite number of wires between any of
in places. Not everything is connected with its own individual pairs of
of wires. So we're going to have to share the path.
The only way that we could share the path among men
many things
is to have the amount of work that we do per thing be controlled.
if somebody can send
like a gigabyte into the path and I can't do anything until I
that gigabyte has gone away.
It's like...
fixing cars and trains on the city roads.
So roadway system.
at an-- spent a lot of time in your car waiting at
have an intersection for a train to go by. You don't want to do that. You want to have
and have women who'd lend things.
Experimental setup
==================
I've tested on a forty-two second fragment of a talk by Van Jacobson from
2006. The clip has a lot of background noise, which is realistic for
videoconferencing, but Jacobson speaks slowly and in a very clear (to my
ear) American accent. The clip was obtained with the following commands:
yt-dlp -f 234 'https://www.youtube.com/watch?v=gqGEMQveoqg'
ffmpeg -ss 18:45 -t 42 -i "A New Way to look at Networking [gqGEMQveoqg].mp4" -acodec copy van.mp4
The clip was played from a Chromium-based browser through the instance of
Galene at galene.org, packet loss and all. The laptop was getting RTP/UDP
data, while the server was using RTP/TCP/TURN over an SSH tunnel. Don't
ask.
The commits used were:
whisper.cpp 6266a9f9e56a5b925e9892acf650f3eb1245814d
galene-stt 88b89cc0550d0b9599b0beeb180fcaf70e591186
The laptop was using gcc 14.2.0 and Go 1.23.3. The server was using
gcc 13.2.0, Go 1.22.2, and CUDA 12.6.r12.6.
-- Juliusz
reply other threads:[~2024-11-20 18:34 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.galene.org/postorius/lists/galene.lists.galene.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87o729oit8.wl-jch@irif.fr \
--to=jch@irif.fr \
--cc=galene@lists.galene.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox