[Galene] Running galene-stt on the GPU

Galène videoconferencing server discussion list archives
 help / color / mirror / Atom feed

* [Galene] Running galene-stt on the GPU
@ 2024-11-20 18:34 Juliusz Chroboczek
  0 siblings, 0 replies; only message in thread
From: Juliusz Chroboczek @ 2024-11-20 18:34 UTC (permalink / raw)
  To: galene

Performance of galene-stt
=========================

    CPU/GPU                           Model      times realtime
                                                (larger is better)
    ---------------------------------------------------
    i5-8350U (4-core CPU)              base         1.17
    Xeon(R) Gold 5220R (24-core CPU)   base         1.49
    2 * RTX A6000 (GPU)                base        27.61
    2 * RTX A6000 (GPU)                medium      12.68
    2 * RTX A6000 (GPU)                large-v3     8.66

Clearly, whisper.cpp does not scale well on the CPU: the performance on
the 24-core server is only slightly better than on the 4-core laptop.
Hence, there's no hope to run anything better than the "base" model on the
CPU.  Since the "base" model produces random gibberish, running galene-stt
on the CPU is not useful.

Performance on the GPU, on the other hand, is excellent.  On two RTX A6000
using CUDA, we get over 8 times realtime using the large-v3 model (the
best model available at the time of writing as far as I'm aware), which
means that we should be able to transcribe eight simultaneous speakers.

Thanks to Jean-Baptiste Yunès for giving me access to the server.

Transcripts
===========

The "base" transcript is almost complete gibberish:

    into little pieces.
    because...
    There's not an infinite number of wires between any people.
    some places. Not everything is connected with its own individual pairs of
    for some wires.
    So we're going to have to share the path.
    The only way that we could share the path among men.
    If you think.
    is to have the amount of work that we do per thing be controlled.
    >> Yeah. All right.
    Somebody can send.
    and to a gigabyte into the path and I can't do anything until
    until that gigabyte has gone away.
    makes it.
    seeing cars and trains on the city road.
    roadway system.
    you're gonna spend a lot of time in your car waiting.
    you want to add an intersection for trying to go by.
    you want to have.
    Limited rent things.

The "medium" transcript is comprehensible, except for the interesting
hallucinations in the fourth caption and at the very end:

    up into little pieces.
    because...
    There's not an infinite number of wires between any two wires.
    places. Not everything is connected with its own individual parasol.
    of wires. So we're going to have to share the path.
    The only way that we could share the path among men.
    many things.
    is to have the amount of work that we do per thing be controlled.
    rolled.
    If somebody can send--
    and a gigabyte into the path and I can't do anything until
    until that gigabyte has gone away, you know.
    much like.
    fixing cars and trains on the city rail.
    So roadway system.
    and spent a lot of time in your car waiting at a--
    to have an intersection for a train to go by. You don't want to do that. You want
    have women that learn things.

The "large-v3" transcript avoids the first hallucination:

    into little pieces.
    Because
    There's not an infinite number of wires between any of
    in places. Not everything is connected with its own individual pairs of
    of wires. So we're going to have to share the path.
    The only way that we could share the path among men
    many things
    is to have the amount of work that we do per thing be controlled.
    if somebody can send
    like a gigabyte into the path and I can't do anything until I
    that gigabyte has gone away.
    It's like...
    fixing cars and trains on the city roads.
    So roadway system.
    at an-- spent a lot of time in your car waiting at
    have an intersection for a train to go by. You don't want to do that. You want to have
    and have women who'd lend things.

Experimental setup
==================

I've tested on a forty-two second fragment of a talk by Van Jacobson from
2006.  The clip has a lot of background noise, which is realistic for
videoconferencing, but Jacobson speaks slowly and in a very clear (to my
ear) American accent.  The clip was obtained with the following commands:

    yt-dlp -f 234 'https://www.youtube.com/watch?v=gqGEMQveoqg'
    ffmpeg -ss 18:45 -t 42 -i "A New Way to look at Networking [gqGEMQveoqg].mp4" -acodec copy van.mp4

The clip was played from a Chromium-based browser through the instance of
Galene at galene.org, packet loss and all.  The laptop was getting RTP/UDP
data, while the server was using RTP/TCP/TURN over an SSH tunnel.  Don't
ask.

The commits used were:

   whisper.cpp 6266a9f9e56a5b925e9892acf650f3eb1245814d
   galene-stt  88b89cc0550d0b9599b0beeb180fcaf70e591186

The laptop was using gcc 14.2.0 and Go 1.23.3.  The server was using
gcc 13.2.0, Go 1.22.2, and CUDA 12.6.r12.6.

-- Juliusz

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2024-11-20 18:34 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-20 18:34 [Galene] Running galene-stt on the GPU Juliusz Chroboczek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox