Galène videoconferencing server discussion list archives
 help / color / mirror / Atom feed
From: Juliusz Chroboczek <jch@irif.fr>
To: galene@lists.galene.org
Subject: [Galene] Running galene-stt on the GPU
Date: Wed, 20 Nov 2024 19:34:43 +0100	[thread overview]
Message-ID: <87o729oit8.wl-jch@irif.fr> (raw)

Performance of galene-stt
=========================

    CPU/GPU                           Model      times realtime
                                                (larger is better)
    ---------------------------------------------------
    i5-8350U (4-core CPU)              base         1.17
    Xeon(R) Gold 5220R (24-core CPU)   base         1.49
    2 * RTX A6000 (GPU)                base        27.61
    2 * RTX A6000 (GPU)                medium      12.68
    2 * RTX A6000 (GPU)                large-v3     8.66

Clearly, whisper.cpp does not scale well on the CPU: the performance on
the 24-core server is only slightly better than on the 4-core laptop.
Hence, there's no hope to run anything better than the "base" model on the
CPU.  Since the "base" model produces random gibberish, running galene-stt
on the CPU is not useful.

Performance on the GPU, on the other hand, is excellent.  On two RTX A6000
using CUDA, we get over 8 times realtime using the large-v3 model (the
best model available at the time of writing as far as I'm aware), which
means that we should be able to transcribe eight simultaneous speakers.

Thanks to Jean-Baptiste Yunès for giving me access to the server.


Transcripts
===========

The "base" transcript is almost complete gibberish:

    into little pieces.
    because...
    There's not an infinite number of wires between any people.
    some places. Not everything is connected with its own individual pairs of
    for some wires.
    So we're going to have to share the path.
    The only way that we could share the path among men.
    If you think.
    is to have the amount of work that we do per thing be controlled.
    >> Yeah. All right.
    Somebody can send.
    and to a gigabyte into the path and I can't do anything until
    until that gigabyte has gone away.
    makes it.
    seeing cars and trains on the city road.
    roadway system.
    you're gonna spend a lot of time in your car waiting.
    you want to add an intersection for trying to go by.
    you want to have.
    Limited rent things.

The "medium" transcript is comprehensible, except for the interesting
hallucinations in the fourth caption and at the very end:

    up into little pieces.
    because...
    There's not an infinite number of wires between any two wires.
    places. Not everything is connected with its own individual parasol.
    of wires. So we're going to have to share the path.
    The only way that we could share the path among men.
    many things.
    is to have the amount of work that we do per thing be controlled.
    rolled.
    If somebody can send--
    and a gigabyte into the path and I can't do anything until
    until that gigabyte has gone away, you know.
    much like.
    fixing cars and trains on the city rail.
    So roadway system.
    and spent a lot of time in your car waiting at a--
    to have an intersection for a train to go by. You don't want to do that. You want
    have women that learn things.

The "large-v3" transcript avoids the first hallucination:

    into little pieces.
    Because
    There's not an infinite number of wires between any of
    in places. Not everything is connected with its own individual pairs of
    of wires. So we're going to have to share the path.
    The only way that we could share the path among men
    many things
    is to have the amount of work that we do per thing be controlled.
    if somebody can send
    like a gigabyte into the path and I can't do anything until I
    that gigabyte has gone away.
    It's like...
    fixing cars and trains on the city roads.
    So roadway system.
    at an-- spent a lot of time in your car waiting at
    have an intersection for a train to go by. You don't want to do that. You want to have
    and have women who'd lend things.


Experimental setup
==================

I've tested on a forty-two second fragment of a talk by Van Jacobson from
2006.  The clip has a lot of background noise, which is realistic for
videoconferencing, but Jacobson speaks slowly and in a very clear (to my
ear) American accent.  The clip was obtained with the following commands:

    yt-dlp -f 234 'https://www.youtube.com/watch?v=gqGEMQveoqg'
    ffmpeg -ss 18:45 -t 42 -i "A New Way to look at Networking [gqGEMQveoqg].mp4" -acodec copy van.mp4

The clip was played from a Chromium-based browser through the instance of
Galene at galene.org, packet loss and all.  The laptop was getting RTP/UDP
data, while the server was using RTP/TCP/TURN over an SSH tunnel.  Don't
ask.

The commits used were:

   whisper.cpp 6266a9f9e56a5b925e9892acf650f3eb1245814d
   galene-stt  88b89cc0550d0b9599b0beeb180fcaf70e591186

The laptop was using gcc 14.2.0 and Go 1.23.3.  The server was using
gcc 13.2.0, Go 1.22.2, and CUDA 12.6.r12.6.

-- Juliusz

                 reply	other threads:[~2024-11-20 18:34 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.galene.org/postorius/lists/galene.lists.galene.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o729oit8.wl-jch@irif.fr \
    --to=jch@irif.fr \
    --cc=galene@lists.galene.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox