[Galene] More work on speech-to-text

Galène videoconferencing server discussion list archives
 help / color / mirror / Atom feed

From: Juliusz Chroboczek <jch@irif.fr>
To: galene@lists.galene.org
Subject: [Galene] More work on speech-to-text
Date: Fri, 08 Nov 2024 15:54:55 +0100	[thread overview]
Message-ID: <87ed3l6aio.wl-jch@irif.fr> (raw)

Hi,

I've just finished doing some more work on speech-to-text support.
Galene-stt can now run in three modes:

  - dump a transcript to standard output; this is the default, and is
    useful if you're trying to follow a meeting that's not in a language
    you understand well;

  - dump a transcript to the chat; this is requested with the option
    "-chat", and I don't think it's very useful;

  - generate proper captions; this is requested with the option
    "-caption", and is pretty useful in general.

You first need to create a group with a user "speech-to-text" with the
permission to publish captions.  Here's what I did in order to create the
group <https://galene.org:8443/group/public/stt/>:

    galenectl create-group -group public/stt
    galenectl create-user -group public/stt -user speech-to-text -permissions caption
    galenectl set-password -group public/stt -user speech-to-text -type wildcard
    galenectl create-user -group public/stt -wildcard
    galenectl set-password -group public/stt -wildcard -type wildcard

Now run the galene-stt client on the fastest machine you have access to:

    ./galene-stt -model models/ggml-tiny-q5_1.bin -caption https://galene.org:8443/group/public/stt

Type `./galene-stt -help` for other options.  Whisper.cpp has a lot of
other options which I haven't exported in galene-stt, please let me know
if there are any that you'd find useful.

  https://github.com/ggerganov/whisper.cpp/blob/master/examples/main/main.cpp#L125

The problem is, of course, that whisper.cpp (the speech-to-text library
I'm using) is too slow to produce real-time output; on my
(eight-years-old) laptop, I'm able to run it in real-time using the "tiny"
model, which does not produce useful output in practice.  I'm
experimenting with running it on the GPU, but with little success so far.

The obvious solution would be to use the cloud instance of Whisper instead
of running the inference locally, but that raises serious privacy issues.
I won't be implementing it myself, but if you're not concerned about
privacy, please feel free to fork the galene-stt tool and announce your
fork on the list.

Next steps:

  - more work on audio segmentation;
  - GPU support.

-- Juliusz

                 reply	other threads:[~2024-11-08 14:54 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.galene.org/postorius/lists/galene.lists.galene.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ed3l6aio.wl-jch@irif.fr \
    --to=jch@irif.fr \
    --cc=galene@lists.galene.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox