From: Juliusz Chroboczek <jch@irif.fr>
To: galene@lists.galene.org
Subject: [Galene] More work on speech-to-text
Date: Fri, 08 Nov 2024 15:54:55 +0100 [thread overview]
Message-ID: <87ed3l6aio.wl-jch@irif.fr> (raw)
Hi,
I've just finished doing some more work on speech-to-text support.
Galene-stt can now run in three modes:
- dump a transcript to standard output; this is the default, and is
useful if you're trying to follow a meeting that's not in a language
you understand well;
- dump a transcript to the chat; this is requested with the option
"-chat", and I don't think it's very useful;
- generate proper captions; this is requested with the option
"-caption", and is pretty useful in general.
You first need to create a group with a user "speech-to-text" with the
permission to publish captions. Here's what I did in order to create the
group <https://galene.org:8443/group/public/stt/>:
galenectl create-group -group public/stt
galenectl create-user -group public/stt -user speech-to-text -permissions caption
galenectl set-password -group public/stt -user speech-to-text -type wildcard
galenectl create-user -group public/stt -wildcard
galenectl set-password -group public/stt -wildcard -type wildcard
Now run the galene-stt client on the fastest machine you have access to:
./galene-stt -model models/ggml-tiny-q5_1.bin -caption https://galene.org:8443/group/public/stt
Type `./galene-stt -help` for other options. Whisper.cpp has a lot of
other options which I haven't exported in galene-stt, please let me know
if there are any that you'd find useful.
https://github.com/ggerganov/whisper.cpp/blob/master/examples/main/main.cpp#L125
The problem is, of course, that whisper.cpp (the speech-to-text library
I'm using) is too slow to produce real-time output; on my
(eight-years-old) laptop, I'm able to run it in real-time using the "tiny"
model, which does not produce useful output in practice. I'm
experimenting with running it on the GPU, but with little success so far.
The obvious solution would be to use the cloud instance of Whisper instead
of running the inference locally, but that raises serious privacy issues.
I won't be implementing it myself, but if you're not concerned about
privacy, please feel free to fork the galene-stt tool and announce your
fork on the list.
Next steps:
- more work on audio segmentation;
- GPU support.
-- Juliusz
reply other threads:[~2024-11-08 14:54 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.galene.org/postorius/lists/galene.lists.galene.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ed3l6aio.wl-jch@irif.fr \
--to=jch@irif.fr \
--cc=galene@lists.galene.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox