From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: mail.toke.dk; spf=pass (mailfrom) smtp.mailfrom=irif.fr (client-ip=2001:660:3301:8000::1:2; helo=korolev.univ-paris7.fr; envelope-from=jch@irif.fr; receiver=) Authentication-Results: mail.toke.dk; dkim=pass (2048-bit key; unprotected) header.d=irif.fr header.i=@irif.fr header.a=rsa-sha256 header.s=dkim-irif header.b=X47hRNMA Received: from korolev.univ-paris7.fr (korolev.univ-paris7.fr [IPv6:2001:660:3301:8000::1:2]) by mail.toke.dk (Postfix) with ESMTPS id 43005A95DD1 for ; Fri, 08 Nov 2024 15:54:59 +0100 (CET) Received: from mailhub.math.univ-paris-diderot.fr (mailhub.math.univ-paris-diderot.fr [81.194.30.253]) by korolev.univ-paris7.fr (8.14.4/8.14.4/relay1/82085) with ESMTP id 4A8EswPx021764 for ; Fri, 8 Nov 2024 15:54:58 +0100 Received: from mailhub.math.univ-paris-diderot.fr (localhost [127.0.0.1]) by mailhub.math.univ-paris-diderot.fr (Postfix) with ESMTP id 9BD4F74E0C for ; Fri, 8 Nov 2024 15:54:58 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=irif.fr; h= content-type:content-type:mime-version:user-agent:subject :subject:from:from:message-id:date:date:received:received; s= dkim-irif; t=1731077697; x=1731941698; bh=4YEtjp9OBNYU96pCBtraKK IP5m6IiqPcYpeqQxbHMl8=; b=X47hRNMA2ahkmeQ9OgDCPKnlrUsJgB0PyZQ8lD vdfqpEI9eLtFO1mChF89K8F+O6tZnE0+yL0uk7RQZITHHsQvI0+rlbxR6zjkEPdJ +wmt15ln/0nJoba50baz8iH50oH2vXk7ePZ7OzcWkA4zaDnj3xxE094h0smUUUJS K7T45S+y+76MgIsop97VZwQnTwwVHbp9kbKuNdoTYJOz65+hbfYrl0kk1otCm9qN GumxsYGQHnkflDmnCns/OsHN6Y6fw4zLePZrTsqK2NpitZSzm/IoNt02ZMZXEP8c PYJlZPoh9ZIeiDbLBhKySE+4BmhdFbEyyotq6Q+wNegmXazA== X-Virus-Scanned: amavisd-new at math.univ-paris-diderot.fr Received: from mailhub.math.univ-paris-diderot.fr ([127.0.0.1]) by mailhub.math.univ-paris-diderot.fr (mailhub.math.univ-paris-diderot.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id UML77gkGYXaG for ; Fri, 8 Nov 2024 15:54:57 +0100 (CET) Received: from pirx.irif.fr (89-64-68-167.dynamic.chello.pl [89.64.68.167]) (Authenticated sender: jch) by mailhub.math.univ-paris-diderot.fr (Postfix) with ESMTPSA id 4B21374CF8 for ; Fri, 8 Nov 2024 15:54:56 +0100 (CET) Date: Fri, 08 Nov 2024 15:54:55 +0100 Message-ID: <87ed3l6aio.wl-jch@irif.fr> From: Juliusz Chroboczek To: galene@lists.galene.org User-Agent: Wanderlust/2.15.9 (Almost Unreal) Emacs/29.4 Mule/6.0 MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (korolev.univ-paris7.fr [194.254.61.138]); Fri, 08 Nov 2024 15:54:58 +0100 (CET) X-Miltered: at korolev with ID 672E2642.000 by Joe's j-chkmail (http : // j-chkmail dot ensmp dot fr)! X-j-chkmail-Enveloppe: 672E2642.000 from mailhub.math.univ-paris-diderot.fr/mailhub.math.univ-paris-diderot.fr/null/mailhub.math.univ-paris-diderot.fr/ X-j-chkmail-Score: MSGID : 672E2642.000 on korolev.univ-paris7.fr : j-chkmail score : . : R=. U=. O=. B=0.000 -> S=0.000 X-j-chkmail-Status: Ham Message-ID-Hash: 2ZCJ34G3CXI7SNNEC6VW67TNRD3Z34U4 X-Message-ID-Hash: 2ZCJ34G3CXI7SNNEC6VW67TNRD3Z34U4 X-MailFrom: jch@irif.fr X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list Subject: [Galene] More work on speech-to-text List-Id: =?utf-8?q?Gal=C3=A8ne_videoconferencing_server_discussion_list?= Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Hi, I've just finished doing some more work on speech-to-text support. Galene-stt can now run in three modes: - dump a transcript to standard output; this is the default, and is useful if you're trying to follow a meeting that's not in a language you understand well; - dump a transcript to the chat; this is requested with the option "-chat", and I don't think it's very useful; - generate proper captions; this is requested with the option "-caption", and is pretty useful in general. You first need to create a group with a user "speech-to-text" with the permission to publish captions. Here's what I did in order to create the group : galenectl create-group -group public/stt galenectl create-user -group public/stt -user speech-to-text -permissions caption galenectl set-password -group public/stt -user speech-to-text -type wildcard galenectl create-user -group public/stt -wildcard galenectl set-password -group public/stt -wildcard -type wildcard Now run the galene-stt client on the fastest machine you have access to: ./galene-stt -model models/ggml-tiny-q5_1.bin -caption https://galene.org:8443/group/public/stt Type `./galene-stt -help` for other options. Whisper.cpp has a lot of other options which I haven't exported in galene-stt, please let me know if there are any that you'd find useful. https://github.com/ggerganov/whisper.cpp/blob/master/examples/main/main.cpp#L125 The problem is, of course, that whisper.cpp (the speech-to-text library I'm using) is too slow to produce real-time output; on my (eight-years-old) laptop, I'm able to run it in real-time using the "tiny" model, which does not produce useful output in practice. I'm experimenting with running it on the GPU, but with little success so far. The obvious solution would be to use the cloud instance of Whisper instead of running the inference locally, but that raises serious privacy issues. I won't be implementing it myself, but if you're not concerned about privacy, please feel free to fork the galene-stt tool and announce your fork on the list. Next steps: - more work on audio segmentation; - GPU support. -- Juliusz