From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: mail.toke.dk; spf=pass (mailfrom) smtp.mailfrom=irif.fr (client-ip=2001:660:3301:8000::1:2; helo=korolev.univ-paris7.fr; envelope-from=jch@irif.fr; receiver=) Authentication-Results: mail.toke.dk; dkim=pass (2048-bit key; unprotected) header.d=irif.fr header.i=@irif.fr header.a=rsa-sha256 header.s=dkim-irif header.b=TA2oOnnK Received: from korolev.univ-paris7.fr (korolev.univ-paris7.fr [IPv6:2001:660:3301:8000::1:2]) by mail.toke.dk (Postfix) with ESMTPS id 36B9CA98052 for ; Wed, 20 Nov 2024 19:34:54 +0100 (CET) Received: from mailhub.math.univ-paris-diderot.fr (mailhub.math.univ-paris-diderot.fr [81.194.30.253]) by korolev.univ-paris7.fr (8.14.4/8.14.4/relay1/82085) with ESMTP id 4AKIYrYw027257 for ; Wed, 20 Nov 2024 19:34:53 +0100 Received: from mailhub.math.univ-paris-diderot.fr (localhost [127.0.0.1]) by mailhub.math.univ-paris-diderot.fr (Postfix) with ESMTP id 37F5069ECE for ; Wed, 20 Nov 2024 19:34:53 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=irif.fr; h= content-transfer-encoding:content-type:content-type:mime-version :user-agent:subject:subject:from:from:message-id:date:date :received:received; s=dkim-irif; t=1732127691; x=1732991692; bh= +/hmnMoCtu1KAHc3eybvb27b7wENfi5hmyN7tzToOSs=; b=TA2oOnnKMBWsTrqb MFES7MIeWAjpT+ASMMnEeDdwj12au3/uwLeiyOpQH962ty5+wIQjmVVw8RpNkeYe dm6523MX8+Ry18XcJdIFntl6uzESJnmx0RIHZ5V6ShlhYSV9vyI3vd9Jf0kJUGwy r3FCqqsSEsS+w5WV0KHbL5ksXRFJo0xllmqLrrtTheV1bjVW0E7YDI2854Hb9JGN UXPcl8XDa4MWfk2kq8rkPrTrtPGmbQy6DErhT4Kh3S/d9MPBnU3YZzfi1DdrkY1k hW9eu8LW6POlbeB8C2EXMyeBvJbcA0T4Fjn+H8hRFjql/KZO5KxeM3Ok/jCoyLOq 3+rn+g== X-Virus-Scanned: amavisd-new at math.univ-paris-diderot.fr Received: from mailhub.math.univ-paris-diderot.fr ([127.0.0.1]) by mailhub.math.univ-paris-diderot.fr (mailhub.math.univ-paris-diderot.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id RyXqJzxfQFCV for ; Wed, 20 Nov 2024 19:34:51 +0100 (CET) Received: from pirx.irif.fr (89-64-69-205.dynamic.chello.pl [89.64.69.205]) (Authenticated sender: jch) by mailhub.math.univ-paris-diderot.fr (Postfix) with ESMTPSA id 3C43369C7F for ; Wed, 20 Nov 2024 19:34:51 +0100 (CET) Date: Wed, 20 Nov 2024 19:34:43 +0100 Message-ID: <87o729oit8.wl-jch@irif.fr> From: Juliusz Chroboczek To: galene@lists.galene.org User-Agent: Wanderlust/2.15.9 (Almost Unreal) Emacs/29.4 Mule/6.0 MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (korolev.univ-paris7.fr [194.254.61.138]); Wed, 20 Nov 2024 19:34:53 +0100 (CET) X-Miltered: at korolev with ID 673E2BCD.000 by Joe's j-chkmail (http : // j-chkmail dot ensmp dot fr)! X-j-chkmail-Enveloppe: 673E2BCD.000 from mailhub.math.univ-paris-diderot.fr/mailhub.math.univ-paris-diderot.fr/null/mailhub.math.univ-paris-diderot.fr/ X-j-chkmail-Score: MSGID : 673E2BCD.000 on korolev.univ-paris7.fr : j-chkmail score : . : R=. U=. O=. B=0.000 -> S=0.000 X-j-chkmail-Status: Ham Message-ID-Hash: ERJZIENULBUYJTP3X57IECHBJ7UWJBJO X-Message-ID-Hash: ERJZIENULBUYJTP3X57IECHBJ7UWJBJO X-MailFrom: jch@irif.fr X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list Subject: [Galene] Running galene-stt on the GPU List-Id: =?utf-8?q?Gal=C3=A8ne_videoconferencing_server_discussion_list?= Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Performance of galene-stt =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D CPU/GPU Model times realtime (larger is better) --------------------------------------------------- i5-8350U (4-core CPU) base 1.17 Xeon(R) Gold 5220R (24-core CPU) base 1.49 2 * RTX A6000 (GPU) base 27.61 2 * RTX A6000 (GPU) medium 12.68 2 * RTX A6000 (GPU) large-v3 8.66 Clearly, whisper.cpp does not scale well on the CPU: the performance on the 24-core server is only slightly better than on the 4-core laptop. Hence, there's no hope to run anything better than the "base" model on the CPU. Since the "base" model produces random gibberish, running galene-stt on the CPU is not useful. Performance on the GPU, on the other hand, is excellent. On two RTX A6000 using CUDA, we get over 8 times realtime using the large-v3 model (the best model available at the time of writing as far as I'm aware), which means that we should be able to transcribe eight simultaneous speakers. Thanks to Jean-Baptiste Yun=E8s for giving me access to the server. Transcripts =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D The "base" transcript is almost complete gibberish: into little pieces. because... There's not an infinite number of wires between any people. some places. Not everything is connected with its own individual pairs = of for some wires. So we're going to have to share the path. The only way that we could share the path among men. If you think. is to have the amount of work that we do per thing be controlled. >> Yeah. All right. Somebody can send. and to a gigabyte into the path and I can't do anything until until that gigabyte has gone away. makes it. seeing cars and trains on the city road. roadway system. you're gonna spend a lot of time in your car waiting. you want to add an intersection for trying to go by. you want to have. Limited rent things. The "medium" transcript is comprehensible, except for the interesting hallucinations in the fourth caption and at the very end: up into little pieces. because... There's not an infinite number of wires between any two wires. places. Not everything is connected with its own individual parasol. of wires. So we're going to have to share the path. The only way that we could share the path among men. many things. is to have the amount of work that we do per thing be controlled. rolled. If somebody can send-- and a gigabyte into the path and I can't do anything until until that gigabyte has gone away, you know. much like. fixing cars and trains on the city rail. So roadway system. and spent a lot of time in your car waiting at a-- to have an intersection for a train to go by. You don't want to do that= . You want have women that learn things. The "large-v3" transcript avoids the first hallucination: into little pieces. Because There's not an infinite number of wires between any of in places. Not everything is connected with its own individual pairs of of wires. So we're going to have to share the path. The only way that we could share the path among men many things is to have the amount of work that we do per thing be controlled. if somebody can send like a gigabyte into the path and I can't do anything until I that gigabyte has gone away. It's like... fixing cars and trains on the city roads. So roadway system. at an-- spent a lot of time in your car waiting at have an intersection for a train to go by. You don't want to do that. Y= ou want to have and have women who'd lend things. Experimental setup =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D I've tested on a forty-two second fragment of a talk by Van Jacobson from 2006. The clip has a lot of background noise, which is realistic for videoconferencing, but Jacobson speaks slowly and in a very clear (to my ear) American accent. The clip was obtained with the following commands: yt-dlp -f 234 'https://www.youtube.com/watch?v=3DgqGEMQveoqg' ffmpeg -ss 18:45 -t 42 -i "A New Way to look at Networking [gqGEMQveoqg= ].mp4" -acodec copy van.mp4 The clip was played from a Chromium-based browser through the instance of Galene at galene.org, packet loss and all. The laptop was getting RTP/UDP data, while the server was using RTP/TCP/TURN over an SSH tunnel. Don't ask. The commits used were: whisper.cpp 6266a9f9e56a5b925e9892acf650f3eb1245814d galene-stt 88b89cc0550d0b9599b0beeb180fcaf70e591186 The laptop was using gcc 14.2.0 and Go 1.23.3. The server was using gcc 13.2.0, Go 1.22.2, and CUDA 12.6.r12.6. -- Juliusz