brainwane: My smiling face, including a small gold bindi (Default)
[personal profile] brainwane posting in [community profile] access_fandom
Whisper, from OpenAI, is an open source speech recognition tool that also does translation. You can try it right now at https://replicate.com/openai/whisper or install it on your own computer to run privately. You provide an audio file, and it emits a text transcript as well as .srt and .vtt subtitle files.

This is a really useful (and free!) tool. I have started using it regularly to make transcripts and captions/subtitles, and I just wrote a blog post to share how, and why -- plus my reflections on the ethics of using it and similar tools trained using machine learning.

Note that it works on existing files, but does not work for live-transcribing an event as it's happening.

(no subject)

Date: 2022-12-23 09:46 pm (UTC)
deborah: the Library of Congress cataloging numbers for children's literature, technology, and library science (Default)
From: [personal profile] deborah

I was seeing you post about this on mastodon and I'm really of two minds about it. I wonder if any measures have been made of how many people post craptions who'd have actually bothered to create real captions or transcripts without an automated tool -- and craptions as they exist today are generally considered worse than useless. You're a conscientious person and mention in your post you clean them up, but so many people are willing to post the completely unedited auto generated captions and call it a day.

On the other hand, if the people who are using auto detected captions wouldn't be creating captions or transcripts any other way, at least it's not a net negative. If they are bad it shouldn't count toward any kind of required accessibility, but it doesn't hurt if it's not replacing something better.

(Are they really that much better than the current state of speech recognition, though? Even the built-in speech recognition currently packaged for free in most operating systems does a really good job with random voices. Not good enough for true transcripts, but good.

I mean I get the point of what you are doing in your blog post if you are also putting on timestamps that are suitable for SRT files, but you could do that with built-in speech recognition and not worry about the ethics of machine learning and large language models at all.)

Edited (Forgot clause) Date: 2022-12-23 09:47 pm (UTC)

April 2025

S M T W T F S
  12345
6789101112
13141516 171819
20212223242526
27282930   

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags