-->
Save your FREE seat for 流媒体 Connect this August. 现在注册!

Ai-Media's Matthew Mello Talks the Evolution of AI Captioning

蒂姆Siglin,创始执行董事, 帮助我流研究基金会, 特约编辑,流媒体, sits down with Matthew Mello, Technical Sales Manager, Ai-Media, 在流媒体东2023的独家采访中讨论人工智能字幕的演变.

Siglin starts the conversation by asking Mello to talk a little about what Ai-Media does.

“We do live human and AI-based transcription for closed captioning,梅洛说。. “通常用于体育或新闻直播,但也包括任何形式的录制内容.”

“So when I've seen human transcripts during a live event, one of the things I've noticed occasionally is things are done phonetically,西格林说. 因为你听到的是一个单词的一部分, you're trying to sort of get ahead of the word, 过了这一关之后, 你没有时间回去改正它. When we use phones that have a certain level of machine learning, 一旦它学会了你正在做的事情, 它会偶尔回去修正. 对我来说,如果我试着输入“Tom”,因为我的名字是Tim,它会不断地把我更正为“Tim”.的权利. 我不得不说,‘不,真的是汤姆.“但是机器学习(转录)系统会在单词被说出之前拼出来吗?? Or do they wait until the word is fully done or the context is fully [understood], 然后你可以在屏幕上看到?”

Mello says that there is an element of contextual learning to their AI transcriptions. “我们拥有的一个好处是,我们有一个基本的字典,他说. “You can go in and kind of customize your dictionary on top of that as another layer. So 假设 it was constantly saying your name was Tom when your name is Tim, 你可以进去说, 千万别说汤姆, 说蒂姆吧.’”

“I worked with a group that did speech-to-text with a product called 龙的自然语言 从几年前开始,”西格林说. “And one of the things you have in English is ‘to, two, too.在这些系统中通常发生的情况是,如果基本包有一组不同的单词,它的工作效果会更好. 所以它在医学上非常有效, 这对法律来说非常有效, because you had the Latin basis of a lot of those. 在10到15分钟的信息之前,一般的对话都不太管用. So tell me how the state of the art has improved on that. If I've not trained the system to a particular voice, 这个基础库在大型语言模型下是否能很好地工作,让它能够识别某人说话的前几句话, 而不是必须接受培训?”

“The newer models are getting much better,梅洛说。. “It has its large dictionary, obviously, but it starts to tune into [context]. Let's say we're talking about a basketball game, 一场NBA比赛, you have two teams that are playing each other. 它可以开始挑选哪两支球队正在比赛,然后通过当前的花名册,并理解你这样拼写这个球员的名字,因为它是这个球队的一部分. So it's starting to get more into that, which is part of the AI piece of this.”

“So it's the filtering piece that works there,西格林说. “本质上,它说,‘我听到凯尔特人和金州勇士队的比赛,它说,‘哦,这一定是篮球……’”

“And it'll do things like capitalize Bucks,梅洛说。. “然而在一种情况下, it might not because you could be talking about bucks, 就像在野外. 所以它开始学习这些东西. 这是最近真正进入自动字幕领域的人工智能部分.”

Siglin asks, “Where's the typical customer for you? Is it broadcasts, or streaming, or enterprise town hall meetings, that kind of thing?”

“The biggest customer for us right now is broadcast,梅洛说。. 他指出,他们的产品 莱克斯 has been working within news broadcasts for the past three years. “有一段时间,它遇到了一些困难,比如玩家的名字、标点符号和一些有时会打断上下文的东西. 我们刚刚推出了一个新版本, 我们给莱克斯安装了一个新引擎, 哪一款游戏能够更好地获取玩家名字等内容. 当然,标点符号是一个更大的帮助,比如易读性.” He cites the example of their current conversation, 其中,当一个人接一个人说话时,莱克斯会添加换行符,还会做一个字形. “So that way it's much easier to understand context and back and forth.”

“So the other problem in the older systems was if you and I talked over each other, 它不知道该跟随谁,西格林说. “但我认为新的语言模型——因为它可以观察音调之类的东西——实际上可以跟踪多个人的谈话.”

“没错,”梅洛说. “And you'll see it line break when I interject…it'll line break, 然后继续你刚才说的话.”

“But what if we're literally talking over each other?西格林问道.

“这是一个很好的测试!梅洛说。.

“我们应该试试权威人士,”西格林说. “因为我们对这些系统一直存在的问题是,突然之间,它有三个词在一起没有意义, 因为是你和我在说话, 我们并没有说不同的话.”

“There's a decent chance it would still kind of do that, 好像你说了什么, 我说了些什么, 你说些什么, 都在一行里,梅洛说。.

“But that would actually be better than what we had in the past, [where] it literally would just say ‘unintelligible,在那一点上,’”西格林说. “所以你说你已经推出了一个新引擎,它在细微差别和标点符号方面表现得更好. Where do you see the next markets for what you're doing, beyond broadcast?”

“The newest one that's been really exciting recently is with sports,梅洛说。. “Sports has been traditionally held by human captioning. Now [AI] is finding its place as it's become reliable. You never worry about scheduling a captioner because sometimes they don't show up. 现在的准确度非常好,即使(质量)只是比人工抓手低一点, it's worth it…you can have it there when you need it, 它更实惠, 它很容易使用. 所以体育是一个重要因素. 还有其他部门,比如政府,我们现在正在进行大量的研究,以找出最好的前进道路.”

“尤其是多语言,”西格林说. “So in Canada where everything has to be in French and English, or if you're in the EU where everything has to be in multiple languages simultaneously, that's certainly a fascinating challenge as well.”

“有时候你会同时说英语和法语,梅洛说。. “同声传译,所以你不能只把它设置为英语,它需要来回切换. There is some progress on that too that I've seen very recently.”

“以前有卫星, 我记得它叫SAP, 这是备用音频频道,西格林说. “Where you could essentially flip over [to] French, flip over to your English or German…the captioning, if it's two languages being spoken simultaneously, 有些人可能不愿意站起来去他们的片场把字幕从英语改成法语,因为他们更愿意听法语. 有没有关于你如何选择他们喜欢在封闭的标题中看到什么的模型?”

“If, 假设, 大部分节目都是英语的, but you wanted to have it available in French also, you can do that with a translation pretty easily, 把它作为一个单独的轨道,梅洛说。. “So the viewer can decide if they want English or French captions. Now what's very new is it being able to automatically detect languages. 在飞行中来回翻转.”

了解更多关于人工智能和流媒体的信息 流媒体连接2023.

流媒体覆盖
免费的
合资格订户
现在就订阅 最新一期 过去的问题
相关文章

AI在流媒体中的利弊

ChatGPT和其他人工智能应用的增长和采用对流媒体专业人士有什么显著的好处和坏处, how can they leverage its strengths effectively? 波士顿25新闻,本·拉特纳报道, LiveX的科里·本克, AugX Labs的Jeremy Boeman说道, Mobeon的Mark Alamares, 和Intellivid 研究的Steve Vonder Haar在流媒体连接2023的这段视频中讨论.

How to Deal with AI Hallucinating, Copyright, Fact-Checking

流媒体专业人士如何应对使用人工智能系统时出现的所有版权和事实核查陷阱,这些人工智能系统接受过公共数据集的训练,就像互联网本身一样错误丛生、被不当征用? 波士顿25新闻,本·拉特纳报道, IntelliVid 研究的Steve Vonder Haar说, AugXLabs的Jeremy Toeman说, 和LiveX的科里·本克讨论如何在这段来自流媒体连接2023的剪辑中导航这个雷区.

How Does Generative AI Impact Streaming Monetization?

Generative AI is a game-changer for all sorts of businesses, 如何利用它,是一系列流媒体组织寻求将其内容和运营货币化的关键战略和技术问题. Darcy Lorincz of Barrett-Jackson Auction Company, C.J. 我是Mad Leo咨询公司的Leonard, 和Reality Software的Nadine Krefetz在流媒体连接2023的小组讨论中探讨了生成人工智能当前和即将产生的影响.

如何供应生活用品 & 多语言视频点播,字幕 & 美国手语

As legal requirements and ever more diverse audiences demand multi-language captions, 本地化, 美国手语 support in their live and on-demand streaming content, 内容开发者和制作人如何能够同时满足法规要求和提供内容的技术要求,从而获得最大的可访问性? LiveX的Corey Behnke和2G数字优化的Allan McLennan在流媒体东2023的小组讨论中讨论了这一片段.

如何利用人工智能在OTT领域省钱

IBM沃森客户端解决方案工程师Ethan Dreilinger讨论了AI/ML对流媒体QoE和ROI的影响,这段视频来自流媒体连接2022.

AI字幕的缺点

LiveX的科里·本克(Corey Behnke)讨论了人工智能无法做到的对字幕准确性的需求, 特别是在大流行期间,对可及性的需求增加了, in this clip from 流媒体 East Connect 2021.

FCC Captioning Requirements for Streaming Video

FCC首席, 残疾权益办事处, Suzy Rosen Singleton分解了FCC字幕要求,因为它们适用于流媒体内容,这是她在2020年流媒体西部连接上的演讲片段.

提及的公司及供应商
" class="hidden">杭州学军中学