Ai-Media's Matthew Mello Talks the Evolution of AI Captioning

蒂姆Siglin,创始执行董事, 帮助我流研究基金会, 特约编辑,流媒体, sits down with Matthew Mello, Technical Sales Manager, Ai-Media, 在流媒体东2023的独家采访中讨论人工智能字幕的演变.

Siglin starts the conversation by asking Mello to talk a little about what Ai-Media does.

“We do live human and AI-based transcription for closed captioning,梅洛说。. “通常用于体育或新闻直播,但也包括任何形式的录制内容.”

“So when I've seen human transcripts during a live event, one of the things I've noticed occasionally is things are done phonetically,西格林说. 因为你听到的是一个单词的一部分, you're trying to sort of get ahead of the word, 过了这一关之后, 你没有时间回去改正它. When we use phones that have a certain level of machine learning, 一旦它学会了你正在做的事情, 它会偶尔回去修正. 对我来说,如果我试着输入“Tom”,因为我的名字是Tim,它会不断地把我更正为“Tim”.的权利. 我不得不说,‘不,真的是汤姆.“但是机器学习(转录)系统会在单词被说出之前拼出来吗?? Or do they wait until the word is fully done or the context is fully [understood], 然后你可以在屏幕上看到?”

Mello says that there is an element of contextual learning to their AI transcriptions. “我们拥有的一个好处是,我们有一个基本的字典,他说. “You can go in and kind of customize your dictionary on top of that as another layer. So 假设 it was constantly saying your name was Tom when your name is Tim, 你可以进去说, 千万别说汤姆, 说蒂姆吧.’”

“I worked with a group that did speech-to-text with a product called 龙的自然语言 从几年前开始,”西格林说. “And one of the things you have in English is ‘to, two, too.在这些系统中通常发生的情况是,如果基本包有一组不同的单词,它的工作效果会更好. 所以它在医学上非常有效, 这对法律来说非常有效, because you had the Latin basis of a lot of those. 在10到15分钟的信息之前,一般的对话都不太管用. So tell me how the state of the art has improved on that. If I've not trained the system to a particular voice, 这个基础库在大型语言模型下是否能很好地工作,让它能够识别某人说话的前几句话, 而不是必须接受培训?”

“The newer models are getting much better,梅洛说。. “It has its large dictionary, obviously, but it starts to tune into [context]. Let's say we're talking about a basketball game, 一场NBA比赛, you have two teams that are playing each other. 它可以开始挑选哪两支球队正在比赛,然后通过当前的花名册,并理解你这样拼写这个球员的名字,因为它是这个球队的一部分. So it's starting to get more into that, which is part of the AI piece of this.”

“So it's the filtering piece that works there,西格林说. “本质上,它说,‘我听到凯尔特人和金州勇士队的比赛,它说,‘哦,这一定是篮球……’”

“And it'll do things like capitalize Bucks,梅洛说。. “然而在一种情况下, it might not because you could be talking about bucks, 就像在野外. 所以它开始学习这些东西. 这是最近真正进入自动字幕领域的人工智能部分.”

Siglin asks, “Where's the typical customer for you? Is it broadcasts, or streaming, or enterprise town hall meetings, that kind of thing?”

“The biggest customer for us right now is broadcast,梅洛说。. 他指出,他们的产品 莱克斯 has been working within news broadcasts for the past three years. “有一段时间,它遇到了一些困难,比如玩家的名字、标点符号和一些有时会打断上下文的东西. 我们刚刚推出了一个新版本, 我们给莱克斯安装了一个新引擎, 哪一款游戏能够更好地获取玩家名字等内容. 当然,标点符号是一个更大的帮助,比如易读性.” He cites the example of their current conversation, 其中,当一个人接一个人说话时,莱克斯会添加换行符,还会做一个字形. “So that way it's much easier to understand context and back and forth.”

“So the other problem in the older systems was if you and I talked over each other, 它不知道该跟随谁,西格林说. “但我认为新的语言模型——因为它可以观察音调之类的东西——实际上可以跟踪多个人的谈话.”

“没错,”梅洛说. “And you'll see it line break when I interject…it'll line break, 然后继续你刚才说的话.”

“But what if we're literally talking over each other?西格林问道.


“我们应该试试权威人士,”西格林说. “因为我们对这些系统一直存在的问题是,突然之间,它有三个词在一起没有意义, 因为是你和我在说话, 我们并没有说不同的话.”

“There's a decent chance it would still kind of do that, 好像你说了什么, 我说了些什么, 你说些什么, 都在一行里,梅洛说。.

“But that would actually be better than what we had in the past, [where] it literally would just say ‘unintelligible,在那一点上,’”西格林说. “所以你说你已经推出了一个新引擎,它在细微差别和标点符号方面表现得更好. Where do you see the next markets for what you're doing, beyond broadcast?”

“The newest one that's been really exciting recently is with sports,梅洛说。. “Sports has been traditionally held by human captioning. Now [AI] is finding its place as it's become reliable. You never worry about scheduling a captioner because sometimes they don't show up. 现在的准确度非常好,即使(质量)只是比人工抓手低一点, it's worth it…you can have it there when you need it, 它更实惠, 它很容易使用. 所以体育是一个重要因素. 还有其他部门,比如政府,我们现在正在进行大量的研究,以找出最好的前进道路.”

“尤其是多语言,”西格林说. “So in Canada where everything has to be in French and English, or if you're in the EU where everything has to be in multiple languages simultaneously, that's certainly a fascinating challenge as well.”

“有时候你会同时说英语和法语,梅洛说。. “同声传译,所以你不能只把它设置为英语,它需要来回切换. There is some progress on that too that I've seen very recently.”

“以前有卫星, 我记得它叫SAP, 这是备用音频频道,西格林说. “Where you could essentially flip over [to] French, flip over to your English or German…the captioning, if it's two languages being spoken simultaneously, 有些人可能不愿意站起来去他们的片场把字幕从英语改成法语,因为他们更愿意听法语. 有没有关于你如何选择他们喜欢在封闭的标题中看到什么的模型?”

“If, 假设, 大部分节目都是英语的, but you wanted to have it available in French also, you can do that with a translation pretty easily, 把它作为一个单独的轨道,梅洛说。. “So the viewer can decide if they want English or French captions. Now what's very new is it being able to automatically detect languages. 在飞行中来回翻转.”

