推荐引擎是很奇怪的东西. Recently my better half has been retraining from interpreting and translation to work with subtitling. 作为一个家庭, 我们总是有字幕/字幕(我们将在稍后公布差异), 因为我女儿有一只耳朵听力严重受损.  

当然, 我的工作是流媒体, 所以我想当大卫·隆卡, 他是Facebook的视频编码总监, 出现在我领英新闻提要的顶部,上面写着:

David Ronca LinkedIn帖子


现在, 而我已经在新兴的流媒体视频技术领域工作了25年以上, subtitles have always been "yet another data stream" that we need to handle rather than "a data stream I have to understand the particulars of.和许多读者一样,我也听说过 .srt和WebVTT,以及 .stl文件. I have merged them with compressed video and audio and packaged them into container 格式s and shipped them in quantity to audiences all over the world. 我觉得我对字幕和定时字幕很熟悉.

然后我发现自己, 就在同一天我看到了大卫的帖子, looking over my (fairly non-technical) wife's shoulder as she was being trained to use a subtitling system. I saw minute timing and position on an edit-decision timeline in an application entirely focused on optimising the subtitle data tied to particular cuts and edits of programming, 我意识到我真的有 不知道 关于字幕格式的细节. 

虽然我们大多数业内人士都可以谈论音频和视频压缩的历史, 以及发行形式, 我们中很少有人 真的 了解市场上的定时文本、协议和数据结构,以及它们存在的原因.

所以我联系了David分享的帖子的作者:Pierre-Anthony Lemieux, 请他给我介绍一下计时文本的细微差别, 并进一步解释ttconv是什么, 以及大卫(和其他人)对此感到兴奋的原因.

Pierre-Anthony has been very focussed on the specifics of Timed-Text and Subtitling for around 8 years. 这是一段有趣的时光. These technologies obviously date back many years before even the Internet was an idea in Vint Cerf and Bob Kahn’s mind.  



在默片时代,字幕很重要,原因显而易见. 简单地停止电影, 显示中间标题, then to return to the programming meant that localization of films for international multi-language distribution was 真的 simple. 只需简单地翻译和替换字幕,这部电影就能突然吸引更多的全球观众. 对于电影的头20年来说,这是伟大的, 但是到了1927年,声音出现了,这使得字幕在很大程度上是多余的. 所以音频配音(甚至完全重拍)的复杂性!),供多语种读者阅读. 

This led to scrolling captions and the very earliest attempts to provide both timed text with localization that could be versioned in post.

在接下来的五六十年间, localization was the key driver for creating timed text—new markets and new revenue all make the extra production cost worthwhile. 直到1980年,美国广播公司,全国广播公司和公共广播公司首次为聋哑人提供封闭字幕 可访问性 也成为了这些技术背后的关键驱动力. 下面列出了不同类型的定时文本的定义, 包括字幕, 字幕, 和更多的.


附件B     定时文本定义(规范)

B.1           字幕

字幕是音轨的文本表示, 通常只有对话框,而且通常使用的语言不是音轨对话框, 适合外语读者阅读.

B.2           给听力受损人士的说明文字

听障人士的字幕是音轨的文本表示, 通常包括所有的声音, 通常和音轨对话框使用相同的语言, 专为听力受损的观众设计.

B.3           给视障人士的文字

给视障人士的文字 is a textual description of visual elements of the content 通常和音轨对话框使用相同的语言, 专为视障人士设计.

B.4           评论

注释提供了有关相关内容的额外信息.g. 制作人评论)通常使用与音轨对话框相同的语言.

B.5           卡拉ok

卡拉ok是歌词的文字表现形式, 通常用与相关歌曲相同的语言.

B.6           强迫的叙述

在媒体上出现的与外国或异域语言或翻译文本有关的定时文本, 例如在一个标志中, 如果没有其他定时文本,则打算显示该文本, 比如字幕或字幕, 启用了.

In 1990, 《百家乐app下载》要求在电视中安装21线预留和字幕解码器. But it wasn’t until 1996 that regulatory forces acknowledged that hard-of-hearing audiences were inconsistently able to consume video/TV/film media and drafted policy mandating that TV broadcasting should always contain 字幕. It took until 2010 to mandate that other forms of video—those pertinent to our industry—should be produced with Captioning.

勒米厄强调,这是全球流媒体平台的出现,而且是全新的, complex localization requirements—that were one of the critical influences on this legislative landscape.

Traditional content windowing meant that an English film would first roll out in English-speaking countries purely with English 字幕. 然后, 如果市场需求/反应良好, the extra production overhead of foreign language subtitling or dubbing would be added on a territory-by-territory basis as the content was windowed for those countries.

But while legislation for 可访问性 had been the key driver to include timed-text for much of the last century, 全球市场力量——1.e.流媒体,再一次扩展了这一点. 进入个人电脑/平板电脑/手机时代(现在说“后疫情”还为时过早?") and today many releases are captioned ready for launch into typically 10 language territories from the outset. Netflix已经达到了大约30种本地化语言. 

而到了1988年,只有200人左右,已经售出了000套字幕系统, 如今,字幕技术可扩展且成本低廉, 有时作为一个web应用程序, with millions of professional subtitlers (like my wife will hopefully qualify to be soon) working on an endless sea of new content.


所以在这个过程中, it was time to look into the tech: I asked Pierre-Anthony a (purposefully naive) fun question: "Isn’t it just a time-stamps and some ASCII??他的回答是:

是的, on one level it is … but there are now a number of legacy systems that produce that timed text 格式ted in specific ways. 我们有 .由DVD演变而来的srt文件, 我们有SCC和STL,它们是从美国和欧洲的广播广播发展而来的.  我们有卡拉ok专用的格式. 我们有 standards that include positioning 信息—so timed text can also be overlaid over the speaker when filming contains a group of speakers. 我们有 standards that enable coloring—again to further help the viewer separate speakers in the flow of dialog.

当我们看不同的语言, 有明显的“方向性”——一些脚本从右向左读, 甚至是垂直的, 即使在不同的语言中,也有不同的方法. 日文,以及结合垂直/水平文本,可以有可选的 “鲁比' 汉字(furigana)添加到基本的汉字文本中, 这可以帮助缺乏经验的读者找出可能有不同意思的细微发音. 这些角色的位置非常微妙, but of course needs to be expressible in timed text 格式ting and often present very challenging rendering problems. 

每一个都有自己的生产工作流遗产, and this means that there are many legacy tools widely in use and the resulting timed-text data is not suited to modern standardisation. 许多这些不同的格式在大型内容存档中广泛存在, 并且必须在内容可以重新用于现代工作流程之前进行翻译. As the use of that media is changing so too are the requirements on the conformity and portability of these timed text 格式s.

这就是 ttconv 进来. Ttconv是一个格式转换器. 我们有 introduced it to help to translate legacy 格式s to what has been widely adopted as the Timed Text 'Lingua Franca’ which is the W3C standardised IMSC 格式.

有了这个对ttconv的有力解释, 以及它所解决的问题,我也跟进了大卫·隆卡.  

I wanted to know why ttconv is solving a problem for Facebook in particular (the source of the original post).  "Ffacebook为观看视频内容提供了一个很好的平台, 而字幕是这个平台的重要组成部分, 启用可访问性和本地化,朗卡说. “而IMSC 1.1是全球字幕的现代格式, 大量的全球视频内容都有遗留格式的字幕. 我们觉得这个行业需要一个现代化的, 用于验证这些遗留字幕格式并将其转换为IMSC 1的高质量开源工具.我们与Pierre和Sandflow合作开发了这个新工具."

如果他看到(同行公司)采用IMSC1的模式.1广泛. 这是一个行业应该更广泛参与的“运动”吗? "IMSC1.1是唯一支持所有全球语言字幕的定时文本模型, 包括复杂格式的字幕和日语等,朗卡说. “我们确实认为,未来所有新的字幕资产都应该在IMSC1中编写.1." 

最后, I asked both if they had advice for 流媒体 readers seeking to improve timed text and subtitling workflows end to end. 

"我个人对业内所有人的建议是将他们的工具和工作流程迁移到IMSC 1.1、使用IMSC1.对于所有新创建的字幕资产来说都是如此,”Ronca说. “进一步, I recommend to integrate TTV into workflows to insure that all incoming and outgoing subtitle assets are conformed to the IMSC1.1规范. 这将最大化字幕的互操作性. 最后,将字幕视为流媒体系统中的一等公民. 要求字幕的质量和QoS与音频和视频相同."

lemiux补充道:“在整个过程中,将计时文本与音频和视频同等对待. 这意味着,例如,定时文本是分销大师的一个组成部分. 投资于定时文本创作实践和格式,让整个世界都能接触到. 这意味着教育上游字幕/字幕提供商和下游平台, 例如, 创建创作指南, 样品, 转换和验证工具(如ttconv), 等." 

总而言之:定时文本并不是聋哑人的“选择”, or those who want to watch foreign movies: One 2016 survey showed that 85% of all video on Facebook is watched with the sound off. 所以下次你浏览新闻推送的时候, 或者在嘈杂的酒吧里看体育频道, 但多亏了字幕和字幕,我才能跟上, stop for a moment and think about the subtle but important role that timed text plays in day to day lives. 它确实是我们这个行业的无名英雄.

