Parser Captions – 3Play Media Support

Introduction

3Play Media is committed to an affordable transcription/captioning solution that is scalable without sacrificing quality/accuracy. While captions created and formatted 100% by hand certainly have benefits like formatting decisions for logical or aesthetic breaks, it certainly has drawbacks like high cost and long turnaround times. While there is no question that the state of the art in Natural Language Processing (NLP) does not, in fact, reach the level of quality that can be achieved with sophisticated human judgement, 3Play Media's automated approach has already achieved impressive results, and, for most applications, are more than adequate.

That being said, there is always room for improvement and our research and development teams have made huge strides to revamp our caption formatting algorithm to better replicate decisions a grammar-conscious human would make when formatting the text of your captions.

It should be noted that this improvement comes without impacting 3Play Media's pricing, transcription accuracy, or our scalability.

The following details a new patent-pending algorithm, referred to as Parser Driven Captions released on 8/22/2013.

NOTE: New accounts added to this system on 8/22/13 and there after have this algorithm finishing their caption files by default. Customers with accounts set up pre-8/22 can have their accounts configured to use the newer algorithm. Also individual projects, batches, and even files can have this algorithm retroactively applied upon request.

What is a caption frame?

A caption file is comprised of numerous caption frames. When presented in the form of captions, the text of the spoken words of a media file are divided into caption frames. Each caption frame traditionally contains two lines of text with a 32 character max per line.

A good way to think about a caption frame is to relate it to a slide in a slideshow... When a video is played with closed captions, text appears on the screen and is synched to the audio of the spoken word. The entire caption file is the slideshow running alongside your video or media file, and that caption file, just a like slideshow contains multiple slides, as well as multiple caption frames.

So in a film as a character delivers a dramatic monologue, for example, the audience doesn't see an overwhelming block of text crammed onto one "slide" or caption frame, they see this monologue neatly organized and divided over multiple caption frames, each caption frame holding on screen for a certain period of time and changing to a new caption frame when the words in the 2 lines of text/32 characters per line constraint have been uttered by the on-screen talent.

The "Balancing Act" of Formatting Captions Frames

When 3Play Media creates caption frames (as part of our default transcription service), we take the following objectives into consideration:

Objective 1: Readability

How long the captions appear on screen given the number of words.

Objective 2: Logic

Sentences that are too long to fit on one caption frame
are divided in a way that "makes sense" to the reader.

Objective 3:Real estate

Without sacrificing Objective 1, and secondarily Objective 2,
the text from broken up sentences should be divided so as
to fill up all frames as best as possible.

The objectives above also point towards a higher level goal,
to make captions that are easily readable.

The new algorithm addresses Objective 2 by favoring grammatical breaks making explicit the tradeoff between using all of the real estate available for the captions (Objective 3) and forcing breaks in the caption frames that are less jarring to the reader (or forcing breaks at what the Stanford hierarchical lexical statistical parser describes as high-level phrase boundaries).

This tradeoff is implemented as a weighted cost function, where the cost of a break is a combination of the " grammatical cost" (i.e.a cost derived from the parse tree structure), and the "fill cost" of both the current caption frame and the next caption frame.The fill cost for a frame is determined according to how much of the frame's textual real estate is used up by the characters in the caption.

Flexible configurations

The following configurations would need to be applied to your file, batch, or project by a 3Play Media team member. Parser Driven Captions offer a quick solution to captions created pre-8/22/13 to improve either the aesthetics or timing of their captions or to provide their audience with a smoother reading experience by applying logical breaks based on grammar.

The update to our captions formatting algorithm offers several options that can be configured upon request to address the overall timings of the frames and in turn improve readability (Objective 1). Parser Driven Captions also supports any number of lines per caption frame whereas previously it was 2 lines per frame. This is in addition to a maximum line-length setting, which has always been available to change upon request. All aesthetic preferences that were available previously are still supported.

NOTE: If a request has been made to a 3Play Media file listed as complete, the file while regenerating may appear unavailable and temporarily return to the In Progress state.

Related articles