Vertical Caption (or Subtitle) Placement

What is Vertical Caption Placement?

Vertical caption (or subtitle) placement is our patent-pending solution to a common problem: Some videos have text on the bottom of the screen that is important for understanding the video. For example, a documentary may include the names and titles of interview subjects as they are first introduced. An “auction” reality TV show may display the current prices of an item as it is bid up by the participants. Or an engineering lecture may show a formula near the bottom of the screen.

Image of how vertical caption frames move up to avoid onscreen text

In each of these cases, placing a caption at the bottom of the frame, as is usually done, would obscure text that is important for the viewer to see. Instead, the captions should be placed at the top of the screen during these time periods (as long as the top of the screen does not also contain such text). It is also important for such placed captions to be vertically “stable”; they should not jump around in the middle of a sentence, and they should remain in their current position for long enough so that their movement does not overly distract the viewer.

Our vertical caption placement functionality meets all of these requirements, and does so completely automatically, cost-effectively, and as part of the standard captioning and subtitling workflow.

Click here to play the demonstration video of vertical caption placement. 

Important note: Files uploaded to your account by default do not go through the vertical caption placement algorithm. VCP can be requested as an add-on service as part of the upload process or can be ordered for any completed file that contains a source video.

How Does It Work?

The caption placement algorithm works in several stages:

First, every frame of the video is examined by a text detection algorithm that uses a statistical model to determine the likelihood that text exists at the bottom of the frame. This statistical model was developed using many thousands of hand-labeled video frames and is designed to identify regions where letter-like patterns appear. The algorithm does not attempt to “read” the text. It is interested only in determining the probability that text-like patterns are present in the search region.

Next, the algorithm searches across time, comparing the vertical pixel location of the purported text regions for every frame. The algorithm then computes a time-dependent probability for the text, taking into account the per-frame probabilities as well as the positional stability across frames.

Then, if the time-dependent text probability is high enough, the algorithm checks for burned-in text at the top of the video, over the same time region. The bottom and top text probabilities are compared, and if the bottom probability is higher, captions are moved to the top of the video. Usually, the top probability is very close to zero because it is uncommon for text to exist at the top. If the top probability is higher than the bottom probability, the caption is left in the bottom position.

Finally, the algorithm applies time and continuity constraints to all time regions where captions are to be repositioned to the top of the video. In particular, if any part of a sentence is to be placed at the top, then all captions in that sentence will also be placed at the top. Or, if a sentence is very short, such that the captions would jump back and forth between top and bottom locations, the algorithm may choose to leave the caption at the more common location (e.g., instead of going top-bottom-top, it may resolve to do top-top-top).

The vertical caption placement process is applied to the core timed text document, associating the top and bottom placement indicators with each frame. When you request an output format that supports caption (or subtitle) placement (see below), the core document is converted to that format, with the placement information translated in a format-compatible manner. The downloaded caption (or subtitle) file can then be used with a video player that supports caption placement.


Caption Placement Limitations

  • It is not possible to use the Same Day service in combination with caption placement due to the increased processing time.
  • The automatic caption placement algorithm (as opposed the manual caption placement service) is designed to err on the side of moving captions to the top, which results in fewer cases of bottom text obscuration. On rare occasions, the captions are moved to the top needlessly.
  • Videos that have burned-in text at the top of the frame can be put through vertical caption placement, but we will NOT check the top of the frame prior to placement; i.e. the top-of-frame text probability will always be assumed to be zero.
  • Vertical caption placement can be run retroactively on files that have already been processed. However, it must be done within 60 days after file completion. The process requires the source video files, which are deleted from our system after 60 days.
  • A per-minute price increment applies. See pricing details.

Supported Output Formats

Below are the caption/subtitle output formats that support vertical placement.


Have more questions? Submit a request


Please sign in to leave a comment.
Powered by Zendesk