Alignment: Best Practices

What you should know about 3Play Media's Alignment Service...

Transcription Alignment service

To achieve best results using the transcript alignment service, it is important to be aware of the following best practices when submitting transcripts and media files.

For users new to the transcript alignment process it is suggested to set up the alignment review tool before submitting transcript for alignment.

See more information on the transcript alignment review tool

Upload:

Submitting the transcript from the account system

When uploading transcripts directly from the 3Play Media account system, it is best to drag and drop your unformatted UTF-8 encoded plain text (.TXT) file from your desktop to the text box associated with the media file you are aligning. 

Drag/Drop is the best method when uploading UTF-8 encoded .TXT transcripts directly to the account system 
Cutting and Pasting your transcript may introduce errors that will appear in the aligned file when complete.

DO NOT drag/drop or cut/paste text from a Microsoft Word document as this will result in encoding errors.
DO NOT submit songs, music etc. This process is intended for speech not singing.

If you exported from Word to UTF-8, and drag/drop results in corruption, then go back to Word, export to ASCII, confirm the file is good in a text editor (like TextEdit), then drag/drop that file.

Proofread your transcript after saving it to the text box that corresponds to your media file.
The text you see onscreen after clicking Save will be the text returned to you within your file.
If you see apostrophes or other characters converted to ???s or ?s this means there are encoding issues and this should be repaired before clicking Submit.

Submitting the transcript via  FTP or API

For large numbers of submissions for alignment FTP/API uploads are best assuming the file has no extraneous formatting and all text corresponds to the audio.
See below for information regarding formatting and text. 

Please only submit video/audio, that is in one language only.  Multi lingual video/audio, can cause alignment issues.

 

Text Requirements:

All transcripts submitted must be in an UTF-8 unformatted plain text (.TXT) format. 
Submit plain text only (no tables, html markup, etc). 

Exporting from Microsoft Word: 

When you save a Microsoft Word document as a .txt file, Microsoft Word special characters are not removed. 
When you save them in the upload interface we convert them to their ASCII equivalent. 
For example, em-dash becomes ASCII dash, all MS "smart quotes" get turned into either ASCII double-quote or apostrophe, as appropriate.

SEE MORE INFORMATION ON EXPORTING FROM WORD TO  UTF-8 TXT

All text in the transcript must correspond to the speech within the media file.

This means...

Remove any stage directions, overview information, i.e., any extraneous text. 
Extraneous text other than properly formatted Speaker IDs will impact alignment quality output and possibly cause the process to fail.

Songs, singing, music should not be submitted. The automated transcript alignment service process relies on speech recognition and unfortunately doesn't work with songs even if you provide the lyrics.

 

Paragraph breaks:

The processing assumes that each and every line-feed sequence in the transcript is intended to be a paragraph break/carriage return.
If this is not your intention, you must manually remove these line-feeds from your input transcript prior to upload.
Microsoft Word and other word processing applications typically offer a way to view the line feeds within your document.  

The alignment process will parse the text appropriately so line breaks should be removed so the plain text transcript should be submitted similarly to the example below

EXAMPLE TEXT WITHOUT LINE BREAKS (aka good formatting)

SPEAKER: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

 

EXAMPLE TEXT WITH LINE BREAKS (aka bad formatting)

SPEAKER: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor

incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud

exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure

dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt

mollit anim id est laborum.

Factors Affecting Alignment Quality

In general this service is best suited for content predominantly containing scripted speech.
Films and documentaries submitted for alignment that contain unscripted speech, competing audio (sound effects and music) or text that does not correspond to the content's speech may show instances of captions staying on screen too long or appearing too early upon completion.

SPEAKER IDS:

Speaker IDs are not required but can be included in the transcript uploaded along with a media file.

When desired, Speaker IDs in the transcript must be adhere to following formatting...

Speaker IDs must be ALL CAPITAL LETTERS followed by a colon
Incorrectly formatted Speaker IDs will result in alignment quality issues
and perhaps cause the process to fail.

 Here are two examples of properly formatted speakers IDs. Note that numbers are acceptable.

SPEAKER 1: Today we are going to review last night's assignment.

BOBBY:  My dog ate my homework.

 
Extraneous Text: 
 
Text which does NOT correspond to the audio interferes with the process.  

Occasional items (e.g., [CLAPPING], [MUSIC PLAYING]) shouldn't cause too many problems.

However, large chunks of text that do not correspond to the audio or significant sections of audio that do not correspond to the text will likely throw the alignment off in that time region.

Overlapping speakers:

...are difficult because the text cannot accurately show the ordering of words.
Regions containing overlapping speakers tend to be very confusing to the process.

Low-quality audio:

...will also negatively affect the alignment.  
This will usually be reflected in the audio difficulty rating for the file.   

It's important to realize that poor correspondence between the text and audio may also be reflected in the audio difficulty rating, even if the audio quality is actually good. 

See more information on audio difficulty

Stranded media files:

These are media files uploaded for alignment that do not have corresponding text files.

- Every morning, we check for these, and... 

  • If it has been stranded for greater than 24 hours  but less than 48, we will send project admins a reminder and warn that it will be deleted in 24 hours.
  • If it's greater than 48 hours (i.e., the warning has already been sent), we delete and notify project admins.
Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.
Powered by Zendesk