Alignment: Best Practices – 3Play Media Support

Transform plain text transcripts into timed caption files, generate audio descriptions, and more using 3Play Media’s Alignment service. This guide outlines best practices to ensure your transcript aligns correctly with your media file.

Prerequisites

A Pro or Enterprise level 3Play Media account is required.
All transcripts submitted must be in an UTF-8 unformatted plain text (.TXT) format. Submit plain text only (no tables, html markup, etc).

Best Practices for Alignment

General Requirements

File Duration: Each media file should be no longer than 2 hours.
Language: Submit only one spoken language per file. Multilingual audio may cause errors.
Verbatim Text: Transcript must match the audio exactly—word for word.
Formatting: Remove all stage directions, summaries, links, formulas, or other non-spoken text.
No Music: Do not submit songs or music. The service is designed for spoken content only.

Text Formatting & Encoding

Paste as Plain Text Only: Do not copy/paste from Word or drag/drop content. Always paste plain text to avoid encoding issues.
Exporting from Word:
Microsoft Word does not remove special characters when saving to .txt.
→ Click here to learn how to export properly in UTF-8 format.
Use UTF-8 Encoding: Ensure your .TXT files are saved in UTF-8 format to preserve special characters like ñ or apostrophes (’).

Note: Incorrect encoding may convert characters to symbols like ??? or ??.

Proofread Before Submitting

Always review the transcript in the text box after saving.
The on-screen version is what will be used during alignment.
If you see strange characters (e.g., ???), fix encoding issues before submitting.

Upload Method Tips

Line Breaks: Remove unnecessary line breaks. Transcripts should flow as continuous text.
High-Volume Uploads:
Use FTP or API for bulk submissions, as long as files are correctly formatted and match the audio content.

✅ Example of good formatting "WITHOUT LINE BREAKS"

❌ Example of bad formatting "WITH LINE BREAKS"

SPEAKER: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

SPEAKER: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor

incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud

exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure

dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt

mollit anim id est laborum.

Factors Affecting Alignment Quality

Content Type

Best suited for scripted speech.
Unscripted content such as interviews, films, or documentaries may result in:
- Captions appearing too early or too late
- Decreased alignment accuracy
Avoid competing audio, such as sound effects or background music.

Speaker IDs

Speaker IDs are optional but must be properly formatted if included.
Formatting rules:
- Use ALL CAPITAL LETTERS, followed by a colon (:)
- Numbers are acceptable

✅ Correct

❌ Incorrect

SPEAKER 1: Today we are going to review last night's assignment.

BOBBY: My dog ate my homework.

Speaker1 - Today we are going to review last night's assignment.

Bobby: My dog ate my homework.

Extraneous Text

Do not include any text that is not spoken in the audio.
Occasional cues like [CLAPPING] or [MUSIC PLAYING] are acceptable in moderation.
Avoid large chunks of non-verbatim content or sections of audio without matching text.

Overlapping Speakers

Alignment cannot accurately determine the order of words in overlapping speech.
Avoid regions with multiple speakers talking simultaneously as this may cause alignment failures.

Audio Quality

Low-quality audio negatively impacts alignment.
Audio difficulty ratings may increase due to:
- Background noise
- Mismatched text and speech
- Poor pronunciation or fast talking
Even with high-quality audio, poor text correspondence can still lead to difficulty warnings.

Stranded Media Files

These are media files uploaded for alignment that do not have corresponding text files.

If stranded for:
- 24–48 hours → Project admins will receive a reminder and warning.
- Over 48 hours → File is deleted and admins are notified.

Note: For users new to the transcript alignment process, it is suggested to enable Alignment Review to Review All Files prior to submitting transcripts for alignment. Click here to read more.