Transform plain text transcripts into timed caption files, generate audio descriptions, and more using 3Play Media’s Alignment service. This guide outlines best practices to ensure your transcript aligns correctly with your media file.
Prerequisites
-
A Pro or Enterprise level 3Play Media account is required.
- All transcripts submitted must be in an UTF-8 unformatted plain text (.TXT) format. Submit plain text only (no tables, html markup, etc).
Best Practices for Alignment
General Requirements
-
File Duration: Each media file should be no longer than 2 hours.
-
Language: Submit only one spoken language per file. Multilingual audio may cause errors.
-
Verbatim Text: Transcript must match the audio exactly—word for word.
-
Formatting: Remove all stage directions, summaries, links, formulas, or other non-spoken text.
-
No Music: Do not submit songs or music. The service is designed for spoken content only.
Text Formatting & Encoding
-
Paste as Plain Text Only: Do not copy/paste from Word or drag/drop content. Always paste plain text to avoid encoding issues.
-
Exporting from Word:
Microsoft Word does not remove special characters when saving to.txt.
→ Click here to learn how to export properly in UTF-8 format. -
Use UTF-8 Encoding: Ensure your .TXT files are saved in UTF-8 format to preserve special characters like
ñor apostrophes (’).
| Note: Incorrect encoding may convert characters to symbols like ??? or ??. |
Proofread Before Submitting
-
Always review the transcript in the text box after saving.
-
The on-screen version is what will be used during alignment.
-
If you see strange characters (e.g.,
???), fix encoding issues before submitting.
Upload Method Tips
-
Line Breaks: Remove unnecessary line breaks. Transcripts should flow as continuous text.
-
High-Volume Uploads:
Use FTP or API for bulk submissions, as long as files are correctly formatted and match the audio content.
| ✅ Example of good formatting "WITHOUT LINE BREAKS" | ❌ Example of bad formatting "WITH LINE BREAKS" |
| SPEAKER: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. |
SPEAKER: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. |
Factors Affecting Alignment Quality
Content Type
-
Best suited for scripted speech.
Unscripted content such as interviews, films, or documentaries may result in:-
Captions appearing too early or too late
-
Decreased alignment accuracy
-
-
Avoid competing audio, such as sound effects or background music.
Speaker IDs
-
Speaker IDs are optional but must be properly formatted if included.
-
Formatting rules:
-
Use ALL CAPITAL LETTERS, followed by a colon (
:) -
Numbers are acceptable
-
| ✅ Correct | ❌ Incorrect |
|
SPEAKER 1: Today we are going to review last night's assignment. BOBBY: My dog ate my homework. |
Speaker1 - Today we are going to review last night's assignment. Bobby: My dog ate my homework. |
Extraneous Text
-
Do not include any text that is not spoken in the audio.
-
Occasional cues like
[CLAPPING]or[MUSIC PLAYING]are acceptable in moderation. -
Avoid large chunks of non-verbatim content or sections of audio without matching text.
Overlapping Speakers
-
Alignment cannot accurately determine the order of words in overlapping speech.
-
Avoid regions with multiple speakers talking simultaneously as this may cause alignment failures.
Audio Quality
-
Low-quality audio negatively impacts alignment.
-
Audio difficulty ratings may increase due to:
-
Background noise
-
Mismatched text and speech
-
Poor pronunciation or fast talking
-
- Even with high-quality audio, poor text correspondence can still lead to difficulty warnings.
Stranded Media Files
These are media files uploaded for alignment that do not have corresponding text files.
-
If stranded for:
-
24–48 hours → Project admins will receive a reminder and warning.
-
Over 48 hours → File is deleted and admins are notified.
-
| Note: For users new to the transcript alignment process, it is suggested to enable Alignment Review to Review All Files prior to submitting transcripts for alignment. Click here to read more. |