Subtitle Tools in C Language
I decided to release some of the C language tools I've written over the years for working with subtitle files. These include SubRip (.srt) tools, a SubStationAlpha (.ssa)-to-SubRip (.srt) converter, as well as a PGS (.sup) tool to extract subtitle images as bitmaps, as well as synchronize the timestamps to new anchor points.
In addition, I have included routines developed for managing YCbCr-to-RGB, RGB-to-YCbCr, and derivation of BT.709 RGB colorspace constants from BT.709 color primaries.
Before using a SubRip file as input to any of the tools listed below, it's generally a good idea to run check.c on it to ensure it's in the proper format.
Table 1: | SubRip (.srt) Tools |
check.c | Check for errors in a SubRip (.srt) file and report results. Compile: gcc -Wall check.c -o check Usage: ./check inputfilename.srt If it finds an error, fix it and re-run check again until no errors are found. Output: reports to stdout |
offset.c | Read an existing SubRip (.srt) file, apply positive, negative, or no offset to the time stamps, and save in an output file. Ignores existing subtitle numbers and renumbers them from 1 to N. I often use it just to renumber by entering 0 for the offsets. If present, the Byte Order Mark (BOM) of the input file will be included in the output file. Compile: gcc -Wall offset.c -o offset Usage: ./offset inputfilename.srt Output: out.srt |
sync.c | Read an existing SubRip (.srt) file and synchronize all timestamps to user-input anchor-points. Subtitle durations are preserved. If present, the Byte Order Mark (BOM) of the input file will be included in the output file. Synchronization is accomplished by using "first" and "last" timestamps as anchor-points. Choose "first" and "last" subtitles that are near or at beginning and end of the feature in order to maximize scaling accuracy. Compile: gcc -Wall sync.c -o sync Usage: ./sync inputfilename.srt Output: out.srt |
ssa2srt.c | Read an existing SubStationAlpha (SSA) file and convert to SubRip (.srt) output file. Transfers styles and markups for font color, bold, italic, underline, strikeout, and alignment. Only recognizes V4 and V4+ Styles. Ignores those SSA style attributes and override tags not implemented in SubRip format. Warning: Unlike SubRip files, SSA files don't require subtitles to be in chronological order. The SubRip output file is not corrected for this rare situation; use reorder.c below. Compile: gcc -Wall ssa2srt.c -o ssa2srt Usage: ./ssa2srt inputfilename Output: out.srt |
ssa2srt-nostyles.c | Read an existing SubStationAlpha (SSA) file and convert to SubRip (.srt) output file. Doesn't transfer styles, only markups for font color, bold, italic, underline, strikeout, and alignment. Warning: Unlike SubRip files, SSA files don't require subtitles to be in chronological order. The SubRip output file is not corrected for this rare situation; use reorder.c below. Compile: gcc -Wall ssa2srt-nostyles.c -o ssa2srt-nostyles Usage: ./ssa2srt-nostyles inputfilename Output: out.srt |
reorder.c | Re-order non-chronological subtitles in a SubRip (.srt) file by sorting on start times. If present, the Byte Order Mark (BOM) of the input file will be included in the output file. Compile: gcc -Wall reorder.c -o reorder Usage: ./reorder inputfilename.srt Output: out.srt |
srt2txt.c | Read an existing SubRip (.srt) file and save only the text lines to an output file. This is useful if you want to submit the text to a translation tool/service without the subtitle numbers and timestamps. Once translated, you can use txt2srt.c to convert back to a SubRip file. Compile: gcc -Wall srt2txt.c -o srt2txt Usage: ./srt2txt inputfilename.srt [nospace] Output: out.txt Subtitle texts will be separated by blank lines unless nospace option is specified. |
txt2srt.c | Take the timestamps from a SubRip (.srt) file and the text from a text file and create a new .srt file. The text file must have the same number of subtitles as the SubRip file, and they must be separated by single blank lines. Compile: gcc -Wall txt2srt.c -o txt2srt Usage: ./txt2srt inputfilename.srt inputfilename.txt Output: out.srt |
fixtag.c | Read an existing SubRip (.srt) file and look for and fix some common markup tag errors. Tags included: italics, bold, underline, strikethrough, font color, and font size Using the optional close argument will cause fixtag to append missing closing tags to the last line of text of the subtitle. You should always compare out.srt with the original subtitle file to determine the author's intent. If present, the Byte Order Mark (BOM) of the input file will be included in the output file. Compile: gcc -Wall fixtag.c -o fixtag Usage: ./fixtag inputfilename.srt [close] Output: out.srt |
time-text.c | Take the time stamps from one SubRip (.srt) file and the subtitle texts from another SubRip file and create a new SubRip file with those timestamps and subtitle texts. Obviously the two input SubRip files should have the same number of subtitles. If present, the Byte Order Mark (BOM) of the text SubRip file will be included in the output file. Compile: gcc -Wall time-text.c -o time-text Usage: ./time-text timeinputfile.srt textinputfile.srt Output: out.srt |
combine.c | Read an existing SubRip (.srt) file, combine subtitles with identical textual content and consecutive timestamps. Within each group of matching subs, it takes the starting time-stamp from the first subtitle and ending time-stamp from the last. Writes a new SubRip file. If present, the Byte Order Mark (BOM) of the input file will be included in the output file. Compile: gcc -Wall combine.c -o combine Usage: ./combine inputfilename.srt Output: out.srt |
split.c | I only created this to produce test files for combine.c. Read an existing SubRip (.srt) file and split each subtitle into two identical subs. If present, the Byte Order Mark (BOM) of the input file will be included in the output file. Compile: gcc -Wall split.c -o split Usage: ./split inputfilename.srt Output: out.srt |
Table 2: | PGS (.sup) Tool |
pgs.c | A tool to analyze a PGS (.sup) file and produce a report file. Optional functions include producing a bitmap file for each subtitle, applying an offset to timestamps, and synchronizing all timestamps to new anchor-points. Subtitle durations are preserved when synchronizing to new anchor points. Compile: gcc -Wall pgs.c -lm -o pgs Usage: ./pgs filename.sup [option]
Output: pgs.out, and optionally: bitmap file for each subtitle, or offset/resynchronized PGS file out.sup |
Table 3: | Chapter Tool |
chapters.c | Create an XML chapters file given the feature duration and desired number of chapters. The resulting .xml file can be added to a video container using tools like MKVToolNix GUI. There are often 12 to 16 chapters for a typical 1.5 hour feature. I usually use ffmpeg to find the duration. e.g., ffmpeg -i filename.mkv Note that chapters.c expects the same time notation as ffmpeg i.e., using fractions of a second. (This is different from SubRip/srt file format which is ",milliseconds".) You can copy and paste the result from ffmpeg. Compile: gcc -Wall chapters.c -o chapters Usage: ./chapters Output: chapters.xml |
Table 4: | Colorspace Tools |
References: ITU-R BT.709-6, ITU-T H.273 (V4), and SMPTE RP 177-1993 |
bt709.c | Derive the RGB color constants for BT.709 YCbCr colorspace. The Normalized Primary Matrix (NPM), which includes KR, KG, and KB, is derived from the color primaries as defined in the BT.709 standard. Compile: gcc -Wall bt709.c -lm -o bt709 Usage: ./bt709 Output: reports to stdout |
ycbcr2rgb.c | Convert YCbCr (BT.709) to 8-bit sRGB. Assumes BT.709 color primaries and gamma-correction were used to produce YCbCr. Uses sRGB color primaries (same as BT.709) and applies sRGB gamma-correction to produce sRGB coordinates. Compile: gcc -Wall ycbcr2rgb.c -lm -o ycbcr2rgb Usage: ./ycbcr2rgb Output: reports to stdout |
rgb2ycbcr.c | Convert 8-bit sRGB to YCbCr (BT.709). Assumes sRGB gamma-correction was applied to sRGB. Uses BT.709 color primaries and applies BT.709 gamma-correction to produce YCbCr. Compile: gcc -Wall rgb2ycbcr.c -lm -o rgb2ycbcr Usage: ./rgb2ycbcr Output: reports to stdout |
P. David Buchan pdbuchan@gmail.com