Copy link to clipboard
Copied
Hi all - I've seen people searching for this, so I thought I'd share a method of extracting closed captions from Captivate. It's not perfect, but it will give you a text file you could then manipulate to generate WebVTT or SRT captions, and it's much faster than copy-and-paste.
NOTE: I've only tried this on Responsive Projects, but it should work on any HTML5 project, as long as a CPM.js file is created in the "assets" folder.
Captivate writes closed captions to the CPM.js file in the following format:
{sf:(time-in),ef:(time-out),t:'(text goes here)'}
A real-world example would look like: {sf:61,ef:138,t:'This is the first module '}
So, if we look through the CPM.js for text sections that match that pattern, you'll have all your captions. After much fiddling with regular expressions (not my strong point), I came up with this:
{sf:([0-9]+),ef:([0-9]+),t:'(.*?)'}
This will find all the captions, whether for video or audio, in the CPM.js file. The next part is to get all of those found items separated out from the rest of the CPM.js file. For that, I'm relying on a Windows PowerShell approach as outlined at Windows PowerShell: Extracting Strings Using Regular Expressions. Basically, the script reads through the CPM.js file, finds the matches, then copies them to a new text file. You end up will all your captions, with time-in and time-out, all listed in the file. The process looks like this, assuming your Captivate project is in "c:\Captions", and is called Module1. Modify that bit as needed.
$input_path = ‘c:\Captions\Module1\assets\CPM.js’
$output_file = ‘c:\Captions\Module1Captions.txt’
$regex = ‘{sf:([0-9]+),ef:([0-9]+),t:'(.*?)'}’
select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value } > $output_file
(Credit where credit is due: this is taken directly from the page listed above. All I did was change the regular expression and the file names.)
This will match the caption pattern and copy everything to a new file, which looks like this:
{sf:1,ef:60,t:'Caption 1 '}
{sf:61,ef:138,t:'Caption 2 '}
{sf:139,ef:219,t:'Caption 3 '}
From this point, I do a little find-and-replace on the "sf" and "ef" pieces, and create a comma-separated list. I can then pull it into Excel or a database, and manipulate it further. I currently use a little ColdFusion to transform this into WebVTT and SRT files.
Caveats:
1. Captivate calculates the time of each caption from the beginning of each slide, so you will have to account for that when creating a WebVTT file. I calculate the display time of each caption, and use that to figure out its time of appearance for a video as a whole.
2. Captivate occasionally repeats captions in the CPM.js file - often the first caption of a slide is also listed, with a different time code, as the last caption of the slide before.
Apart from these minor hiccups, it's been working quite well. When I get a second I'll try to develop a Mac version as well, as that's my primary platform, though please feel free to post something if you can already do it.
Hope this is helpful for someone!
G.
Have something to add?