co-oCCur is a high-speed subtitle synchronization tool.
It is being developed under GSoC 2019 with CCExtractor development.
Mentor: @cfsmp3
It consists of two tools:
Tool A
Use case: Synchronization of subtitles between two versions (for example, with and without commercials) of the same
audiovisual content.
It will take as input the original audiovisual content, the edited audiovisual content and the
subtitles document of the original audiovisual content.
Tool B
Use case: Synchronization of subtitles between two versions of the same audiovisual content in the absence of the
original content.
It will take as input the modified audiovisual content and the subtitle document for the original
audiovisual content.
This project is in it's early stage and is taking baby steps towards the end goal. Available functionality of the project is going to refactor over time.
- Clone the repository from Github:
git clone https://github.com/sypai/co-oCCur
- Navigate to
install
directory:
cd install
- Run
build.sh
./build.sh
- Sync!
./co_oCCur -tool [tool options] <tool specific arguments>
The parameters to be passed to co-oCCur.
[NOTE: This list might change in future]
Parameter | Value | Description |
---|---|---|
-tool OR -t |
NAME A OR B | Select the tool to be used for subtitle synchronization. REQUIREMENT: YES |
-org OR -o |
FILE /path/to/original/audio.wav | Original Audio File REQUIREMENT: TOOL A, YES TOOL B, NO |
-mod OR -m |
FILE /path/to/modified/audio.wav | Modified Audio File REQUIREMENT: YES |
-srt OR -s |
FILE /path/to/original/subtitle.srt | Original subtitle file REQUIREMENT: YES |
[Restriction: Audio files must be PCM mono sampled at 16000 Hz]
- CMake
CMake minimum version 3.14 is required.
Running build.sh
can result in:
bash: ./build.sh: Permission denied
Possible Turnaround:
- Give it execute permission (only possible if the file-system gives RW rights)
cd co-oCCur/install
chmod +x build.sh
./build.sh
- Use CMake to build it
# Root Directory
cmake ./
make
- Audio Files
Make sure the audio is uncompressed raw PCM (16-bit signed int), mono sampled at 16000 Hz (Enough to cover human speech frequency range).
Using ffmpeg you can run:
ffmpeg -i inputVideo.ts -acodec pcm_s16le -ac 1 -ar 16000 audioName.wav
- Subtitle Files
The input subtitle file should be a clean and proper SubRip (SRT) file.
- IN:
./co_oCCur
- IN:
./co_oCCur -t A -o ./install/TestFiles/WavAudio/example.wav -m ./install/TestFiles/WavAudio/example1.wav -s ./install/TestFiles/Subtitles/example.srt
What will this trigger?
- Tool A to be used for synchronization.
- Read "example.wav" as original audio and extract audio fingerprints from it.
- Enrich the "example.srt" file with audio fingerprint anchors at corresponding timestamps.
- Read "example1.wav" as modified audio file. Seek fingerprints at offsets decided by enriched subtitle file, the timestamps of fingerprint anchors.
- Compare the two fingerprints and detect the constant temporal offset.
- Adjust "example.srt" using delta obtained and created a subtitle file "example_co_oCCur.srt".
GNU General Public License 3.0 (GPL-v3.0)
Check LICENSE.md
You may reach CCExtractor community through the slack channel where most CCExtractor developers hang out.
- CCExtractor channel on Slack
We foster a welcoming and respectful community. 👐
Any contribution to the project would be highly appreciated!