You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
SI25/mixcloud/fun_with_subtitles_01.sh

128 lines
4.3 KiB
Bash

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

# yt-dlp installed with pip
# nb: you may need to update yt-dlp frequently to keep up with changes made by commercial services
# to prevent downloading tools like yt-dlp to stay working!
#
# pip install --upgrade yt-dlp
#
# Download the playlist, writing info in json
yt-dlp "https://www.mixcloud.com/radiowormrotterdam/playlists/worm-25/" --write-info-json
# let's move the files to a sub-folder
mkdir worm25
mv *.info.json *.webm worm25
#
# _.........._
# | |xpub | |
# | | | |
# | | | |
# | |________| |
# | ______ |
# | | | | |
# |__|____|_|__|
#
# IN the early days of computer history
# on computers using DOS (or Disk Operating System)
# file names needed to conform to a very strict standard
# of 8 letters, with only A-Z , dash -, and underscore _ , and numbers 0-9
# (and no spaces!)
# plus a 3 letter extension indicating the type of file
# for example README.TXT
#
# Copy one file in this folder and give it a short name
#
cp worm25/WORM\ 25\ A\ history\ of\ WORM\ in\ 25\ Objects\ \#1\ Mia\ on\ the\ hand-towel\ dispenser\ \(04.06.24\)\ \[radiowormrotterdam_worm-25-a-history-of-worm-in-25-objects-1-mia-o
n-the-hand-towel-dispenser-040624\].webm w25mia.webm
# __
# .,-;-;-,. /'_\
# _/_/_/_|_\_\) /
# '-<_><_><_><_>=/\
# `/_/====/_/-'\_\
# "" "" ""
# NOTE:
# it may seem pointless to move files around and rename them with short names
# BUT...
# coding/working with the commandline requires A LOT OF FOCUS
# so steps to reduce "cognitive load" like sifting through long lists of confusing filenames
# that make working on the commandline slower and *harder to read*
# are really worthwhile!
#
# Also when working with digital materials, it's often tempting to try to
# address an entire collection (in this case all the recordings, and the whole hour of each recording)
# BUT ...
# it's really important when testing things out that you focus on a small sample
# in this way you make experimentation, including the *necessary errors
# and missteps*, as fluid as possible to stay in the flow,
# so that you can get through the bugs
# to the interesting results that will give you the energy
# and confidence to keep going!
#
# so in this case....
#
# get something working for a 60 second sample
# THEN once you know it works...
# apply it to the whole hour long recording
# and eventually all the recordsings of the playlist
# Working with a long file also can take a lot of time when experimenting
# use ffmpeg to make a 30 second extract
# -ss is start time, -t duration (duration of final extract)
# at the same time, convert the wav to mp3 (for the browser)
ffmpeg -i w25mia.webm -ss 120 -t 60 w25mia60.wav
# use vosk to transribe, to srt
# nb: worm25_mia_60wav is an INPUT and needs to already exist
# worm25_mia_60.srt is an OUTPUT and will be (re)created
vosk-transcriber -l en-us -i w25mia60.wav -t srt -o w25mia60.srt
# SRT (SubRip Subtitle) comes from
# practices of PIRATE CURATION
# where films would be ripped from DVDs
# and distributed as video files
# SRT is then a simple (and small) text format for
# the missing subtitles, that can then
# be translated into many languages
# and distributed separately from the (heavier) video files
# Also due to the editability and understandability
# the format was so popular, that it became the
# basis of the W3C's web standard (web) VTT
#
# see also: https://www.opensubtitles.com/
#
# The two are nearly the same except
# the file needs to have a "header" (the first line should be:)
# WEBVTT
# And the timecodes use a dot instead of a comma..
# SO
# 00:00:00,075 --> 00:00:02,610
# in SRT becomes in VTT:
# 00:00:00.075 --> 00:00:02.610
#
# let's use a python script to convert from srt to vtt
scripts/srt2vtt.py w25mia60.srt w25mia60.vtt
# make an mp3 for the browser
ffmpeg -i w25mia60.wav w25mia60.mp3
# adjust vtt.html to point to the mp3 + vtt
# LOOK AT vtt.html
# NB: to see captions, we need to use <video> even though we just have <audio>
#
# This seesm to work in Firefox but not Chrome/Chromium.
# NB the video is made 100% width and a fixed (small) height to make the caption size nice
# BUT...
# It would be even better to style the captions ourselves and bring them into the page
# like other HTML content!
#
# VTT tracks can also trigger the *oncue* event, which means we can program CUSTOM title behaviors!
#
# LOOK AT vtt_basic.html
# and vtt_custom.html