What is Blender
People who know me know I love Blender. Some people don’t know what is Blender though. It was conceived first as a 3D modelling and animation tool but you can do a lot of things with it, such as video editing (which includes sound).



Blender runs in all OS, Linux, Mac, Windows. you can download it here.
Annotating
If you have video or sound to annotate, although Blender is a very powerful tool, it’s maybe overwhelming at first sight. So I will show you in few easy steps how to annotate and export annotations.
In case you didn’t know, video and sound have frequencies and speeds. Things like 24 frames per second or 24000Hz, think about this when you edit or annotate video.
First open blender and select video edition, either in File > new > video editing. Or in the splash screen


Once you see the main workflow window for video editing simply drag your video or your sound into Blender, or open it from the file explorer on the side. Video is the blue channel, sound is green.

Annotations can be added with text effect strips. You can modify the text in the right hand menu.
If for some reason the menu is not there, hover over the sequencer section and press "N".

If your sequencer view is too narrow, zoom in with the mouse wheel. You can drag the darker sides of the annotation or move it around to make it as long as you need.
Remember to save continuously with ctrl+s or in file > save.
Export annotations
Blender has a built in text editor and code console. It uses python and has a python API called BPY. You can open a console in any location within Blender. Let me tell you how.
Blender can have many sub windows in which you can customize your workflow, such as in the first images I showed you. You can create any number of such windows by dragging any top right corner of a subwindow.

Open a text window, create a new text and add this code:
import bpy
f = open('filetosave.txt', 'w')
fstarts=[]
texts=[]
for i in bpy.data.scenes["Scene"].sequence_editor.sequences_all:
if type(i) is bpy.types.TextSequence:
print(i.text)
texts.append(i.text)
fstarts.append(i.frame_start)
fstarts, texts = zip(*sorted(zip(fstarts, texts)))
for i in zip(fstarts,texts):
print(i)
f.write(f'{i[0]}:{i[1]}\n")
f.close()
And this way you can export to a txt, all your annotations and the frame in which they occur.
But the frame is not the time. so you can modify to code to print the time instead by dividing by the fps of your video. If your fps is 24, then 240 frames are 10 seconds.
You can see your fps here:

Blender API
You can see in the code the kind of things you can get from your text strips, see more here: https://docs.blender.org/api/current/bpy.types.Sequence.html#bpy.types.Sequence
In this case, everything, video, sound, image, effect, is a sequence.
Annotating with images
You can add anything you want in the sequencer, images, sound and more video. Just drag it in move it around, expand it to cover the area you like and voilà. Use the menus to the right to modify aspects about this image like size, shape rotation location within the video.