Automatic generation of audio description and L2V

The sub-project 4 (SP4) of the IICT Flagship concerns the automatic generation of audiodescriptions and synthesizers.

Audio description is a technique used to make a television program more accessible to people with visual impairments. This is a voice-over that describes in detail the unspoken scenes to ensure a better understanding of what is happening on screen. The synth (or lower third voice) is an audio commentary, commonly used to provide additional information to the narrative elements (e.g. reading text on the screen). These operations are very time consuming for human users.

The Icare research institute is involved in the implementation of different approaches based on specialized algorithms, artificial intelligence and machine learning, to automate these tasks. This means automatically extracting visual information, generating a textual description and storing the result in a structured way in a computer file. The latter can then be vocalized synthetically so that the viewer can read it.


Flagship Innosuisse PFFS-21-47: Inclusive Information and Communication Technologies