Automatic Generation of Audio Description and Lower Third Voice

Sub-project 4 (SP4) of the IICT Flagship focuses on the automatic generation of audio descriptions and lower third voices.

Audio description is a technique used to make television programs more accessible to people with visual impairments. It involves a voice-over that describes non-spoken scenes in detail to ensure a better understanding of what is happening on screen. The lower third voice (or ‘synthé’) is an audio commentary, commonly used to provide complementary information to narrative elements (e.g., reading on-screen text). These operations prove to be very time-consuming for human users.

The Icare research institute is involved in implementing various approaches based on specialized algorithms, artificial intelligence, and machine learning, to automate these tasks. This means automatically extracting visual information, generating a textual description, and storing the result in a structured computer file. This file can then be synthetically vocalized so that the viewer can access the information.

Innosuisse Flagship PFFS-21-47: Inclusive Information and Communication Technologies

Automatic Generation of Audio Description and Lower Third Voice

Partenaires