YouTube’s AI can now describe sound effects

Share this

YouTube understands the power of video to tell stories, move people, and leave a lasting impression. One part of storytelling that many people take for granted is sound, yet sound adds color to the world around us.

YouTube has had automated captions for its videos since 2009, and now it’s expanding the feature to include captions for sound effects.

While those three were some of the most frequent manually captioned sounds, YouTube says it’s only in the early stages of making improvements for its deaf and hard of hearing users. The company says captions like ringing, barking, and knocking are next in line, but those require more deciphering than simple laughter or music.

For now, the automatic effects captioning is actually restricted to those exactly these three sounds. The reason for this, Google says, is that these are also exactly the sounds that most video producers manually caption right now.

You Tube is still in the early stages of this work, and they are aware that these captions are fairly simplistic. However, the infrastructural backend to this system will allow us to expand and easily apply this framework to other sound classes. Future challenges might include adding other common sound classes like ringing, barking and knocking, which present particular problems

Share this

Related posts

Leave a Comment