Jay-Z Reciting Shakespeare? Audio Deepfakes are here

The 21st century has already brought massive advances in technology of all kinds with numerous benefits to society. Should we count deepfakes as one of those advances? It’s now possible to take snippets of an artist’s voice, plug them into a database, and come out with your own song that the artist has no control over whatsoever. How so?

Two of the leading text-to-speech programs are LJ Speech and Tacotron 2, the latter of which was developed by Google. These programs take user uploaded audio snippets and create a synthetic ‘voice’ based on that audio. Once the voice is created, the user can type any sentence into the program and it will speak that sentence in the synthetic voice. Add some background beats and additional words and it’s easy to see how you could make your own song! But what happens when an amateur uses audio clips from a mainstream artist to build the synthetic voice?

It just so happens that Jay-Z faced this exact issue with YouTube. In 2020, a user uploaded an audio deepfake of Jay-Z reciting Shakespeare’s “To be, or not to be,” monologue from Hamlet. YouTube initially took the video down and sent a DMCA claim, but the channel fought back. They argued that Jay-Z didn’t write or perform the monologue, the uploader/synthetic voice did and, therefore, Jay-Z had no claim to the rights. While you seem to be hearing Jay-Z’s voice, the audio is not owned by the rapper in any way.

The US has laws prohibiting the spread of disinformation through deepfakes, but when it comes to music, there is a large gray area left undecided. Are these types of creations legal or not?

Another uncertain aspect of this technology is the effect on advertising and sponsorship. If an advertiser can create the sound of a famous musician without needing to pay the big bucks for their direct sponsorship, what’s to stop them from doing that? For example, if someone were to recreate Travis Scott’s vocal sound and use it in a commercial endorsing some product, would it convince fans that the rapper is supporting the product and influence them to buy? In a typical sponsorship/endorsement situation you’d expect Scott to get a cut of the sales but since the text-to-speech system is what created the sound, he wouldn’t.

What are your thoughts? Do the original artists used by the deepfake creators have rights in the work created by text-to-speech programs? Or is this an instance of freedom of speech and the original artists have no say? At least one thing is for sure: technology always outpaces regulation.

“Smells Like Teen Spirit” hits 1 billion streams on Spotify

Leave a Reply