On Wednesday, creative producer Charlie Holtz combined GPT-4 Vision (commonly called GPT-4V) and ElevenLabs voice cloning technology to create an unofficial AI version of the natural celebrity. David Attenborough narting Holtz’s every move on camera. Like Friday afternoon, the X post The stunt description has received more than 21,000 likes.
“Here we have an amazing design Homo sapiens distinguished by silver circular spectacles and a mane of tousled cluster locks,” Attenborough lied in the demo as Holtz looked on with a grin. “He wears what appears to be a blue blanket, which can only be assumed to be a part. of your mating display.”
“Look closely at the subtle corner of your eye,” he continued, as if narrating a BBC wildlife story. “It seems to be in the middle of a curious culture of curiosity or skepticism. The background suggests a sheltered habitat, possibly a local feeding area or a watering hole.”
How does it work? Every five seconds, a Python script called “light” takes a photo from Holtz’s webcam and feeds it to GPT-4V—a version of OpenAI’s modeling language that can process image inputs—via an API , which has a special incentive to make you creative. text is the body of Attenborough’s narration. Then you feed the text into the feature Eleven labs An AI voice profiler trained on sound samples of Attenborough’s speech. Holtz provided code (called the “distributor”) that pulls it all together on GitHub, and requires API tokens for OpenAI and ElevenLabs which are paid to run.
While some of these capabilities have been available separately for some time, developers have recently begun to experiment with combining these capabilities together thanks to API availability, which can create amazing displays like this one.
During the demo video, when Holtz holds up a cup holding a drink, the fictional Attenborough legend says, “Ah, in its natural environment, we consider sophistication Homo sapiens involved in the critical ritual of hydration. Each of these men has chosen a small round container, probably filled with H2Oh, and he pressed it intelligently towards your intake orifice. Such grace, such passion.”
Have different demos posted on X by Pietro Schirano, you can hear the cloned voice of Steve Jobs critiquing the designs created in Figuma, a design app. Schirano uses a similar technique, with the image being fed to the GPT-4V via the API (which allows it to respond in the style of Jobs), then fed into the ElevenLabs emulator of Jobs’ voice.
We’ve already covered voice cloning technology, which is fraught with ethical and legal concerns where the software creates deep imitations of the human voice, making them “say” things that real people never say. This has legal implications regarding celebrity publicity rights, and has already been used to scam people by faking things for fans to make money. eleven labs’ terms of service prohibits people from making clones of other people’s things in a way that would violate “Intellectual Property Rights, Public Rights and Copyright,” but it’s a law that can be difficult to enforce.
Now, while some people expressed deep dismay from someone imitating Attenborough’s voice without permission, many others seemed to be amused by the demo. “Okay, I’ll get David Attenborough to narrate videos of my son learning how to eat broccoli,” quipped Jeremy Nguyen in an X answer.