A research paper has proven that AI image generators can exactly replicate a copyrighted photo.
As the debate swirls around image synthesizer models and lawsuits are brought against Stable Diffusion, researchers from Google have shown that diffusion models can memorize specific images and then recreate them.
Supporters of artificial intelligence (AI) believe that machines work just like human photographers, inspired by the works of other artists the machine learning models use training data sets as inspiration to generate pictures form.
But in their paper, Extracting Training Data from Diffusion Models, the researchers recreate faithful versions of original, copyrighted photos using both Stable Diffusion and Google’s Imagen.
“We study if diffusion models ‘memorize’ training examples, which we define as generating a near-identical copy of any image. We propose to extract memorized images by generating many times with the same prompt and flagging cases where many of the generations are the same.” writes Eric Wallace, a Ph.D. student from Berkeley AI research who worked on the paper.
“Applying our method to Stable Diffusion and Google’s Imagen, we extract hundreds of images, and do so with high precision. Many of these images are copyright or licensed, and some are photos of individuals.”
Models such as Stable Diffusion are trained on copyrighted, trademarked, private, and sensitive images.
Yet, our new paper shows that diffusion models memorize images from their training data and emit them at generation time.
Paper: https://t.co/LQuTtAskJ9
👇[1/9] pic.twitter.com/ieVqkOnnoX
— Eric Wallace (@Eric_Wallace_) January 31, 2023
The team recreated a portrait of Ann Graham Lotz using Stable Diffusion, a side-by-side comparison reveals flaws such as distortion and noise in the AI image but there is little doubt that it is the same picture.
However, two of the researchers revealed to Gizmodo that the team tried 300,000 text prompts and found that the AI image generators only recreated an exact image 0.03% of the time, the rate was even lower for Stable Diffusion which is available to the public, unlike Google’s Imagen.
“The caveat here is that the model is supposed to generalize, it’s supposed to generate novel images rather than spitting out a memorized version,” Vikash Sehwag, a Ph.D. candidate at Princeton University, tells Gizmodo.
Despite the low rate of image recreation, the fact that this happens at all is alarming. Some AI image generators reserve rights to their images, effectively claiming copyright. This could be problematic if the AI generates the exact same image taken by a photographer.