The astonishing capability of generative AI to create visual images is getting better and more accessible, but with their models based on massive libraries of existing art, artists are frantically looking for ways to prevent their work from being harvested without their permission. A new tool, ominously named Nightshade, could be the answer.
The trick involves using optimized, prompt-specific “data poisoning attacks” that corrupt the data needed to train AI models when it’s fed into an image generator.
“Poisoning has been a known attack vector in machine learning models for years,” Professor Ben Zhao told Decrypt. “Nightshade is not interesting because it does poisoning, but because it poisons generative AI models, which nobody thought was possible because these models are so big.”
Combatting intellectual property theft and AI deepfakes has become crucial since generative AI models came into the mainstream this year. In July, a team of researchers at MIT similarly suggested injecting small bits of code that would cause the image to distort, rendering it unusable.
Generative AI refers to AI models that use prompts to generate text, images, music, or videos. Google, Amazon, Microsoft, and Meta have all invested heavily in bringing generative AI tools to consumers.
As Zhao explained, Nightshade gets around the problem of an AI model’s large datasets by targeting the prompt—for example, requests to create an image of a dragon, dog, or horse.
“Attacking the whole model makes no sense,” Zhao said. “What you do want to attack is individual prompts, debilitating the model and disabling it from generating art.”
To avoid detection, the research team explained, the text and image within the poisoned data must be crafted to appear natural and crafted to deceive both automated alignment detectors and human inspectors to achieve the intended effect.
Although the poisonous Nightshade dataset is merely a proof of concept, Zhao said the easiest way to deceive an AI model like Stable Diffusion into thinking a cat is a dog is by simply mislabeling a few hundred images of a cat as a dog.
Even without any coordination, artists could begin implementing these poison pills en masse, and it could cause the AI model to collapse.
“Once enough attacks become active on the same model, the model becomes worthless,” Zhao said. “By worthless, I mean, you give it things like ‘give me a painting,’ and it comes out with what looks like a kaleidoscope of pixels. The model is effectively dumbed down to the version of something akin to a random pixel generator.”
Zhao said Nightshade does not require any action be taken against the AI image generator itself but takes effect when the AI model attempts to consume the data that Nightshade has been included in.
“It does nothing to them unless they take those images and put them into the training data,” he said, calling it less of an attack and more like self-defense or a barbed wire fence with poison tips aimed at AI developers who do not respect opt-out requests and do-not-scrape directives.
“This is designed to solve that problem,” Zhao said. “So we had this barbed wire sting with some poison. Unless you run around and get this stuff all over, you won’t suffer.”
Edited by Ryan Ozawa.
Stay on top of crypto news, get daily updates in your inbox.
Source: https://decrypt.co/203153/ai-prompt-data-poisoning-nightshared