OpenAI Claims DeepSeek Stole Its Data to Train Their AI Model

Takes one to know one.

In an astonishing feat of hypocrisy that outshines Midjourney accusing Stable Diffusion of image theft last year, ChatGPT and DALL-E developer OpenAI has claimed that DeepSeek – a new kid in AI town that recently caused NVIDIA to suffer the biggest single-day drop in US stock market history, plummeting 17% (~$600 billion) – stole its data to train their AI model.

Reuters

As reported by the Financial Times, OpenAI has allegedly found evidence of "distillation," a machine learning technique that transfers knowledge from a large AI model to a smaller one, perpetrated, as OpenAI believes, by the Chinese company.

According to the report, OpenAI and Microsoft investigated accounts believed to belong to DeepSeek last year that were using OpenAI's API. These accounts were subsequently blocked on suspicion of engaging in distillation, which violated OpenAI's terms of service prohibiting users from using the output of its services to "develop models that compete with OpenAI." When contacted by the FT, the ChatGPT developer declined to comment further or provide any evidence linking DeepSeek to the alleged activity.

"There's a technique in AI called distillation, when one model learns from another model [and] kind of sucks the knowledge out of the parent model," White House advisor David Sacks said earlier about the purported distillation. "And there's substantial evidence that what DeepSeek did here is they distilled the knowledge out of OpenAI models, and I don’t think OpenAI is very happy about this."

Needless to say, the accusation is filthy rich coming from OpenAI, given the sheer number of lawsuits filed against the company in recent years – many of them over, you guessed it, copyright infringement. From The New York Times and Canadian media companies to Indian book publishers, George R.R. Martin, and even Elon Musk – these are just some of the plaintiffs that have taken legal action against OpenAI, and that's not even mentioning the cases of image scraping for DALL-E, uncovered by Gary Marcus and Reid Southen a year ago.

Earlier, OpenAI itself admitted that "it would be impossible to train today's leading AI models without using copyrighted materials," however, now that they find themselves on the receiving end, it is suddenly deemed unacceptable to use others' data for AI training, once again exposing the complete "good for me but not for thee" delusion some of the big tech moguls seem to be living in.

If you want to protect your digital artwork from scraping, we highly recommend (not an ad) trying out Glaze, a tool that can hide your art from AI detection and prevent style mimicry, as well as Nightshade, a tool that essentially "poisons" your images and distorts their feature representations in generative AI models. As said by an OpenAI spokesman in 2024, the company views the use of those technologies as "abuse," serving as an indirect confirmation of their effectiveness.

Don't forget to join our 80 Level Talent platform and our new Discord server, follow us on Instagram, Twitter, LinkedIn, Telegram, TikTok, and Threads, where we share breakdowns, the latest news, awesome artworks, and more.

OpenAI Claims DeepSeek Stole Its Intellectual Property to Train Their AI Model

Reuters

Join discussion

Comments 2

You might also like

We need your consent