logo80lv
Articlesclick_arrow
Research
Talentsclick_arrow
Events
Workshops
Aboutclick_arrow
profile_loginLogIn

If You Want to Trap AI Training Bots, This Software Is for You

Let them walk around endlessly. 

Lightspring/Shutterstock

With creative minds rallying against generative AI, the need to protect intellectual property is as high as ever. One developer, Aaron B, took a slightly sadistic route in trying to fight AI training crawlers by making them wander endlessly from one site to another.

His software, called Nepenthes, is a tarpit that generates endless sequences of pages with dozens of links. "Pages are randomly generated, but in a deterministic way, causing them to appear to be flat files that never change. Intentional delay is added to prevent crawlers from bogging down your server, in addition to wasting their time."

You can also add a Markov Babbler to help accelerate the LLM's collapse.

"It's less like flypaper and more an infinite maze holding a minotaur, except the crawler is the minotaur that cannot get out. The typical web crawler doesn't appear to have a lot of logic. It downloads a URL, and if it sees links to other URLs, it downloads those too. Nepenthes generates random links that always point back to itself – the crawler downloads those new links. Nepenthes happily just returns more and more lists of links pointing back to itself," Aaron B told 404 Media

Nepenthes can be used either defensively to flood out valid URLs on your site, preventing bots from accessing real content, or offensively by feeding them tons of useless data: "In short, let them suck down as much bullshit as they have diskspace for and choke on it."

You can check out Nepenthes here, but the creator warns that it is an intentionally malicious program, so proceed only if you're comfortable with it. Here is an example of how it works (yes, it's supposed to be that slow.) 

People unhappy with AI training on their works have been trying to come up with solutions against theft for a while. One of them is Glaze, a tool that lets artists apply "style cloaks" to their art to "mislead generative models that try to mimic a specific artist."

Talented coders come up with all sorts of ideas in pursuit of a weapon that can obstruct AI bots, and Nepenthes is here to offer its help.

"It's also sort of an art work, just me unleashing shear unadulterated rage at how things are going. I was just sick and tired of how the internet is evolving into a money extraction panopticon, how the world as a whole is slipping into fascism and oligarchs are calling all the shots - and it's gotten bad enough we can't boycott or vote our way out, we have to start causing real pain to those above for any change to occur," Aaron B said.

One Hacker News user claiming to be the CEO of SerpApi, a real-time API to access Google search results, claims that crawlers have a way to handle such infinite sites. "If that’s true, I’ve several million lines of access log that says even Google Almighty didn’t graduate," Aaron B responded.

What do you think about this solution? Join our 80 Level Talent platform and our new Discord server, follow us on InstagramTwitterLinkedInTelegramTikTok, and Threads, where we share breakdowns, the latest news, awesome artworks, and more.

Join discussion

Comments 0

    You might also like

    We need your consent

    We use cookies on this website to make your browsing experience better. By using the site you agree to our use of cookies.Learn more