AI Crawlers Are Bullying Open Source: Stop Grousing and Go Away
April 25, 2025
AI algorithms are built on open source technology. Unfortunately generative AI is harming its mother code explains TechDirt: “AI Crawlers Are Harming Wikimedia, Bringing Open Source Sites To Their Knees, And Putting The Open Web At Risk.” To make generative AI work you need a lot of computer power, smart coding, and mounds of training data. Money can buy coding and power, but (quality) training data is incredibly difficult to obtain.
AI crawlers were unleashed on the Internet to scrap information and use it for training models. The biggest information providers for crawlers are Wikimedia projects and it’s a big problem. Wikimedia, which claims to be “the largest collection of open knowledge in the world,” says most of its traffic is from crawlers and it is eating into costs:
“Since January 2024, we have seen the bandwidth used for downloading multimedia content grow by 50%. This increase is not coming from human readers, but largely from automated programs that scrape the Wikimedia Commons image catalog of openly licensed images to feed images to AI models. Our infrastructure is built to sustain sudden traffic spikes from humans during high-interest events, but the amount of traffic generated by scraper bots is unprecedented and presents growing risks and costs.”
This is bad because it is straining the Wikimedia datacenter and budgetary resources. Wikimedia isn’t the only information source feeling the burn from AI crawlers. News sites and more are being wrung by crawlers for every decimal of information:
“It’s increasingly clear that the reckless and selfish way in which AI crawlers are being deployed by companies eager to tap into today’s AI hype is bringing many sites around the Internet to their knees. As a result, AI crawlers are beginning to threaten the open Web itself, and thus the frictionless access to knowledge that it has provided to general users for the last 30 years.”
Silicon Valley might have good intentions but dollars are more important. (Oh, I am not sure about the “good intentions.”)
Whitney Grace, April 25, 2025
Comments
Got something to say?