Galleries, Libraries, Archives, and Museums (GLAMS) and their digitized collections and data are focus of GLAM-E Lab
The GLAM-E Lab is a joint initiative between the Centre for Science, Culture and the Law at the University of Exeter and the Engelberg Center on Innovation Law & Policy at NYU Law. It uses direct representation to develop model policies and terms for cultural institutions that are creating open access programs.
Are AI Bots Knocking Cultural Heritage Offline?
In late 2024, isolated reports began to appear from individual online cultural heritage collections. Those reports described servers and collections straining – and sometimes breaking – under the load of swarming bots. The bots were reportedly scraping all of the data from collections to build datasets to train AI models. This activity was overwhelming the systems designed to keep those collections online.
The GLAM-E Lab launched a survey and published the report, Are AI Bots Knocking Cultural Heritage Offline? in 2025.1 It focused on digitized collections and data connected to GLAMs – Galleries, Libraries, Archives, and Museums. Here are key findings:
Bots are widespread, although not universal. Of 43 respondents, 39 had experienced a recent increase in traffic. Twenty-seven of the 39 respondents experiencing an increase in traffic attributed it to AI training data bots, with an additional seven believing that bots could be contributing to the traffic.
This increase in traffic has been hard to anticipate because few respondents were actively tracking bot traffic prior to the bots triggering a crisis in their collection. Many respondents did not realize they were experiencing a growth in bot traffic until the traffic reached the point where it overwhelmed the service and knocked online collections offline.
Some respondents have been seeing an increase in bot traffic since 2021, while others did not experience their first spike until 2025.
Some bots clearly identify themselves, while others take a range of measures to hide their source.
When bots come, they tend to swarm for relatively brief periods of time. The frequency of these swarms may be increasing.
Robots.txt is not currently an effective way to prevent bots from overwhelming collections.
Respondents are deploying a range of home-grown and third-party firewall-based countermeasures to try to screen out bots based on IP address, geography, domain, and user agent string. Some of these efforts appear to be effective, although few are confident that they will be sustainable in the long term.
Respondents are reluctant to take more aggressive steps to move collections behind things like login screens for a variety of reasons, including concerns about how effective those measures will be in the medium term, that implementing those changes can have negative impacts on welcome users, and whether login-based restrictions run counter to their larger goal of making the collections easily available online.
Respondents worry that swarms of AI training data bots will create an environment of unsustainably escalating costs for providing online access to collections.
Interview with Michael Weinberg Co-Director of the GLAM-E Lab, about the study and what this phenomenon portends for information access and sustainable infrastructures.
Read the interview at the Scholarly Kitchen2 with Michael Weinberg, Co-Director of GLAM-E Lab conducted by Lisa Janicke Hinchliffe.3
Q. What risks do you foresee if bot-related overloads continue unchecked, especially for smaller or underfunded institutions? How might this situation affect the public mission of GLAMs in the next 3 to 5 years?
A. I worry that bots will drive up costs for hosting current collections. It may also make institutions reluctant to create new open access and digitally accessible collections.
That would be a bad outcome for everyone involved, including the entities deploying the bots in the first place. Everyone benefits when there is a sustainable way to share collections online.
This concern was one of the significant reasons we decided to move forward with the report. There are a range of conversations happening around AI these days. One of the primary goals of this report is to isolate the technical impact that training bots are having on collections.
More- You can read the entire interview here: Are AI Bots Knocking Digital Collections Offline? An Interview with Michael Weinberg.
Michael Weinberg. 2025. Are AI Bots Knocking Cultural Heritage Offline? GLAM e-Lab. This report captures the impact that bots building datasets for AI model training are having on online cultural collections in early 2025. PDF below.
The Scholarly Kitchen was established in 2008 by Society for Scholarly Publishing. Its mission is “[t]o advance scholarly publishing and communication, and the professional development of its members through education, collaboration, and networking.” The Scholarly Kitchen is a moderated and independent blog aimed to help fulfill this mission by bringing together differing opinions, commentary, and ideas, and presenting them openly.
Keep SSP members and interested parties aware of new developments in publishing
Point to research reports and projects
Interpret the significance of relevant research in a balanced way (or occasionally in a provocative way)
Suggest areas that need more input by identifying gaps in knowledge
Translate findings from related endeavors (publishing outside STM, online business, user trends)
Attract the community of STM information experts interested in these things and give them a place to contribute
Lisa Janicke Hinchliffe is Professor as well as Coordinator for Research Professional Development in the University Library at the University of Illinois at Urbana-Champaign.
Pity the scrapers don’t share. The arrival of AI private datasets means each seeks to populate their models thus a myriad of bots arrive to collect their data. Maybe each seeking particular data or are trying to digest it all. I can see the traffic dilemma. Can’t there be some sharing of the haul? I have no real idea of what data they seek nor what they need. Not my study area.
Fee collection for bot behavior?
Once upon a time there was Deja Vu database of Usenet which was immense and quite useful for history. My stuff was there. Then Google bought it and I thought it was forever. Not so. Google has dumped a lot after their purpose was filled. But that was just as storage costs plummeted, today?
So the bot collected data likely the same. AI company doesn’t make it and the stored data likely gone.