urbanists.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
We're a server for people who like bikes, transit, and walkable cities. Let's get to know each other!

Server stats:

577
active users

#Crawlers

0 posts0 participants0 posts today
jtgd<p><span class="h-card" translate="no"><a href="https://mastodon.world/@olliefrancis" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>olliefrancis</span></a></span></p><p>You know how people who BitTorrent have software and lists of IP addresses to block? We need something like that for <a href="https://sfba.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://sfba.social/tags/crawlers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>crawlers</span></a> to stop this.</p>
mr.w0bb1t<p>"[..] 65% of our most expensive traffic comes from <a href="https://tldr.nettime.org/tags/bots" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>bots</span></a>" · How <a href="https://tldr.nettime.org/tags/crawlers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>crawlers</span></a> impact the operations of the <span class="h-card" translate="no"><a href="https://wikimedia.social/@wikimediafoundation" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>wikimediafoundation</span></a></span> projects.</p><p>👉🏻 <a href="https://diff.wikimedia.org/2025/04/01/how-crawlers-impact-the-operations-of-the-wikimedia-projects/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">diff.wikimedia.org/2025/04/01/</span><span class="invisible">how-crawlers-impact-the-operations-of-the-wikimedia-projects/</span></a></p>
Camelia :tranarchy_a_nonbinary: 🇵🇸<p>analyzing logs from yesterday, it looks like <a href="https://tech.lgbt/tags/GPTBot" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>GPTBot</span></a> ALONE (among other <a href="https://tech.lgbt/tags/crawlers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>crawlers</span></a>) has been making several requests per second to my server FOR THE WHOLE DAY (during which I wasn't able to access my server, unfortunately). It constantly sent requests to the same page, over and over again, until I was able to block it.</p>
Inautilo<p><a href="https://mastodon.social/tags/Business" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Business</span></a> <a href="https://mastodon.social/tags/Introductions" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Introductions</span></a><br>Meet LLMs.txt · A proposed standard for AI website content crawling <a href="https://ilo.im/16318s" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">ilo.im/16318s</span><span class="invisible"></span></a></p><p>_____<br><a href="https://mastodon.social/tags/SEO" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>SEO</span></a> <a href="https://mastodon.social/tags/GEO" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>GEO</span></a> <a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/Bots" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Bots</span></a> <a href="https://mastodon.social/tags/Crawlers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Crawlers</span></a> <a href="https://mastodon.social/tags/LlmsTxt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LlmsTxt</span></a> <a href="https://mastodon.social/tags/RobotsTxt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RobotsTxt</span></a> <a href="https://mastodon.social/tags/Development" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Development</span></a> <a href="https://mastodon.social/tags/WebDev" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebDev</span></a> <a href="https://mastodon.social/tags/Backend" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Backend</span></a></p>
Βασίλης Βαλατσός<p>The best way to stop <a href="https://social.apotheke.earth/tags/chinese" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Chinese</span></a> <a href="https://social.apotheke.earth/tags/ai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ai</span></a> <a href="https://social.apotheke.earth/tags/crawlers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>crawlers</span></a> DDOSing your site is to make sure you are blocked by the <a href="https://social.apotheke.earth/tags/great" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Great</span></a> <a href="https://social.apotheke.earth/tags/firewall" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Firewall</span></a></p><p><a href="https://aethrvmn.gr/glory-ccp" rel="nofollow noopener noreferrer" target="_blank">https://aethrvmn.gr/glory-ccp</a></p><p>Now I just need a copypasta like that for <a href="https://social.apotheke.earth/tags/israel" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Israel</span></a> so that <a href="https://social.apotheke.earth/tags/us" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>US</span></a> crawlers also stop crawling</p>
Kevin Karhan :verified:<p><span class="h-card" translate="no"><a href="https://fedi.chadthundercock.com/@chloe" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>chloe</span></a></span> is this a download-bomb to trap <a href="https://infosec.space/tags/Crawlers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Crawlers</span></a>?</p>
Veronica Olsen 🏳️‍🌈🇳🇴🌻<p>Who could have guessed that an industry whose entire business model is based on theft would behave like malware attacks on the Internet? 🤔</p><p><a href="https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries/?utm_source=mastodon&amp;utm_medium=social" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">arstechnica.com/ai/2025/03/dev</span><span class="invisible">s-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries/?utm_source=mastodon&amp;utm_medium=social</span></a></p><p><a href="https://mastodon.online/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://mastodon.online/tags/DDoS" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>DDoS</span></a> <a href="https://mastodon.online/tags/Crawlers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Crawlers</span></a></p>
William Shotts<p>Using <a href="https://mstdn.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> to fight AI.</p><p>Cloudflare builds an AI to lead AI scraper bots into a horrible maze of junk content • The Register<br><a href="https://www.theregister.com/2025/03/21/cloudflare_ai_labyrinth/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">theregister.com/2025/03/21/clo</span><span class="invisible">udflare_ai_labyrinth/</span></a></p><p><a href="https://mstdn.social/tags/Cloud" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Cloud</span></a> <a href="https://mstdn.social/tags/CloudFlare" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>CloudFlare</span></a> <a href="https://mstdn.social/tags/crawlers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>crawlers</span></a></p>
Albert Cardona<p>Web crawlers are out of control. If anyone has a suggestion on how to block them entirely, not via robots.txt but with e.g., a prompt that only a human can answer, I'd appreciate it.</p><p>Here, a list of the operating systems of my website's visitors.</p><p><a href="https://mathstodon.xyz/tags/webmaster" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>webmaster</span></a> <a href="https://mathstodon.xyz/tags/crawlers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>crawlers</span></a></p>
Hacker News<p>A sysadmin's rant about feed readers and crawlers (2022) — <a href="http://rachelbythebay.com/w/2022/03/07/get/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">http://</span><span class="ellipsis">rachelbythebay.com/w/2022/03/0</span><span class="invisible">7/get/</span></a><br><a href="https://mastodon.social/tags/HackerNews" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>HackerNews</span></a> <a href="https://mastodon.social/tags/sysadmin" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>sysadmin</span></a> <a href="https://mastodon.social/tags/rant" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>rant</span></a> <a href="https://mastodon.social/tags/feedreaders" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>feedreaders</span></a> <a href="https://mastodon.social/tags/crawlers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>crawlers</span></a> <a href="https://mastodon.social/tags/technology" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>technology</span></a> #2022</p>
mr.w0bb1t<p>Trap <a href="https://tldr.nettime.org/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> Aggressive <a href="https://tldr.nettime.org/tags/Crawlers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Crawlers</span></a> .. This is a simple proof of concept of using a Markov Chain to generate an infinitely large website.</p><p>👉🏻 <a href="https://github.com/gw1urf/spigot" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">github.com/gw1urf/spigot</span><span class="invisible"></span></a></p>
Juan<p>This is a terrifying read.</p><p>2025-01-23 The bots are at it again | <a href="https://alexschroeder.ch/view/2025-01-23-bots-devouring-the-web" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">alexschroeder.ch/view/2025-01-</span><span class="invisible">23-bots-devouring-the-web</span></a></p><p>Sure, my website is "tiny", 100% static, and very fine-tuned (I like to think); but if I had some dynamic content that was mildly popular... oh, dear.</p><p><a href="https://mastodon.gamedev.place/tags/web" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>web</span></a> <a href="https://mastodon.gamedev.place/tags/bots" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>bots</span></a> <a href="https://mastodon.gamedev.place/tags/crawlers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>crawlers</span></a> <a href="https://mastodon.gamedev.place/tags/waste" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>waste</span></a></p>
Max Resing<p>It looks like LLM-producing companies that are massively <a href="https://infosec.exchange/tags/crawling" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>crawling</span></a> the <a href="https://infosec.exchange/tags/web" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>web</span></a> require the owners of a website to take action to opt out. Albeit I am not intrinsically against <a href="https://infosec.exchange/tags/generativeai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>generativeai</span></a> and the acquisition of <a href="https://infosec.exchange/tags/opendata" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>opendata</span></a>, reading about hundreds of dollars of rising <a href="https://infosec.exchange/tags/cloud" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>cloud</span></a> costs for hobby projects is quite concerning. How is it accepted that hypergiants skyrocket the costs of tightly budgeted projects through massive spikes in egress traffic and increased processing requirements? Projects that run on a shoestring budget and are operated by volunteers who dedicated hundreds of hours without any reward other than believing in their mission?</p><p>I am mostly concerned about the default of opting out. Are the owners of those projects required to take action? Seriously? As an <a href="https://infosec.exchange/tags/operator" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>operator</span></a>, it would be my responsibility to methodically work myself through the crawling documentation of the hundreds of <a href="https://infosec.exchange/tags/LLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLM</span></a> <a href="https://infosec.exchange/tags/web" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>web</span></a> <a href="https://infosec.exchange/tags/crawlers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>crawlers</span></a>? I am the one responsible for configuring a unique crawling specification in my robots.txt because hypergiants make it immanently hard to have generic <a href="https://infosec.exchange/tags/opt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>opt</span></a>-out configurations that tackle LLM projects specifically?</p><p>I reject to accept that this is our new norm. A norm in which hypergiants are not only methodically exploiting the work of thousands of individuals for their own benefit and without returning a penny. But also a norm, in which the resource owner is required to prevent these crawlers from skyrocketing one's own operational costs?</p><p>We require a new <a href="https://infosec.exchange/tags/opt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>opt</span></a>-in. Often, public and open projects are keen to share their data. They just don't like the idea of carrying the unpredictable, multitudinous financial burden of sharing the data without notice from said crawlers. Even <a href="https://infosec.exchange/tags/CommonCrawl" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>CommonCrawl</span></a> has safe-fail mechanisms to reduce the burden on website owners. Why are LLM crawlers above the guidelines of good <a href="https://infosec.exchange/tags/Internet" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Internet</span></a> citizenship?</p><p>To counter the most common argument already: Yes, you can deny-by-default in your robots.txt, but that excludes any non-mainstream browser, too.</p><p>Some concerning <a href="https://infosec.exchange/tags/news" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>news</span></a> articles on the topic:</p><ul><li><a href="https://archive.is/nQ6Gk" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">archive.is/nQ6Gk</span><span class="invisible"></span></a></li><li><a href="https://archive.is/CRwVs" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">archive.is/CRwVs</span><span class="invisible"></span></a></li></ul><p><a href="https://infosec.exchange/tags/webcrawling" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>webcrawling</span></a> <a href="https://infosec.exchange/tags/crawler" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>crawler</span></a> <a href="https://infosec.exchange/tags/web" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>web</span></a> <a href="https://infosec.exchange/tags/opensource" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>opensource</span></a></p>