Reddit will block the Internet Archive

Draugoth

Gold Member
STK115_Reddit_01.jpg.webp


Source
Reddit says that it has caught AI companies scraping its data from the Internet Archive's Wayback Machine, so it's going to start blocking the Internet Archive from indexing the vast majority of Reddit. The Wayback Machine will no longer be able to crawl post detail pages, comments, or profiles; instead, it will only be able to index the Reddit.com homepage, which effectively means Internet Archive will only be able to archive insights into which news headlines and posts were most popular on a given day.

"Internet Archive provides a service to the open web, but we've been made aware of instances where AI companies violate platform policies, including ours, and scrape data from the Wayback Machine," spokesperson Tim Rathschmidt tells The Verge.
The Internet Archive's mission is to keep a digital archive of websites on the internet and "other cultural artifacts," and the Wayback Machine is a tool you can use to look at pages as they appeared on certain dates, but Reddit believes not all of its content should be archived that way."Until they're able to defend their site and comply with platform policies (e.g., respecting user privacy, re: deleting removed content) we're limiting some of their access to Reddit data to protect redditors," Rathschmidt says.
 
I think I saw a metric that Reddit is the most sourced website for LLM development. Not twitter, not instagram or facebook, but reddit.
 
"Until they're able to defend their site and comply with platform policies (e.g., respecting user privacy, re: deleting removed content) we're limiting some of their access to Reddit data to protect redditors," Rathschmidt says.
...the entire point is that it preserves history as is; like a snapshot in time.
It's like being upset at a library for having archived old newspapers with interviews/articles of people that later wanted them removed.
 
Reddit blocks everything. You'll have better success breaking into Area 51 unnoticed than you would successfully making a thread on a subreddit without it being auto deleted or removed minutes after there.
 
Reddit wants AI companies to go to them and pay them. Not get the data for free off Internet Archive.
It's nuts that reddit feels like it owns the data posted to that site, to the point of direct monetization at least.

And I say that as someone who worked at reddit (about 10 years ago) and still runs one of the top five subreddits on the site.
 
Smart move by Reddit. User data is the new gold in the AI economy, and the open web basically has zero of this anymore. Reddit data is much easier to label than say TikTok data.

But we definitely need some new unscrapable protocols.
 
Good. Reddit can spread less of its poison.
Just save the hassle and block Reddit. From everything 🤷‍♂️
No wonder most of the AI's are fucking unhinged.
AI should stop using reddit.
Have they considered that their userbase is braindead and that it would make the AI dumber?
You guys have to look at more than just all, popular, news, and frontpage sections. It's a vast ocean out there.
 
AI has given so many responses absolutely confident in its bullshit. And when I question or point out why it's wrong, it doubles down often.

Just like so many top-rated Reddit comments.
 
Have they considered that their userbase is braindead and that it would make the AI dumber?
I learned to invest because of Reddit (not YOLO WSB regard stuff).

Anytime I need to search for legit answers for something- Reddit has been my go-to.

Yea- I bought their stock, but its because I use it. Same goes for my AMD positions.

Considering my experience with Reddit, you have a mix of good and bad, and it largely comes down to political spectrum sadly.
 
You guys have to look at more than just all, popular, news, and frontpage sections. It's a vast ocean out there.
The issue with Reddit is of course that it's a huge network of silos, that's how the site is structured

Large subs are basically cesspools of bots manufacturing woke socialist consensus and the worst examples of mindbroken humans in existence

There are thousands of small subs about every possible special subject imaginable which kind of orbit around the shithole of large subs and largely don't interact with them. For any number of niche hobbies, games, and interests, Reddit basically is the go-to for discussion about those topics. These small worlds are like the stars in a galaxy, and the large subs are like the supermassive black hole at the center of the galaxy. Sometimes a sub becomes too large and gets sucked into the enormous black hole in the center, such as what happened to r/wallstreetbets which used to be a nice sub about meme investing but has now become just another shithole after it got way too large

Oddly enough, there are still some surviving networks on Reddit which continue to thrive after almost all conservatives were purged from the site over the years. Any conservatives who were caught in the waves of purges when subs like r/thedonald were deleted either were sucked in the black hole and ceased to exist, or fled towards the Outer Rim and hid in small subs far from the center of the galaxy

And yet despite all this, Reddit continues to host the world's largest network of discussions about firearms. Yes, Reddit is the biggest place to talk about guns. Somehow the Reddit gun network has not yet been completely purged, although I can't imagine it will be allowed to remain unmolested forever because the large subs are filled with people who hate guns and hate people who own guns

Reddit is a strange place. That's the bottom line
 
Last edited:
Top Bottom