this post was submitted on 31 Jul 2024
28 points (100.0% liked)

Chat

7500 readers
22 users here now

Relaxed section for discussion and debate that doesn't fit anywhere else. Whether it's advice, how your week is going, a link that's at the back of your mind, or something like that, it can likely go here.


Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago
MODERATORS
 

I am a Kagi user and have been for 7 months now. I signed up for the 300 searches per month plan because I felt like that would fit me well enough and it turns out I average 87 searches per month. That's a lot lower than I thought it would be by quite a bit.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 4 points 3 months ago (2 children)

reddit recently updated their robots.txt to disallow all crawlers Google paid a bunch of money to have access to crawl reddit

You'll still see old stuff, but crawlers that care about robots.txt will get no new information.

[–] [email protected] 6 points 3 months ago (1 children)

The best part of that robots.txt is:

Reddit believes in an open internet, but not the misuse of public content.

Sure Jan.

[–] [email protected] 2 points 3 months ago

Reddit believes in ~~an open~~ pay to access internet, but not the ~~mis~~use of ~~public content~~ our content we didn't make.

[–] [email protected] 1 points 3 months ago (1 children)
[–] [email protected] 3 points 3 months ago

A few possibilities,

  1. brave has state in the past they use the googlebot user agent, if all reddit does is check the useragnet, it won't block brave. This does however mean brave is violating the robots.txt file.
  2. I saw some mentions of google fall back, I don't know if they still do it, but that could be another possibility.
  3. Brave ignores robots.txt files
  4. Brave paid for access

No matter the reason, well behaving crawlers will no longer crawl reddit, Everything is disallowed in the robots.txt

User-agent: *
Disallow: /