this post was submitted on 26 Jul 2024
662 points (97.4% liked)

Technology

59168 readers
2273 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 4 points 3 months ago (1 children)

I'm kind of curious to understand how they're blocking other search engines. I was under the impression that search engines just viewed the same pages we do to search through, and the only way to 'hide' things from them was to not have them publicly available. Is this something that other search engines could choose to circumvent if they decided to?

[–] [email protected] 11 points 3 months ago (1 children)

Search engine crawlers identify themselves (user agents), so they can be prevented by both honor-based system (robots.txt) and active blocking (error 403 or similar) when attempted.

[–] [email protected] 2 points 3 months ago (1 children)

Thank you, I understand better now. So in theory, if one of the other search engines chose to not have their crawler identify itself, it would be more difficult for them to be blocked.

[–] [email protected] 1 points 3 months ago

This is where you get into the whole webscraping debate you also have with LLM "datasets".

If you, as a website host, are detecting a ton of requests coming from a singular IP you can block said address. There are ways around that by making the requests from different IP addresses, but there are other ways to detect that too!

I'm not sure if Reddit would try to sue Microsoft or DDG if they started serving results anyway through such methods. I don't believe it is explicitly disallowed.
But if you were hoping to deal in any way with Reddit in the future I doubt a move like this would get you in their good graces.

All that is to say; I won't visit Reddit at all anymore now that their results won't even show up when I search for something. This is a terrible move and will likely fracture the internet even more as other websites may look to replicate this additional source of revenue.