this post was submitted on 24 May 2025
1447 points (99.1% liked)
Science Memes
14659 readers
2516 users here now
Welcome to c/science_memes @ Mander.xyz!
A place for majestic STEMLORD peacocking, as well as memes about the realities of working in a lab.
Rules
- Don't throw mud. Behave like an intellectual and remember the human.
- Keep it rooted (on topic).
- No spam.
- Infographics welcome, get schooled.
This is a science community. We use the Dawkins definition of meme.
Research Committee
Other Mander Communities
Science and Research
Biology and Life Sciences
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- !reptiles and [email protected]
Physical Sciences
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
Humanities and Social Sciences
Practical and Applied Sciences
- !exercise-and [email protected]
- [email protected]
- !self [email protected]
- [email protected]
- [email protected]
- [email protected]
Memes
Miscellaneous
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
It does respect robots.txt, but that doesn't mean it won't index the content hidden behind robots.txt. That file is context dependent. Here's an example.
Site X has a link to sitemap.html on the front page and it is blocked inside robots.txt. When Google crawler visits site X it will first load robots.txt and will follow its instructions and will skip sitemap.html.
Now there's site Y and it also links to sitemap.html on X. Well, in this context the active robots.txt file is from Y and it doesn't block anything on X (and it cannot), so now the crawler has the green light to fetch sitemap.html.
This behaviour is intentional.