Science Memes

14659 readers

2516 users here now

Welcome to c/science_memes @ Mander.xyz!

A place for majestic STEMLORD peacocking, as well as memes about the realities of working in a lab.

Rules

Don't throw mud. Behave like an intellectual and remember the human.
Keep it rooted (on topic).
No spam.
Infographics welcome, get schooled.

This is a science community. We use the Dawkins definition of meme.

Research Committee

[email protected]

Other Mander Communities

Science and Research

Biology and Life Sciences

Physical Sciences

Humanities and Social Sciences

Practical and Applied Sciences

Memes

Miscellaneous

founded 2 years ago

MODERATORS

[email protected]

1447

Black Mirror AI (mander.xyz)

submitted 1 day ago by [email protected] to c/[email protected]

200 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 2 points 18 hours ago

It does respect robots.txt, but that doesn't mean it won't index the content hidden behind robots.txt. That file is context dependent. Here's an example.

Site X has a link to sitemap.html on the front page and it is blocked inside robots.txt. When Google crawler visits site X it will first load robots.txt and will follow its instructions and will skip sitemap.html.

Now there's site Y and it also links to sitemap.html on X. Well, in this context the active robots.txt file is from Y and it doesn't block anything on X (and it cannot), so now the crawler has the green light to fetch sitemap.html.

This behaviour is intentional.