It occurs to me that this audience might not immediately understand how hard the chosen tasks are. I was fairly adversarial with my task selection.
Two of them are in RPython, an old dialect of Python 2.7 that chatbots will have trouble emitting because they're trained on the incompatible Python 3.x lineage. The odd task out asks for the bot to read Raku, which is as tough as its legendary predecessor Perl 5, and to write low-level code that is very prone to crashing. All three tasks must be done relative to a Nix flake, which is easy for folks who are used to it but not typical for bots. The third task is an open-ended optimization problem where a top score will require full-stack knowledge and a strong sense of performance heuristics; I gave two examples of how to do it, but by construction neither example can result in an S-tier score if literally copied.
This test is meant to shame and embarrass those who attempt it. It also happens to be a slice of the stuff that I do in my spare time.
