okay so that post’s core supposition (“using ptx instead of cuda”) is just ~~fucking wrong~~ fucking weird and I’m not going to spend time on it, but it links to this tweet which has this:
DeepSeek customized parts of the GPU’s core computational units, called SMs (Streaming Multiprocessors), to suit their needs. Out of 132 SMs, they allocated 20 exclusively for server-to-server communication tasks instead of computational tasks
this still reads more like simply tuning allocation than outright scheduler and execution control (which your post alluded to)
[x] doubt
e: original wording because cuda still uses ptx anyway, whereas this post looks like it’s saying “they steered ptx directly”. at first I read the tweet more like “asm vs python” but it doesn’t appear to be what that part meant to convey. still doubting the core hypothesis tho
honest question: was this meant seriously, or in jest?