okay so that post’s core supposition (“using ptx instead of cuda”) is just ~~fucking wrong~~ fucking weird and I’m not going to spend time on it, but it links to this tweet which has this:
DeepSeek customized parts of the GPU’s core computational units, called SMs (Streaming Multiprocessors), to suit their needs. Out of 132 SMs, they allocated 20 exclusively for server-to-server communication tasks instead of computational tasks
this still reads more like simply tuning allocation than outright scheduler and execution control (which your post alluded to)
[x] doubt
e: original wording because cuda still uses ptx anyway, whereas this post looks like it’s saying “they steered ptx directly”. at first I read the tweet more like “asm vs python” but it doesn’t appear to be what that part meant to convey. still doubting the core hypothesis tho
on the one hand, I want to try find which ~~vendor marketing material~~ "research paper" that paragraph was copied from, but on the other... after yesterday's adventures trying to get data out of PDFs and c.o.n.s.t.a.n.t.l.y getting "hey how about this LLM? it's so good![0]" search results, I'm fucking exhausted
[0]: also most of these are paired with pages of claims of competence and feature boasts, and then a quiet "psssst: also it's a service and you send us your private data and we'll do with it whatever we want" as hidden as they can manage