433
DeepSeek ditches Nvidia for Huawei chips in V4 launch
(cybernews.com)
This is a most excellent place for technology news and articles.
Yeah I can believe their interconnect is better, given their extensive history in networking.
W.r.t TFLOPs, let me clarify what I meant. Even on traditionally compute-bound workloads (attention, etc.), on H200 it's actually surprisingly difficult to make full use of the card's throughput before hitting VRAM bandwidth limits. Tensor core throughput has grown a lot faster than bandwidth has.
I've never written a kernel for Huawei chips so I have no idea if they have the same problem. But this problem is there on many datacenter-class NVIDIA chips, which is why they keep introducing features (TMA, TMEM, etc.) to try and lower the time wasted waiting for memory.