Recent content by Qesa

  1. Q

    RDNA4

    People don't know/don't care that RDNA2 achieved near-parity to Ampere through a higher bill of materials on a more advanced node, and don't know/don't care that for RDNA3/Ada that situation is reversed.
  2. Q

    RDNA4

    Why would you expect a N4 chip to have perf/xtor more in line with N6 than N5? All of which are on the same node. Notably, while the high level architecture is extremely similar except for clocking much higher, they're all worse perf/xtor than Ampere on 8lpp. Once again, all on the same node...
  3. Q

    NVIDIA discussion [2024]

    H100 is already pretty much a bundle of matrix multiply units though. They can strip out the vector and high precision stuff - and Blackwell is already doing the latter - but just how much of a SM's floor plan is currently dedicated to them? I don't think it's a lot. It doesn't appear to be...
  4. Q

    AMD Execution Thread [2024]

    Back in 2023 they disclosed Sony was 16% of their revenue in 2022, which came to $3.8B for the year. With Xbox as well, easily over a billion per quarter for consoles
  5. Q

    Speculation and Rumors: Nvidia Blackwell ...

    Do we actually want to load the CPU less? Generally in games the GPU is loaded 100% while the CPU has idle cores. And decompressing assets is easy to offload so it shouldn't be something contributing to being single-thread bound. Even if GPU decompression is considerably faster than GPU, the...
  6. Q

    Speculation and Rumors: Nvidia Blackwell ...

    Sorry, not their white paper, their slide deck Specifically this one: https://regmedia.co.uk/2023/12/06/amd_mi300_wire_rates.jpg
  7. Q

    Speculation and Rumors: Nvidia Blackwell ...

    So you're saying AMD are just lying in their white paper?
  8. Q

    Speculation and Rumors: Nvidia Blackwell ...

    MI300 has 4.8 TB/s bidirectional bisection bandwidth, that's less than half B100
  9. Q

    GTC 2024

    Assuming high level arch is largely unchanged compared to Hopper, it seems like a huge layout improvement. In a similarly sized die on the same node, it's got 20% more SMs, 2x L2$, 30% less power, at similar clock speeds
  10. Q

    RDNA3 Efficiency [Spinoff from RDNA4]

    I assume he's counting the 7900 XTX as 12288 cores rather than 6144. The issue isn't so much scheduling but register file bandwidth. Each SIMD lane can only load 4 values from registers per clock cycle. A single FMA needs 3, so it's impossible to do two together unless the value is already...
  11. Q

    Speculation and Rumors: Nvidia Blackwell ...

    They're well overdue for an architecture overhaul. Hopefully that will yield some perf/mm^2 and perf/W benefits independent of node. Perf/mm^2 regressed with Turing (even on the TU20x chips so it's not all attributable to RT/tensor cores) and while it's hard to compare Ampere and Ada thanks to...
  12. Q

    RDNA3 Efficiency [Spinoff from RDNA4]

    This doesn't really make sense as a concept. You can reduce clocks and voltage to exchange perf/mm^2 for perf/W. Case in point is N33 itself, 7600 XT has similar perf/W to 6600 XT (but much better perf/mm^2) while 7600 has ~20% better perf/W (and still slightly better perf/mm^2)
  13. Q

    AMD Execution Thread [2023]

    The only plausible configurations are an overclocked 7600 or further cut down 7700XT with 2 MCDs
  14. Q

    AMD Execution Thread [2023]

    With a snooper for eventual coherence. But also just don't expect coherence without using atomics or synchronisation. Likewise. However I've seen a fair bit of talk about it seamlessly behaving like a single chip which still doesn't seem to be the case.
  15. Q

    AMD Execution Thread [2023]

    The LLC being attached to memory controllers seems like an interesting decision. So the cache in each tile will only include data in memory in the same tile. If you're naively treating it as UMA, 3/4 of your L2 hits are going to be across tiles served at 1.2-1.5 TB/s rather than the on-paper 17...
Back
Top