Recent content by Arun

  1. A

    Speculation and Rumors: Nvidia Blackwell ...

    Some of the higher transistor density on B100 might just be a combination of more SRAM and maybe targeting slightly lower clocks (for lower power). It’s too early to conclude anything about it without more information on the microarchitecture imo.
  2. A

    AMD Execution Thread [2024]

    I hope the idle power & casual use battery life are somewhat competitive with my M2 Max 16” Macbook Pro, otherwise those specs are the exact opposite of a selling point for me… (I’d like to switch back to a PC laptop at some point, but there’s nothing remotely compelling at this point, so I hope...
  3. A

    AMD Execution Thread [2024]

    Uhm. I think you probably misunderstood my question. Compare these 2 graphs: -48% revenue but operating margin only dropped from 18% to 16%. Operating profit = Revenue - Cost Of Goods Sold (e.g. wafers from TSMC) - Operating Costs (e.g. R&D salaries). Typically when a company's revenue...
  4. A

    AMD Execution Thread [2024]

    Agreed, I don't quite understand, does anyone know? How does AMD's semi-custom business work in terms of gross and operating margins (for both consoles and Samsung)? Given the amounts of money involved, I assume they legally buy the chip from TSMC and sell it back to MS so there are costs...
  5. A

    NVIDIA discussion [2024]

    Yep absolutely, that sets a minimum of what is in there, just there’s probably more than we know - e.g. if raytracing was mostly per-SM but shared a bit of logic per-TPC, how could we possibly know that? These kind of implementation details really aren’t relevant for marketing material. What...
  6. A

    NVIDIA discussion [2024]

    I don’t think we “know” exactly what is in the SM vs TPC - e.g. based on my testing, I am fairly confident the L0 instruction caches are per-multiprocessor, the L1 instruction caches are per-TPC (32KiB, used to be 12KiB on Volta) and the L1.5 instruction/constant caches (128KiB) are per GPC. But...
  7. A

    TSMC wafer pricing

    For posterity's sake, I would like to commemorate this as the first of many times I personally get confused between the TSMC A16 process and the Apple A16 SoC...🎉:( (doesn't help I worked on one of them obviously) Anyway, Apple M2 Ultra would like to say hello... and if you think you can...
  8. A

    TSMC wafer pricing

    It does feel a bit like 16nm vs 20nm… In many ways, A16 feels like N2P+backside and not much else, just like 16nm was mostly just FinFET. It’s quite disappointing N2P lost backside power delivery, that must be a very welcome surprise for Intel. I wonder how A16 compares to the *original* N2P...
  9. A

    RDNA4

    This is just crazy baseless speculation based on no insider information, but the only way I can see those die sizes & specs being realistic is if at least several of the following are true: Either: There is little or no Infinity Cache. ... OR: Both 44 & 48 have new 6nm MCDs that are not...
  10. A

    AMD Execution Thread [2024]

    Do you mean gaming benchmarks are much less than 40%? I could imagine they have quite different cache characteristics (see: relative V-Cache benefit) but from a “core” point of view I’d expect the kind of changes that help SIR2017 to also help most other workloads including games, so I’m curious...
  11. A

    PS5 Pro *spawn

    BVH4 is just broken in my opinion, 64-byte per BVH4 with 128-byte cachelines doesn't make sense, you're still fetching the 2nd set of 64 bytes you often won't need. The intersection HW isn't as expensive as the memory hierarchy bandwidth. So BVH8 is basically a "free"(-ish) improvement and it...
  12. A

    Hardware implementation of threading models in contemporary GPUs

    You're right, I think in A100 they got up to 1 MMA/WMMA instruction every 2 clocks, maybe that depends on the size/precision variant though... However, AI code has quite a lot of integer multiply-adds for address generation and load instructions, so the instruction decoder might still sometimes...
  13. A

    Hardware implementation of threading models in contemporary GPUs

    TLDR: NVIDIA conditionals are more expensive than AMD conditionals partly as a result of Volta SIMT which is awesome but overkill for graphics/AI, but they could probably make it ~free with a bit more effort. This is a bump of a very old thread (pre-forum-closing-and-reopening) but a very...
  14. A

    NVIDIA discussion [2024]

    I think Google's original incentive for TPUs was to be a more dedicated ASIC... back when they were designed in 2013(!) and used in production in 2015(!) which is 2++ years before Volta! But the amount of "dedicated AI silicon" has increased significantly every single generation, and I agree...
  15. A

    NVIDIA discussion [2024]

    Skynet is the greatest threat humanity NVIDIA's stock price has ever seen. Jen-Hsun joking: "Room temperature comes in, Jacuzzi comes out" - Related: https://theconversation.com/swimming-pools-could-slash-bills-by-harvesting-heat-from-servers-heres-how-to-make-it-work-221693 (I'm not...
Back
Top