Recent content by Ext3h

  1. Ext3h

    Speculation and Rumors: Nvidia Blackwell ...

    That would be my expectation, yes. If there is any hardware, all 3 formats are supported. Uncertain about that, not on consumer grade silicon. 64GB/s would be over engineered for several more SSD generations. But I do expect that non-consumer silicon does achieve that data rate, and be it only...
  2. Ext3h

    Speculation and Rumors: Nvidia Blackwell ...

    Not bitstream compatible, but only a couple of bit shuffles and extra padding bits away from being. Also more constrained. If you have a hardware deflate decompressor, is trivial to extend it for GDeflate. But it does require to patch the frontend. Same as LZ4 is mostly just Deflate with a...
  3. Ext3h

    Speculation and Rumors: Nvidia Blackwell ...

    What you mean by mistake? Take a look at the timeline. By the point NVIDIA sold/licensed GDeflate to Microsoft, they had already the hardware decompression unit ready. "Hardware accelerated JPEG decompression" in Hopper rings a bell? They had deflate support, 2 years ago. GDeflate nicely fits...
  4. Ext3h

    Speculation and Rumors: Nvidia Blackwell ...

    Worth mentioning that the CPU typically can't handle the data rates necessary due to being too constrained in cache size / memory bandwidth to run decompression with large dictionaries on all cores simultaneously. Even spilling to L2 cache doesn't scale well on the CPU, LZ4 is actually even...
  5. Ext3h

    Shader Compilation on PC: About to become a bigger bottleneck?

    It actually didn't feel like Horizon Forbidden West managed to get around the stuttering issues at all, despite PSO collection. Even in the intro cinematic scenes, there's plenty of assets just popping in, and the stutter in some of the level transitions is massive. Can't tell whether it's...
  6. Ext3h

    Next gen lighting technologies - voxelised, traced, and everything else *spawn*

    https://github.com/NVIDIAGameWorks/SHARC/blob/180f825f5fbae03aaaa593727d80d1094f913f00/shaders/Include/HashGridCommon.h#L103 Didn't use surfels. Just a subdivision of the voxel into 8 axis aligned primary normal orientations. And it's simply recording voxels in world coordinates. With no extra...
  7. Ext3h

    Next gen lighting technologies - voxelised, traced, and everything else *spawn*

    Is SHaRC spatial only, or direction aware? Asking because the demo case appeared to have painfully avoided geometry where different lighting conditions would have applied to the front/back facing side a geometry thinner than the voxel size. Simply didn't bother yet to take a look at the actual...
  8. Ext3h

    RDNA4

    Which is perfectly fine for rendering in screen-space, I suppose? You are still traversing after all, you did amortize your costs on the cluster level and had that capped to a constant number of clusters. However, once you leave screen-space or any form of coherent perspective projection and go...
  9. Ext3h

    RDNA4

    Likewise. CU local, but actually dedicated to this functionality as it does need to evaluate some logic for the actual cache lookup. Just a rough guestimate, but about a 100 entries (assuming an up to 10 level deep BVH tree up to the final triangle) should be able achieve a somewhat steady level...
  10. Ext3h

    RDNA4

    It doesn't :) A cached nearest-hit is just a (yet to be confirmed) any-hit (or no-match due to rejection of the entire BLAS / flag based rejection) for another ray. So it's safe to share within a TLAS. Only when accepted, it's very likely to be not just a random any-hit but actually a good...
  11. Ext3h

    RDNA4

    Okay, so the patent exactly matches what you'd naively assume in 1 man-hour of brainstorming that a cache in that spot would do. That being patented is extremely bad news then, as working around that patent appears to be quite hard, and that guiding effect is a basic necessity to turn it...
  12. Ext3h

    RDNA4

    That's probably something different. If you have a cached path all the way to the most recently used triangle soup (or a non-exposed sub-group inside such), and you end up finding a hit there for an (at least somewhat) coherent ray, it's an instant, "free" hit without having to repeat any part...
  13. Ext3h

    RDNA4

    So far it's plausible though. Unless we see new instructions which hint to a fully offloaded, fixed function traversal approach, it's still going to be the same limitations as with the RDNA3 - too much register pressure, too much synchronous alteration between hit shaders and triangle soup...
  14. Ext3h

    DirectStorage GPU Decompression, RTX IO, Smart Access Storage

    There was a catch - the NVMe spec didn't specify that different queues should have been flagged to permit out-of-order transfers so that a stalling CPU memory controller wouldn't prevent transfers to the GPU and vice versa. It didn't matter when you use one queue per CPU core (which simplifies...
  15. Ext3h

    DirectStorage GPU Decompression, RTX IO, Smart Access Storage

    Actually you don't. Well, kind of don't. You can just take 95% of the WebP codec but substitute the final deflate implementation with GDeflate. Everything else in that codec is already a perfect fit for the GPU. And suddenly you got in on the GPU, with all the benefits. Benefits such as: Mipmap...
Back
Top