Recent content by nAo

  1. N

    GTC 2024

    Damn, I wish I was down in CA! Enjoy GTC :-)
  2. N

    AMD Execution Thread [2023]

    Both TensorRT and TensorRT-LLM are open source: https://github.com/NVIDIA/TensorRT https://github.com/NVIDIA/TensorRT-LLM
  3. N

    AMD Execution Thread [2023]

    How big is MI300?
  4. N

    AMD made a mistake letting NVIDIA drive Ray Tracing APIs

    The idea of opening up the BVH formats is appealing on paper, especially for very simple HW implementations, but there is huge hidden cost in doing so. Once it's out, like with an ISA, you have to support it forever, on all past, present and future HW. This makes even less sense for something...
  5. N

    AMD made a mistake letting NVIDIA drive Ray Tracing APIs

    How would exposing a ray-triangle intersection instruction help accelerating Nanite? It would affect a fraction of some code that already is a relatively small fraction of the total frame time. Doesn't sound like much big of a deal. Moreover, would it really accelerate anything when it would...
  6. N

    GART: Games and Applications using RayTracing

    DMMs are not based on this paper.
  7. N

    AMD RX 7900XTX and RX 7900XT Reviews

    The fact it can achieve very high clocks on irregular workloads (blender?) where the CUs might be stalling a lot (i.e. not consuming as much power..) could suggest that it is using more power than expected. OTOH on gaming workloads clocks are lower because it's much better utilized and it's more...
  8. N

    AMD RX 7900XTX and RX 7900XT Reviews

    It takes significant architectural effort to add 30% more cores and get 30% more performance. It ain't easy :)
  9. N

    AMD RDNA3 Specifications Discussion Thread

    Are there any numbers out there for the power overhead due to going with 6 chiplets for I/O?
  10. N

    GPU Ray Tracing Performance Comparisons [2021-2022]

    The SER API is public, you can download the SER SDK and use it right away: https://developer.nvidia.com/blog/improve-shader-performance-and-in-game-frame-rates-with-shader-execution-reordering/ SER in-depth whitepaper...
  11. N

    RDNA 2 Ray Tracing

    Not all implementations are the same :)
  12. N

    RDNA 2 Ray Tracing

    I wouldn't call a 16 entry stack a short stack. There a few papers out there showing you can get great perf with fewer than 7-8 entries, for instance: https://www.embree.org/papers/2019-HPG-ShortStack.pdf
  13. N

    AMD: RDNA 3 Speculation, Rumours and Discussion

    You don't go very far with pure brute force RT as the computational and bandwidth costs would be insanely high :)
  14. N

    AMD: RDNA 3 Speculation, Rumours and Discussion

    No they don't. Whether upscaling is used or not the tensor cores also predict how to best fuse together the output of the optical flow generator with the optical flow/motion vectors coming from the application. It's well described here...
  15. N

    Polygons, voxels, SDFs... what will our geometry be made of in the future?

    I am not sure I follow you here, but intersecting a triangle or an AABB should not be any worse, latency wise, than issuing a texture sampler instruction. AFAIR RDNA2 sends the ray and AABB/triangle data to the HW intersectors, so it's not like the latter have to fetch anything from memory...
Back
Top