Recent content by TimothyFarrar

  1. T

    GPU Ray-tracing for OpenCL

    Hello Jawed, I would not advise against vector types in general, but rather the best thing to do is to try various options and profile to see what works best in your given situation. In general, vector loads could be a disadvantage if they increase kernel register count and result in lower...
  2. T

    RV870 texture filtering

    The irony in all this is that where the filtering happens is NOT as important as that other number, 272 billion unfiltered 32-bit fetches per sec, under the assumption that the programmer can now bypass the filtering bottleneck using unfiltered texture fetches and directly hit L1 (general...
  3. T

    RV870 texture filtering

    Yeah that is part of what I was wondering about ... nearest point sampling of a 32-bit per pixel texture performance vs gather4 (or fetch4 or whatever the correct term is for DX11) performance?
  4. T

    RV870 texture filtering

    Anyone want to fully clarify what "texture interpolators" actual means? Besides the confusing Anandtech article, HardOCP says "There are now 80 texture units and the HD 5870 can perform 68 billion bilinear filtered texels per second and up to 272 billion 32-bit fetches per sec.", implying...
  5. T

    RV870 texture filtering

    Thanks! Hardware.fr measured triangle setup (tri/s) rate the same as R700. EDIT: same as in 1 tri/clock
  6. T

    RV870 texture filtering

    So vertex throughput as in triangle setup? Anyone see any test results that actually show peek triangle setup rates on the 5870?
  7. T

    Nvidia GT300 core: Speculation

    Seems to me that "Indexable registers" have been provided by what CUDA refers to as "shared memory" since the G80 chip.
  8. T

    NVIDIA shows signs ... [2008 - 2017]

    It sure is possible on current GPU hardware, and you would not use a linked list (a linked list is a painfully bad data structure for coherency and parallel processing on the CPU as well). No it is quite different. Programming a cluster is typically all about message passing and limiting...
  9. T

    NVIDIA shows signs ... [2008 - 2017]

    Are you sure about that? For example, seems like is an unspoken assumption here that because you haven't seen complex data structures on the GPU that somehow that isn't possible? IMO it is much easier to build a high performance multi-reader multi-writer atomic queue for the GPU than it is...
  10. T

    Centralized Texturing

    Yes memory is duplicated in TUs in both cases. The difference is the interconnect network needed in the distributed case. Jawed, I know you want to extend this topic into joint distributed TEX/RBE(ROP) units... which might make it easier to realize an advantage in the interconnect network...
  11. T

    Centralized Texturing

    Shouldn't this rather be called distributed texturing? Central anything = serial bottleneck. Isn't one of the primary advantages of having localized texture units that you don't have to have a huge on-chip interconnect to transfer around texture request packets. Texture requests have a large...
  12. T

    AMD: R8xx Speculation

    BTW, thanks for the ATI details!
  13. T

    AMD: R8xx Speculation

    That is exactly the point I was attempting to make! Right, I wasn't implying that tessellation creates a new problem. It will likely be more frequent however...
  14. T

    AMD: R8xx Speculation

    Yes order of writes arbitrary. However if read/write passes through RBEs, and if RBEs shared tiles (or cachelines, or whatever), they would have to later join the result of those accesses on the shared tiles (rough coherency, colliding write order doesn't matter, but cannot loose writes when...
  15. T

    AMD: R8xx Speculation

    Hello Jawed. If they went this way, would be interesting to see how they handle general RT read/write access (UAVs). Typically one would assume that RTs are divided into tiles and those tiles are distributed across the RBEs (ROPs) with a fixed mapping per RT. So raster stage just sends out...
Back
Top