Recent content by TimothyFarrar

T
GPU Ray-tracing for OpenCL

Hello Jawed, I would not advise against vector types in general, but rather the best thing to do is to try various options and profile to see what works best in your given situation. In general, vector loads could be a disadvantage if they increase kernel register count and result in lower...
- TimothyFarrar
- Post #29
- Jan 3, 2010
- Forum: Rendering Technology and APIs
T
RV870 texture filtering

The irony in all this is that where the filtering happens is NOT as important as that other number, 272 billion unfiltered 32-bit fetches per sec, under the assumption that the programmer can now bypass the filtering bottleneck using unfiltered texture fetches and directly hit L1 (general...
- TimothyFarrar
- Post #62
- Sep 25, 2009
- Forum: Architecture and Products
T
RV870 texture filtering

Yeah that is part of what I was wondering about ... nearest point sampling of a 32-bit per pixel texture performance vs gather4 (or fetch4 or whatever the correct term is for DX11) performance?
- TimothyFarrar
- Post #58
- Sep 25, 2009
- Forum: Architecture and Products
T
RV870 texture filtering

Anyone want to fully clarify what "texture interpolators" actual means? Besides the confusing Anandtech article, HardOCP says "There are now 80 texture units and the HD 5870 can perform 68 billion bilinear filtered texels per second and up to 272 billion 32-bit fetches per sec.", implying...
- TimothyFarrar
- Post #53
- Sep 25, 2009
- Forum: Architecture and Products
T
RV870 texture filtering

Thanks! Hardware.fr measured triangle setup (tri/s) rate the same as R700. EDIT: same as in 1 tri/clock
- TimothyFarrar
- Post #51
- Sep 24, 2009
- Forum: Architecture and Products
T
RV870 texture filtering

So vertex throughput as in triangle setup? Anyone see any test results that actually show peek triangle setup rates on the 5870?
- TimothyFarrar
- Post #46
- Sep 24, 2009
- Forum: Architecture and Products
T
Nvidia GT300 core: Speculation

Seems to me that "Indexable registers" have been provided by what CUDA refers to as "shared memory" since the G80 chip.
- TimothyFarrar
- Post #1,994
- Sep 6, 2009
- Forum: Architecture and Products
T
NVIDIA shows signs ... [2008 - 2017]

It sure is possible on current GPU hardware, and you would not use a linked list (a linked list is a painfully bad data structure for coherency and parallel processing on the CPU as well). No it is quite different. Programming a cluster is typically all about message passing and limiting...
- TimothyFarrar
- Post #1,221
- Sep 3, 2009
- Forum: Graphics and Semiconductor Industry
T
NVIDIA shows signs ... [2008 - 2017]

Are you sure about that? For example, seems like is an unspoken assumption here that because you haven't seen complex data structures on the GPU that somehow that isn't possible? IMO it is much easier to build a high performance multi-reader multi-writer atomic queue for the GPU than it is...
- TimothyFarrar
- Post #1,216
- Sep 3, 2009
- Forum: Graphics and Semiconductor Industry
T
Centralized Texturing

Yes memory is duplicated in TUs in both cases. The difference is the interconnect network needed in the distributed case. Jawed, I know you want to extend this topic into joint distributed TEX/RBE(ROP) units... which might make it easier to realize an advantage in the interconnect network...
- TimothyFarrar
- Post #5
- Aug 25, 2009
- Forum: Architecture and Products
T
Centralized Texturing

Shouldn't this rather be called distributed texturing? Central anything = serial bottleneck. Isn't one of the primary advantages of having localized texture units that you don't have to have a huge on-chip interconnect to transfer around texture request packets. Texture requests have a large...
- TimothyFarrar
- Post #3
- Aug 25, 2009
- Forum: Architecture and Products
T
AMD: R8xx Speculation

BTW, thanks for the ATI details!
- TimothyFarrar
- Post #1,769
- Aug 18, 2009
- Forum: Architecture and Products
T
AMD: R8xx Speculation

That is exactly the point I was attempting to make! Right, I wasn't implying that tessellation creates a new problem. It will likely be more frequent however...
- TimothyFarrar
- Post #1,766
- Aug 18, 2009
- Forum: Architecture and Products
T
AMD: R8xx Speculation

Yes order of writes arbitrary. However if read/write passes through RBEs, and if RBEs shared tiles (or cachelines, or whatever), they would have to later join the result of those accesses on the shared tiles (rough coherency, colliding write order doesn't matter, but cannot loose writes when...
- TimothyFarrar
- Post #1,751
- Aug 18, 2009
- Forum: Architecture and Products
T
AMD: R8xx Speculation

Hello Jawed. If they went this way, would be interesting to see how they handle general RT read/write access (UAVs). Typically one would assume that RTs are divided into tiles and those tiles are distributed across the RBEs (ROPs) with a fixed mapping per RT. So raster stage just sends out...
- TimothyFarrar
- Post #1,739
- Aug 18, 2009
- Forum: Architecture and Products