Re: CELL + nVidia = shaded RTRT potential
nAo said:
PZ said:
Wouldn't the CELL + nVidia ROP provide the essential combination for this to happen?
APUs local memory is so small that one can't just pretend to run a plain RT implementation on them without issuing a lot of external memory references via DMA requests. That's the kind of things one doesn't want to happen on a stream processor.
Maybe an advanved RT implementation on CELL could exploit rays coherence thus grouping queries on some spatial subdivision structure to a void a lot of scattered memory read operations, otherwise A|SPUs built-in dma prefetch mechanism would be uneffective!
Yes you are right in that the APU memory is limited. But I am not ready to give up yet. Here are some reasons supporting the potential for a ray tracing form of rendering with a CELL + nVidia architecture:
1) As Jaws mentioned, GPU ray tracers have been around now as a proof of concept of ray tracing on a streaming processor. They also highlight just how wonky it is to use GPUs as ray tracing engines. William Dally talking about the Imagine Stream Processor from Stanford also explicitely mentioned ray tracing in a streaming architecture. I agree it is not that great a match for RT but it is a potential avenue.
2) There are opportunities for coherence in rays especially for rays with common origins (shooting multiple shadow rays at a point to multiple lights) or rays with common directions.
3) Hofstee in his video mentions global illumination (as I recall)
4) The paper I linked to above also includes some discussion about memory bandwidth and access patterns. They indicate that ray tracing does require pretty random access patterns but that this keep different banks of memory utilized fully instead of everyone clobbering one chunk.
Their own ray tracing pipeline requires 75 KB of memory per pipeline which could fit in an APU memory plus an overall cache to prevent triangles from being intersected multiple times. They have 64 threads going on at a time. They estimate bandwidth (worst case) of 2GB/s bandwidth for 1024x768x60Hz which is easy for Cell.
They mention that their 90Mhz FPGA is equivalent to an 8-12Ghz CPU and is about 3-5x GPU implementation because of its special purpose design. The Cell is not this special purpose but does have part of the streaming advantage along with the clockspeed advantage. As you keenly mentioned, the critical element is the special bits they added for keeping track of traced objects and rebuilding ray tracing object hierarchies as objects move. These kinds of things that are special purpose to their chip and which could not be efficiently ported over are the unknowns. If these unkowns are critical then yes the dream is over (for now).
Ray tracing is not sooo bad because it is completely demand driven. You are only processing pixels/triangles that contribute to the final scene and in that sense you are only asking of the memory what is relevant.
Once DeanoC wrote that a future PS4 and XBOX3 comparison would be based on external memory latencies and not raw flop/s figures..I couldn't agree more
One key performance figure I would like to know about A|SPUs is the time needed to switch thread and it would be nice to have some kind of support for fast registers saving and restoring, or fast registers banks switching.
A|SPUs flexibility is a given..one can do everything, the real problem is how to do everything keeping A|SPUs don't run idle most of the time..
Yes I think if you look at the supercomputer market you can see that Cray has made advances not just because of pure speed but because their data transfer systems in the machine are low latency (the lowest around).
I think also that Longhorn has some specific requirements about rendering latency and handling huge numbers of threads (1000s). I believe they want every icon to be in 3D and in a separate thread. Stuff like that must cause nVidia and ATI a lot of headaches.
Could this be why they seemingly won't use nVidia for the whole graphics pipeline so that they can leave this road open?
Why are you saying they
seemingly won't use nvidia IP for the whole pipeline? I don't think we have any data about or am I missing something here?
Your question is not unexpcted anyway, at some point GPU manufacters would have to introduce some very fast and flexible calculations core...maybe something along the lines of CELL architecture could be one of the answers and NVIDIA would like to use it.
Too bad I think that's not the case..I'm still skeptic about CELL stuff being into PS3 GPU
ciao,
Marco
I am just taking a hint from the patent which shows the Cell top side and Pixel Engine bottom side as well as an image from a patent (not sure which one) which shows the APUs set up as a vertex processing pipeline. Not an airtight case by any means but it's all we have so far.