Xenos as Physics Processor?

Nemo80 said:
I don't know which ruby you are talkng about, but the one i know ran on my 9800 at about 15 fps, so i wouldn't expect anything less.

Anyways, it's DX9 all over, with likely zero CPU utilization, so porting a simple demo using the same API is not a big deal.
The Ruby Assassin video, shown at E3 and recently in a Tokyo demo. This Ruby is the high end Ruby demo for the R520.

As for CPU utilization, that is the point. GPUs are doing the graphics work--so the CPU can do game stuff. Take a look at these slides. They go into full detail how the graphics were done on the GPU.

Anyhow, what is your point? You first say Xenos is not as powerful as R520, I demonstrated that it runs the R520 demos fine. Also, Xenos has a custom API, Similar to DX9 but not the same.
 
This to me looks like a general industry-wide approach that's dividing labour into conventional procesing and float intensive tasks. We've got PPU, Cell, and GPU's, and XeCPU's enhanced float performance, to run concurrently witha conventional processing backbone. nVidia's talk of heading the same way as Sony, and ATi talking of using a 'GPU' for other tasks sets up a future I believe with your typical x86 processor and a float monster card. There'll be PPUs, GPU's, Cell addons, and goodness knows what else. As they all share a common task the complexity of using several specialist chipsets that gain performance at cost of versatility seem not to be cost effective. And this in my mind leads to the principle of a specialist hardware free future where everything's achieved with insane amounts of programmable performance rather then efficient custom hardwares.

I was thinking the other day whether consoles would benefit from more custom chips as the Amiga did. Could a better machine be constructed with a few extra simple processors that handled certainly workloads very well? Back in it's day Amiga's approach eclipsed the one-powerful-processor model of the PC, and then the PC pulled ahead. And then the PC was rekitted out with custom processors, a CPU, GPU, AudioPU. And it seems the future will go back to monolithic powerhouses. I wonder if it'll seesaw back and forth between complex multichip custom hardwares and uniform programmable hardwares, or if once we next get to a monolithic solution (if that ever really happens) it'll stay that way?
 
Acert93 said:
The Ruby Assassin video, shown at E3 and recently in a Tokyo demo. This Ruby is the high end Ruby demo for the R520.

hm, never seen that one before. But it does look quite odd to me, nothing to be excited about. Actually i think i've seen better looking ingame graphics in some PC games.
 
The reasons you give are the same ones I think, IMO, that we wont see too many addons cards/chips in the future. PCs were more diverse when they average PC cost well over USD$2,000 or more. You will have a hard time convincing OEMs and whitebox makers that including more chips, which is more expensive, to do these tasks. Margins are tights and splitting the money even more ways hurts more companies.

If an existing part, be it CPU or GPU, can make themselves more useful for these tasks I think they will win out in the main consumer market (even if they stink, comparatively). Being the first to have a mainstream solution pretty much means you win. e.g. CELL is great at floats, but at 250M transistors you are talking about adding a lot of cost, power consumption, and heat. A dual purpose device, be it an Intel/AMD CPU or NV/ATI GPU would be more functional on the whole for most users. And of course it is the big companies who control the product channels. Intel has it on easy street because they can push out a new technology pretty easily. NV and ATI are next in that they are in 40% of PCs and with the move toward Vista they should be looking at explosive growth and have SIGNIFICANTLY more impact on PC performance and expansion in the future.

The Sound Card market is a good example, IMO, where inferior general parts and offloading work onto the CPU spelled the doom for that market. It was a growing pain, but right now I would say MOST gamers are using integrated sound. The difference is nominal, and the savings in cost (last time I checked ~$75 for a decent up to date Sound Card; $200+ for the new technology) and the performance is negligable. And of course developers are shunning / under utilizing the Sound Cards (read: Creative) because, well, they are not standard. Yeah they are nice, but why spend a lot of time and money when most people wont notice, therefore wont buy your product based on that reason?

This is why I think PPUs will fail. They are an addon card. Expensive at that. Further, they are only usable in certain situations (games that support them) which is a chicken or the egg approach.

So personally, and just my opinion (which is obviously slanted from watching the PC sector since '86) is that due to the importance of sub-$1000 PCs that any technology to make a splash and significant impact needs to 1. control cost and 2. maximize pentration and 3. having a part that is usefull as often as possible (either being dual use like Dual GPU that can do both tasks OR multifunctional units be incorperated into an existing product to lower costs of redundancy, e.g. CPUs or GPUs that have specialized units or can spend resources doing other side tasks at the cost of the primary goal).

The only players, imo, who can do that are Intel, AMD, NV, and ATI. I think ATI/NV have an inside track due to the fact the overlap in market is EXCELLENT and their frequent product updates puts them in a good position for this. I don't see or hear anything from Intel or AMD indicating they will have a solution coming soon.

So the solutions we may see may not be excellent at first, but I think proprietary designs that are offered through addon devices are a hard sell, even for CELL & PPUs in the PC space. Yeah, bad pun ;)
 
3roxor said:
Based on what..? Looking at the 360 games it draws a different picture..
Xenos has twice the pipeline horsepower - roughly.

That's just the start of it.

Jawed
 
Nemo80 said:
hm, never seen that one before. But it does look quite odd to me, nothing to be excited about. Actually i think i've seen better looking ingame graphics in some PC games.

I'd be interested to know which ones, i thought i had seen them all! :)

J
 
Titanio said:
In terms of the computational model, they're SM3.0+. Really SM3.0 upwards + better precision everywhere is the significant breakthrough in terms of GPGPU imo - though yeah, not necessarily just the min DX9 spec (SM2.0 etc.).
Don't forget MEMEXPORT.

The biggest complaint the GPGPU guys have is about scatter - writing to arbitrary locations in memory. MEMEXPORT solves that.

Scatter is implemented in BrookGPU, but it's a bit kludgy by comparison with MEMEXPORT.

Jawed
 
Jawed said:
Don't forget MEMEXPORT.

I'm not ;) Just highlighting the intro of SM3.0 hardware as a tipping point toward a more favourable model for computation beyond graphics. That's obviously being built upon, memexport being a very notable feature.
 
Jawed said:
Don't forget MEMEXPORT.

The biggest complaint the GPGPU guys have is about scatter - writing to arbitrary locations in memory. MEMEXPORT solves that.

Scatter is implemented in BrookGPU, but it's a bit kludgy by comparison with MEMEXPORT.

Jawed
I've a question on MEMEXPORT - won't that slow down the whole process? GPU's are so fast because of their wide SIMD architecture working on fast busses relative to the CPUs. If you go outside of that and have ALU's writing data to various different memory addresses, won't you hang up the memory accessing somewhat, banging into those darned slow memory latencies? And in the case of XB360, wouldnt GPGPU work on RAM be as slow in memory accesses as the XeCPU, in which case why would Xenos be better suited then XeCPU's VMX units when working with random memory accessing?
 
Shifty Geezer said:
I've a question on MEMEXPORT - won't that slow down the whole process? GPU's are so fast because of their wide SIMD architecture working on fast busses relative to the CPUs. If you go outside of that and have ALU's writing data to various different memory addresses, won't you hang up the memory accessing somewhat, banging into those darned slow memory latencies?

Well you might be able to switch to a "non-blocked" thread, perhaps, in the meantime. But the opportunity for doing so may not be guaranteed the more of this kind of work you do (work accessing main memory regularly).

Shifty Geezer said:
And in the case of XB360, wouldnt GPGPU work on RAM be as slow in memory accesses as the XeCPU, in which case why would Xenos be better suited then XeCPU's VMX units when working with random memory accessing?

The CPU has many advantages, but the parallelisation offered by a GPU is what would be attractive from a pure horsepower perspective. But obviously you'd need to tailor to fit the architecture and avoid memory access as much as possible, really. There has been the suggestion that Xenos is relatively light on cache compared even to GPUs generally, which is another thing you might have to be careful about from that perspective.

Does anyone have more detail on memexport? Like how many memexport "requests" you can have going at once, perhaps? It deals with vectorised data, correct? Is the granularity a 4-component vector of data, or..?
 
Last edited by a moderator:
What differentiates a GPGPU application from a CPU trying to perform the same kinds of computations is how good a fit the data (quantity) and vector computing (GFLOPs) fits the streaming application style supported by a GPU.

MEMEXPORT is not really meant for bulk writing to memory, as far as I can tell.

One of the problems in GPGPU applications is saving the state of computations for reference either on a later pass of the algorithm or for a different phase of processing.

The current approach, which requires the creation of triangles to be written into render-targets to save this data away, is just a serious bottleneck - analogous to the bottleneck created by particle effects.

So MEMEXPORT is not so much a high-bandwidth data export path (that's what the conventional pixel output is for, writing to textures which are then read back by the CPU) as a random-access intermediate data path.

Jawed
 
This as been hinted by several people (Dave, DeanoC, ATI), so I think that it is a safe bet saying that some devs will use it for maore than gfx, and because IMO gfx with the power of Xenus are done, I think that every effort in power and features for physics are a good thing, as a lot of people already said GPUs will be general processors but with differents performances depending on the task.
 
Jawed said:
Xenos has twice the pipeline horsepower - roughly.

That's just the start of it.

Jawed

How does it have 2x the power?
Can you please elaborate???
The only thing I remeber is that the Xenos can write longer shader code than the R520.
But the R520 can issue 2 vertices per cycle while the xenos can only issue 1.
 
Jawed said:
Xenos has twice the pipeline horsepower - roughly.

That's just the start of it.

Jawed

I'm confused. the 1800xt is a 16 pipeline card, which is supposed to compete with the 7800GTX, a 24 pipeline card, whose pipes are apparently equally as powerful as the 48 unified shaders in the Xenos? Also, aren't nVidia going to release the 7800 Ultra, a possible 32 pipeline card?

Wouldn't all this put the ATi at a huge disadvantage to nVidia on the PC GPU scene?
 
R520 has 16 pixel shader pipes and 8 vertex shader pipes - 24 total. Xenos has 48 pipes of roughly the same capability (but running at 500MHz instead of 625MHz).

But Xenos has other serious efficiency gains, such as the unified architecture, and the "unstoppable" ROPs (EDRAM).

So, overall, Xenos should be around twice as fast+ as R520.

I don't understand why people think 7800GTX/RSX is in the same ballpark as Xenos.

Jawed
 
So, overall, Xenos should be around twice as fast+ as R520.

As an admitted Xbox ******, I'd love to believe that. However, I just dont see it.

Each pipe in Xenos is one ALU. Each in R520 is two ALU's (at least). I dont see how they're comparable (but hoping they are).

I just have a hard time believing ATI could top a 320m transistor part (R520) with a 232m part. If they could, why the hell wouldn't they do that on the desktop!!?? You dont need the EDRAM there.

Plus, X360 games so far are simply not mindblowing. I understand there are lots of reasons for that, and I'm hoping they get better, but...

If anything, the way 7800 GTX chews through PC games, I think Xenos has a tall order ahead. Hardly anything out even dents the 7800 below 100 FPS without AA. And frankly, I think AA will be useless on consoles. The audience just wont care that much.
 
Back
Top