ATI - PS3 is Unrefined

Laa-Yosh said:
And how cool it is to have some texture trashing in a PC game, right? ;)

What exactly do you mean by texture thrashing? Are you refering to textures beng copied back-and-forth between system RAM and VRAM? if so, the PS3 NUMA has being specifically designed to avoid this as it was stated by KK himself...
 
Jaws said:
Are you refering to textures beng copied back-and-forth between system RAM and VRAM?

Yep

if so, the PS3 NUMA has being specifically designed to avoid this as it was stated by KK himself...

I have no idea what he means with that...
 
Laa-Yosh said:
...
I have no idea what he means with that...

The solution to the texture thrashing problem that you're describing.

The difference with the PC architecture with PCI-e is that the CPU cannot address the GPUs VRAM, AFAIK. The VRAM will also hold a 'copy' of data from system RAM, e.g. texture data. With the PS3 using FlexIO and what KK described, this copy would be made redundant because the CELL could address VRAM and XDR and RSX could address XDR and VRAM...
 
Titanio said:
8 G70 vertex pipes, and 24 G70 pixel pipes comes out at ~255Gflops. That doesn't count the mini ALU in the pixel shaders or the FP normalise, on a positive note, or texture addressing, on a negative note. Xenos's count for 32-bit Gflops is closer to 210-220Gflops IIRC. It's just paper figures, of course.

Just for clarification, the Xenos figure is:

3 Arrays x 16 ALU/Array x 5 MACs per ALU x 2 Flops/MAC x .5 GHz = 240 GF/s.

Aaron Spink
speaking for myself inc.
 
aaronspink said:
Just for clarification, the Xenos figure is:

3 Arrays x 16 ALU/Array x 5 MACs per ALU x 2 Flops/MAC x .5 GHz = 240 GF/s.

Aaron Spink
speaking for myself inc.

It's 9 Flops/cycle per ALU x 48 ALUs x 0.5 GHz ~ 216 GFLOPS, 32 bit programmable.

As mentioned earlier it's from a recent MS leak and this site,

Article said:
At 16 basic units at 3 blocks and total 48 ????????. The floating point real number (FP) with the vector which is formed (SIMD) it is the image of the computing element as a substance. Vector operation of 4 elements (sum of products calculation) with scalar FP operation (1 element FP) simultaneously 1 cycle (the clock) with it is possible to do. Per 1 cycle (4 element ×2 operations +1 scalar operation) to become ×48=432flop, because Xbox 360-GPU is driven 500MHz, peak efficiency becomes 432*500MHz=216GFLOPS with just ????.

http://www.beyond3d.com/forum/showthread.php?t=20082
 
Jaws said:

Oh, that's obvious, but to me it's the other way around - it is more like a result of this design, but not really the goal. And in a console enviroment, developers usually manage their textures far better than on a PC anyway (see the GS in PS2 ;)
 
Jaws said:
The solution to the texture thrashing problem that you're describing...
How much data is likely to be spent on Textures in next-gen games? With the possibility of procedural bumps and 'dirt' and more complex shaders, are textures going to be that prominent? I'd like to hear estimates. I'd have thought five 1024x1024 is an average for most key objects, with less for simpler objects. Are these guesses unrealistic?
 
Laa-Yosh said:
Oh, that's obvious, but to me it's the other way around - it is more like a result of this design, but not really the goal. And in a console enviroment, developers usually manage their textures far better than on a PC anyway (see the GS in PS2 ;)

Here's my take on the timeline. Three major original design goals were a 1080p framebuffer and on a 65nm process to be released in 2006. The 65nm was a no-go for mass production and subsequently a 1080p framebuffer with eDRAM was not viable on 90nm. Hence no eDRAM but off-die VRAM more akin to PC's. However, FlexIO has much greater bandwidth than PCI-e hence this CELL<->RSX goal...
 
Jaws said:
It's 9 Flops/cycle per ALU x 48 ALUs x 0.5 GHz ~ 216 GFLOPS, 32 bit programmable.

Until I get a better source, and the one you're pointing to isn't, I'm going to go with Dave's breakdown with is 3x16x(4+1 MACs). This is 240 GFLOPS. So unless you can point me to an authorative source, I'm going with the original, and continually confirmed number of 240.

Aaron Spink
speaking for myself inc.
 
Titanio said:
8 G70 vertex pipes, and 24 G70 pixel pipes comes out at ~255Gflops. That doesn't count the mini ALU in the pixel shaders or the FP normalise, on a positive note, or texture addressing, on a negative note. Xenos's count for 32-bit Gflops is closer to 210-220Gflops IIRC. It's just paper figures, of course.

Apologies, I was only wacking away at pixel shading, despite my (mistaken) use of 8 vertex pipes in the sentence there too. (and was using 4 components with madd for flops/ALU*clock).Besides that, isn't the mini-ALU more for fixed function anyway? But adding all this up only increases the theoretical delta between itself (G70) and R520, for a smaller and smaller realworld gain. Which is why I avoided even going for 10 flops (still has a nasty taste in my mouth) per ALU or the partial-precision normalize.

Jaws said:
7800GTX, 430 MHz ~ 199 Gflops, 32 bit programmable

1800XT, 625 Mhz ~ 170 Gflops, 32 bit programmable

No, there is NO HUGE difference as you suggest with those numbers.

I was looking at just pixel shading. The latter would be 120Gflops (with the note that unless the mini-ALU can do more than add, 1/3 of that only comes into play occasionally). The former....165 again (with the case that half of it can be borrowed by texture duties, or useless in cases where the compiler can't issue another instruction to it before the primary ALU). I'm avoiding vertex shading because Huddy mentioned that you're rarely, if ever, vertex processing limited (and I don't think that's changed signficantly since then... and that was speaking of 6 X800 VS, IIRC). So those numbers are just distracting otherwise. Just while we're playing with these stupid numbers.

Xenos, 500 Mhz ~ 216 Gflops, 32 bit programmable*

Seems like a really good way to beef up complexity to make savings so that, what, one component can no longer madd? Sounds like extra copmiler work, scheduling, etc. to me, for little gain.
 
aaronspink said:
Until I get a better source, and the one you're pointing to isn't, I'm going to go with Dave's breakdown with is 3x16x(4+1 MACs). This is 240 GFLOPS. So unless you can point me to an authorative source, I'm going with the original, and continually confirmed number of 240.

Aaron Spink
speaking for myself inc.

The source is from an ATI technical presentation and also from a MS leaked doc.

Can you show me Dave's source?
 
Last edited by a moderator:
Shifty Geezer said:
How much data is likely to be spent on Textures in next-gen games? With the possibility of procedural bumps and 'dirt' and more complex shaders, are textures going to be that prominent? I'd like to hear estimates. I'd have thought five 1024x1024 is an average for most key objects, with less for simpler objects. Are these guesses unrealistic?

Don't count that much on procedurals. The most you can get with them is Pixar's 'Bug's Life' - but that uses shaders with megabytes of code.
Bitmaps are a must IMHO, and count on hundreds of MBs of texture data for nextgen games. I think textures will be the largest chunk, then sounds, then game world data, then code and geometry and animation data.
Think about it, most PC games can use 60-80MB of texture data per frame, not per scene; and on a console, you'll need to keep more than that in memory, because you can't stream from DVD while turning around.

And you'll need a lot of texture memory because for each surface, you'll want a color, a normal, and a specular map at least, with parallax mapping requiring an additional height map. And nextgen graphics means a lot more individual objects as well, not just larger poly counts and textures with a similarly empty scenery...
 
dukmahsik said:
so what does all these numbers really say? rsx > xenos?

Yes, but only by a little bit.

As to be expected from hardware which didn't tape until months after another piece of comprable hardware...
 
Shifty Geezer said:
How much data is likely to be spent on Textures in next-gen games? With the possibility of procedural bumps and 'dirt' and more complex shaders, are textures going to be that prominent? I'd like to hear estimates. I'd have thought five 1024x1024 is an average for most key objects, with less for simpler objects. Are these guesses unrealistic?
Lair uses ten 4096x4096 textures (presumably quite a lot of those are normal maps, specular too?) per dragon:

http://www.igda.org/sf/0510.htm

So that's 160MB blown.

Then you've got other kinds of maps, diffuse, ambient and baked-in lighting. Then you've got textures generated by rendering (e.g., for frame post-processing).

I don't see how texture usage is going to be light.

There should be a lot of procedural textures too - but they'd still have to go into memory, I assume.

Hopeful devs can talk about this without risking NDA.

Jawed
 
TurnDragoZeroV2G said:
...
I was looking at just pixel shading.

Yep, I know. I was just pointing out that 32bit programmable flops, taking into account peak issue rates, is not that different for both, complete GPUs.

TurnDragoZeroV2G said:
The latter would be 120Gflops (with the note that unless the mini-ALU can do more than add, 1/3 of that only comes into play occasionally). The former....165 again (with the case that half of it can be borrowed by texture duties, or useless in cases where the compiler can't issue another instruction to it before the primary ALU). I'm avoiding vertex shading because Huddy mentioned that you're rarely, if ever, vertex processing limited (and I don't think that's changed signficantly since then... and that was speaking of 6 X800 VS, IIRC). So those numbers are just distracting otherwise. Just while we're playing with these stupid numbers.

Just wanted to add that for G70, the second ALU is NOT completely blocked by texturing but no real confirmation,

http://www.beyond3d.com/forum/showthread.php?t=21746
 
dukmahsik said:
so what does all these numbers really say? rsx > xenos?

I'd stick to what you said along time ago, basically, it's what's on screen that counts.

This is a technical forum so accurate figures are important if there's a conflict.
 
Back
Top