xbox360 gpu explained...or so

Perhaps they mean 4 ops being MUL4, ADD4, MUL1 and ADD1.


Jawed said:
192 ops per cycle :D in NVidia terminology. :LOL:
Which NVidia terminology is that? If you counted vector components separately, that would be 158 ALU ops already for NV40.
 
What's confusing me is that he says there are 4ALU's per shader(48shaders). It sounds like independent units somehow. It's probably just a very bad choice of words but still.
 
Xmas said:
Perhaps they mean 4 ops being MUL4, ADD4, MUL1 and ADD1.


Jawed said:
192 ops per cycle :D in NVidia terminology. :LOL:
Which NVidia terminology is that? If you counted vector components separately, that would be 158 ALU ops already for NV40.

Precisely. It's really pointless this stuff aint it?

Jawed
 
Is that how nV arrived at 136? (24 pipes * 4 components) + (10 vertex shaders * 4 components) = 136?
 
according the slide that nvidia shown on E3, there is 1 tex addressing unt + 2 Pixel Shader ALUs + 2 SFUs in one Pixel Shader, that is mean 5~7(the ALUs maybe support dual/co-issue ) shader ops per pixel shader unit.

and we know the NV4X/R3XX/R4XX's vertex shader is a 4d(FP32)+1d(FP32) structure, it is possible that the RSX's VS to continue this structure. so there are 2 shader ops per vertex shader unit.
 
Hyp-X said:
Wow 4 * 48 = 196 :?:
He stated it multiple times - so it must be true...

It's 96 Shader ops per cycle.

48-way vect4 units
48-way scalar units

*official* spec from MS.
 
firingsquad said:
ATI: We have 48 shaders. And each shader, every cycle can do 4 floating-point operations, so that gives you 196.

techreport said:
On chip, the shaders are organized in three SIMD engines with 16 processors per unit, for a total of 48 shaders. Each of these shaders is comprised of four ALUs that can execute a single operation per cycle, so that each shader unit can execute four floating-point ops per cycle

anandtech said:
ATI was very light on details of their pipeline implementation on the 360's GPU, but we were able to get some more clarification on some items. Each of the 48 shader pipelines is able to process two shader operations per cycle (one scalar and one vector), offering a total of 96 shader ops per cycle across the entire array.

xbox360.ign.com said:
Given the Xbox 360 GPU's multithreading and balanced design, you really can't compare the two systems in terms of shading operations per clock. However, the Xbox 360's GPU can do 48 ALU operations (each can do a vector4 and scalar op per clock), 16 texture fetches, 32 control flow operations, and 16 programmable vertex fetch operations with tessellation per clock for a total of 48*2 + 16 + 32 + 16 = 160 operations per cycle or 160 * 500 = 80 GOps per second.

Another interesting tidbit...

xbox360.ign.com said:
Lastly, we were sent updated spec numbers on the Xbox's numbers, and we spoke with Microsoft's Vice President of hardware, Todd Holmdahl, about the Xbox 360's final transistor count.

Another bit of information sent our way is the final transistor count for Xbox 360's graphics subset. The GPU totals 332 million transistors, which is spit between the two separate dies that make up the part. The parent die is the "main" piece of the GPU, handling the large bulk of the graphics rendering, and is comprised of 232 million transistors. The daughter die contains the system's 10MB of embedded DRAM and its logic chip, which is capable of some additional 3D math. The daughter die totals an even 100 million transistors, bringing the total transistor count for the GPU to 232 million.

Xbox360 vs PS3

http://xbox360.ign.com/articles/617/617951p1.html
 
rwolf said:
firingsquad said:
ATI: We have 48 shaders. And each shader, every cycle can do 4 floating-point operations, so that gives you 196.

That's 196 Flops per cycle and NOT 196 Shader ops per cycle.

This would suggest

196 Flops*500Mhz ~ 98 GFLOPS for Xenos.

This is underpowered and the 196 numnber is their mis-understanding. This 196 Flops per cycle is not correct.



rwolf said:
techreport said:
On chip, the shaders are organized in three SIMD engines with 16 processors per unit, for a total of 48 shaders. Each of these shaders is comprised of four ALUs that can execute a single operation per cycle, so that each shader unit can execute four floating-point ops per cycle

As above. It's a misunderstanding.

It should be,

48-way vec4 ~ 48*4 ~ 192 Flops per cycle

192* 2FMADD~ 384 Flops per cycle

48-way scalar ~ 48 Flops per cycle

48*2FMADD~ 96 Flops per cycle

Xenos ~ 384 + 96 ~ 480 Flops per cycle

Xenos @ 500 MHz ~ 480*0.5 GHz ~ 240 GFlops

They have mixed up numbers and units as they are similar.

MS spec,

48-way vec4
48-way scalar

96 Shader ops per cycle.

EDIT:

rwolf said:
anandtech said:
ATI was very light on details of their pipeline implementation on the 360's GPU, but we were able to get some more clarification on some items. Each of the 48 shader pipelines is able to process two shader operations per cycle (one scalar and one vector), offering a total of 96 shader ops per cycle across the entire array.

This is correct.
 
I would like to know how fast the enhanced DRAM is? 500MHz or only 333MHz.

with 2Terabit bandwidth and 32pixels on the fly at all time this would be 128bit/pixel when the eDRAM works at 500MHz which would be good enough. But I suspect that the 2Terabit/sec are an bidirectional figure simply because all the other internal busses are also counted that way, which would leave only 64bit/pixel @ 500MHz, certainly not enough for anything more than plain Z+S and R+G+B+A.

Has someone more informations about the eDRAM-chip?


[edit]
To make it clearer:
IMHO 128bit/pixel seems a little bit too much for me, on the other side 64bit/pixel are not enough (IMHO). Therefore I thought that 96bit @ 333MHz could just be the right spot to manage HDR and AA too. By the way, 1280x720x96bit = ~ 10MB

Does someone agree? :D
 
2 Tb (or is it Tib? probably something in between...) per second is an absolute value, whether it's running at 500MHz or 333MHz.

And I still don't see what they need this incredible bandwidth for. If the chip can only write 8 pixels/clock, what do they need 32 bytes per pixel (read and write) for?
 
That 2Tb figure is from the interface running at 2GHz. I've yet to establish of the entire chip is running at 2GHz, the memory and the interface or just the interface. Naturally if th entire thing is running at 2GHz that is going to have a considerable impact on fill-rate.
 
DaveBaumann said:
That 2Tb figure is from the interface running at 2GHz. I've yet to establish of the entire chip is running at 2GHz, the memory and the interface or just the interface. Naturally if th entire thing is running at 2GHz that is going to have a considerable impact on fill-rate.
How, if the GPU itself is limited to outputting 2 quads with 32bit color/4 Z quads?
 
Xmas said:
How, if the GPU itself is limited to outputting 2 quads with 32bit color/4 Z quads?

Why do you say the GPU is is limited to two quads? As far as I can tell the shader core is actually operating on 8 quads (when dealing with pixels). However, when doing Z/Stencil passes most of the shader core will be dedicated to geometry processing and will be wanting to output many quads to the ROP/Memory unit.
 
So, is it right to assume that ATI has basically split a somewhat traditional GPU into two parts?
They've placed the parts that need the most bandwith onto the chip with the EDRAM; and separated it at the point where bandwith needs are lower, right?
 
DaveBaumann said:
Xmas said:
How, if the GPU itself is limited to outputting 2 quads with 32bit color/4 Z quads?

Why do you say the GPU is is limited to two quads? As far as I can tell the shader core is actually operating on 8 quads (when dealing with pixels). However, when doing Z/Stencil passes most of the shader core will be dedicated to geometry processing and will be wanting to output many quads to the ROP/Memory unit.

Why 8 quads? With 16 SIMD channels I am count only 4 quads.

If I understand the blockdigramm right there is no way to get pixel from the scan converter to pipe comm. This means that in Z/Stencil passes the data need to go to every shader pipe.
 
Laa-Yosh said:
So, is it right to assume that ATI has basically split a somewhat traditional GPU into two parts?
They've placed the parts that need the most bandwith onto the chip with the EDRAM; and separated it at the point where bandwith needs are lower, right?
Pretty much, yes.
 
Back
Top