How to calculate theoretical max throughput in GB/s?

OpenGL guy · Dec 29, 2009

DavidGraham said:
Also you can try to extrapolate from the previous texels/pixels/triangles/flops data , by knowing the word size for each one of them , for example : pixels can be of 32-bit size (they are 128-bit in HDR), vertices can be 64 or even 128-bit , texels could be the same , and FLOPS are usually 64-bit in size .

All calculations are done at 32-bits per component in the shaders, not 64-bits. Double precision is available in DX11, but rarely used.

DavidGraham · Dec 29, 2009

3dilettante said:
If you mean the physical sections of a register file that store an operand are of a fixed length, yes. They are physically set in silicon.
A FLOP, however, is not a physical object.
For example, a single-precision FLOP is going to use 32-bit operands, while double-precision will use 64-bit operands.
Internally, a GPU would treat two separate 32-bit register locations as a combined 64-bit value.
A floating-point ADD or MUL takes 2 operands, and two of them would take 4.
However, the general usage of FLOPs counts an FMADD as two FLOPs even though it takes only 3.

These are amazing pieces of information , thank you a lot !

A software-visible "register" is not necessarily physically represented in a 1:1 manner.
Cypress uses 4x32-bit registers, which can be parceled out as individual 32-bit operands, subject to a lot of restrictions.

What do you mean it is subject to a lot of restrictions , and is it true that the entire large Cypress ship has only 4 registers ?! that is a very low number ? I remember Pentium having about a hundred or so registers !

Some items may be a certain size at one point in the pipeline and have a different size altogether later, depending on factors like graphics mode, format, and what kind of work is being done.

Could you please , give me some examples of such case ?!

DavidGraham · Dec 29, 2009

OpenGL guy said:
All calculations are done at 32-bits per component in the shaders, not 64-bits. Double precision is available in DX11, but rarely used.

Good to know , thanks for correcting me , I am sorry for the mistake , I guess that is why they invented floating point , it expands greatly upon the regular integers .

3dilettante · Dec 29, 2009

DavidGraham said:
What do you mean it is subject to a lot of restrictions , and is it true that the entire large Cypress ship has only 4 registers ?! that is a very low number ? I remember Pentium having about a hundred or so registers !

Cypress has registers that are 4x32 bits wide. Each 1/4 section can be the source of a 32-bit operand, but the logic behind which ones can be used in a given clock cycle is outlined in AMD documents concerning the RV770 ISA. Certain combinations are needed to net the maximum number of operands per clock.
The total number of registers in Cypress is massive, with 5 MiB in aggregate (though it is split up into many local banks, a single thread can only address 128).

Could you please , give me some examples of such case ?!

There are a bunch of threads in the beginners section that outline a lot of resources. I'm more comfortable with the hardware-level side than I am with the state machine GPUs emulate for the graphics pipeline.
Different software states and options can wildly influence how much data the GPU is going to be passing around, and much of what it does pass around is not visible to the outside world.

It is possible for certain shader types to amplify data and it is possible for the GPU to reject data outright. Formats and attributes can change which parts of the GPU get used and how much it stores at a given instant. A single triangle submitted to the GPU can result in a variable number of internal resources being allocated depending on what is going to happen to it.

Here's one thread that has some nice links.
http://forum.beyond3d.com/showthread.php?t=55568

trinibwoy · Dec 29, 2009

Frontino said:
Is there a way to know the datapath width of every single component of the chip, so I can make the math by myself?

Unless you designed the chip yourself? No. The best you can hope for is to get numbers for the caches and other local memory.

Tahir2 · Dec 30, 2009

I believe what you are asking for would be a programmers dream and something akin to performance analysers.

I remember the Atari Jaguar had some ridiculous claims..e.g "

"Rendering" up to 850 million one-bit pixels/second"

How to calculate theoretical max throughput in GB/s?

OpenGL guy

DavidGraham

DavidGraham

3dilettante

trinibwoy

Meh

Tahir2

Similar threads