Branching Granularity

sauron

Newcomer
Hi to everybody.
I've here my first GPU beginner question: what does the expression "dynamic branching" and/or "branching granularity" mean?
I've read something in this forum but I've not yet a clear idea about what these expressions really mean.
Could you explain for example why G80 has 32pixel/16vertex of granularity (with 128sp) and R600 64pixel/64vertex with 64sp?
Thanks in advance:smile:
 
Dynamic branching means that you branch on variables that can change per pixel for example "if (color.a > 0.5f) {...} else {...}". For static branching you would branch on variables that are constant for example "if (DEFINED_DOLIGHTING) { doLighting(); }".

Branching granularity is connected to the way how exactly branching works on GPU. Since you always have many pixels(/vertices/gs primitives) in flight you need a good way to manage how shader executes on different pixels. The way GPU does this is that it tags each pixel in a block where it will branch. Then it feeds instructions on entire block for "if part" and then for "else part". Of course if there are no pixels tagged as "else part" (or "if part") it won't execute that part on that block of pixels. That's why granularity is important. Smaller blocks mean better branching when neighbouring pixels take different paths at branches.
 
I'll take it down a notch further than MDolenc, just in case you didn't quite understand.

GPUs are fast because they run the same shader program on many pixels at a time. Even before we actually called it a pixel shader, we had GPUs doing the same sequence of operations on hundreds of pixels at a time. If you had 5 steps in creating a pixel, they'd do the first step on hundreds of pixels, then the second, etc. until they're finished and can then start on the next batch of a hundred pixels with their first step.

"Dynamic branching" means that the shader program instructions depend on a condition, like this:
Code:
if (x > 0)
    (run 5 instruction cycles)
otherwise
    (run 20 different instruction cycles)
So you can see right away that you can't guarantee that every pixel will now follow the same sequence of instructions, as they depend on the condition of whether x>0.

If you can't run different instructions for a batch of pixels, you can still execute the code in the same step by step manner by just idling on the pixels that an instruction doesn't apply to. Basically if your batch has some pixels where x>0 and some that have x<=0, then you will take 25 cycles on each pixel plus however many cycles it takes to process the "if" structure.

"Branching granularity" tells you how big this batch is. If all the pixels in a batch have the same condition, then they only have to run through one set of instructions, and a smaller batch makes it more likely that all the pixels will have the same condition. Factors of 2 or 4 don't usually make that much difference in speed. It's when you have a factor of 10 or more, like with G7x compared to R5xx, that branching granularity can make huge differences in performance, like a X1600 doubling the speed of a 7900GTX.

However, you should note that branching is not very common in games at all, and only recently have we seen limited use. It requires major additions and changes in the hardware, and is a big reason that R5xx was so much bigger than G7x.
 
Why is that in general? Aren't more modern shadowing techniques very branchy?

DB performance was pretty poor on all pre-G8x cards (spanning two generations of nV SM3.0 cards), and then there were all those R4xx series of cards that do not support DB (SM2.0x) delaying the uptake of SM3.0. Just too many things to support, I suppose.
 
Back
Top