I'll take it down a notch further than MDolenc, just in case you didn't quite understand.
GPUs are fast because they run the same shader program on many pixels at a time. Even before we actually called it a pixel shader, we had GPUs doing the same sequence of operations on hundreds of pixels at a time. If you had 5 steps in creating a pixel, they'd do the first step on hundreds of pixels, then the second, etc. until they're finished and can then start on the next batch of a hundred pixels with their first step.
"Dynamic branching" means that the shader program instructions depend on a condition, like this:
Code:
if (x > 0)
(run 5 instruction cycles)
otherwise
(run 20 different instruction cycles)
So you can see right away that you can't guarantee that every pixel will now follow the same sequence of instructions, as they depend on the condition of whether x>0.
If you can't run different instructions for a batch of pixels, you can still execute the code in the same step by step manner by just idling on the pixels that an instruction doesn't apply to. Basically if your batch has some pixels where x>0 and some that have x<=0, then you will take 25 cycles on each pixel plus however many cycles it takes to process the "if" structure.
"Branching granularity" tells you how big this batch is. If all the pixels in a batch have the same condition, then they only have to run through one set of instructions, and a smaller batch makes it more likely that all the pixels will have the same condition. Factors of 2 or 4 don't usually make that much difference in speed. It's when you have a factor of 10 or more, like with G7x compared to R5xx, that branching granularity can make huge differences in performance, like a X1600 doubling the speed of a 7900GTX.
However, you should note that branching is not very common in games at all, and only recently have we seen limited use. It requires major additions and changes in the hardware, and is a big reason that R5xx was so much bigger than G7x.