Bus inversion

mito

beyond noob
Veteran
from beyond's x1950xtx article

Bus inversion

GDDR4's design means that no power is consumed when sending a 1 down the wire, with the device consuming only on sending a 0 instead. On any I/O transaction with the device, if more than half the 8-bit bitfield that's being transferred is a zero, the field is inverted to introduce a majority of ones, and a inverted flag is set so the memory controller and device know how to decode. There's a transistor cost involved, but the cost is cheap compared to the power savings possible overall.

Is this bus inversion a trend? Is the quantity of 0s less than the quantity of 1s?
 
Dynamic Bus Inversions are used to dampen the power swings that occur on clock edges. They may reduce total power dissipation, but that's not their main purpose. Their main purpose is to reduce bus noise and to reduce the strain on the power delivery circuit(s) on die or on board.

Energy is not usually consumed(*) for just holding a potential difference across a wire. Energy is consumed when changing that potential difference.

If you know you'll end up togggling from 0->1 or 1->0 more than half your pins, you can instead toggle the smaller portion and send a signal to the other end saying "well, now I inverted everything."

For example, let's say you need to send an 8-bit number 5, followed by the 8-bit number 254 on a bus. With normal buses, you'd end up doing something like:
Code:
    5   -   00000101
        |
        |   *****1**  (intermediate state)
        v
  254   -   11111110

With DBI, you can do:
Code:
    5   -   00000101 + 0   <-- We add a wire to tell if we need to invert or not
        |
        |   00000*01 + *  (intermediate state)
        v
  254   -   00000001 + 1

In this example, you end up toggling just 2 pins instead of 7, saving 71% of the drain.

Of course, if the data going through the bus is effectively random, then you have a problem: Now you need to power 4.5 transistions/clock on average instead of 4, thus costing you more power, and putting more strain on your power distribution network. So you really should only be using this when you have a good idea how your command or values will change from one clock to the next.


(*) I'm ignoring leakage and resistive effects.
 
  • Like
Reactions: Geo
Of course, if the data going through the bus is effectively random, then you have a problem: Now you need to power 4.5 transistions/clock on average instead of 4, thus costing you more power, and putting more strain on your power distribution network. So you really should only be using this when you have a good idea how your command or values will change from one clock to the next.


(*) I'm ignoring leakage and resistive effects.
Actually, with this type of bus inversion with 8 data lines and 1 inversion line, you can keep the WORST-case transition rate down at 4 transitions per clock (instead of 8 without). The average-case switching is not very much affected by bus inversion, but the worst-case switching is halved.

The scheme that you describe is essentially the one used by the Pentium4 FSB. but it does sound somewhat different from the scheme described for GDDR4 here (which appears to be using an open-drain signalling scheme, and uses bus inversion to limit the potentially severe power consumption penalty of using such a scheme.)
 
Actually, with this type of bus inversion with 8 data lines and 1 inversion line, you can keep the WORST-case transition rate down at 4 transitions per clock (instead of 8 without). The average-case switching is not very much affected by bus inversion, but the worst-case switching is halved.

The scheme that you describe is essentially the one used by the Pentium4 FSB. but it does sound somewhat different from the scheme described for GDDR4 here (which appears to be using an open-drain signalling scheme, and uses bus inversion to limit the potentially severe power consumption penalty of using such a scheme.)

GDDR4 support two different types of bus inversions.

The DC variant that inverts based on the numbers of “0â€￾ that need to be transmitted.
The AC variant inverts based on the number of necessary signal switches.
 
Random thought:

Would there be other more advanced schemes that could improve this further? I'm thinking that on a wider bus, say 32bit or larger, another bit or two could be used to select between a number of different encodings to reduce the number of transitions even further. For instance with two bits:
00 - normal
01 - all inverted
10 - even bits inverted
11 - odd bits inverted

Or something similar that may make more sense.
 
I was thinking along the lines of you Humus. I think it might be good for bytes of 16 bits or more though.

edit: Fixed redundancy.
 
Last edited by a moderator:
Random thought:

Would there be other more advanced schemes that could improve this further? I'm thinking that on a wider bus, say 32bit or larger, another bit or two could be used to select between a number of different encodings to reduce the number of transitions even further. For instance with two bits:
00 - normal
01 - all inverted
10 - even bits inverted
11 - odd bits inverted

Or something similar that may make more sense.

Splitting the bits into even/odd like that is not going to help very much - you are in effect just splitting your N-bit bus into two N/2-bit buses as far as the bus inversion is concerned, and if you are going to do that, it's probably better to keep each of the N/2-bit buses grouped separately rather than interlaced in an even/odd pattern (if all the inversions of a clock cycle e.g. hit the upper half of the two buses at the same time, the interlaced bus will suffer a substantially nastier spike than the separately-bundled buses). It may be possible to exploit known patterns in the data that you are going to read/write to do more "intelligent" bus inversion, but in that case it is probably better to use those patterns to just compress the data instead.
 
Back
Top