Bus inversion

mito · Aug 24, 2006

from beyond's x1950xtx article

Bus inversion

GDDR4's design means that no power is consumed when sending a 1 down the wire, with the device consuming only on sending a 0 instead. On any I/O transaction with the device, if more than half the 8-bit bitfield that's being transferred is a zero, the field is inverted to introduce a majority of ones, and a inverted flag is set so the memory controller and device know how to decode. There's a transistor cost involved, but the cost is cheap compared to the power savings possible overall.

Is this bus inversion a trend? Is the quantity of 0s less than the quantity of 1s?

Humus · Aug 24, 2006

mito said:
Is the quantity of 0s less than the quantity of 1s?

Well, assuming random data it will happen about half the time.

Bob · Aug 24, 2006

Dynamic Bus Inversions are used to dampen the power swings that occur on clock edges. They may reduce total power dissipation, but that's not their main purpose. Their main purpose is to reduce bus noise and to reduce the strain on the power delivery circuit(s) on die or on board.

Energy is not usually consumed(*) for just holding a potential difference across a wire. Energy is consumed when changing that potential difference.

If you know you'll end up togggling from 0->1 or 1->0 more than half your pins, you can instead toggle the smaller portion and send a signal to the other end saying "well, now I inverted everything."

For example, let's say you need to send an 8-bit number 5, followed by the 8-bit number 254 on a bus. With normal buses, you'd end up doing something like:

Code:

    5   -   00000101
        |
        |   *****1**  (intermediate state)
        v
  254   -   11111110

With DBI, you can do:

Code:

    5   -   00000101 + 0   <-- We add a wire to tell if we need to invert or not
        |
        |   00000*01 + *  (intermediate state)
        v
  254   -   00000001 + 1

In this example, you end up toggling just 2 pins instead of 7, saving 71% of the drain.

Of course, if the data going through the bus is effectively random, then you have a problem: Now you need to power 4.5 transistions/clock on average instead of 4, thus costing you more power, and putting more strain on your power distribution network. So you really should only be using this when you have a good idea how your command or values will change from one clock to the next.

(*) I'm ignoring leakage and resistive effects.

arjan de lumens · Aug 24, 2006

Bob said:
Of course, if the data going through the bus is effectively random, then you have a problem: Now you need to power 4.5 transistions/clock on average instead of 4, thus costing you more power, and putting more strain on your power distribution network. So you really should only be using this when you have a good idea how your command or values will change from one clock to the next.

(*) I'm ignoring leakage and resistive effects.

Actually, with this type of bus inversion with 8 data lines and 1 inversion line, you can keep the WORST-case transition rate down at 4 transitions per clock (instead of 8 without). The average-case switching is not very much affected by bus inversion, but the worst-case switching is halved.

The scheme that you describe is essentially the one used by the Pentium4 FSB. but it does sound somewhat different from the scheme described for GDDR4 here (which appears to be using an open-drain signalling scheme, and uses bus inversion to limit the potentially severe power consumption penalty of using such a scheme.)

Demirug · Aug 24, 2006

arjan de lumens said:
Actually, with this type of bus inversion with 8 data lines and 1 inversion line, you can keep the WORST-case transition rate down at 4 transitions per clock (instead of 8 without). The average-case switching is not very much affected by bus inversion, but the worst-case switching is halved.

The scheme that you describe is essentially the one used by the Pentium4 FSB. but it does sound somewhat different from the scheme described for GDDR4 here (which appears to be using an open-drain signalling scheme, and uses bus inversion to limit the potentially severe power consumption penalty of using such a scheme.)

GDDR4 support two different types of bus inversions.

The DC variant that inverts based on the numbers of â€œ0â€ that need to be transmitted.
The AC variant inverts based on the number of necessary signal switches.

Humus · Aug 24, 2006

Random thought:

Would there be other more advanced schemes that could improve this further? I'm thinking that on a wider bus, say 32bit or larger, another bit or two could be used to select between a number of different encodings to reduce the number of transitions even further. For instance with two bits:
00 - normal
01 - all inverted
10 - even bits inverted
11 - odd bits inverted

Or something similar that may make more sense.

compres · Aug 25, 2006

I was thinking along the lines of you Humus. I think it might be good for bytes of 16 bits or more though.

edit: Fixed redundancy.

arjan de lumens · Aug 25, 2006

Humus said:
Random thought:

Would there be other more advanced schemes that could improve this further? I'm thinking that on a wider bus, say 32bit or larger, another bit or two could be used to select between a number of different encodings to reduce the number of transitions even further. For instance with two bits:
00 - normal
01 - all inverted
10 - even bits inverted
11 - odd bits inverted

Or something similar that may make more sense.

Splitting the bits into even/odd like that is not going to help very much - you are in effect just splitting your N-bit bus into two N/2-bit buses as far as the bus inversion is concerned, and if you are going to do that, it's probably better to keep each of the N/2-bit buses grouped separately rather than interlaced in an even/odd pattern (if all the inversions of a clock cycle e.g. hit the upper half of the two buses at the same time, the interlaced bus will suffer a substantially nastier spike than the separately-bundled buses). It may be possible to exploit known patterns in the data that you are going to read/write to do more "intelligent" bus inversion, but in that case it is probably better to use those patterns to just compress the data instead.

Bus inversion

mito

beyond noob

Humus

Crazy coder

Bob

arjan de lumens

Demirug

Humus

Crazy coder

compres

arjan de lumens

Similar threads