Nvidia GT300 core: Speculation

Rolf N · Aug 23, 2009

If you really wanted to, you could serialize the ECC bits and stick with a "normal" bus width.

Code:

DDDDE
DDDDE
DDDDE
DDDDE

becomes

EEEE
DDDD
DDDD
DDDD
DDDD

KimB · Aug 23, 2009

Rolf N said:
If you really wanted to, you could serialize the ECC bits and stick with a "normal" bus width.

Code:

DDDDE DDDDE DDDDE DDDDE becomes EEEE DDDD DDDD DDDD DDDD

That would come at a noticeable performance loss, though.

MfA · Aug 23, 2009

A 1/8th bandwidth reduction is hardly going to break the bank.

KimB · Aug 23, 2009

MfA said:
A 1/8th bandwidth reduction is hardly going to break the bank.

But still noticeable. As I said.

Rolf N · Aug 23, 2009

Chalnoth said:
That would come at a noticeable performance loss, though.

The overall bandwidth overhead for the ECC bits is exactly the same. You're only getting there through frequency, not by using a wider bus.

KimB · Aug 23, 2009

Rolf N said:
The overall bandwidth overhead for the ECC bits is exactly the same. You're only getting there through frequency, not by using a wider bus.

True, but it's likely more difficult (though perhaps less expensive) to drive the bus at a higher frequency than construct a wider bus.

silent_guy · Aug 24, 2009

Rolf N said:
If you really wanted to, you could serialize the ECC bits and stick with a "normal" bus width.

Code:

DDDDE DDDDE DDDDE DDDDE becomes EEEE DDDD DDDD DDDD DDDD

You can only do that if it fits the burst characteristics of the memory.

If I understand your example above, it looks like you're going from a burst of 4 to a burst of 5, which is not supported.

aaronspink · Aug 24, 2009

Rolf N said:
If you really wanted to, you could serialize the ECC bits and stick with a "normal" bus width.

Code:

DDDDE DDDDE DDDDE DDDDE becomes EEEE DDDD DDDD DDDD DDDD

With serialization like that you either end up with odd memory sizes or require special memory chips.

In addition, you would either need chips that support x9 burst sizes or take up 2 burst slots. Nominal burst length is 8x while chop burst length is 4x.

I'm still partial to cherry picked DDR3 if they want to implement ECC personally. Solves two birds: ECC and capacity but gives up bandwidth.

rpg.314 · Aug 24, 2009

They are already sacrificing a bit of bw for avoiding errors. Tesla has a bw of ~102GBps while gtx 280 has a bw of 141 GBps.

CarstenS · Aug 24, 2009

Tesla also has a way lower mem clockspeed, IIRC. Something in the 800-900 MHz range.

KimB · Aug 24, 2009

rpg.314 said:
They are already sacrificing a bit of bw for avoiding errors. Tesla has a bw of ~102GBps while gtx 280 has a bw of 141 GBps.

That may be more related to power consumption/heat than reliability.

rendezvous · Aug 24, 2009

aaronspink said:
With serialization like that you either end up with odd memory sizes or require special memory chips.

In addition, you would either need chips that support x9 burst sizes or take up 2 burst slots. Nominal burst length is 8x while chop burst length is 4x.

I'm still partial to cherry picked DDR3 if they want to implement ECC personally. Solves two birds: ECC and capacity but gives up bandwidth.

I have been looking at how ECC is implemented now and it seems like it is on a DIMM level, not a chip level, by adding another chip, each bank goes from 8 chips and 64 bits to 9 chips and 72 bits.

I'm not sure it woudl be fesible to do implement it that way on a gaphics card.
On a graphics card the bus is devided into several smaller busses with perhaps one or two chips per bus.
Adding another chip per bus for ECC would lead to too much overhead.

I may very well have missed something (like a 36 bit GDDR5 chip) but I think adding ECC to a graphics card is difficult without making to big sacrifices.

KimB · Aug 24, 2009

rendezvous said:
I'm not sure it woudl be fesible to do implement it that way on a gaphics card.
On a graphics card the bus is devided into several smaller busses with perhaps one or two chips per bus.
Adding another chip per bus for ECC would lead to too much overhead.

Why?

There's no question that ECC would take more board real estate. But there's no reason to believe that it would be too much.

rendezvous · Aug 24, 2009

Chalnoth said:
Why?

There's no question that ECC would take more board real estate. But there's no reason to believe that it would be too much.

The GT200 has a 512 bit bus with 16 chips, i.e. 32 bits per chip, lets assume that next gen won't go lower.
Lets assume that you want to do your ECC encoding so that your ECC is 36 bits for each 32 bits of data. Thats a increase of 1/8, which sounds acceptable.

The problem comes when you are to organize your buses, two 288 bit (256 data) buses woudl fit, but then you would have to tocuh all nine chips when you read or write.

We have seen that the busses on graphics cards are quite small to be as effective as possible and I don't see a viable solution if you don't want to increase your bus width and if you don't want to add more than 1/8 overhead.

But hey, I'd like to be proven wrong.

KimB · Aug 24, 2009

rendezvous said:
The GT200 has a 512 bit bus with 16 chips, i.e. 32 bits per chip, lets assume that next gen won't go lower.
Lets assume that you want to do your ECC encoding so that your ECC is 36 bits for each 32 bits of data. Thats a increase of 1/8, which sounds acceptable.

The problem comes when you are to organize your buses, two 288 bit (256 data) buses woudl fit, but then you would have to tocuh all nine chips when you read or write.

We have seen that the busses on graphics cards are quite small to be as effective as possible and I don't see a viable solution if you don't want to increase your bus width and if you don't want to add more than 1/8 overhead.

But hey, I'd like to be proven wrong.

I think you're vastly overestimating the difficulty here. Yes, it will be a challenge. Yes, the board would have to be redesigned. Yes, it would probably end up being a significantly larger board, perhaps with even more layers than a non-ECC part. All this makes it more expensive: it doesn't make it impossible. And for the prices that professional cards go for, more expensive isn't too tremendous a barrier.

Jawed · Aug 24, 2009

http://forum.beyond3d.com/showthread.php?p=1296379#post1296379

rendezvous · Aug 24, 2009

Chalnoth said:
I think you're vastly overestimating the difficulty here. Yes, it will be a challenge. Yes, the board would have to be redesigned. Yes, it would probably end up being a significantly larger board, perhaps with even more layers than a non-ECC part. All this makes it more expensive: they doesn't make it impossible. And for the prices that professional cards go for, more expensive isn't too tremendous a barrier.

You are correct that the added cost for adding a couple of additional memory chips and changing the board layout isn't a major problem, and the cost of it could probably be more than made up for in added price of professional/computation boards.

I see the bus layout as the major problem, but it may be possible for Nvidia to change the layout of the bus for computation boards, or perhaps Jawed's memory hubs will solve the issue (But i still don't see how they would allow adding ECC to a small bus without a large number of chips, unless we start getting 36 bit memory chips).

trinibwoy · Aug 25, 2009

Purevideo SLI?

rpg.314 · Aug 25, 2009

I dunno if this is just a cover-your-ass patent or they are planning to make anything worthwhile out of it. But it seems kinda dodgy to have one gpu decode the bitstream and other gpu process it. When they have fixed function video decoders, why bother sending it over sli to perform subsequent processing. It'll likely cost more power (not by much), would prolly involve higher overhead etc. And all those quality enhancements are nothing more than a rat's ass for a modern gpu.

Come to think of it. They have a far higher bandwidth across PCIe b/w gpu's siting right there, and yet it has never been utilized. Not even for CUDA, where it is a highly demanded feature for those with multi gpu systems.

dkanter · Aug 25, 2009

rpg.314 said:
They are already sacrificing a bit of bw for avoiding errors. Tesla has a bw of ~102GBps while gtx 280 has a bw of 141 GBps.

That's for capacity.

DK

Nvidia GT300 core: Speculation

Rolf N

Recurring Membmare

KimB

MfA

KimB

Rolf N

Recurring Membmare

KimB

silent_guy

aaronspink

rpg.314

CarstenS

Moderator

KimB

rendezvous

KimB

rendezvous

KimB

Jawed

rendezvous

trinibwoy

Meh

rpg.314

dkanter

Similar threads