Nvidia GT300 core: Speculation

Status
Not open for further replies.
If you really wanted to, you could serialize the ECC bits and stick with a "normal" bus width.

Code:
DDDDE
DDDDE
DDDDE
DDDDE

becomes

EEEE
DDDD
DDDD
DDDD
DDDD
 
The overall bandwidth overhead for the ECC bits is exactly the same. You're only getting there through frequency, not by using a wider bus.
True, but it's likely more difficult (though perhaps less expensive) to drive the bus at a higher frequency than construct a wider bus.
 
If you really wanted to, you could serialize the ECC bits and stick with a "normal" bus width.

Code:
DDDDE
DDDDE
DDDDE
DDDDE

becomes

EEEE
DDDD
DDDD
DDDD
DDDD
You can only do that if it fits the burst characteristics of the memory.

If I understand your example above, it looks like you're going from a burst of 4 to a burst of 5, which is not supported.
 
If you really wanted to, you could serialize the ECC bits and stick with a "normal" bus width.

Code:
DDDDE
DDDDE
DDDDE
DDDDE

becomes

EEEE
DDDD
DDDD
DDDD
DDDD

With serialization like that you either end up with odd memory sizes or require special memory chips.

In addition, you would either need chips that support x9 burst sizes or take up 2 burst slots. Nominal burst length is 8x while chop burst length is 4x.

I'm still partial to cherry picked DDR3 if they want to implement ECC personally. Solves two birds: ECC and capacity but gives up bandwidth.
 
They are already sacrificing a bit of bw for avoiding errors. Tesla has a bw of ~102GBps while gtx 280 has a bw of 141 GBps.
 
With serialization like that you either end up with odd memory sizes or require special memory chips.

In addition, you would either need chips that support x9 burst sizes or take up 2 burst slots. Nominal burst length is 8x while chop burst length is 4x.

I'm still partial to cherry picked DDR3 if they want to implement ECC personally. Solves two birds: ECC and capacity but gives up bandwidth.

I have been looking at how ECC is implemented now and it seems like it is on a DIMM level, not a chip level, by adding another chip, each bank goes from 8 chips and 64 bits to 9 chips and 72 bits.

I'm not sure it woudl be fesible to do implement it that way on a gaphics card.
On a graphics card the bus is devided into several smaller busses with perhaps one or two chips per bus.
Adding another chip per bus for ECC would lead to too much overhead.

I may very well have missed something (like a 36 bit GDDR5 chip) but I think adding ECC to a graphics card is difficult without making to big sacrifices.
 
I'm not sure it woudl be fesible to do implement it that way on a gaphics card.
On a graphics card the bus is devided into several smaller busses with perhaps one or two chips per bus.
Adding another chip per bus for ECC would lead to too much overhead.
Why?

There's no question that ECC would take more board real estate. But there's no reason to believe that it would be too much.
 
Why?

There's no question that ECC would take more board real estate. But there's no reason to believe that it would be too much.

The GT200 has a 512 bit bus with 16 chips, i.e. 32 bits per chip, lets assume that next gen won't go lower.
Lets assume that you want to do your ECC encoding so that your ECC is 36 bits for each 32 bits of data. Thats a increase of 1/8, which sounds acceptable.

The problem comes when you are to organize your buses, two 288 bit (256 data) buses woudl fit, but then you would have to tocuh all nine chips when you read or write.

We have seen that the busses on graphics cards are quite small to be as effective as possible and I don't see a viable solution if you don't want to increase your bus width and if you don't want to add more than 1/8 overhead.

But hey, I'd like to be proven wrong.
 
The GT200 has a 512 bit bus with 16 chips, i.e. 32 bits per chip, lets assume that next gen won't go lower.
Lets assume that you want to do your ECC encoding so that your ECC is 36 bits for each 32 bits of data. Thats a increase of 1/8, which sounds acceptable.

The problem comes when you are to organize your buses, two 288 bit (256 data) buses woudl fit, but then you would have to tocuh all nine chips when you read or write.

We have seen that the busses on graphics cards are quite small to be as effective as possible and I don't see a viable solution if you don't want to increase your bus width and if you don't want to add more than 1/8 overhead.

But hey, I'd like to be proven wrong.
I think you're vastly overestimating the difficulty here. Yes, it will be a challenge. Yes, the board would have to be redesigned. Yes, it would probably end up being a significantly larger board, perhaps with even more layers than a non-ECC part. All this makes it more expensive: it doesn't make it impossible. And for the prices that professional cards go for, more expensive isn't too tremendous a barrier.
 
I think you're vastly overestimating the difficulty here. Yes, it will be a challenge. Yes, the board would have to be redesigned. Yes, it would probably end up being a significantly larger board, perhaps with even more layers than a non-ECC part. All this makes it more expensive: they doesn't make it impossible. And for the prices that professional cards go for, more expensive isn't too tremendous a barrier.

You are correct that the added cost for adding a couple of additional memory chips and changing the board layout isn't a major problem, and the cost of it could probably be more than made up for in added price of professional/computation boards.

I see the bus layout as the major problem, but it may be possible for Nvidia to change the layout of the bus for computation boards, or perhaps Jawed's memory hubs will solve the issue (But i still don't see how they would allow adding ECC to a small bus without a large number of chips, unless we start getting 36 bit memory chips).
 
I dunno if this is just a cover-your-ass patent or they are planning to make anything worthwhile out of it. But it seems kinda dodgy to have one gpu decode the bitstream and other gpu process it. When they have fixed function video decoders, why bother sending it over sli to perform subsequent processing. It'll likely cost more power (not by much), would prolly involve higher overhead etc. And all those quality enhancements are nothing more than a rat's ass for a modern gpu.

Come to think of it. They have a far higher bandwidth across PCIe b/w gpu's siting right there, and yet it has never been utilized. Not even for CUDA, where it is a highly demanded feature for those with multi gpu systems.
 
Status
Not open for further replies.
Back
Top