A GPU's main problem is that it's a high power thing, much higher than consumer CPUs - feeding the core with power puts the squeeze on pins for I/O.
The one or more hub chips should be low power, so the balance of I/O : power should keep the size of that chip down. Secondly, the GPU<->hub I/O is less pins than the pins required to implement GDDR, which means more pins for power and/or a smaller GPU.
GDDR5 doesn't even use differential signalling. NVidia can make an entirely proprietary interconnect, and do what it likes to get the signalling it wants.
The alternative is multiple small hub chips, e.g. 4x 128-bit. The perimeter is 19mm of GDDR5 and 10mm of GPU connection per chip = ~30mm. The area would be say 19mm² of GDDR5, 10mm² of GPU connection and say 7mm² of MC on 55nm meaning the entire chip is <40mm². Each hub chip would be something like 4x11mm, say.
Then there's the question of whether the ROPs go on the hub chip too, making room for vastly more ALUs The Tesla (or should that be Fermi?) variant then has hubs that are ROP-less and support ECC.
Why would you need such high speeds? I thought the whole point was that for an equivalent effective width you need fewer physical pins when using a proprietary differential signalling bus. So a 512-bit custom bus will actually be much smaller than a 512-bit GDDR5 interface. But it can run at the same 4.5GT/s speed as GDDR5 does, it'll just take up less space.
Differential signaling requires 2x the number of pins/wire per signal than single ended. If you ran the custom differential interface at the same frequency, you would need ~2x the number of wires and pins for the signaling. You need to run the differential interconnect significantly faster (~2x) to break even on pin count.
If that's the case then that patent only makes sense if you're trying to access a lot of very slow memory. Probably a dead end.
The basic premise of the patent was an IC that converted a parallel DRAM interface to a proprietary serial one.
Is this distinct from the FB-DIMM concept, other than the fact that Nvidia pasted "GPU" over the section that would have been a CPU?
Differential signaling requires 2x the number of pins/wire per signal than single ended.
Yeah, parasitic power is unquestionably an issue. If hubs are used, that might mean that laptop chips (e.g. 128-bit or 64-bit memory bus) don't use hubs, and that it's reserved for 256/512-bit chips.The hub chips would actually have fairly high power density. High speed I/Os aren't known for being low power, and what you are essentially doing is doubling the number of high power I/Os on these hub chips. So you are actually looking at the range of 25-50W of purely parasitic power being added to the design.
~30mm of I/O would require a ~8mmx8mm hub chip. 64mm2 x 4 is a lot of area.
The sideport on RV770 is the same bandwidth as PCI Express - it is seemingly nothing more than a 16 lane version 2 PCI Express port dedicated to inter-GPU communication. It's 38% of the area of the main PCI Express port, which appears to be a direct result of the less demanding physical environment required for the connection. That seems to imply substantially lower power consumption, too.There are real limits. Considering the frequencies of GDDR5, it would be fully bleeding edge to do an interconnect at 2x GDDR5 frequencies and likely require differential signaling which means you are right back where you started as far as pin counts. For it to make sense, you would need to run the GPU<>HUB signaling at a min of 4x GDDR5 frequencies which is a range still in the very early stages of advanced research at this point.
You mean volumes aren't there for an ECC variant? If the scalable memory buffer is viable for Nehalem-EX, then why isn't an ECC-specific version viable for NVidia? Why should NVidia's volumes over 3 years, say, be painfully expensive? Particularly if, say, a 16GB ECC board can sell for $8000.The volumes aren't there.
You seem to be suggesting that this kind of power will require a square chip, 8x8mm, as opposed to the rectangular 11x4mm that I was suggesting. And that stacked-I/O, as MfA was suggesting, wouldn't be an option, either.
The sideport on RV770 is the same bandwidth as PCI Express - it is seemingly nothing more than a 16 lane version 2 PCI Express port dedicated to inter-GPU communication. It's 38% of the area of the main PCI Express port, which appears to be a direct result of the less demanding physical environment required for the connection. That seems to imply substantially lower power consumption, too.
The I/O for hub communication should benefit substantially from a constrained physical environment, too, in comparison with GDDR5 I/O. I guess some pins in GDDR5 that are power/ground would be traded for signal pins in a differential configuration too.
You mean volumes aren't there for an ECC variant? If the scalable memory buffer is viable for Nehalem-EX, then why isn't an ECC-specific version viable for NVidia? Why should NVidia's volumes over 3 years, say, be painfully expensive? Particularly if, say, a 16GB ECC board can sell for $8000.
Will GDDR5 go differential in a couple of years' time?...
Just to put some numbers to it ... what does an extra substrate layer add to the cost of an AMB sized chip?Stacked-I/O requires more package layers resulting in higher cost and worse signaling.
Yeah, it seems the pricing would appear to enforce this as a solution for 512-bit GPUs only, which are clearly meant to leave NVidia at >$100 a piece.Cause the Nvidia solution is going to have to sell for <$100 total for the vast majority of the volume! There are different realities for something with an ASP of ~100 and something with an ASP of >1000.
GDDR5 is currently available in 6Gbps form, and is supposed to reach ~7Gbps.Its going to have to do something. Given the physical constraints they might be able to push GDDR5 to around 6 or so GT/s but then they'll likely be out of signaling margin. A reasonable solution is probably to go to something like a 5-7 GT/s SiBi signaling technology.
Yeah, it seems the pricing would appear to enforce this as a solution for 512-bit GPUs only, which are clearly meant to leave NVidia at >$100 a piece.
GDDR5 is currently available in 6Gbps form, and is supposed to reach ~7Gbps.
SINCE WE BROKE the news about Nvidia's GT300 and it's huge die size, there is more about the family that has come to light. It will have some ankle biting offspring in short order.
Yes, the oft-delayed monster is going to have children, four of them. For the sharp-eyed, this is more than Nvidia has done at any time in the past. While none of them have taped out yet, our moles say that Nvidia usually waits until they get the first silicon back to see if there are any changes that need to be made to subsequent designs. With mask sets running into the 7 figures, this is what you call a smart move.
Yeah, parasitic power is unquestionably an issue. If hubs are used, that might mean that laptop chips (e.g. 128-bit or 64-bit memory bus) don't use hubs, and that it's reserved for 256/512-bit chips.
If the hub is used to make the GPU smaller (not even sure if this would be by a significant degree), there'd be some power saving arising from the smaller GPU.
You seem to be suggesting that this kind of power will require a square chip, 8x8mm, as opposed to the rectangular 11x4mm that I was suggesting. And that stacked-I/O, as MfA was suggesting, wouldn't be an option, either.
The sideport on RV770 is the same bandwidth as PCI Express - it is seemingly nothing more than a 16 lane version 2 PCI Express port dedicated to inter-GPU communication. It's 38% of the area of the main PCI Express port, which appears to be a direct result of the less demanding physical environment required for the connection. That seems to imply substantially lower power consumption, too.
You mean volumes aren't there for an ECC variant? If the scalable memory buffer is viable for Nehalem-EX, then why isn't an ECC-specific version viable for NVidia? Why should NVidia's volumes over 3 years, say, be painfully expensive? Particularly if, say, a 16GB ECC board can sell for $8000.
I am inclined to agree that this concept may have arisen in an era that didn't contemplate GDDR5, e.g. when GDDR4 signalled "not much performance gain" over GDDR3, or that it was purely defensive. The sheer performance of GDDR5 may undo this.
Will GDDR5 go differential in a couple of years' time?...
Jawed
Part of the GDDR5 interface is dedicated lines for error detection. Detected errors cause a re-transmission attempt and may be used as a signifier to kick off re-training to adapt to varying voltages/temperatures.If you have ECC, I wonder if you need to worry about your MC<>DRAM interconnect a bit more and use something more robust than GDDRx?
GDDR5 was likely designed with a notion that bit errors are more acceptable than they are in the CPU/GPGPU world.
Yeah, that's a serious problem.GDDR5 was designed to be single ended for backwards compat. with GDDR3/4. I suspect GDDR6 may need diff. signaling.
However, figuring out how to support both high performance/high price DRAM and lower price DRAM for midrange/low-end systems on the same memory controller is very tricky.
I suspect the basic electrical differences between single-ended and differential will be a problem when looking at the interfacing from pads to pins and the types of pins. Unless there was a way to configure some pad/pin combinations in the interface to switch from being power/ground for single-ended to signalling for differential.Maybe once GDDR5 is low price, the spec could be extended to be differential...and then diff. GDDR5 could be the 'cheap' DRAM for a GDDR6 memory controller. But I don't know whether anyone would like that, or if it's remotely feasible when you consider teh actual trade-offs involved.