Nvidia GT300 core: Speculation

Status
Not open for further replies.
Part of the GDDR5 interface is dedicated lines for error detection. Detected errors cause a re-transmission attempt and may be used as a signifier to kick off re-training to adapt to varying voltages/temperatures.

I'm aware of that. But the BER targeted by GDDR5 is probably worse than the target for DDR3 or FBD. That's my point. I'm aware that there is CRC over the interconnect, but the question is about the target BER relative to standard memory.

Yeah, that's a serious problem.

In fact this is a motivator in the patent document. The hub chips insulate the GPU chips from the vagaries of memory technology and varying interface types. But the hub chips incur a significant increase in entry-level cost for the board as a whole, as well as the power penalties.

Now you could argue that the rising tide of IGP softens that blow - i.e. that the entry-level cost for a board is rising anyway. But it seems the penalties are so severe that it might only be possible with the biggest GPUs. In which case the DDR flexibility would de-emphasise the problems the chip team has when they're building a new huge GPU over a multi-year timeline, enabling the existing strategy of delivering the halo chip as the first chip in a new architecture. The smaller chips would then be engineered for the specific memory types. This would make NVidia "laggy" on memory technology adoption - but NVidia is already laggy, if taking GDDR5 at face value - though NVidia had the first GDDR3 GPU.

You keep on thinking up ways that this could work, but the reality is that it's a bad idea.

Integrated memory controllers are the future, and the only systems that need memory oriented discrete components are ultra-high capacity and ultra-high reliability ones. Neither of those describe a GPU.

You end up adding a lot of latency, adding power, adding cost, and you don't get a whole lot in return. That sort of thing is handy when you have a massive split in the industry (e.g. RDR vs. DDR), but it's pretty clear that the consensus is GDDRx for the high-end and DDRx for the low-end.

There are a huge number of downsides and very few upsides.

GDDR3 seems to be facing a rapid tail-off - it may be only 18 months before it disappears entirely. ATI may not use it in the upcoming generation, sticking with DDR and GDDR5. Dunno if that effectively means that GDDR5 would face a yet-more-rapid tail-off if it were replaced by GDDR6.

The rising tide of IGP also hampers the economies of scale that make an interation of GDDR viable. Against that discrete GPUs are still undergoing growth. But the notebook sector is putting a squeeze on everything.

I think that's a very insightful comment and I wonder what the data shows. I think you're right, and I'd be curious to see if that means that the low-end of discrete move upwards in the price stack.


@MFA:

Since you are looking at the difference between two pins, any changes (e.g. temperature, bad pcb design) that equally impact both pins will cancel each other out. That makes EMI easier to handle for instance and can reduce the amount of shielding needed.

Differential signaling is also more power efficient (smaller swings in voltage on each pair can produce the same overall voltage shift).

If you look at all the new interfaces introduced in the last 10 years, they have all trended towards diff. signaling: Rambus, CSI, HT, FBD, PCI-e, etc.

DK
 
Differential signaling is also more power efficient (smaller swings in voltage on each pair can produce the same overall voltage shift).

If you look at all the new interfaces introduced in the last 10 years, they have all trended towards diff. signaling: Rambus, CSI, HT, FBD, PCI-e, etc.
None of those have such favourable environments as GDDR5 (mere cm of PCB to travel) and none of those are driven by the same bandwidth per pin uber all mentality as GPU memory (at least for non mobile GPUs).

Reduced power consumption is nice, but it's not going to save enough power/ground pins to offset the doubling with differential signalling. Increased SNR is nice, but the CMOS/substrate/PCBs have inherent bandwidth limits for binary signalling regardless of SNR ... and I think single ended signalling can push close enough to those with manageable power consumption.

Without more complex encoding than binary, differential signalling will drive down and not up bandwidth per pin IMO.
 
None of those have such favourable environments as GDDR5 (mere cm of PCB to travel) and none of those are driven by the same bandwidth per pin uber all mentality as GPU memory (at least for non mobile GPUs).

You're right about the signal environment, it is more favorable - not because of the length, but because there are no sockets or discontinuities.

I also think you are radically under-estimating the focus on bandwidth at AMD, Rambus and Intel. They focus on BW/pin, BW/mW and BW/mm2 and are very aggressive.

XDR2 is designed for 12.8gbps...way higher than anyone else is going right now. Compared to ~7gbps for GDDR5, and it's unclear that they will ever hit 7gpbs.

CSI is shipping at 6.4gbps, higher than GDDR5 ships today. I can't recall the HT speeds shipping now.

FBD ships at 4gpbs, and has been doing so for a long time.

I can't find the pin count for a GDDR5 interface, so it's hard for me to compare.

Reduced power consumption is nice, but it's not going to save enough power/ground pins to offset the doubling with differential signalling. Increased SNR is nice, but the CMOS/substrate/PCBs have inherent bandwidth limits for binary signalling regardless of SNR ... and I think single ended signalling can push close enough to those with manageable power consumption.

Pretty much the entire industry agrees that differential is the way going forward to achieve superior gbps/mW and /mm2 and /pin. There's really no debate, it's just about how far you can keep on pushing single ended before the game is up.

I have a hard time believing that GDDRx is competitive when you look at mW/gpbs. Nat Semi has a 13mW/gbps pci-e interface available, and I've heard of stuff from rambus as low as 2mW/gpbs.

I don't know the numbers for CSI, XDR2 or HT, but I'd be curious if someone can dig them up.

But here's a presentation that pretty strongly states that the memory guys are dropping the ball on power:
http://www.ska.ac.za/ska2009/download/ska2009_elmegreen.pdf

Without more complex encoding than binary, differential signalling will drive down and not up bandwidth per pin IMO.

You're welcome to believe that, but if you look at the research done in industry and academia you'll find that the rest of the world believes otherwise, and they are investing millions and billions accordingly.

DK
 
Pretty much the entire industry agrees that differential is the way going forward to achieve superior gbps/mW and /mm2 and /pin. There's really no debate, it's just about how far you can keep on pushing single ended before the game is up.

Macri said advanced signaling technologies from Rambus will not be competitive, in part because they use a differential (two-wire) approach rather than the single-wire technique in GDDR5. The extra wire typically requires more pins and power. "We don't think a differential solution make sense until you get to speeds of 8- to10Gbit/s," he said.

Going forward is all fine and well, but we are in the here and now. Single ended isn't being pushed because the switch is hard, they switched to differential for signals where it made sense after all .... it was pushed with GDDR5 because for GPUs at the moment it is still the optimal solution.

Once CMOS/substrates/PCBs/etc have evolved a bit so XDR2 can really reach it's higher design targets rather than promising them things will undoubtedly be different, but this is the here and now. Differential signalling would not give NVIDIA an appreciable increase in bandwidth per pin IMO.

About those billions, I think that's overstating the investments still being made. "Going forward" differential might make sense, the trouble is that a little forward from that optical makes even more sense ... investing a lot of money in say differential GDDR6 is one bulk silicon laser away from money down the drain. NEC was planning on optical CPU/memory interconnects in 2011 a couple years back ... slightly too ambitious, but still. With the economic slowdown I think it's extremely likely differential will just be entirely skipped for memory (apart from the few customers RAMBUS manages to get).
 
I'm aware of that. But the BER targeted by GDDR5 is probably worse than the target for DDR3 or FBD. That's my point. I'm aware that there is CRC over the interconnect, but the question is about the target BER relative to standard memory.
No, earlier you were quite explicit in questioning the interconnect:

If you have ECC, I wonder if you need to worry about your MC<>DRAM interconnect a bit more and use something more robust than GDDRx?
GDDR5 was likely designed with a notion that bit errors are more acceptable than they are in the CPU/GPGPU world.

You keep on thinking up ways that this could work, but the reality is that it's a bad idea.
Hmm, well I'm not the one that paid real money for a patent application. Talk is cheap, eh? As long as I'm learning something I'm likely to keep prodding. Apart from anything else, I learn a bit more about the workings of existing stuff, and this discussion prompts me to ferret in this:

http://v3.espacenet.com/publication...T=D&date=20090205&CC=US&NR=2009032941A1&KC=A1

properly, some time...

We can't tell what NVidia thinks of the viability of these concepts, or what value/expectation NVidia once had. It might have been one of those recreational patents you see from time to time, "while we were building clouds we made some rainbows, let's patent them too."

You end up adding a lot of latency, adding power, adding cost, and you don't get a whole lot in return. That sort of thing is handy when you have a massive split in the industry (e.g. RDR vs. DDR), but it's pretty clear that the consensus is GDDRx for the high-end and DDRx for the low-end.
That split is actually very tortuous, though. It's one of the factors along with the implicit timeliness, that the patent application points at.

A factor of 2-3x performance between successive generations of memory, in a field such as GPUs where bandwidth demanded goes up by 2x every 2 years, but halo GPUs appear every 18 months or shorter makes for a treacherous mix. The "abortion" that was GDDR4 shows how treacherous it is to try to plan ahead.

There are a huge number of downsides and very few upsides.
Well, how do you price flexibility? Also, unless you're Intel, who can afford/cajole early-adoption of each DDR iteration, what can you do? AMD is forced to take a back seat with DDR iterations - apart from anything else, of course, the IMC has made the iterations more troublesome these past few years. Clearly the guys at ATI took the bull by the horns, yet it's still seen as good fortune how GDDR5 and RV770 combined.

I think that's a very insightful comment and I wonder what the data shows. I think you're right, and I'd be curious to see if that means that the low-end of discrete move upwards in the price stack.
Notebooks are selling more units than desktops, apart from anything else. And don't forget Intel, solely with IGP, commands the majority of the GPU market. Though it's worth noting that notebook and desktop systems can contain both IGP and discrete at the time of their original sale.

Since you are looking at the difference between two pins, any changes (e.g. temperature, bad pcb design) that equally impact both pins will cancel each other out. That makes EMI easier to handle for instance and can reduce the amount of shielding needed.

Differential signaling is also more power efficient (smaller swings in voltage on each pair can produce the same overall voltage shift).
One caveat here is DBI (and ABI) may be saving more power than is necessarily being credited. Inversion savings are effectively lost in a differential signalling system.

If you look at all the new interfaces introduced in the last 10 years, they have all trended towards diff. signaling: Rambus, CSI, HT, FBD, PCI-e, etc.
Well, as soon as someone isolates the pure interconnect power consumption of GDDR5, as opposed to the MC+interface+wiring+memory, then we might get somewhere.

Jawed
 
No, earlier you were quite explicit in questioning the interconnect:

I think what I wrote was a little confusing. I was talking about the target BER of the interconnect accounting for error detection.

My expectation is that given the applications for GDDRx (graphics), they may not have designed the interconnect to be quite as robust against bit errors in transmission, relative to FBD or DDRx.

ECC is just one piece of the system, and increasing the reliability of storage is good, but only if the rest of the system is improved in tandem. Put another way, GDDR5 was formulated to be acceptable for GPUs; I don't know if was designed to be acceptable for GPUs using ECC.

In all fairness, NV could then design a more tightly constrained memory controller interconnect (using more board layers, shorter distances, etc.) to accommodate that, but it's not as simple as 'add ECC and press play' : )

DK
 

I'm impressed - where'd you dig that up? And is it 61 pins on the DRAM or memory controller?

All I found was http://www.rambus.com/us/products/xdr2/xdr2_vs_gddr5.html which has different data, although it's not really clear how they are counting...and obviously Rambus isn't trying to laud GDDR5 here, so it probably needs to be verified and more thoroughly investigated.

Anyway, it would be interesting to see the BW/pin for various interfaces. All the papers I've read at ISSCC are pretty conclusive that differential is the only way forward.

David
 
Going forward is all fine and well, but we are in the here and now. Single ended isn't being pushed because the switch is hard, they switched to differential for signals where it made sense after all .... it was pushed with GDDR5 because for GPUs at the moment it is still the optimal solution.

As an FYI, as much respect as I have for Macri, irc, he was on the other side of the Rambus RDRAM battle and isn't exactly the most impartial in this area. Don't get me wrong, Joe is a smart guy but from a pure technology perspective, rambus has basically been continuously ahead on the whole memory interface thing.

I honestly don't see GDDR5 getting much beyond 6 GT/s without branching into differential data variants.

Once CMOS/substrates/PCBs/etc have evolved a bit so XDR2 can really reach it's higher design targets rather than promising them things will undoubtedly be different, but this is the here and now. Differential signalling would not give NVIDIA an appreciable increase in bandwidth per pin IMO.

Given the distances involved, differential should be able to 12+ GT/s right now. And as it stands, XDR2 is actually more pin efficient as well.

About those billions, I think that's overstating the investments still being made.

Actually, its easily in the billions. Pretty much all the interconnect and networking work is being done on differential. Its the only way to get enough margin to keep pushing the GT/s/wire forward.

"Going forward" differential might make sense, the trouble is that a little forward from that optical makes even more sense

Optical is still a bit out and will still be relatively expensive for quite a bit. There are also only a small handful of companies that have any real experience in Si photonics. In addition, there are numerous packaging problems still to overcome in order to use photonics in applications such as GPU/CPU to memory interconnects.

Also the speeds need to increase to counteract in the increased costs. Outside of networking, the first applications will likely be in captive silicon interconnects. We're still a ways off from connecting to memory with it.
 
Well, how do you price flexibility? Also, unless you're Intel, who can afford/cajole early-adoption of each DDR iteration, what can you do? AMD is forced to take a back seat with DDR iterations - apart from anything else, of course, the IMC has made the iterations more troublesome these past few years. Clearly the guys at ATI took the bull by the horns, yet it's still seen as good fortune how GDDR5 and RV770 combined.

GDDR5 and RV770 wasn't fortune. They were basically designed together. GDDR5 is basically all ATI(Joe Macri et al) which enabled them to get a fairly big jump on the RV770 side.
 
I'm impressed - where'd you dig that up? And is it 61 pins on the DRAM or memory controller?
http://www.qimonda-news.com/download/Qimonda_GDDR5_whitepaper.pdf

All I found was http://www.rambus.com/us/products/xdr2/xdr2_vs_gddr5.html which has different data, although it's not really clear how they are counting...and obviously Rambus isn't trying to laud GDDR5 here, so it probably needs to be verified and more thoroughly investigated.
Is XDR2 shipping in any products?

Jawed
 
I think what I wrote was a little confusing. I was talking about the target BER of the interconnect accounting for error detection.
As you'll see from the Qimonda whitepaper, GDDR5 guarantees detection of two errors. But this seems to be only for reads as writes aren't protected.

My expectation is that given the applications for GDDRx (graphics), they may not have designed the interconnect to be quite as robust against bit errors in transmission, relative to FBD or DDRx.
Does DDR provide any transmission error detection and recovery? As far as I can tell that's where FBD etc. come in.

Jawed
 
GDDR5 and RV770 wasn't fortune. They were basically designed together. GDDR5 is basically all ATI(Joe Macri et al) which enabled them to get a fairly big jump on the RV770 side.
http://www.anandtech.com/video/showdoc.aspx?i=3469&p=8

Maybe the mix of politics made it less a case of fortune than it would otherwise appear, but I think it's quite reasonable to observe that the developmental paths of GDDR3 and GDDR4 indicate a substantial risk.

NVidia still hasn't produced a shipping product with GDDR5, which must have a strong effect on the economic commitment of the GDDR manufacturers - NVidia's refusal to use GDDR4 would have had a chilling effect, though that was not ambitious enough it seems, so NVidia did the right thing.

The 40nm woes at TSMC have prolly delayed NVidia's adoption of GDDR5 (GT214 disappeared so now it's just GT215) by ~9-12 months (GT215 doesn't appear as if it will ship until Q4 and GT300 is anyone's guess). GDDR3's scaling, significantly further than originally planned, also helps in delaying adoption of GDDR5. GPU transistor budgets, hence efficiency of compression techniques, also have a significant effect on demand for bandwidth (i.e. reducing it).

Well, anyway, I'm sure you're quite familiar with these kinds of tangled webs of dependencies.

Jawed
 
NVidia still hasn't produced a shipping product with GDDR5, which must have a strong effect on the economic commitment of the GDDR manufacturers - NVidia's refusal to use GDDR4 would have had a chilling effect, though that was not ambitious enough it seems, so NVidia did the right thing.
Irrespective of NVIDIA, the fundamental difference between GDDR4 and GDDR5 was that there was only one DRAM vendor signed to produce GDDR4, we already knew that there was 3 vendors geared up for GDDR5.
 
Irrespective of NVIDIA, the fundamental difference between GDDR4 and GDDR5 was that there was only one DRAM vendor signed to produce GDDR4, we already knew that there was 3 vendors geared up for GDDR5.
Is that the main cause for the quick demise of GDDR4? I was under the impression that it was more of a mix of various factors including the lack of support from nVidia, relatively short lifetime before GDDR5 introduction and higher than predicted scaling of GDDR3. BTW I was surprised to find that there have been quite a few HD4670 and HD4850 released with GDDR4 memories, I thought that after R6xx ATi/AMD would have removed the support for it from the memory controllers but apparently they didn't.
 
Hmm, well I'm not the one that paid real money for a patent application. Talk is cheap, eh? As long as I'm learning something I'm likely to keep prodding. Apart from anything else, I learn a bit more about the workings of existing stuff, and this discussion prompts me to ferret in this:

http://v3.espacenet.com/publication...T=D&date=20090205&CC=US&NR=2009032941A1&KC=A1

properly, some time...
I just skimmed, but is Nvidia talking about building a conductive "footing" through the passivation layer to get a better interface with the solder?

We can't tell what NVidia thinks of the viability of these concepts, or what value/expectation NVidia once had. It might have been one of those recreational patents you see from time to time, "while we were building clouds we made some rainbows, let's patent them too."
For a large corporation with large pockets in the torturous patent minefield of today, a large patent portfolio's size is an asset all on its own.
MAD works with IP as well (unless facing a patent troll...), and if you patent enough you can just extort a licensing fee out of others.

That split is actually very tortuous, though. It's one of the factors along with the implicit timeliness, that the patent application points at.
Is there any indication that the cycle is going to get any faster?
It may very well slow down.
What lesson will the volume DRAM industry learn from Qimonda (now being liquidated) in trying to rely on pushing forward on botique memory to save their bacon?


Clearly the guys at ATI took the bull by the horns, yet it's still seen as good fortune how GDDR5 and RV770 combined.
It was a two-party dance, and there's blood on the dance floor.
It may not happen again.
 
I just skimmed, but is Nvidia talking about building a conductive "footing" through the passivation layer to get a better interface with the solder?
This is an ATI patent application that forms the solder ball in a different way upon the pad with stronger corners. These corners are more robust against the type of fracture due to thermal expansion mismatches caused by non-lead based solder. The passivation layer is constructed in two parts, sandwiching the under bump metallisation layer, and it seems the underfill is the only thermally flexible interface.

I dare say the clever part is that this technique allows the under bump metallisation layer constructed on the outside of the chip to be formed into arbitrary shapes. So you get a "free" low-impedance metal layer on the outside of the chip, which is good for high-current connections, i.e. power. I guess it's also very useful in providing electrical shielding to enmesh signal pad/ball combinations. (This is the original reason my interest was piqued, as it seems to be a good way of providing a clean environment for the high speed GDDR5 signals - I presume they're fairly vulnerable at the interface of substrate and chip, i.e. at the solder balls. It also allows the GDDR5 interfacing pins to be concentrated on signalling, as power and ground can be brought across the chip, externally, without using solder balls for power amongst the balls for signal. This means the GDDR5 interface can take minimal area.)

These arbitrary shapes allow a reduction in the ball count, too, e.g. a power pin on the substrate can feed multiple pads on the chip surface, instead of requiring that each pad has a dedicated ball. Additionally the solder reflow process enforces a minimum spacing between balls, so by reducing the count of balls required to deliver ground and power connections, you can move the pads closer together. Power isn't a problem because the under-bump metallisation layer can be beefy, considerably more so than metal layers within the chip.

Is there any indication that the cycle is going to get any faster?
It may very well slow down.
Apart from the fact that memory performance is desperately behind other technology scalings, no, there's no reason for the cycle to get faster.

What lesson will the volume DRAM industry learn from Qimonda (now being liquidated) in trying to rely on pushing forward on botique memory to save their bacon?
What boutique memory was Qimonda betting the farm on?

It was a two-party dance, and there's blood on the dance floor.
It may not happen again.
Now that Intel has woken up and discovered the need for efficient memory interfaces on consumer CPUs (as core counts race towards oblivion), maybe there'll be a new dance floor.

Jawed
 
Is XDR2 shipping?

I don't think Rambus has any announced design wins. I think my main point was that they can use a bog standard 65nm process and pretty simple board to hit 12.8gbps.

GDDR5 only ships at pretty low frequencies now, although there have probably been some high speed demos.

As you'll see from the Qimonda whitepaper, GDDR5
guarantees detection of two errors. But this seems to be only for reads as writes aren't protected.

It seems like the goal was to protect reading instructions stored in memory, not data. That would complicate ECC if the writes cannot be assumed to be reliable...but not necessarily a fatal flaw. It would all depend on the probability of a write error compared to the probability of a SER in the DRAMs.

However, what I'm really wondering is what BER is considered acceptable and within spec for GDDR5 and what HW can achieve. Usually this is expressed as BER<N*10^-k for k>11.

Does DDR provide any transmission error detection and recovery? As far as I can tell that's where FBD etc. come in.

Jawed

Errors are highly non-linear WRT transmit rate, getting exponentially worse at high speed.

Since DDR transmits data at pretty low rates, it actually isn't needed.

FBD needed it since it was designed for servers, and it's rather high speed.

DK
 
What boutique memory was Qimonda betting the farm on?

The kind you can sell for a profit.

Now that Intel has woken up and discovered the need for efficient memory interfaces on consumer CPUs (as core counts race towards oblivion), maybe there'll be a new dance floor.

They have efficient interfaces, they are just tuned for very different workloads. GDDR5 would do kind of poorly for TPC-C...partially because you usually want as much memory as possible.

David
 
http://www.anandtech.com/video/showdoc.aspx?i=3469&p=8

Maybe the mix of politics made it less a case of fortune than it would otherwise appear, but I think it's quite reasonable to observe that the developmental paths of GDDR3 and GDDR4 indicate a substantial risk.

They basically designed the spec. They basically committed to a contract manufacturing order with the DRAM vendors. There is no politics or luck involved. Just normal biz. Its not like they didn't have the majority of the control.

NVidia still hasn't produced a shipping product with GDDR5

Nvidia hasn't really introduced a real new product since the introduction of GDDR5. Also ATI had about a year headstart on GDDR5 from everything publicly available.
 
Status
Not open for further replies.
Back
Top