NVIDIA Maxwell Speculation Thread

Is Charlie ever going to stop with his anti-NV agenda? What did they ever do to him? Does he really believe anyone still takes him seriously?
 
There are threads for tabloïd news, "Nvidia shows signs of strain" and "AMD doom and gloom".
But lol this is all.. weird. The first half is about some theory that architectures are grouped into major four-year ones including their two-year "refresh".
So, GT200 was a new architecture and Fermi is its "refresh", Kepler was new ground and Maxwell is a "refresh" again so Volta is the following ground-breaking stuff and here's why it's a disaster : because AMD failed at interposer memory.

That's funny crackpot. GT200 was a big G80, not a new architecture. Fermi was quite different by the author's admission :
Tesla and Fermi are a single underlying “four-year” architecture with the latter having a massively updated uncore and interconnect network as the main new feature.
"Tesla and Fermi were the same, with the latter being completely different" (he omits to mention how SMs and where are double-precision FPUs have changed too)

If anything maybe is Kepler more similar to Fermi? Hot clocks were dropped, SMX has triple the "cores" but it still has GPCs, same shader model etc.

Also note that the GK110 part marketed as Titan is a completely different family to the smaller GK10x and GK11x lines, names aside, they are very different lines.
True there are significant updates but.. the Kayla GPU has the GK110's abilities. So the GK11x (or GK20x) is the same than GK110, not GK104.

Then Maxwell is said to be a minor update to Kepler, just because.. trust Charlie on this. It thus follows that Volta is completely different :LOL:. I'd wager that Volta is just Maxwell with stacked memory, Maxwell paves the way for both Tegra and Volta with its significant memory changes.
 
The really interesting part in there is the mention of Tiran, the slide showing that it was meant for stacked memory, and of course the question raised: what the hell happened to it?

I don't really buy that it worked fine and was scrapped for other reasons, and would be curious to know exactly what the problem(s) was(were). Charlie's rant aside, there's interesting information in there.
 
Tiran does exist as an island and it is a pretty interesting choice of codename.

The island is currently only inhabited by military personnel from Egypt and the Multinational Force and Observers [MFO], although it has been inhabited during many centuries in the past.

Chisholm Point is a cape of Tiran Island.

Some theologists claim that Tiran Island is the location of the parting of the Red Sea described in the Book of Exodus in the Torah.[citation needed]

Israel briefly took over the island during the Suez Crisis and again between 1967 to 1982 following the events of the Six Day War. Procopius writes that there was an autonomous Jewish community on the island (then called Iotabe) until the sixth century AD, when it was conquered by the Byzantine Empire.[5] This history figured in Israeli rhetoric during the Suez Crisis.[6]

Some sources report that many beaches on the island are mined.
 
The really interesting part in there is the mention of Tiran, the slide showing that it was meant for stacked memory, and of course the question raised: what the hell happened to it?

I don't really buy that it worked fine and was scrapped for other reasons, and would be curious to know exactly what the problem(s) was(were). Charlie's rant aside, there's interesting information in there.

I agree wholeheartedly. This is the problem with Charlie, isn't it?! Some interesting titbits projected through tinted glasses (his own special tint, not even green or red, more a sort of brownish-orangey-yellow, like diarrhoea).
 
The news is related to an architecture two generations from now, and if there were more information it'd almost support a new thread.

It's not inconceivable that AMD could have had a lead on 2.5D or on-package memory. We see that Intel is doing this, in some form, and historically as a CPU manufacturer AMD has been forced to maintain a trailing implementer of the former's packaging tech, usually. The competitive pressures and historical expertise would mean that the crossover point for AMD was different than it would be for designers that didn't need to push the physical implementation curve so hard.

Nvidia's historical needs haven't required the same level of physical implementation, so the same level of aggression might not pay off.

AMD's losing so much its presence in much of the markets that would have driven integrated memory volumes and perhaps some kind of hiccup in the standardization or economics of stacked memory or interposer tech could have caused a delay, even if the tech of on-package memory AMD developed was sound.

Without access to the rest of the article to be certain, I wouldn't say that what's publically written is a sign of trouble of that magnitude. An advantage isn't an advantage if the leader doesn't pull the trigger. The longer the delay, the less Nvidia's alleged deficit matters, if it can keep its timeline.
 
The really interesting part in there is the mention of Tiran, the slide showing that it was meant for stacked memory, and of course the question raised: what the hell happened to it?

I don't really buy that it worked fine and was scrapped for other reasons, and would be curious to know exactly what the problem(s) was(were). Charlie's rant aside, there's interesting information in there.

AMD missed the boat on Tiran it seems.
 
The questions are:
1. How does the author know that Nvidia isn't experimenting with this stuff as well (behind closed doors, that is)?
2. How does the author know that Nvidia releasing stacked memory products only in 2016 is a technical and not an economical/political decision?
3. If AMD has this technology ready and had it ready for Tahiti - why did they not use it? Would be quite stupid, not to.

To me, the whole article is filled with inconclusive stuff, with assumptions and possibilities (and errors as some of you have already pointed out).
 
Even if those things about AMD's plans at some point in 2009 about what they would be doing in 2012 :LOL: are true, then I would not comment who exactly he is speaking about:

a very public acknowledgement of the abject mess the company is in

AMD missed the boat on Tiran it seems.

Is there anyone with access to the full article and may he share what is written there?
 
If anything maybe is Kepler more similar to Fermi? Hot clocks were dropped, SMX has triple the "cores" but it still has GPCs, same shader model etc.
It's almost funny how he can be so completely wrong, now if this were speculation well fair game but that's for stuff which is all known now for quite some time...
You are quite right you could call gt200 a "architecture refresh" of g8x/g9x whereas kepler would be an architecture refresh of fermi. On a high-level, gt200 is very similar to g8x/g9x whereas kepler is very similar to fermi. There's always arguments what constitutes a refresh, but no doubt that these two are way more similar to each other than fermi is to tesla.
Since I have no idea what maxwell/volta/einstein are going to look like though I can't tell if there's some scheme there really or it's more or less coincidence that there are larger and smaller two year changes.
 
AFAIK (which is very little) you need a substrate to connect those RAMs to the GPU. And this substrate is made using some old silicon process. Which means you still need wafers etc.

I assume that the technological challenge where conquered quite a while ago, but that it's just too expensive.

Contemporary GPUs may be a bit BW starved, but no egregiously so. There is no point in doubling BW if you don't have the compute cycles to use it.

As long as this is the case, the old, cheap, way is preferred. If silicon processes keep getting better (a given for the near future) and external DRAM does not (probably?), then stacked die will become inevitable. But where not there yet.

It's fascinating to see what kind of stories can be spun around this.
 
It would be an interesting hypothetical if stacked memory and some kind of interposer-based tech were available for Tahiti, even if it provided the same bandwidth as the current solution.

Perf/mm has been harped on when comparing it to GK104, and the GDDR interface for Tahiti takes a rather large amount of die space. That shrinking fraction of the die and its contribution to Tahiti's bulk could have lead to a different outcome on that metric, if nothing else.
I wouldn't think changing that minor debate point would justify the effort.

edit:
The power savings from some kind of HBM setup might have been more noticeable.
 
Do you think it would save a lot of area? I really don't know...
Power should indeed be better.

In fact, do we have any kind of hint about how much the MCs consume overall on a GPU?
 
Do you think it would save a lot of area? I really don't know...
Power should indeed be better.

In fact, do we have any kind of hint about how much the MCs consume overall on a GPU?

I can't remember where or if I saw a direct comparison.
One rough idea of the area involved is the interface area for the VRAM used for the PS Vita.

http://www.ecnmag.com/news/2011/03/samsung-wide-io-memory-mobile-products-deeper-look
This had the area of the DRAM at 64.34mm2.

http://chipworksrealchips.blogspot.com/2012/07/sonys-ps-vita-uses-chip-on-chip-sip-3d.html
I used an annotated image from there that outlines the rough area of the interface.

With some very rough pixel counting, it's something like 6.63 mm2 for the interface.


Tahit's GDDR5 interface, from my MS Paint pixel counting, is roughly 20% of the die shot, so it's about 77 mm2.

For Tahiti, using the GHz edition, it's 3.7 GB/s/mm2.
For the first gen WideIO, it's around a less-impressive 1.9 GB/s/mm2.

Crossover assumes one of the proposed next generation of Wide IO interfaces with DDR signalling and higher speeds kicks in.
The more modest proposal increases bandwidth by 2.6x to 266 Gbits/s.
Another had 1 TBit/s, which is past the crossover point, assuming the pad area doesn't increase too massively.

The more conservative option would give 4.9 GB/s/mm2 for the stacked version.
The TBit/s would sway things further in favor of WideIO, if such a scheme came to frution.

Granted, I've made a bunch of assumptions about the applicability of the Vita's face-to-face microbump method of using TSV RAM to this example. It's a real-world example with a 40nm bump pitch.
The proponents of TSV and interposer RAM claim a pitch a quarter of that is possible, but that's all in powerpoint slides.

The other side is that I am being pretty conservative about the annotated area of the interface, and this is not including the size and complexity of a GGDR5 controller geared for high clocks versus a wider, slow DDR bus. (edit: I am not sure which way that would go.)


The following paper puts a little over a third of the Radeon 6990's power budget in the memory controllers and DRAM.
http://www.cse.psu.edu/~juz138/files/islped209-zhao.pdf

A third of Tahiti's 250W TDP is 83 W.

In this arena, the existing low-power WideIO provides 12.8 GB/s with a stacked variant drawing 367mW. The area wouldn't be acceptable, but power would be saved with a little over 8W for DRAM. We could double or more to account for the MC power draw and be below 80W.
 
Last edited by a moderator:
The really interesting part in there is the mention of Tiran, the slide showing that it was meant for stacked memory, and of course the question raised: what the hell happened to it?
It got scrapped and was later picked up by NVIDIA and released with some changes. We now know it as the Titan.

(Sorry, couldn't resist.)

Anyways…

The questions are:
1. How does the author know that Nvidia isn't experimenting with this stuff as well (behind closed doors, that is)?
2. How does the author know that Nvidia releasing stacked memory products only in 2016 is a technical and not an economical/political decision?
3. If AMD has this technology ready and had it ready for Tahiti - why did they not use it? Would be quite stupid, not to.
I have another question:
4. Why did AMD plan stacked memory for Tiran in the first place, which was scheduled for around a late 2012 release? I'm not sure if it was to get significantly more bandwidth, unless it was to be a massive chip, since (AFAIK) Tahiti doesn't seem to be particularly bandwidth starved. Maybe for power reasons as mentioned above?

Also (related to #2), I wonder if NVIDIA isn't going to use stacked memory on Maxwell because they can get the necessary bandwidth with the usual GDDR5. A 512-bit bus and 7 Gbps memory gives 56% more bandwidth than the Titan, which might be enough for a "big" Maxwell. (This comparison ignores power and other considerations.)
 
Back
Top