NVIDIA Maxwell Speculation Thread

I was refering to the geometry shader, not the overall performance.
I think that's mostly a result of a larger output buffer, but fundamentally it works the same. That's more of a tweak rather than an architectural change.
 
GT200 has improved geometry shader, performance is way higher than G80, register file size is doubled etc. G80 is Compute Capability 1.0, GT200 is Compute Capability 1.3. GT200 fully decodes H.264 in hardware while G80 didn't. It's not just more units, it's very different internally almost everywhere.

http://www.anandtech.com/show/2549/5

Also GT2xx GPUs can do something that G8x/G9x GPU can't do.

You're following the discussion here aren't you? If yes it shouldn't be too hard to understand my reasoning. I don't need any details on each architecture, I'm well enough informed thank you.
 
High level changes are rather in a NV30---------G80-------Kepler-------etc. cadence (while Fermi being somewhat an in between case sample). Anything between G80 and Fermi were based on a similar architectural backbone. On hindsight were are the "high level improvements" at the competition? Does GCN sound to you like something they'll overhaul all that soon?

Wasn't GT200 just a G80 with more units and a handful of FP64 units added to the mix under that very same reasoning?

By "high-level changes" I mean "improvements that change enough stuff to be obvious on the simplistic block diagrams that IHVs are willing to disclose". By this definition, all recent NV architectures featured high-level changes.

GCN has already seen big front-end changes. What will happen over the next year is an open question.
 
Changes in SM is much easier then changes in geometry flow because you can keep interfaces similar or even same. GT200 was smaller change but took longer then planned because it was not high priority project initially. Fermi was as big a change as G80. Very complex. Maybe a bit too much in one go. Kepler was easier.
 
By "high-level changes" I mean "improvements that change enough stuff to be obvious on the simplistic block diagrams that IHVs are willing to disclose". By this definition, all recent NV architectures featured high-level changes.

GCN has already seen big front-end changes. What will happen over the next year is an open question.

I would expect the current interconnect and GPCs to remain for at least a couple of generations past Kepler; I've been also told that it's not likely that we'll see changes for SIMD32 configs in their ALUs any time soon. Apart from that there aren't going to be any huge fireworks IMHO and as long as efficiency rises as expected with every new generation I don't think many will complain. What they could change is to decouple ROPs from the MC, but I don't think that it belongs to something people would consider as "important".
 
GT200 was supposed to fill the ~18 month gap to the Fermi launch in late '09 that, as we all know, didn't happen. The strategy was similar to the DX9-DX10 transition, where G70 played the same role as GT200 -- both architectures were simply more of the same from the preceding generation, not a major invention. A straightforward performance upgrade with few small arch tweaks.

Now, the situation with Fermi is a bit different, me thinks. Kepler is definitely a bit more than a performance scale up.
 
Last edited by a moderator:
Kepler would had been a smaller "surprise" if 32nm wouldn't had been canned.
 
By "high-level changes" I mean "improvements that change enough stuff to be obvious on the simplistic block diagrams that IHVs are willing to disclose". By this definition, all recent NV architectures featured high-level changes

There's some stuff I would expect to see on Maxwell (the hierarchical scheduling and register file stuff that Dally and others wrote a paper on) that I would count as a pretty major change. I also remember seeing a presentation by Dally where he said that he's been trying to push for a different kind of network-on-chip at NVidia, so maybe something will happen there? Though, Ailuros says no and seems to have inside info.
 
Changes to the interconnect, which I understand to mean the network(s) by which different elements on the chip communicate. This sounds like it's a cross-bar in current NVidia chips, but as far as I know, there are topologies (e.g. flattened butterfly) that have better energy delay products.
 
Changes to the interconnect, which I understand to mean the network(s) by which different elements on the chip communicate. This sounds like it's a cross-bar in current NVidia chips, but as far as I know, there are topologies (e.g. flattened butterfly) that have better energy delay products.

Out of curiousity is there an estimate how much they could gain in terms of power with something like a flattened butterfly?

Any relative literature which a layman like me would understand about 25% of? :)
 
I'm a hardware layman as well. I was thinking of this link (NoCs are discussed at the end). But it turns out I misremembered, and the comparison is between a mesh and flattened butterfly. So, err... ignore me. Perhaps someone with actual HW design experience can set me straight?
 
Judging by the "CUDA density" on GM107, it is not hard to expect a 20nm GM110 is likely to accomodate ~6500 CUDA cores (thats based on the assumption that they dont try to fit in a nonsense denver/ARM cores into their high end Maxwells).

At least they keep the cuda core counts in each SM the same as Kepler family, and possibly keep SMEM(despite the rumor says otherwise which I doubt)/regs/instruction issue units the same as Kepler as well, save some work on tunning and optimizing current codes for upcoming Maxwell.
 
I'm a hardware layman as well. I was thinking of this link (NoCs are discussed at the end). But it turns out I misremembered, and the comparison is between a mesh and flattened butterfly. So, err... ignore me. Perhaps someone with actual HW design experience can set me straight?

Considering Dally speculated a dragonfly for the hypothetical exascale machine: http://forum.beyond3d.com/showpost.php?p=1825220&postcount=673 for 2017 I'd say that we still have time until we see as sophisticated interconnects. Is it even clear how the interconnect in current architectures looks like?
 
Judging by the "CUDA density" on GM107, it is not hard to expect a 20nm GM110 is likely to accomodate ~6500 CUDA cores (thats based on the assumption that they dont try to fit in a nonsense denver/ARM cores into their high end Maxwells).

Did you also count 64 FP64 units per SMX in that math? Because anything GM10x obviously will be limited again to just a fraction of FP64 units compared to the top dog.
 
Judging by the "CUDA density" on GM107, it is not hard to expect a 20nm GM110 is likely to accomodate ~6500 CUDA cores (thats based on the assumption that they dont try to fit in a nonsense denver/ARM cores into their high end Maxwells).

Why would Denver 64-Bit Custom ARM be nonsense in the very high-end?

It would seem like that is where it will make the most sense.

Even if the number of CUDA cores are reduced that is more than made up (and probably greatly exceeded) by replacing an Intel CPU with another GM110.
 
Why would Denver 64-Bit Custom ARM be nonsense in the very high-end?

It would seem like that is where it will make the most sense.

Even if the number of CUDA cores are reduced that is more than made up (and probably greatly exceeded) by replacing an Intel CPU with another GM110.

The entire point of the Denver cores is to allow you to avoid having to transfer stuff over the PCIe, since you would pretty much directly port the CPU portions over to running on the Denver cores.
 
Back
Top