NVIDIA Maxwell Speculation Thread

mczak · Feb 6, 2014

DSC said:
I was refering to the geometry shader, not the overall performance.

I think that's mostly a result of a larger output buffer, but fundamentally it works the same. That's more of a tweak rather than an architectural change.

Ailuros · Feb 6, 2014

DSC said:
GT200 has improved geometry shader, performance is way higher than G80, register file size is doubled etc. G80 is Compute Capability 1.0, GT200 is Compute Capability 1.3. GT200 fully decodes H.264 in hardware while G80 didn't. It's not just more units, it's very different internally almost everywhere.

http://www.anandtech.com/show/2549/5

Also GT2xx GPUs can do something that G8x/G9x GPU can't do.

You're following the discussion here aren't you? If yes it shouldn't be too hard to understand my reasoning. I don't need any details on each architecture, I'm well enough informed thank you.

Alexko · Feb 6, 2014

Ailuros said:
High level changes are rather in a NV30---------G80-------Kepler-------etc. cadence (while Fermi being somewhat an in between case sample). Anything between G80 and Fermi were based on a similar architectural backbone. On hindsight were are the "high level improvements" at the competition? Does GCN sound to you like something they'll overhaul all that soon?

Wasn't GT200 just a G80 with more units and a handful of FP64 units added to the mix under that very same reasoning?

By "high-level changes" I mean "improvements that change enough stuff to be obvious on the simplistic block diagrams that IHVs are willing to disclose". By this definition, all recent NV architectures featured high-level changes.

GCN has already seen big front-end changes. What will happen over the next year is an open question.

GpuMonkey · Feb 6, 2014

Changes in SM is much easier then changes in geometry flow because you can keep interfaces similar or even same. GT200 was smaller change but took longer then planned because it was not high priority project initially. Fermi was as big a change as G80. Very complex. Maybe a bit too much in one go. Kepler was easier.

Ailuros · Feb 6, 2014

Alexko said:
By "high-level changes" I mean "improvements that change enough stuff to be obvious on the simplistic block diagrams that IHVs are willing to disclose". By this definition, all recent NV architectures featured high-level changes.

GCN has already seen big front-end changes. What will happen over the next year is an open question.

I would expect the current interconnect and GPCs to remain for at least a couple of generations past Kepler; I've been also told that it's not likely that we'll see changes for SIMD32 configs in their ALUs any time soon. Apart from that there aren't going to be any huge fireworks IMHO and as long as efficiency rises as expected with every new generation I don't think many will complain. What they could change is to decouple ROPs from the MC, but I don't think that it belongs to something people would consider as "important".

fellix · Feb 6, 2014

GT200 was supposed to fill the ~18 month gap to the Fermi launch in late '09 that, as we all know, didn't happen. The strategy was similar to the DX9-DX10 transition, where G70 played the same role as GT200 -- both architectures were simply more of the same from the preceding generation, not a major invention. A straightforward performance upgrade with few small arch tweaks.

Now, the situation with Fermi is a bit different, me thinks. Kepler is definitely a bit more than a performance scale up.

Ailuros · Feb 6, 2014

Kepler would had been a smaller "surprise" if 32nm wouldn't had been canned.

psurge · Feb 6, 2014

Alexko said:
By "high-level changes" I mean "improvements that change enough stuff to be obvious on the simplistic block diagrams that IHVs are willing to disclose". By this definition, all recent NV architectures featured high-level changes

There's some stuff I would expect to see on Maxwell (the hierarchical scheduling and register file stuff that Dally and others wrote a paper on) that I would count as a pretty major change. I also remember seeing a presentation by Dally where he said that he's been trying to push for a different kind of network-on-chip at NVidia, so maybe something will happen there? Though, Ailuros says no and seems to have inside info.

Ailuros · Feb 7, 2014

I said no to what? And no no insider infos this time.

psurge · Feb 7, 2014

Changes to the interconnect, which I understand to mean the network(s) by which different elements on the chip communicate. This sounds like it's a cross-bar in current NVidia chips, but as far as I know, there are topologies (e.g. flattened butterfly) that have better energy delay products.

Ailuros · Feb 7, 2014

psurge said:
Changes to the interconnect, which I understand to mean the network(s) by which different elements on the chip communicate. This sounds like it's a cross-bar in current NVidia chips, but as far as I know, there are topologies (e.g. flattened butterfly) that have better energy delay products.

Out of curiousity is there an estimate how much they could gain in terms of power with something like a flattened butterfly?

Any relative literature which a layman like me would understand about 25% of?

xDxD · Feb 7, 2014

iMacmatician said:
From Videocardz: "NVIDIA Maxwell GM107 GPU pictured and detailed."

Cut.

Interesting: If you still don’t understand what is so special about Maxwell, just compare GK107-450 to GM107-300. Both GPUs will most likely consume the same amount of power, but GM107 has twice as much CUDAs.

psurge · Feb 7, 2014

I'm a hardware layman as well. I was thinking of this link (NoCs are discussed at the end). But it turns out I misremembered, and the comparison is between a mesh and flattened butterfly. So, err... ignore me. Perhaps someone with actual HW design experience can set me straight?

LiXiangyang · Feb 7, 2014

Judging by the "CUDA density" on GM107, it is not hard to expect a 20nm GM110 is likely to accomodate ~6500 CUDA cores (thats based on the assumption that they dont try to fit in a nonsense denver/ARM cores into their high end Maxwells).

At least they keep the cuda core counts in each SM the same as Kepler family, and possibly keep SMEM(despite the rumor says otherwise which I doubt)/regs/instruction issue units the same as Kepler as well, save some work on tunning and optimizing current codes for upcoming Maxwell.

Ailuros · Feb 7, 2014

psurge said:
I'm a hardware layman as well. I was thinking of this link (NoCs are discussed at the end). But it turns out I misremembered, and the comparison is between a mesh and flattened butterfly. So, err... ignore me. Perhaps someone with actual HW design experience can set me straight?

Considering Dally speculated a dragonfly for the hypothetical exascale machine: http://forum.beyond3d.com/showpost.php?p=1825220&postcount=673 for 2017 I'd say that we still have time until we see as sophisticated interconnects. Is it even clear how the interconnect in current architectures looks like?

Ailuros · Feb 7, 2014

LiXiangyang said:
Judging by the "CUDA density" on GM107, it is not hard to expect a 20nm GM110 is likely to accomodate ~6500 CUDA cores (thats based on the assumption that they dont try to fit in a nonsense denver/ARM cores into their high end Maxwells).

Did you also count 64 FP64 units per SMX in that math? Because anything GM10x obviously will be limited again to just a fraction of FP64 units compared to the top dog.

A1xLLcqAgt0qc2RyMz0y · Feb 8, 2014

LiXiangyang said:
Judging by the "CUDA density" on GM107, it is not hard to expect a 20nm GM110 is likely to accomodate ~6500 CUDA cores (thats based on the assumption that they dont try to fit in a nonsense denver/ARM cores into their high end Maxwells).

Why would Denver 64-Bit Custom ARM be nonsense in the very high-end?

It would seem like that is where it will make the most sense.

Even if the number of CUDA cores are reduced that is more than made up (and probably greatly exceeded) by replacing an Intel CPU with another GM110.

moozoo · Feb 8, 2014

If it does have a Denver core, could the graphics card run an X server...

itaru · Feb 8, 2014

yes
like a broadcom videocore's remote procedure call

Open ARM GPU drivers

Reverse engineering power management on NVIDIA
GPUs - A detailed overview

This is the approach that Broadcom chose for its Videocore
technology which is fully controllable by a clearly-documented
Remote Procedure Calls (RPC) interface. This allowed the
Raspberry Pi foundation to provide a binary-driver-free Linuxbased
OS on their products [34].

keldor314 · Feb 8, 2014

A1xLLcqAgt0qc2RyMz0y said:
Why would Denver 64-Bit Custom ARM be nonsense in the very high-end?

It would seem like that is where it will make the most sense.

Even if the number of CUDA cores are reduced that is more than made up (and probably greatly exceeded) by replacing an Intel CPU with another GM110.

The entire point of the Denver cores is to allow you to avoid having to transfer stuff over the PCIe, since you would pretty much directly port the CPU portions over to running on the Denver cores.

NVIDIA Maxwell Speculation Thread

mczak

Ailuros

Epsilon plus three

Alexko

GpuMonkey

Ailuros

Epsilon plus three

fellix

Ailuros

Epsilon plus three

psurge

Ailuros

Epsilon plus three

psurge

Ailuros

Epsilon plus three

xDxD

psurge

LiXiangyang

Ailuros

Epsilon plus three

Ailuros

Epsilon plus three

A1xLLcqAgt0qc2RyMz0y

moozoo

itaru

keldor314

Similar threads