NVIDIA Maxwell Speculation Thread

AnarchX · Feb 15, 2014

http://forums.laptopvideo2go.com/topic/30761-inf-v4146/

Can anybody explain why GM107 is listed as DEV.13 and DEV.17?
In the past at least the first two digits of the device ID stayed the same and only the last two digits changed for the different SKUs. Maybe two different GM107 - different fabs?

UniversalTruth · Feb 15, 2014

Alexko said:
Or maybe just the process, which would still increase performance per watt

they say second half of the year for the second generation - huh, I thought 20 nm would be available early next year...

anyways, there should be architectural improvements also present by then

constant · Feb 15, 2014

Which press conference are the images coming from?

I'm guessing that for mobile an 128 "core" SMM is more suitable than 192 cores, now they have smaller building blocks to build mobile solutions out of.

Remember that most of the low end mobile and desktop parts in the Kepler gen. were based off of rebranded Fermi parts.

This configuration makes much more sense. And hopefully we'll see an improvement in the amount of SMEM / FPU ratio aswell!

Can't wait for a whitepaper!!

AnarchX · Feb 15, 2014

CUDA 6.0 RC developer driver: http://forums.laptopvideo2go.com/topic/30763-v33259-windows-8-32bit-nvidia-mobile/

NVCUDA.dll contains interessting strings:
GK180 (was supposed to be the GPU of Tesla K40, but why has it an own codename in CUDA?)
GK210 (as mentioned above a SM_37 Kepler)
SM_52 (GM2xx Maxwell compute capability?)

UniversalTruth · Feb 15, 2014

GK210 is not for this thread but this is supposed to replace GK110, right? And if so, the GM200 is still ages from being released, so is 20 nm too

Alexko · Feb 15, 2014

UniversalTruth said:
they say second half of the year for the second generation - huh, I thought 20 nm would be available early next year...

anyways, there should be architectural improvements also present by then

I think 20nm is available now, it just doesn't perform very well.

UniversalTruth · Feb 15, 2014

I meant available in a state suitable for mass production and release but I guess at current pricing and yields it is still a no go option

A1xLLcqAgt0qc2RyMz0y · Feb 15, 2014

UniversalTruth said:
GK210 is not for this thread but this is supposed to replace GK110, right? And if so, the GM200 is still ages from being released, so is 20 nm too

Well Nvidia is making something in 20nm.

A senior TSMC executive revealed recntly that the company will begin 20nm production in the first quarter of 2014, contributing to the company's revenue in the following quarter.

Industry sources said TSMC's 20nm production capacity has been booked up with orders from industry giants including Apple Inc., Qualcomm Inc., Xilinx, Altera, Supermicro, NVIDIA, MediaTek and Broadcom Corp.

http://focustaiwan.tw/news/atod/201312050035.aspx

A1xLLcqAgt0qc2RyMz0y · Feb 15, 2014

NVIDIA Maxwell GeForce GTX 750 Ti and GTX 750 Official Specs Confirmed

http://wccftech.com/nvidia-maxwell-...ed-60watt-gpu-geforce-800-series-arrives-2014

NVIDIA GeForce 800 Series in Second Half of 2014

NVIDIA also confirmed during the conference that they are planning to introduce the GeForce 800 series which is fully based on the Maxwell architecture in second half of 2014. This means that we will see the proper high-performance GPUs such as the replacements for GeForce GTX 780, GeForce GTX 770 and GeForce GTX 760 in Q3 2014. We have already noted codenames of the high-end Maxwell chips which include GM200, GM204 and GM206, however NVIDIA didn’t mention what process they would be based on but early reports point out to 20nm.

iMacmatician · Feb 15, 2014

So the 660 is still in the lineup even after the 750 Ti and 750 will be released. I guess I was too "optimistic" in thinking the 750 Ti could close that gap, given the rumored performance numbers. It seems a bit strange to me why they'd have a 600 series part in the middle of a bunch of 700 series parts (the 650 I can understand since it's at the bottom).

DSC · Feb 15, 2014

That's why I believe GM206 will come quickly to replace GK106 which has reached the end of its lifetime already. Nvidia needs a more power efficient and higher performing part for the $200 market and GK106 isn't ideal anymore 2 years since it launched.

dnavas · Feb 15, 2014

Ailuros said:
What is really awkward in those charts is that they're comparing a GK110 cluster with a GM107 cluster; I do get the point the slide is trying to make even with 192 vs. 4*32. Are there no dedicated FP units in Maxwell or was some of the marketing guys just to overeager and thought 256 look "prettier" on the left?

I agree, this is odd [I assume by "fp" you're referring to the dp units]. Earlier I had asked where the dp units were and the logical response was that these units weren't in the block diagram in the previous release; now we're sitting here with these two charts which would seem to contradict that point. I don't think I buy the aesthetic or ignorance arguments -- not with TegraK1 throwing around the 192 term with abandon and the marketing team making crop circles. A more likely explanation would be the dilution of "192" as a magic marketing term, but even that seems like a stretch. So, are the smaller maxwell chips not dp-capable?

That seems likely at first blush -- we've violently agreed that there's no real market being fulfilled, so it seems completely reasonable to bifurcate your product line. But, then, why put the GK110 slide next to your GMxx7 slide? If there's no expectation for dp in your consumer product line, why raise the issue?

Another possibility is that these two models are somehow comparable, so the GM107 is capable of dp. Would it be possible that they made the alus capable of half-rate dp? Half of 128 alus is comparable to the 64 units in GK110, and presumably half-rate logic is cheaper by area and power.

Also curious -- the wording of increased performance per alu. Had the increase in performance been compared at the SMX to SMM level, an increase in utilization would be the reasonable assumption, but at the alu level, it implies the alu is capable of more. Can you issue separate mul & add, fp32 & int32, are there currently instructions that take more than one clock cycle to issue that can now get better throughput, or is there something else? I similarly find it odd that there is one scheduler and two dispatchers per 32 alus -- why does one need two dispatchers? 16 alu-wide dispatch? One external (tmu/sfu/???) and one internal (in which case, why co-locate them in the diagram)? Or is there some kind of co-issuing being done here? [Or, are those not dispatch units?]

Lots of questions, I wonder how many answers we'll get on Tuesday....

silent_guy · Feb 15, 2014

iMacmatician said:
It seems a bit strange to me why they'd have a 600 series part in the middle of a bunch of 700 series parts.

And this is exactly why rebranding an old part with a more consistent name that fits into the current product line makes so much sense.

tviceman · Feb 16, 2014

If the "new" information on die size (148mm^2) and transistor density (15% more dense than GK107) is to be believed here: http://videocardz.com/49557/exclusive-nvidia-maxwell-gm107-architecture-unveiled

Then that puts GM107 roughly at 12.67 million transitors per mm^2, for a total of 1.875 billion transitors. Compared to Bonaire, which has a die size of 160mm^2, Bonaire has 13 million transistors per mm^2 for a total of 2.08 billion.

Supposedly "GTX 480" like performance will make the 750 ti around 25-30% faster than the r260x, while consuming ~20 watts less on average loads. All of this is still to be seen to be realized (and believed) but IMO Maxwell is shaping up to be an even more impressive successor to Kepler than Kepler was to Fermi (on the graphics front).

lanek · Feb 16, 2014

So nobody find really strange that Nvidia is introducing a new architecture with a low end chip ? .. when i see 1st generation on the slides, all is said.

itaru · Feb 16, 2014

Probably this is technique adopted in maxwell.
A scheduler and a register file are layered, and sm becomes electric power saving in the study by optimizing it by software.
Probably it will become the hierarchy from 2 to 3 in maxwell.
In addition, various technique will be used.
Possibly the warp size may be changed.

There is the technique to divide big warp into small warp, and to handle.

A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors
https://research.nvidia.com/publica...r-file-energy-efficient-throughput-processors
http://www.cs.virginia.edu/~skadron/Papers/gebhart_tocs.pdf

silent_guy · Feb 16, 2014

lanek said:
when i see 1st generation on the slides, all is said.

I think I figured it out: there will be a second generation!

mczak · Feb 16, 2014

tviceman said:
Supposedly "GTX 480" like performance will make the 750 ti around 25-30% faster than the r260x, while consuming ~20 watts less on average loads.

The leaked (game) benchmarks do not really suggest that - more like 5-10% faster than r260x - mind you I would consider that quite impressive for a slightly less complex chip, with large disadvantages in raw numbers just about everywhere (mem bandwidth, alus, tmus). Power consumption though could potentially be more like a 40W advantage if that 60W TDP is any indication...
Maybe that's why AMD paper launched the r7 265 now, the gtx 750ti might beat the r7 260x but won't be able to touch the 265 (and MSRP is still quite close probably hence reviews comparing them). Of course power consumption as well as perf/w should be much better but (on desktop) this is just one factor. (On the power consumption front, if it's really that impressive I would believe they are using HPM otherwise this just seems too good to be true.)

silent_guy · Feb 16, 2014

About the HPM: K1 is using that and reaching a 900MHz(?) clock speed. That's really not that much slower than, say, a GK110 (at least if you ignore boost.) You'd also think that they have power efficiency in mind when they make a mobile chip and thus don't go to the limit in terms of choosing the most aggressive, power hungry standard cells, so they may leave some things on the table in terms of clock speed for HPM.

So much faster is HP really compared to HPM? Just 10%?

tviceman · Feb 16, 2014

lanek said:
So nobody find really strange that Nvidia is introducing a new architecture with a low end chip ? .. when i see 1st generation on the slides, all is said.

GK107 came out first with Kepler, so it really isn't strange. It's a sound strategy given the need to refresh their mobile boy's in both performance and perf/watt in the face of ever-improving iGPU's.

NVIDIA Maxwell Speculation Thread

Similar threads