I think he meant: bad performance despite large increase in bandwidth.R600 re-run style? R600 didn't bring new memory type?
X1950 already used GDDR4 and GDDR5 didn't come 'till RV7xx
I think he meant: bad performance despite large increase in bandwidth.R600 re-run style? R600 didn't bring new memory type?
X1950 already used GDDR4 and GDDR5 didn't come 'till RV7xx
I think we can confidently say that NVidia will have no choice but to abandon GDDR5 for the high end gaming market. It's a dead end. It sucks up way too much power for a start (10s of watts on current cards when GPU interfacing + memory power is all added up). And it can't remotely compete in terms of raw bandwidth. It's just a matter of time.
The more interesting question, to me, is whether AMD will fuck it up on the first go, R600 re-run style.
The only thing I could think of to escape the 96 ROPs would had been to decouple ROPs from the MC but I severely doubt it and I don't even think they'll do it anytime soon.
Maybe you shouldn't be thinking that it has twice the ROPs compared to GK110, but 32 more compared to GM204?
GK104 --> GM204 = 100% more clusters (yes smaller blahblahblah), same amount of FP64 units, twice the ROPs, same amount of TMUs, 3.54b ---> 5.2b
GK110 --> GM200 = 60% more clusters (again smaller clusters), 60% more FP64 units, twice the ROPs, 240 vs. 192 TMUs, 7.1b ---> ~8.0b?
As I said I'd like to stand corrected, since as a layman that kind of stuff is pure and quite uneducated speculation, but it's also my understanding that TMUs are quite expensive units. Having 20% less in the latter hypothetical case isn't exactly something to sneeze over either.
It's one thing to increase density when changing architecture, but I don't understand why a gm200 would have better transistor density than gm204. If they can push gm200 to Hawaii densities, then why didn't they do so for gm204 as well.
For actual size: a gm204 is around 400mm2. For gm200, 384 bits MC is a given, and 20 SMs wouldn't make a lot of sense, so let's say 24 SMs as well. That is +50% in both cases. That leaked cache size shows 3MB: another +50%. A naive pure 50% scaling of gm204 results in a die size of 600mm2. That's the number you have to start with. If it's 560mm2, one way or the other, they're going to have find 40mm2 by scaling texture and ROP units less than +50%, yet still add extra FP64 units as well?
I just don't see it.
And then I believe they'll jump to Pascal (GP100) for that market (followed by Volta), rather than use a 16FF Maxwell refresh.
Pascal would probably be used on really high end gaming products (Titan, Titan Z) but possibly not on a "geforce GTX 1080 Ti" kind of card. I speculate "regular high end " gaming cards will still use GDDR5 rather than HBM memory (or which it is). If Pascal can only use the stacked-memory dies and not GDDR5 (no idea, flexible memory controllers are theoretically possible) then you need a GPU such as GM200, possibly followed by a similar GM300.
(That "GM300" for a 16FF Maxwell is just a putative name, maybe it'd be about a Pascal with less features and branded as a Pascal, giving you a situation like GK104 vs GK110. Maybe that last parenthesis of mine is useless : if "GM300" comes out before Pascal then it shall be named "GM300", "3rd generation Maxwell")
NVIDIA's high-end GPUs have all stayed within 250 W (listed) TDP so I presume that power savings from HBM can be transferred to higher core clocks or more SMs.I think power consumption is usually not the highest concern when they're building a high-end graphics card...
If 64 ROPs are justified for GM204, then I don't see why 96 ROPs aren't justified for a chip with 50% more bandwidth.I dont see them decoupling the ROPs and MC either. I speculated that they could reduce the number of ROPs per MC but this does not seem too feasible either. True..if you look at it that way, it has 50% more ROPs than GM204. But from what I understand, ROPs are extremely bottlenecked by memory bandwidth so increasing the number of ROPs without increasing bandwidth will not always yield a performance benefit. So I wonder if the large increase in ROPs is justified.
Yes, density is determined by random logic vs cache vs analog blocks (PLLs, unique IOs, etc.) (And standard cell library and metal stackup, but let's assume those are constant within the same family.)Historically, the transistor density has always increased with chip size. If we take Kepler, the densities for GK107, GK104 and GK110 were 11.02 M/mm2, 12.04 M/mm2 and12.89 M/mm2. I suppose its because the bigger chips have a higher proportion of ALUs and cache as a percentage of overall die and these are denser.
Even disregarding what I said earlier, notice that gm107 to 204 is only increasing by .5 where it was 1 for equivalent Kepler. So a naive conclusion for Maxwell would be 0.5 for 204 to 200 as well.The figures for Maxwell are GM107 with 12.63 M/mm2 and GM204 with 13.06 M/mm2. If we apply the same scaling as GK110 vs GK104 for GM200 vs GM204, GM200's density should be ~13.98 M/mm2.
Yes, that sounds right. And, while 64 ROPs may sound a bit like overkill, don't forget the chip has the 4x16 pixel rasterizer (and enough shader export) to go along with it. And for the typical bandwidth limited nature of ROPs, certainly the improved frame buffer compression helped too. Plus, some operations are very slow with nvidia's ROPs and never even close to being bandwidth limited (fp32 rgba blend), so doubling up would help there too (instead of having more complex ROPs).NVIDIA's high-end GPUs have all stayed within 250 W (listed) TDP so I presume that power savings from HBM can be transferred to higher core clocks or more SMs.
If 64 ROPs are justified for GM204, then I don't see why 96 ROPs aren't justified for a chip with 50% more bandwidth
A good question... I suspect it will be the same arrangement as other GM2xx chips (hence 48). Assuming 2 GPCs that would be more ROPs than really needed but not too bad (certainly not as much overkill as some older chips). But all things considered that's probably a better tradeoff than having "not enough" ROPs (and more in tradition with nvidia's past chips).Also, since I'm curious… if the GM206 has a 192-bit bus, do you think it will have 24 or 48 ROPs?
You're saying GDDR5 is a dead end for high-end cards because it sucks too much power?
I think power consumption is usually not the highest concern when they're building a high-end graphics card...
The reason behind not deploying Maxwell in Tesla Accelerators is said to be the lack of FP64 Floating Point Units and additional double precision hardware. And the reason behind not including DP FPUs in Maxwell might have to do with the extremely efficient design that NVIDIA was aiming for. This however means that NVIDIA’s upcoming Maxwell core, the GM200 which is the flagship core of the series might just remain the GeForce only offering unlike Kepler which was aimed for the HPC market first with the Titan supercomputer and launched a year later after the arrival of the initial Kepler cores as the GeForce GTX Titan.
Since the DP FP64 FPU hardware blocks will be removed from the top-tier cards that are rumored to arrive next year, they will include several more FP32 FPUs to make use of the remaining core space and that means better performance for the GeForce parts since games have little to do with Double precision performance.
Ailuros, read the article -.-
The source is Kenichi Hayashi (NVIDIA platform business headquarters Director), based on a presentation made by him at a conference. Not some rumour from Chiphell...
Since the DP FP64 FPU hardware blocks will be removed from the top-tier cards that are rumored to arrive next year, they will include several more FP32 FPUs to make use of the remaining core space and that means better performance for the GeForce parts since games have little to do with Double precision performance.
It's probably just me that FP32 SPs don't get just randomly added into an architecture like Maxwell. They come in clusters ie SMMs, for which clusters come with area heavy units like TMUs.
If GM200 is indeed lacking most of the FP64 units then surely the most likely scenario is that NV has just chosen to add more of the same SMMs that we see in GM204 instead of changing the SMM itself.
I do have to say though, personally I'd be thrilled to see a big NV chip that's not also meant for teslas etc. If NV thinks that it's cost effective to support two different big dies for different markets then that's great news. I have my doubts though.
If GM200 is indeed lacking most of the FP64 units then surely the most likely scenario is that NV has just chosen to add more of the same SMMs that we see in GM204 instead of changing the SMM itself.
I do have to say though, personally I'd be thrilled to see a big NV chip that's not also meant for teslas etc. If NV thinks that it's cost effective to support two different big dies for different markets then that's great news. I have my doubts though.
The consensus seems to be that if it indeed not made for FP64 support, they will not do one before next architecture.
This said, it seems everything is based on the lack of information about GM200 in this tesla presentation.
If the leap is not big enough, they have maybe just choose to dont speak about maxwell on this presentation.. and instead speak about the future ones.
Hardware wise, thoses company dont change their mind on last minutes, so whatever is planned is planned since a long time.
Hey, GF100 and GF104 had different SMs cluster sizes.. 32 vs 48. Why can't they do that again? It would be in the opposite direction sure but it's not like nvidia is shy of doing unexpected changes, quite the contrary (first 384bit memory bus in G80, double pumped SPs on three generations, followed by its removal, odd number of clusters in GK110, etc..). nVIDIA does whatever it's needed to achieve their performance goals, so I would not be surprised to see quite some changes in GM200, rather than just an upscaled GM204. It should support exactly the same features tough.
Yes, but the elephant in the room is GK210. Why do it if you would have a great GM200 coming in?
They might have been constrained by 28nm to the point the increase in DP computing power would be less than what would be possible with an optimised Kepler core (GK210) in a dual chip card. They could have compared the numbers in advance and since Kepler is already a mature architecture, it could have made more sense and safe to do that, rather than pushing for a larger GM200 chip on a new architecture.