Nvidia Volta Speculation Thread

To my knowledge, no Nvidia card in the last 3+ generations has failed to reach its turbo clockrates. Any reason to believe V100 will be any different?
It all depends on various variables - even gaming loads can push NVIDIA cards even below their so called "baseclock" occasionally, let alone under Boost-clocks
 
It all depends on various variables - even gaming loads can push NVIDIA cards even below their so called "baseclock" occasionally, let alone under Boost-clocks
I don't remember which benchmarker it was, but some site tried their whole lineup of benchmarks and used the lowest clock their testing card showed, as a baseline for their benchmarks. They came out with the promised base clock by NVIDIA. Which loads do you mean?
 
To my knowledge, no Nvidia card in the last 3+ generations has failed to reach its turbo clockrates. Any reason to believe V100 will be any different?
I am sitting right next to a Maxwell-based Titan X that is settles to running at 1.002 MHz in CoD: Modern Warfare Remastered, so yes: I have reason to believe, that under full load turbo clocks are more or less... let's say not telling the whole truth.

Any reason to doubt the various performance numbers provided so far?
Yes, because they are marketing from one IHV and as such they are selected to show the new products in a

SThe comparison in terms of Turbo clockrates would have yielded the same results. So my point still stands. If you want to compare oranges to my stated apples, be my guest, but please telling me afterwards my arguments wouldn't make sense, when you are the one throwing oranges into the mix.
I want to compare real world achieved sustained performance instead of spreadsheet metrics. For BOTH cards, so no apples vs. oranges, as you keep misunderstanding or want to force into my argument for some reason.
 
I am sitting right next to a Maxwell-based Titan X that is settles to running at 1.002 MHz in CoD: Modern Warfare Remastered, so yes: I have reason to believe, that under full load turbo clocks are more or less... let's say not telling the whole truth.
What does the first TitanX has to do with P100 or V100? Nothing?

Besides I guess the card is throttling due to high temperature. Thats not an issue in the server. It will just get louder ;)

I want to compare real world achieved sustained performance instead of spreadsheet metrics. For BOTH cards, so no apples vs. oranges
Then why come around with an orange Kepler-card? ;)

As long as we are speculating and there is no V100 owned by a third party, we have to relay on spreadsheet. If you want to doubt the spreadsheet, do you have arguments for that? So do you have benchmarks where the P100 cards dips way down in clock speed due to power and/or thermal issues?
 
I don't remember which benchmarker it was, but some site tried their whole lineup of benchmarks and used the lowest clock their testing card showed, as a baseline for their benchmarks. They came out with the promised base clock by NVIDIA. Which loads do you mean?
Sadly I can't seem to find it now, it was one of the "regular sites" (aka not some unknown no-name site from wherever)
What I do remember was a graph with clocks per game for their test routine, and in one game the clocks actually dropped below baseclock. I'll try to dig around if I can find it, but it's hard when I'm not even sure if it was 980, 980 Ti or 1080 or something else
 
I am sitting right next to a Maxwell-based Titan X that is settles to running at 1.002 MHz in CoD: Modern Warfare Remastered, so yes: I have reason to believe, that under full load turbo clocks are more or less... let's say not telling the whole truth.

Well that's most definitely not the norm. Unless there's something wrong with Anandtech's review methodology or the games they test?

http://www.anandtech.com/show/9059/the-nvidia-geforce-gtx-titan-x-review/16

I also happen to have a 960 and a 1060 that always always always run above boost clocks... so let's just say I don't agree with your wild generalization and strawman argument.

Yes, because they are marketing from one IHV and as such they are selected to show the new products in a

History tells that those claims are believable, based on previous launches. Besides I don't think you can do much one way or another to show SGEMM in a good light...

So there's a lot of evidence that points to an increase in performance and perf-per-watt. There's none pointing to the contrary, or at least you have provided nothing. "I am an skeptic until something fully proven" is not evidence. This is a speculation thread, if you are going to dispute someone's claim that "there appears to be a perf/watt increase", which is based on available evidence, I think that you should provide evidence of your own rather than just dismissing anything and everything that you choose not to believe.
 
It all depends on various variables - even gaming loads can push NVIDIA cards even below their so called "baseclock" occasionally, let alone under Boost-clocks

The point is not even if the clocks go down in some workloads. What matters is if Volta behaves differently than previous generations, and so far there's absolutely nothing that even suggests this is the case.
 
Well that's most definitely not the norm. Unless there's something wrong with Anandtech's review methodology or the games they test?
http://www.anandtech.com/show/9059/the-nvidia-geforce-gtx-titan-x-review/16
I also happen to have a 960 and a 1060 that always always always run above boost clocks... so let's just say I don't agree with your wild generalization and strawman argument.

I don't know about the scenes Anand's testing, so I cannot assert if there's something wrong. As with all reviews, they test a certain sample of available games and applications and thus cannot catch all possible behaviours - I guess that goes without saying.

What I can say is that in our launch review we had the Titan X (Maxwell) already clocking under it's typcial boost in Risen 3 for example:
http://www.pcgameshardware.de/Gefor...orce-GTX-Titan-X-Test-Benchmark-1153643/2/#a1
(in case it is not obvious: When free boost is slower than forced standard clocks, it's below typical boost). Besides that, I just provided you with an example of card and case that's exhibiting this behaviour right now (Titan X-M vs. CoD:MW remastered).

You call that strawman - be my guest.

History tells that those claims are believable, based on previous launches. Besides I don't think you can do much one way or another to show SGEMM in a good light...
History tells that marketing material is carefully selected most of the time.

So there's a lot of evidence that points to an increase in performance and perf-per-watt. There's none pointing to the contrary, or at least you have provided nothing. "I am an skeptic until something fully proven" is not evidence. This is a speculation thread, if you are going to dispute someone's claim that "there appears to be a perf/watt increase", which is based on available evidence, I think that you should provide evidence of your own rather than just dismissing anything and everything that you choose not to believe.
For starters: I did never say, performance per watt would drop ("to the contrary") nor did I say, there'd be no performance increase per watt („dispute someone's claim that "there appears to be a perf/watt increase"“). Instead I said „I want to compare real world achieved sustained performance instead of spreadsheet metrics. For BOTH cards, so no apples vs. oranges, as you keep misunderstanding or want to force into my argument for some reason.“

I only caution against taking +50% perf/watt for granted when all the „evidence“ for it is the spec sheet number of peak boost throughput, in fact, you read it for yourself here:
https://forum.beyond3d.com/posts/1989770/

The point is not even if the clocks go down in some workloads. What matters is if Volta behaves differently than previous generations, and so far there's absolutely nothing that even suggests this is the case.
Of course: Contrary to all the generations before, they did not state a base clock for GV100 yet. Maybe an oversight, maybe it's a number that sounds a little too low for the marketing.
- and that's all I've been trying to tell you guys in this thread: It all depends on how much the clocks go down under full load. When we know that, we can estimate how much of an efficiency improvement Nvidias has achieved coming from GP100. Whether it's 50%, 5% or 150%.

The funny thing is: I don't care. I just want numbers that are not based on some crack pipe dream, but hard numbers from real sustained workloads.
 
Last edited:
Navi won't be alone...
Eh, I don't know. MCMs make the most sense for the GPU designers and retailers because they are more cost efficient (lower risk, more profit). They are not quite as nice for the customer.... lower performance and lower performance/watt vs big chips. The math that needs to be done is whether the power savings (both direct and cooling) translate to a cost savings for the consumer over the expected life of the product. If it does, and if Nvidia knows it is going to continue to have enough consumers which are willing to pay the additional upfront cost required to offset the additional design expense and time investment, then it is probably worthwhile for Nvidia to keep making massive chips.

On the other hand, in the absence of (meaningful) competition, Nvidia might choose the MCM only path purely for short-term financial reasons (greed), but that also carries some risk with it... Not to mention you lose a bit of the halo effect. I would guess large chips will continue to stick around for the foreseeable future. Though the 800+ mm^2 of GV100 is likely a one-off.
 
Nvidia's MCM concept is comparing its MCM versus an unbuildable monolithic GPU. If we go with their projections, the performance per mm2 of silicon is inferior to having it all on one chip, but they can throw much more silicon at it than can be done otherwise.
I haven't read through the paper in-depth, but for comparison in terms of interconnects, Nvidia's power numbers for PCB signalling seems to be in line with AMD's inter-socket GMI links. Nvidia projects 10pj/bit and AMD's EPYC slides had a link at 9pj/bit.

The on-package interconnect is projected to use improvements in organic substrate manufacturing and Nvidia's ground referenced signalling to achieve .5 pj/bit. For comparison, EPYC's on-package links manage 2 pj/bit.
It's somewhat difficult to get a clear picture of AMD's interposer-based link energies. HBM was noted as being ~6-7 pj/bit for some of the comparisons to GDDR5, but I am not sure if the values are comparable.

AMD's proposed future scheme has active interposers for its chiplets, but then some undetermined signalling over an MCM. There are NOC and interposer-based projects that could drop to .3 pj/bit, but AMD's proposal seems less concrete than Nvidia's.

Both Nvidia and AMD are proposing silicon dis-integration strategies. As Nvidia is moving some of its otherwise duplicated silicon to a separate system/IO die.
 
Nvidia's MCM concept is comparing its MCM versus an unbuildable monolithic GPU
No it isn't. GV100 is already 84 SMs at "12" nm..... 128 SMs at 7nm is certainly buildable.
 
Last edited:
No it isn't. GV100 is already 84 SMs at "12" nm..... 128 SMs at 7nm is certainly buildable.

The multi-GPU baseline is 2x 128 SM monolithic chips. Nvidia's projection seems to be saying that 256 SMs in an MCM is as feasible as creating a 2x monolithic setup.
The question then is how far do both sides drop if cutting things in half.
 
You call that strawman - be my guest.

it's a strawman, because you are taking a few isolated cases and acting like that's evidence of normal behaviour.

History tells that marketing material is carefully selected most of the time.

Not really in Nvidia's case for Maxwell and Pascal, since pretty much all claims were spot on. And for V100 they are claiming up to 80% increase in common FP32 workloads. One thing is for PR to be creative and a completely different thing is to outright fabricate the numbers. So again, like Leier said, maybe the reality is not as good as what PR portrayed, but it shouldn't be too far. And that increase is not coming from the node alone, because the node offers at most 25%, and that too is "carefully selected marketing material" from TSMC. So unless you are claiming that V100 will miss the mark, by more than 25%, it clearly is more efficient.

Of course: Contrary to all the generations before, they did not state a base clock for GV100 yet. Maybe an oversight, maybe it's a number that sounds a little too low for the marketing.
- and that's all I've been trying to tell you guys in this thread: It all depends on how much the clocks go down under full load. When we know that, we can estimate how much of an efficiency improvement Nvidias has achieved coming from GP100. Whether it's 50%, 5% or 150%

Sure, only then we will be able to tell how much it increased, no one has disputed that, but we don't need that in order to know that it did increase significantly. Unless you're claiming that clocks are so low that V100 performance is equal to P100 performance, despite Nvidia advertising >50% performance improvements.

The funny thing is: I don't care. I just want numbers that are not based on some crack pipe dream, but hard numbers from real sustained workloads.

Wrong thread. If you only want hard numbers avoid speculation threads. And I wouldn't say that official specifications are "numbers based on some crack pipe dream", but maybe that's just me.
 
[QUOTE="CarstenS, post: 1990661, member: 528]I only caution against taking +50% perf/watt for granted when all the „evidence“ for it is the spec sheet number of peak boost throughput, in fact, you read it for yourself here:
https://forum.beyond3d.com/posts/1989770/[/QUOTE]
Maybe you should start reading other people posts. After taking those perf/watt improvements from NVIDIA for "granted" i wrote:

"Maybe that is exaggerated, but even if it is only 30 or 40 percent, that is an huge increase."

So you mean your whole posts just depends on your lack of reading my post?

[QUOTE="CarstenS, post: 1990661, member: 528]Titan X (Maxwell)[/QUOTE]
Different architecture, different manufacturing process - there is no proof. Especially since NIVIDIA has optimized Pascal for higher clocks.

And: I don't think that NVIDIA is stretching the truth very much, when they give out tech specs for Compute Cards. That would be a disaster for them.
 
For relax a bit the atmosphere,if the conceptt is different, i have find funny some similarity between both design ... even more when i remember the exascale system presented by AMD some times ago ( who will more related to MCM ). OFC AMD was not directly propose a MCM chips, but i retain that MCM can allow thi on paper )

anyway ..

epyc_tech_day_first_session_for_press_and_analysts_06_19_2017-page-015_575px.jpg
NVIDIA-MCM-GPU-With-GPC-and-DRAM-Dies_3-1030x679.png


AMD exascale APU with stacked DRAM ( TOP-PIM )
21065703899l.png
 
Last edited:
No, i think they are talking about a 256 SMs monolithic GPU (unbuildable), not a 128 SMs one...
They look at both in the paper. But the unbuildable comparison is rather meaningless and uninteresting. Any solution wins vs a non-existent one...
 
The unbuildable one is merely a reference point to give readers an idea on how much the efficiency loss of such package comparing to a single chip with the same CUDA core count, I don't know what's the point to pick this up.

And I would be rather disappointed if Nvidia could only manage to put 256SMs (assuming same cuda core count as GV100/GP100) into this solution, since the density of TSMC's 7nm process is 3.3X of its 16nm/12nm counterpart, assuming everything remain the same, just scale up a GV100 will give you more than 200SMs on 7nm process with a die size of 600mm^2. Such solution could drive manufacturing cost down, but given Nvidia's track record, just don't expect they will charge you any less.

Also, from a programming point of view, multiple big GPUs is kind of easier to program than multi small GPU packages (currently, a single GPU workstation can host up to 10 dual socket GPUs, so even if you could manage to put 256SM into a single package, to achieve high single-node performance you still need multiple gpu package/cards), the latter adding another layer of complexity: NUMA-awareness, which could be a headache to deal with.
 
Last edited:
Back
Top