Nvidia Volta Speculation Thread

Would that make much of a difference on gaming side of things? I was more interested in the fact that they have dedicated INT32 units alongside FP32, how much can nvidia cut down on the GV100 for a gaming chip?
It'd be highly situational, ALU heavy code with limited occupancy.

INT32 at full rate is more practical for addressing. Shift away from lower precision, hidden addressing units with unified memory as the addressing demands are growing. Larger pools and more frequent indexing into them.

I don't think they'd like to go more FP32 cores in the gaming chip than GV100 unless they're really hurting for the gpu performance crown, so 400-500mm2 chip perhaps.
Probably fewer FP32 units as they shrink the die inline with GP102. GV102 I'd have to imagine is the top gaming chip at nearly half the size of GV100. Would be interesting to see a cost analysis of the double exposure. I could see going smaller than GP102 to ensure 4 HBM2 stacks fit. Lot of bandwidth and capacity for pro variants. For pro and compute work, bandwidth and capacity would be more practical than raw TFLOPs.
 
So you know the Bytes*-per-FLOPS-ratio of GV102 already?

*both total and per second
That ratio need not be consistent. I'm only stating that some compute heavy workloads could warrant a higher ratio. A Titan tier chip could go 4 stacks just because it can. It could always look like a Vega10, but I'd guess it ends up more like P100 without FP64. Avoid the double exposure and make it fit within traditional dimensions. Perhaps even relaxed a bit for better yields, but keeping the Tensor cores where bandwidth is more a concern. V100 is so large it's essentially a new tier. Those HBM2 stacks wouldn't need to be clocked super high either. 8GB stacks at 75% or even less clocks would be very flexible and inline with existing models. Pairing Tensor disabled, cut chips with slower HBM2.
 
That ratio need not be consistent. I'm only stating that some compute heavy workloads could warrant a higher ratio.
Actually, you don't (or didn't), and it would not make any sense either, but anyway.

A Titan tier chip could go 4 stacks just because it can. It could always look like a Vega10, but I'd guess it ends up more like P100 without FP64. Avoid the double exposure and make it fit within traditional dimensions. Perhaps even relaxed a bit for better yields, but keeping the Tensor cores where bandwidth is more a concern. V100 is so large it's essentially a new tier. Those HBM2 stacks wouldn't need to be clocked super high either. 8GB stacks at 75% or even less clocks would be very flexible and inline with existing models. Pairing Tensor disabled, cut chips with slower HBM2.
Since it's not AMD and I'll be safe from anklebiting, I would say that for all we know about the current situation, Nvidia would be quite stupid to use HBM2 right now on something we envision as GV102 - both from a margin as well as a competitive point of view and especially in the light of G(ddr)6 being announced for early next year.
 
Since it's not AMD and I'll be safe from anklebiting, I would say that for all we know about the current situation, Nvidia would be quite stupid to use HBM2 right now on something we envision as GV102 - both from a margin as well as a competitive point of view and especially in the light of G(ddr)6 being announced for early next year.

Wasn't it generally agreed that the 384-bit GDDR6 graphics card mentioned by SK Hynix was probably GV102?

https://www.anandtech.com/show/1129...gddr6-memory-for-graphics-cards-in-early-2018

What is noteworthy is that SK Hynix does disclose some details about the first graphics cards to use its GDDR6 memory. As it appears, that adapter will have a 384-bit memory bus and will thus support memory bandwidth upwards of 768 GB/s. Given the number of chips required for a 384-bit memory sub-system, it is logical to assume that the card will carry 12 GB of memory. SK Hynix is not disclosing the name of its partner among GPU developers, but it is logical to assume that we are talking a high-end product that will replace an existing graphics card.

I mean, all but one Titan have released in Q1 (the outlier being August 2016's Titan X).

  • Titan - 2/2013
  • Titan Black - 2/2014
  • Titan Z - 3/2014
  • Titan X - 3/2015
  • Titan X - 8/2016
  • Titan Xp - 4/2017

So it seems like Nvidia will probably debut Volta by replacing the Titan Xp in Q1 2018 with a slightly cut down GV102 and its 384-bit bus.

EDIT It just dawned on me that April isn't Q1, so the Titan Xp released in early Q2. You get the idea.
 
Since it's not AMD and I'll be safe from anklebiting, I would say that for all we know about the current situation, Nvidia would be quite stupid to use HBM2 right now on something we envision as GV102 - both from a margin as well as a competitive point of view and especially in the light of G(ddr)6 being announced for early next year.
Perhaps GV102 isn't the correct code here, but whatever is a step down from V100. As I said above, there is enough of a size gap to fit a whole new SKU(~471mm2 to 815mm2). So a GDDR6 and HBM2 variant may exist between expected V100 and GV104 parts. P100 was the last chip Nvidia designed around that size and they chose HBM2. GDDR6 at that scale would lack capacity, power efficiency, and bandwidth.

In the case of Tensor operations, Google indicated, as expected, that bandwidth was the limiting factor. Even Volta is losing badly to the TPU2 in regards to efficiency and likely price. A cheaper server card that retains much of the bandwidth and capacity, without the oversized die, would make sense. The alternative leaves a really large gap in the lineup. GDDR6 is just too limited at serious performance levels. Nvidia would be left with a 12GB card facing 16/32GB Vega variants with better power efficiency, memory capacity/bandwidth, and unified memory implementation on X86. Still seems likely a license issue exists there, and I doubt Intel is feeling generous.
 
GP100 was top-of-the-line as is GV100, no difference there. Of course, there could be another SKU at 600 mm², but there's nothing to indicate that coming from earlier strategy with Pascal.
Google was limited while using 256 bit (g)DDR3 edit: no G, but DDR3 with a transfer rate of 30 GB/s!!, which is hardly comparable.
 
Last edited:
Even Volta is losing badly to the TPU2 in regards to efficiency and likely price.
Not in my reality. V100 is 120TFops at 300W vs 45TFlops at 250W for TPU2. In fact, Google is investing more and more in V100 HGX racks.
But anyway, it doesn't matter anymore, TPU team left google to start GROQ. So for now, TPU project is dead and has no future at google...

GDDR6 is just too limited at serious performance levels. Nvidia would be left with a 12GB card facing 16/32GB Vega variants with better power efficiency, memory capacity/bandwidth, and unified memory implementation on X86.
Vega and power efficiency in same sentence :runaway: (please no undervolting nonsens, Nvidia cards undervolt too)
I don't know in which parallel world you are living, but here on earth, Vega HBM2 barely compete with GDDR5 Pascal and you already make it a winner against an unannounced GDDR6 Volta part. seriously?
[snipped insult]
 
Last edited by a moderator:
in a more factual note, Nvidia announced at GTC 2017 Beijing that all the big 3 Chinese IT giants - Alibaba Baidu Tencent - are adopting Volta HGX in their datacenter:
Alibaba Group Holding Ltd., Baidu Inc. and Tencent Holdings Ltd. are upgrading their data centers with Nvidia’s Volta-based platforms, which revolve around the V100 data center GPU
Hardware suppliers are Huawei, Inspur and Lenovo:
In addition to the Chinese cloud firms, Nvidia said that some of China’s largest server builders are adopting Nvidia hardware for their products. Huawei Investment & Holding Co. Ltd., Inspur International Ltd., Lenovo Group Ltd., are now using Nvidia’s HGX reference architecture to offer Volta-based systems for hyperscale data centers

http://www.marketwatch.com/story/nv...sed-chips-with-chinese-tech-giants-2017-09-25
 
It's GDDR6, Hynix already announced they are incorporating it in a high end GPU, and they meant NVIDIA.

Hopefully they are better at it than they were with the 1GHz HBM2.

Wasn't it generally agreed that the 384-bit GDDR6 graphics card mentioned by SK Hynix was probably GV102?

https://www.anandtech.com/show/1129...gddr6-memory-for-graphics-cards-in-early-2018



I mean, all but one Titan have released in Q1 (the outlier being August 2016's Titan X).

  • Titan - 2/2013
  • Titan Black - 2/2014
  • Titan Z - 3/2014
  • Titan X - 3/2015
  • Titan X - 8/2016
  • Titan Xp - 4/2017

So it seems like Nvidia will probably debut Volta by replacing the Titan Xp in Q1 2018 with a slightly cut down GV102 and its 384-bit bus.

EDIT It just dawned on me that April isn't Q1, so the Titan Xp released in early Q2. You get the idea.

No need for Titan V if gtx 2080 does the job. GV104 being the Titan card would be something though.

Not in my reality. V100 is 120TFops at 300W vs 45TFlops at 250W for TPU2. In fact, Google is investing more and more in V100 HGX racks.
But anyway, it doesn't matter anymore, TPU team left google to start GROQ. So for now, TPU project is dead and has no future at google...


Vega and power efficiency in same sentence :runaway: (please no undervolting nonsens, Nvidia cards undervolt too)
I don't know in which parallel world you are living, but here on earth, Vega HBM2 barely compete with GDDR5 Pascal and you already make it a winner against an unannounced GDDR6 Volta part. seriously?
IMHO, better to wait a bit for some facts before spreading FUD in typical fanboy way...

nvidia cards can undervolt but to what degree? My custom gtx1070 can use around 30% more power than the 1070FE while only being 10% faster, Vega56 uses another 10% over the former while being around similar performance. If I push the power limit slider on V56 to -10%, it loses 30Mhz on the gpu which is around 2.3%. It's quite obvious that undervolting the gtx1070FE wouldn't lead to similar gains in efficiency as the custom 1070, I think that holds for Vega 56 as well.

There is the die-size factor to account for, but the story would be much different regarding efficiency if AMD were the first to market with Vega and didn't have to push the chip and nvidia didn't have the GP102 to easily give them the performance crown.
 
but the story would be much different regarding efficiency if AMD were the first to market with Vega and didn't have to push the chip and nvidia didn't have the GP102 to easily give them the performance crown.
The story would be different if AMD didin't launch Vega (the uarch that heavily relies on software) with half-baked software surrounding it.
On the scale of 1 to Fermi this is Fermi in terms of bad launches.
 
The highest power consumption measured by Tom´s hardware for a 1070 was 200W. RX56 Bios1 Balanced measured in at 220W.
 
nvidia cards can undervolt but to what degree? My custom gtx1070 can use around 30% more power than the 1070FE while only being 10% faster, Vega56 uses another 10% over the former while being around similar performance. If I push the power limit slider on V56 to -10%, it loses 30Mhz on the gpu which is around 2.3%. It's quite obvious that undervolting the gtx1070FE wouldn't lead to similar gains in efficiency as the custom 1070, I think that holds for Vega 56 as well.
Google Nvidia Max-Q laptop reviews and you will see that Pascal can be much more power efficient than what we see on desktop cards
 
Vega and power efficiency in same sentence :runaway: (please no undervolting nonsens, Nvidia cards undervolt too)
Why would it need undervolted? That would surely help, but nobody is undervolting in a server market. If anything they simply run cards slower to be more power efficient. The savings with memory types are well established, [snipped insult]

Not in my reality. V100 is 120TFops at 300W vs 45TFlops at 250W for TPU2. In fact, Google is investing more and more in V100 HGX racks.
But anyway, it doesn't matter anymore, TPU team left google to start GROQ. So for now, TPU project is dead and has no future at google...
So your reality broke down and you substituted figures to make it accurate? Take 4 chips, disabled three of them, then presumed the one remaining chip still consumed the same amount of power? I suppose that's one way to make 180TFLOPs@250W less than 120TFLOPs@300W.

Google is investing in racks because there will always be some customers using different software. They're investing in AMD racks as well.

I don't know in which parallel world you are living, but here on earth, Vega HBM2 barely compete with GDDR5 Pascal and you already make it a winner against an unannounced GDDR6 Volta part. seriously?
[snip insult]...
[snip sniping]. Compute and graphics are entirely separate areas. Vega is already beating P100 in some tests as expected, yet you feel Nvidia's GDDR5 offerings are superior to even their largest chip? [snipped some more]
 
Last edited by a moderator:
Back
Top