Nvidia Pascal Announcement

Clukos · Jul 22, 2016

ShaidarHaran said:
Ding ding ding, we have a winner! This is precisely my intention for, and interest in this card. Just need to see how it OCs before I decide between new Titan X or SLI 1070 for 4k60.

For what you want to achieve it's a no-brainer to go with the -new- Titan X over the two 1070s.

Dangerman · Jul 22, 2016

Frankly I wish Nvidia just did 1070 & 1070 Ti for GP104 and 1080 & 1080Ti for GP102; far less confusing.

CSI PC · Jul 22, 2016

RecessionCone said:
Yes. This is wrong. GP100 has dedicated FP16 units. GP102, 104, 106 have dedicated Int8 units.

The GP100 is a mixed-precision Cuda core and that is how you get double the FP16 Tflops to the FP32.
The dedicated 2xFP16 cores are on the 1080.
Cheers

ninelven · Jul 22, 2016

And I thought the 1080 was overpriced.... I guess shiny new buildings don't pay for themselves.

RecessionCone · Jul 23, 2016

CSI PC said:
The GP100 is a mixed-precision Cuda core and that is how you get double the FP16 Tflops to the FP32.
The dedicated 2xFP16 cores are on the 1080.
Cheers

The 2xFP16 cores are not on the 1080, since it is a GP104.

The 1080 does have the 4xInt8 cores that are not present on the GP100.

Orion · Jul 23, 2016

pjbliverpool said:
To hit 16TF a 96 CU Vega would need to be running at 1300Mhz. In pure theory that would make it around 95% faster than the Fury X. If Pascal Titan is 60% faster than Titan X that puts it around 70-75% faster than Fury X. So while what you say is possible, it relies on 3 huge assumptions:

1. That a 96 CU part will be able to reach 1300Mhz, something that the 36 CU 480 couldn't achieve at stock clocks.
2. That all parts of the Fury X are scaled up by 50%, and not just CU's.
3. That performance scales exactly linearly with unit count and clock speed - which looking at the 390x and Fury X in particular has not been the case in the past.

IMO, AMD have their work cut out for them to match this part, and it's likely NV have a little left in the wings for a fully unlocked version too.

The 480 can't achieve that at stock clocks because it's limited by its cooler and power. AIBs are hitting 1400 easy, and there are rumors of 1500+ cards. Whatever issues are with the current manufacturing process will likely improve over time, and they could fab elsewhere probably if need be.

All amd needs is a larger chip, and nvidia won't be able to offer performance difference to justify 1200$. They'll only be able to compete by lowering prices, if amd prices things reasonably.

Razor1 said:
IF Vega doesn't escape all the problems Polaris is having, Vega is screwed, its a bigger chip, its will be a gas guzzler and will run across the same issues Fury line ran up against.

And where does rx480 look better than the 1060, outside of Doom, Dx12 is 5% lead at most, actually closer to a tie and Dx11 is like 15% in favor of the gtx 1060. All the while the gtx 1060 having a 20% power advantage.

And Flops don't matter, the 1060 has less flops than the rx480 and yeah you see the picture.

RX 480 can be up to 30% faster than a 980 in some circumstances at 1440p, basically 980ti level of performance, you ain't seeing that with a 1060. At 1080p it can be within 30% of a 1080 in some circumstances.

This is the non OC reference version, 8GB versions of which can be had for 200$. The AIB, have higher clocks and will perform better.

Ryan Smith · Jul 23, 2016

RecessionCone said:
The 2xFP16 cores are not on the 1080, since it is a GP104.

The 1080 does have the 4xInt8 cores that are not present on the GP100.

Just to be clear, GP104-style SMs have 1 FP16x2 core per SM.

CarstenS · Jul 23, 2016

Orion said:
All amd needs is a larger chip, and nvidia won't be able to offer performance difference to justify 1200$. They'll only be able to compete by lowering prices, if amd prices things reasonably.

So you think AMD woudl forego increasing their margins given that they can match performance? I doubt it. Just look at Fury X introduction last year.

Orion said:
RX 480 can be up to 30% faster than a 980 in some circumstances at 1440p, basically 980ti level of performance, you ain't seeing that with a 1060. At 1080p it can be within 30% of a 1080 in some circumstances. This is the non OC reference version, 8GB versions of which can be had for 200$. The AIB, have higher clocks and will perform better.

And 200 Dollar being the 4 GiB price without taxes? Unfortunately, where I live, even the reference 8 GB cards start at 270 EUR which roughly equals 299 US-$ atm. But then, R9 Nano is very expensive here to compared to other countries. Maybe I'm just out of luck.

Another thing: In the end of your argument you're comparing the overclocked AIB models from both vendors, right? Because everything else would just not make sense.

ieldra · Jul 23, 2016

CarstenS said:
So you think AMD woudl forego increasing their margins given that they can match performance? I doubt it. Just look at Fury X introduction last year.

And 200 Dollar being the 4 GiB price without taxes? Unfortunately, where I live, even the reference 8 GB cards start at 270 EUR which roughly equals 299 US-$ atm. But then, R9 Nano is very expensive here to compared to other countries. Maybe I'm just out of luck.

Another thing: In the end of your argument you're comparing the overclocked AIB models from both vendors, right? Because everything else would just not make sense.

The AIB cards released so far haven't been very impressive, average factory OC is 1330mhz, and after tweaking around 1370 is the highest I've seen.

https://www.techpowerup.com/reviews/ASUS/RX_480_STRIX_OC/26.html

The factory overclocked Pascal cards on the other hand, depending on how high they clock out the box (assuming ~2ghz) can get another 5-10% performance depending on how high you can go and how much you can push the memory to complement the increased clocks. Sure, 5% isn't much, but 2100mhz on a 1060 is 400 mhz above 'stock' (~1700). Problem is you'll never find a card that actually boosts as low as 1731

Ryan Smith said:
Just to be clear, GP104-style SMs have 1 FP16x2 core per SM.

Ah so it's one FP16x2 and not 2xFP16. GP100, on the other hand, doesn't have *dedicated* fp16 units, correct?

CSI PC · Jul 23, 2016

Ryan Smith said:
Just to be clear, GP104-style SMs have 1 FP16x2 core per SM.

Just curious.
I know Nvidia clarified the situation with you about the use of the FP16x2 cores for GP104, did they also clarify if it was the same mixed-precision FP32/FP16 Cuda core only seen so far on the P100 but fixed in operation as FP16x2 or was it another unique type Cuda core?

Also, has there been any die shots of the GP100 or even GP104 (if it can be confirmed this is the mixed-precision core)?
Would be interesting seeing the size comparison of the older and newer Cuda cores.
If I remember you showed a die shot with one of the Maxwell 2 reviews/articles, wondering if you/anyone managed to get one this time.
Thanks

Voxilla · Jul 23, 2016

So this new GP102 GPU all by itself made for just one single low volume Titan card, pretty weird.
Would have expected a 1080Ti with these specs and a Titan to be a fully enabled card.

ImSpartacus · Jul 23, 2016

Voxilla said:
So this new GP102 GPU all by itself made for just one single low volume Titan card, pretty weird.
Would have expected a 1080Ti with these specs and a Titan to be a fully enabled card.

The same thing happened with the original Titan for like 6 months. I think Pascal is going to shake out sorta like the Kepler generation.

The first titan is partially disabled to serve as a halo product.

Then 6ish months later, that same gpu becomes fully enabled with higher clocks as the X80 Ti.

Then a few months after that, a refreshed titan drops with everything enabled, slightly higher clocks, still, and perhaps some kind of clamshell setup to get a truly egregious amount of vram.

So we've got 2016's titan now. I wouldn't be terribly surprised to see a 1080 Ti show up in a couple months to ruin Vega's day and then gp102 gets one more showing with 2017's Titan just to ensure that there's always a titan halo card in the lineup.

Voxilla · Jul 23, 2016

The 1200$ will buy you either this new Titan or a 1080 SLI.
Given the former is only ~ 25% faster as a single 1080, a pair of 1080 looks to be a much better proposition at the moment.

ShaidarHaran · Jul 23, 2016

Voxilla said:
The 1200$ will buy you either this new Titan or a 1080 SLI.
Given the former is only ~ 25% faster as a single 1080, a pair of 1080 looks to be a much better proposition at the moment.

If it were that simple no one would ask, and there aren't any $600 1080s out there. SLI doesn't "just work" like a single card does. There are lots of games which don't support it currently and some that never will. Your $1300-1400 investment then ends up performing like a single card OR WORSE. That doesn't sit well with everyone. In my situation an SLI setup becomes even more expensive because I water cool, so tack on another $160 or so per card and we're talking about a $1600-1750 solution that may be no better than the $650+ solution.

Voxilla · Jul 23, 2016

ShaidarHaran said:
If it were that simple no one would ask, and there aren't any $600 1080s out there. SLI doesn't "just work" like a single card does. There are lots of games which don't support it currently and some that never will. Your $1300-1400 investment then ends up performing like a single card OR WORSE. That doesn't sit well with everyone. In my situation an SLI setup becomes even more expensive because I water cool, so tack on another $160 or so per card and we're talking about a $1600-1750 solution that may be no better than the $650+ solution.

It still remains a valid dilemma, in case you have the budget to spend, to buy either the new Titan or a 1080 SLI.
DX12 has some new features like SFR that will ensure more games will be making use of multi GPUs.
Best option probably is to wait until a 1080Ti comes along having more performance and lower price, like what was the 780Ti compared to the first Titan.

CSI PC · Jul 23, 2016

ieldra said:
Ah so it's one FP16x2 and not 2xFP16. GP100, on the other hand, doesn't have *dedicated* fp16 units, correct?

Sorry my way of writing and getting lazy in this heat that caused confusion.
When Nvidia talk about mixed-precision for Pascal GP100 it is a single Cuda core that is FP32/FP16 (so FP16x2), and so gives the doubling of Tflops,the only other Cuda core is the FP64.
In theory this could also be done for FP64 Cuda cores if they overcome other limitations I think comes back to register-bandwidth, and maybe this is a capability of Volta.
The mixed-precision FP32 Cuda core development-evolution can be seen going back to Tegra X1 and for functions such as image recognition/Deep learning, scroll down to Double Speed FP16 : http://www.anandtech.com/show/8811/nvidia-tegra-x1-preview/2.

So it is debatable whether this is the same mixed-precision FP32 Cuda core from GP100 but fixed as FP16x2 operation, or another unique kind of Cuda core.
As it is there for compatibility reasons it seems, I would assume same as GP100 but this is Nvidia.
As Ryan mentions there is one of these cores per SM, which gives FP16 the FLOP rate ratio 1/64 and matches up with tests done by others, absolutely useless apart from compatibility testing and tbh not sure how many Cuda developers will consider this card even for that.
Cheers

ieldra · Jul 23, 2016

Ryan Smith said:
Just to be clear, GP104-style SMs have 1 FP16x2 core per SM.

CSI PC said:
Sorry my way of writing and getting lazy in this heat that caused confusion.
When Nvidia talk about mixed-precision for Pascal GP100 it is a single Cuda core that is FP32/FP16 (so FP16x2), and so gives the doubling of Tflops,the only other Cuda core is the FP64.
In theory this could also be done for FP64 Cuda cores if they overcome other limitations I think comes back to register-bandwidth, and maybe this is a capability of Volta.
The mixed-precision FP32 Cuda core development-evolution can be seen going back to Tegra X1 and for functions such as image recognition/Deep learning, scroll down to Double Speed FP16 : http://www.anandtech.com/show/8811/nvidia-tegra-x1-preview/2.

So it is debatable whether this is the same mixed-precision FP32 Cuda core from GP100 but fixed as FP16x2 operation, or another unique kind of Cuda core.
As it is there for compatibility reasons it seems, I would assume same as GP100 but this is Nvidia.
As Ryan mentions there is one of these cores per SM, which gives FP16 the FLOP rate ratio 1/64 and matches up with tests done by others, absolutely useless apart from compatibility testing and tbh not sure how many Cuda developers will consider this card even for that.
Cheers

No confusion at all actually, the way I had understood it is that GP100 contained only two types of FPUs per SM; FP16/32 + FP64.

The FP16 capability of GP100 is attributable to their ability to pack two data into each 32b register, and two operands into each 'FP32' instruction, thus achieving double throughput (in theory). Now where I am a little confused, is what goes on in GP104; is there a pair of standard FP16 units ? The way it's been mentioned as a single "FP16x2" unit is confusing

Orion · Jul 23, 2016

CarstenS said:
So you think AMD woudl forego increasing their margins given that they can match performance? I doubt it. Just look at Fury X introduction last year.

Fury would depend on the cost of HBM. But by reasonable I mean 600-700$ for single gpu cards. AMD needs to undercut to increase marketshare. Given identical performance and price, nvidia is likelier to be chosen given their very good market perception.

And 200 Dollar being the 4 GiB price without taxes? Unfortunately, where I live, even the reference 8 GB cards start at 270 EUR which roughly equals 299 US-$ atm. But then, R9 Nano is very expensive here to compared to other countries. Maybe I'm just out of luck.

Another thing: In the end of your argument you're comparing the overclocked AIB models from both vendors, right? Because everything else would just not make sense.

Some of the 200$ 4 GB are actually 8GB cards, just check if they've got an 8GB under the 4GB sticker and change bios.

As for AIB, some have reached over 1400Mhz, and I've read rumors some might come at over 1500Mhz.

5-10% won't be able to reach 980ti levels for the OC 1060 in any circumstance, or will it? The non OC 480 is 980ti performance in some circumstances, and the OC version if it reaches 1400 or 1500+ should be noticeably above 980ti in some circumstances.

It is also said that TVs might be getting freesync in the near future, if that does happen, you will have a reasonable latency 4k HDR premium display of 50+ inches, exclusively for amd cards as nvidia doesn't seem to be supporting free open standards. For those who game on htpcs, that's going to be a big deal imho[especially come vega].

Deleted member 2197 · Jul 24, 2016

As for AIB, some have reached over 1400Mhz, and I've read rumors some might come at over 1500Mhz.

Do you have a link? All reviews so far of custom 480 cards have not broken 1400Mhz for a stable benchmark. It's possible in the future under water but have not seen any on air.

DavidGraham · Jul 24, 2016

Orion said:
The non OC 480 is 980ti performance in some circumstances, and the OC version if it reaches 1400 or 1500+ should be noticeably above 980ti in some circumstances.

Please define "some circumstances", you are actually saying R480 can be faster than FuryX/980Ti now? in what world? and in what benchmarks? the card can't even surapass 1060 in most benchmarks let alone above that!

Nvidia Pascal Announcement

Clukos

Bloodborne 2 when?

Dangerman

CSI PC

ninelven

PM

RecessionCone

Orion

Ryan Smith

CarstenS

Moderator

ieldra

CSI PC

Voxilla

ImSpartacus

Voxilla

ShaidarHaran

hardware monkey

Voxilla

CSI PC

ieldra

Orion

Deleted member 2197

Guest

DavidGraham

Similar threads