Nvidia Pascal Announcement

In both cases the ALU count suggests the GPU has disabled SMs. This will be the first Titan SKU using salvage parts.

Not really. The first Titan was also a cut down part (remember 780/Titan Black/780ti?). Potentially the same deal here, 4 different parts from the same chip, depending on what AMD is doing of course, Nvidia are in no rush. It has to be said though, the pricing is a joke, as it has been with almost every Pascal GPU so far.
 
In both cases the ALU count suggests the GPU has disabled SMs. This will be the first Titan SKU using salvage parts.

The original GK110 Titan was a salvage part. The titan Black came later using the same chip fully enabled and so did the 780Ti.

EDIT: beat...
 
If Nvidia release a cut down non-titan version this year for ~$800, AMD are in for even more pain. For their sake hopefully it plays out more like the 780 did.
The thing is, this is already a cut down version, what Ti usually would have been, i can only imagine them releasing this as a Ti when the fully enabled version(3840 cores, 1600+Mhz, 12TF, 24GB GDDR5X(cheaper than HBM(?) and better for marketing, still no bottleneck) comes out(Q1 17).
 
The thing is, this is already a cut down version, what Ti usually would have been, i can only imagine them releasing this as a Ti when the fully enabled version(3840 cores, 1600+Mhz, 12TF, 24GB GDDR5X(cheaper than HBM(?) and better for marketing, still no bottleneck) comes out(Q1 17).

Why not release two Titans and two GPUs under new naming either 1085/1090 (if they want to keep the 10 series naming) or 1180/1180ti like they did with Kepler. It's not like there's any competition on the horizon yet, and we all know the number one priority for Nvidia is to make as much money as the possibly can from enthusiasts (hello, Founders Edition™).
 
Last edited:
The thing is, this is already a cut down version, what Ti usually would have been, i can only imagine them releasing this as a Ti when the fully enabled version(3840 cores, 1600+Mhz, 12TF, 24GB GDDR5X(cheaper than HBM(?) and better for marketing, still no bottleneck) comes out(Q1 17).
1080Ti won't wait for next year ;)
 
A second blog post from NVidia, more about machine learning with Titan X. New fun detail is that multiple Titan X's were handed out to random audience members in a giveaway, so actual GPUs are out in the wild already.

The first blog post talks about Pascal's Titan X's machine learning throughput (as a mysterious 44 TOPS), which is not in reference to the P100's touted fp16x2 capability. Instead the single line mentions something about 4x rate 8 bit "inference instruction".

This is something new. Unless it's talking about GP104's DP4A and DP2A 8 and 16 bit instructions, which were unadvertised in the GTX 1080 launch but are accessable via CUDA and are documented in the CUDA 8.0RC PTX reference. DP4A is "Four-way byte dot product-accumulate" and DP2A is "Two-way dot product-accumulate", which certainly could be the "inference instructions". Notably P100 tesla, with sm_60, lacks these instructions. GP104 is sm_61 and has them. (In my earlier post I mistakenly talked about sm_5x.. Pascal is sm_6x).
 
This thing can't compete with full(96 CU) 16GB HBM2 Vega, but since it's not coming out any time soon, Nvidia is taking advantage of the situation, at $1200+ profit margin must be huge.

It's a bit rich to make claims these days about a high end nVidia chip not being able to compete with anything AMD has to throw at it and with basically zero actual information to back those claims up. AMD has a ton of technical challenges to overcome to get their performance up to this level, let alone making this "not able to compete"
 
Why not release two Titans and two GPUs under new naming either 1085/1090 (if they want to keep the 10 series naming) or 1180/1180ti like they did with Kepler. It's not like there's any competition on the horizon yet, and we all know the number one priority for Nvidia is to make as much money as the possibly can from enthusiasts (hello, Founders Edition™).
Well, there it is, they are making money either way, 1080 goes all the way to $1000 now, they will sell 1080's and Titans, then sell Ti's and another Titans, without doing any extra work.

1080Ti won't wait for next year ;)
They could do it this year, it's just that there is no hurry :))

It's a bit rich to make claims these days about a high end nVidia chip not being able to compete with anything AMD has to throw at it and with basically zero actual information to back those claims up. AMD has a ton of technical challenges to overcome to get their performance up to this level, let alone making this "not able to compete"
So i shouldn't talk for another half a year or something? I was simply answering questions of "what" and "why".
AMD, on the other hand, won't be able to compete for the actual top end solution crown, yet again.
 
I was simply answering questions of "what" and "why".

You were claiming that this new Pascal Titan X would not be able to compete with a Vega based 96CU chip, based on absolutely nothing to back that up and in my opinion it's quite a tall order for AMD to live up to that claim of yours.

Lets see them make a 96CU chip with good enough core clocks to get close to this, let alone far surpass it like you suggested... AMD is currently struggling to match the 1080 with 2 RX 480s in games that show good scaling with a power consumption clearly more than 300W, so they have their hands full with coming a Titan X beater.

I understand you said it's not coming soon, but you still seem to have quite a bit of faith in what this supposed chip will be able to do.
 
Last edited:
I don't know what to take of this, looks to be a preemptive strike..... or they might have a good idea of what Vega is bringing to the table (if they are basing predictions on P10)

Quite out of the blue which is is unusual by itself. I just don't see a stimuli to announcing something like this. Well outside of Intel. So I'm going to think for now its more a focus on neuronets and what Knights landing can do as opposed to what it does for gaming.
Yeah I am surprised it is so early and launched not even with 12Gbps GDDR5X memory.

Just to say in general.
I think they are launching early because of news that Cray are selling more of latest Intel based FP32/FP16 than their Pascal based accelerator P100 solutions for now - Cray are strong partners with both companies.
And I still say they are doing what they did with the GK110 die that was launched across all 3 segments with this being the 'consumer-pro' model, as they need another Tesla model below the P100, a strong new Quadro, and this leads to the new Titan sharing this die.

[edited] As mentioned by some the dp4a is not supported by P100 and so far is GP104.

However it would be strange to launch this as a Deep Learning model and at a university event if it did not also have the mixed-precision Cuda core of FP32-FP16 capability.
While it has the same number of cores as the P100, interesting to see that it has reduced number of transistors; with the P100 having around 25% more - only so much room I guess even with drastically less DP.
And as some of the other sites mention, will also be interesting to see if it has the GPC-double SM structure of the P100 or more traditional 128 Cuda cores per SM, could go either way on that decision IMO.

Anyway it does feel more like a forced quicker response to Intel rather than AMD, with the next down to P100 die now.
Cheers
 
Last edited:
You were claiming that this new Pascal Titan X would not be able to compete with a Vega based 96CU chip, based on absolutely nothing to back that up and in my opinion it's quite a tall order for AMD to live up to that claim of yours.

Lets see them make a 96CU chip with good enough core clocks to get close to this, let alone far surpass it like you suggested... AMD is currently struggling to match the 1080 with 2 RX 480s in games that show good scaling with a power consumption clearly more than 300W, so they have their hands full with coming a Titan X beater.

I understand you said it's not coming soon, but you still seem to have quite a bit of faith in what this supposed chip will be able to do.
Yes, but because it's not meant to, not because it won't be close, I was saying that in perspective of Nvidia being able to easily trump AMD, and thus doing so. If they don't release fully enabled Titan at all, AMD might as well price their offering similarly(cause it would look more appealing, heck RX 480 looks better to a lot of people than 1060 right now), which is not happening.

Full Vega might not escape all the problems Polaris is having, but coming out almost a year after it should achieve 16TF, which should put right between 3584 and 3840 Titans in performance.
 
Considering how long it takes to design a GPU, it is very unlikely that anything AMD did, does or will do influenced the launch date of GP102. I dare say that NV does not seem to have hit any delays with their pascal line-up and that they are on schedule and that is what we are seeing.
 
Due to GP104's very high clock speeds, the practical difference to this new new Titan seems really small.
A custom overclocked (e.g. AiO watercooled?) GTX1080 that manages 1.9-2GHz should consume just about the same and get 10% lower theoretical output. Given the higher clocks and subsequently higher single-threaded performance, there's a chance it would perform practically the same or even better.

Then again, the Titans were never about value in gaming, but rather low-cost compute perks or e-peens for rich gamers.
 
A second blog post from NVidia, more about machine learning with Titan X. New fun detail is that multiple Titan X's were handed out to random audience members in a giveaway, so actual GPUs are out in the wild already.

The first blog post talks about Pascal's Titan X's machine learning throughput (as a mysterious 44 TOPS), which is not in reference to the P100's touted fp16x2 capability. Instead the single line mentions something about 4x rate 8 bit "inference instruction".

This is something new. Unless it's talking about GP104's DP4A and DP2A 8 and 16 bit instructions, which were unadvertised in the GTX 1080 launch but are accessable via CUDA and are documented in the CUDA 8.0RC PTX reference. DP4A is "Four-way byte dot product-accumulate" and DP2A is "Two-way dot product-accumulate", which certainly could be the "inference instructions". Notably P100 tesla, with sm_60, lacks these instructions. GP104 is sm_61 and has them. (In my earlier post I mistakenly talked about sm_5x.. Pascal is sm_6x).
Yeah good point about dp4a, forgot about the testing Scott did.
Would they release what they call a Deep Learning model at a university in a scientific setting if it did not support FP16?
That would restrict what it could be used for if it only had FP32 and int8.
Nvidia made a big thing about first Deep Learning 'Supercomputer', and that was P100 with its mixed-precision FP32/FP16.
So that would mean they are again re-defining Deep Learning narrative if they now ignore applications requiring FP32/FP16 in this context.
To quote them with their big push on Deep Learning earlier in the year with the P100.
16nm FinFET fabrication technology for unprecedented energy efficiency; Chip on Wafer on Substrate with HBM2 for big data workloads; and new half-precision instructions to deliver more than 21 teraflops of peak performance for deep learning.
Also: https://devblogs.nvidia.com/parallelforall/inference-next-step-gpu-accelerated-deep-learning/
On supported chips, such as Tegra X1 or the upcoming Pascal architecture, FP16 arithmetic delivers up to 2x the performance of equivalent FP32 arithmetic. Just like FP16 storage, using FP16 arithmetic incurs no accuracy loss compared to running neural network inference in FP32.
Suggesting they are also focused on half-precision in this context, possible they have ignored all of this for Titan but that does limit its Deep Learning capabilities a lot.

Cheers
 
Last edited:
Considering how long it takes to design a GPU, it is very unlikely that anything AMD did, does or will do influenced the launch date of GP102. I dare say that NV does not seem to have hit any delays with their pascal line-up and that they are on schedule and that is what we are seeing.
Mmm..., this doesn't have anything to do with the gpu design itself, though, and they were probably aware of their advantage for a long time.
 
Full Vega might not escape all the problems Polaris is having, but coming out almost a year after it should achieve 16TF, which should put right between 3584 and 3840 Titans in performance.

To hit 16TF a 96 CU Vega would need to be running at 1300Mhz. In pure theory that would make it around 95% faster than the Fury X. If Pascal Titan is 60% faster than Titan X that puts it around 70-75% faster than Fury X. So while what you say is possible, it relies on 3 huge assumptions:

1. That a 96 CU part will be able to reach 1300Mhz, something that the 36 CU 480 couldn't achieve at stock clocks.
2. That all parts of the Fury X are scaled up by 50%, and not just CU's.
3. That performance scales exactly linearly with unit count and clock speed - which looking at the 390x and Fury X in particular has not been the case in the past.

IMO, AMD have their work cut out for them to match this part, and it's likely NV have a little left in the wings for a fully unlocked version too.
 
Due to GP104's very high clock speeds, the practical difference to this new new Titan seems really small.
A custom overclocked (e.g. AiO watercooled?) GTX1080 that manages 1.9-2GHz should consume just about the same and get 10% lower theoretical output. Given the higher clocks and subsequently higher single-threaded performance, there's a chance it would perform practically the same or even better.

This data begs to differ:

Performance-vs.-Power-Consumption_w_728.png


At 1500 Mhz Pascal seems to be much faster per Mhz than at 1800+ Mhz.

On top of that the Titan also has a similar gflop/bandwidth ratio as the 1060, which is arguably what contributes to make it 60% as fast as the 1080 despite only having 47%/49% (base/boost) of theoretical output.
Power-Consumption-vs.-Clock-Rate_w_727.png
 
Mmm..., this doesn't have anything to do with the gpu design itself, though, and they were probably aware of their advantage for a long time.

Advantage compared to whom? When NV decided to do a GP102 and use GDDRX5 the only partly reliable data was the predicted availability of GDDRX5 and HMB2, the predicted yields for certain die sizes by TSMC and a target date for market entry. Those are very early design decisions for me and hardly influenced by AMD in any way.
 
Back
Top