Nvidia Pascal Announcement

Yes, but because it's not meant to, not because it won't be close, I was saying that in perspective of Nvidia being able to easily trump AMD, and thus doing so. If they don't release fully enabled Titan at all, AMD might as well price their offering similarly(cause it would look more appealing, heck RX 480 looks better to a lot of people than 1060 right now), which is not happening.

Full Vega might not escape all the problems Polaris is having, but coming out almost a year after it should achieve 16TF, which should put right between 3584 and 3840 Titans in performance.
They will have the same challenges as Nvidia and that is multiple segments where they need varied FP64/FP32/FP16, this also needs to be balanced against power draw.
These days it is unlikely (well apart from Nvidia P100 it seems) a manufacturer will use a dedicated GPU die in just one segment out of the three when it comes to their top consumer GPU.
Need to remember the previous gen were a bit of an anomaly as they had minimal DP, while also they now need good FP16 and Int8 for certain research.
Cheers
 
Due to GP104's very high clock speeds, the practical difference to this new new Titan seems really small.
A custom overclocked (e.g. AiO watercooled?) GTX1080 that manages 1.9-2GHz should consume just about the same and get 10% lower theoretical output. Given the higher clocks and subsequently higher single-threaded performance, there's a chance it would perform practically the same or even better.

Then again, the Titans were never about value in gaming, but rather low-cost compute perks or e-peens for rich gamers.

Yeah on paper it's a little disappointing as a gaming card vs 1080. Looking forward to reviews.
 
This data begs to differ:
(...)
At 1500 Mhz Pascal seems to be much faster per Mhz than at 1800+ Mhz.

I don't see how those graphics support your claims. I also don't get how Tomshardware can build a "FPS-per-watt" curve (I would totally get it if it was a table)..
Are they changing clocks on the fly? Which clocks are being achieved for each FPS value? Are they touching the core voltage or is it all using standard voltage? Is the memory being overclocked? It seems like there's a lot of stuff must be oversimplified/assumed to make a curve like that.

The "performance per MHz" that you claim could be because the chip is hitting other bottlenecks, but the single-threaded advantage - as small as it could be - is still there.
 
Due to GP104's very high clock speeds, the practical difference to this new new Titan seems really small.
A custom overclocked (e.g. AiO watercooled?) GTX1080 that manages 1.9-2GHz should consume just about the same and get 10% lower theoretical output. Given the higher clocks and subsequently higher single-threaded performance, there's a chance it would perform practically the same or even better.

Then again, the Titans were never about value in gaming, but rather low-cost compute perks or e-peens for rich gamers.

Judging by how well GM204 -> GM200 clocked i'd wager that the new Titan X will be able to hit 1.9-2 ghz easily, if not more (just like GP104). The difference should be substantial, and this is most probably the first GPU able to hit 4k60 in most titles with maxed settings.
 
I am really really disappointed about the specs of this GP102

Not only it is overpriced but also the best they can pull is 11T SP-FLOPS @ 16nm? thats only 44G SP FLOPS/w.

Whats the point to waste so many silicons on uint8 which nobody would even touch that besides a very limited deep-learning zealots, and what if current DL/AI algorthims evolves into something more complicated/smart than the brainless stupid GEMM and grey computation? wtf?

A month ago I have a meeting with professors from NUDT of China (National University of Defence Tech), and they are about to release a GPU-like accerlator for China's next gen exscale supercomputer in the near future, that accerlator has a performance of 30-60G DP FLOPS/W and 60-120G DP FLOPS/W @14nm, and thats roughly 2-3X faster @FP32 and probably 100X faster@FP64 than this overpriced piece of silicon, and that accrelator also support a open-cl/cuda-like vectorized computing language.

Its seems that the lack of proper competition turn Nvidia just like intel.
 
Last edited:
Due to GP104's very high clock speeds, the practical difference to this new new Titan seems really small.
A custom overclocked (e.g. AiO watercooled?) GTX1080 that manages 1.9-2GHz should consume just about the same and get 10% lower theoretical output. Given the higher clocks and subsequently higher single-threaded performance, there's a chance it would perform practically the same or even better.

Then again, the Titans were never about value in gaming, but rather low-cost compute perks or e-peens for rich gamers.
Like if GP102 won't OC...
Moreover, at 4K, GP104 is bandwidth limited. With 50% more bandwidth to start with, I predict GP102 will scale much better when OC than GP104. So even if it won't reach 2.1GHz but "only" 2GHz, the gap will remain at least the same when both are OC.
 
To hit 16TF a 96 CU Vega would need to be running at 1300Mhz. In pure theory that would make it around 95% faster than the Fury X. If Pascal Titan is 60% faster than Titan X that puts it around 70-75% faster than Fury X. So while what you say is possible, it relies on 3 huge assumptions:

1. That a 96 CU part will be able to reach 1300Mhz, something that the 36 CU 480 couldn't achieve at stock clocks.
2. That all parts of the Fury X are scaled up by 50%, and not just CU's.
3. That performance scales exactly linearly with unit count and clock speed - which looking at the 390x and Fury X in particular has not been the case in the past.

IMO, AMD have their work cut out for them to match this part, and it's likely NV have a little left in the wings for a fully unlocked version too.

The Fury X seems severely bottlenecked by fillrate and geometry output, and Hawaii seems like a much more balanced chip in comparison. The only situation where all that compute performance was put to good use without hitting the other bottlenecks so far has been Doom in Vulkan, which is obviously not good enough to warrant the GPU itself. IMO the only truly good thing that came out of Fiji was the Nano, which seems like a nice deal even today because it's hitting close to 300€.

That said, let's hope Vega isn't just a simple do-over of Fiji's bottlenecks, with "only" 64 ROPs and 4 geometry engines. Vega doesn't need 16 TFLOPs to be competitive with this new Titan. AMD needs to spend their transistors elsewhere IMO.


Judging by how well GM204 -> GM200 clocked i'd wager that the new Titan X will be able to hit 1.9-2 ghz easily, if not more (just like GP104). The difference should be substantial, and this is most probably the first GPU able to hit 4k60 in most titles with maxed settings.
So even if it won't reach 2.1GHz but "only" 2GHz, the gap will remain at least the same when both are OC.

You're both suggesting this card will to 33% core overclock easily?
Wow...
 
I don't see how those graphics support your claims.

Maybe you need some new glasses then, because cause it's clear. At lower clock rates, it's performing relatively faster.

I also don't get how Tomshardware can build a "FPS-per-watt" curve (I would totally get it if it was a table)..
Are they changing clocks on the fly? Which clocks are being achieved for each FPS value? Are they touching the core voltage or is it all using standard voltage? Is the memory being overclocked? It seems like there's a lot of stuff must be oversimplified/assumed to make a curve like that.

I don't see how any of that is relevant at all, other than this being an attempt on your part to shoot the messenger instead of the message...
And as for how to build a curve out of a set of data points... come on now, is this Beyond3d forum or a kindergarten?

The "performance per MHz" that you claim could be because the chip is hitting other bottlenecks, but the single-threaded advantage - as small as it could be - is still there.

Well obviously those bottlenecks have a much larger impact than your allegued single-threaded advantage (for which I'd like to see a definitely proof of btw). Which brings us to the Titan having most of those bottlenecks at a much higher limits. i.e memory B/W 50% higher, pixel fillrate around 35% higher, 50% bigger L2 more than likely, 35% higher geometry rates more than likely, etc.
 
You're both suggesting this card will to 33% core overclock easily?
Wow...

Why not? The previous Titan X is "clocked" at 1000 MHz yet it can easily be overclocked to 1450-1500 MHz. At "worst" a 45% overclock.

Edit: With the way Pascal GPUs are behaving so far it'd be logical to expect 1900 MHz to be achievable, at least.
 
Last edited:
Like if GP102 won't OC...
Moreover, at 4K, GP104 is bandwidth limited. With 50% more bandwidth to start with, I predict GP102 will scale much better when OC than GP104. So even if it won't reach 2.1GHz but "only" 2GHz, the gap will remain at least the same when both are OC.
If it's bandwidth starved, why can't GTX 1080 even pull to the theoretical bandwidth difference of 25% with 25% more execution units and higher clocks to boot, compared to 1070? In fact, the performance gap barely widens couple %s at 4K compared to 1080p
 
The core count is the same as P100. this could be a new GP100 variant with GDDR5X instead of HBM2.
Or it could be GP102, far earlier than expected.
It's GP102, and the transistor count is far lower than GP100, so FP64 units got cut I assume
 
It's GP102, and the transistor count is far lower than GP100, so FP64 units got cut I assume
Yeah I agree,
I would had expected DP to be around 1.2-1.8Tflops but I am not longer convinced it even has that now (context if it was to be used also as Tesla, news mentioned it would be at least a Quadro part).
Cheers
 
This thing can't compete with full(96 CU) 16GB HBM2 Vega, but since it's not coming out any time soon, Nvidia is taking advantage of the situation, at $1200+ profit margin must be huge.
6144 ALU Vega 11 would be roughly 14.7tflop at 1200mhz

GP102 at 2GHz (28SMs, not full complement of 30; 3584 ALU) is 14.3tflop.

I agree with the poster above, AMD should spend their transistor budget elsewhere and produce a balanced GPU for a change, shader throughput is nice only when you can out it to use, otherwise it's like a third nipple on the elbow: useless
 
If it's bandwidth starved, why can't GTX 1080 even pull to the theoretical bandwidth difference of 25% with 25% more execution units and higher clocks to boot, compared to 1070? In fact, the performance gap barely widens couple %s at 4K compared to 1080p

Compared to the 1070 the 1080 has 33% more execution units and higher clocks to boot. Thus the agregate outputs are 37% higher at boost clocks.

Performance aligns more with bandwidth than anything else. Specially if we also take the 1060 into account. 60% for bandwidth and performance.

The 1070 doing slightly better could be explained by it having te same amount of ROPs and L2 as the 1080.

perfrel_2560_1440.png
 
Can anyone explain where they're getting 44TOP/s Int8?

It see seems its just 4x the fp32 rate, but that's considering the use of FMA so it's already two ops per cycle.

22TOP/s makes sense to me as it would be 4x the int32 rate with one operation per instruction, assuming 1:1 ratio of FPU:ALU

Just read about dp4a and dp2a2, that would make sense
 
Last edited:
Can anyone explain where they're getting 44TOP/s Int8?

It see seems its just 4x the fp32 rate, but that's considering the use of FMA so it's already two ops per cycle.

22TOP/s makes sense to me as it would be 4x the int32 rate with one operation per instruction, assuming 1:1 ratio of FPU:ALU

Just read about dp4a and dp2a2, that would make sense
Here was the dp4a testing done earlier in the year, and I think it talked about dp2a as well and also limitation between sm_60 and _sm61 [edit yep just read it again and does].
https://devtalk.nvidia.com/default/...e-gtx-1080-amp-gtx-1070/post/4889750/#4889750
Person to read is Scott Gray.
Cheers
 
Pascal Titan vs 1080 is very close to being the same as 980Ti vs 980. Titan will have a very clear edge in performance and no 1080 model will be able to overcome that, unless LN is used. The Titan should boost to 1600Mhz+ out of the box anyway and overclock reasonably close to the same frequency as the 1080 does, just like with previous nVidia architectures Kepler and Maxwell.

Yes the stock cooler with stock power targets and temp limits will hold it back somewhat, but those can be cranked up and even at stock, it will be out of touch from the 1080.

It would be nice to see this chip with better coolers though. They could still release a very formidable 1080Ti cards with custom coolers, even if they cut 1 or 2 more SMs and I do expect that to happen.
 
Back
Top