Nvidia Turing Speculation thread [2018]

ShaidarHaran · Aug 18, 2018

DavidGraham said:
It's definitely NOT 8% faster than 1080Ti, that's nonsense. Even a TitanXP is faster than that. Just wait for the performance reveal, you should be satisfied performance wise. Also the 1K price is for a custom OC'ed card from PNY, it could be a place holder.

The 8% is a bit tongue-in-cheek on my part. Should be at least 20%. Still though, that's a bit underwhelming compared to the last few generational performance deltas. Also with a 754mm^2 die I would expect more CUDA cores, but I understand that this generation is all about shifting the focus towards ray tracing.

McHuj · Aug 18, 2018

I just hope the ray tracing performance is enough to be noticeable in games and not just by pixel peepers.

lanek · Aug 18, 2018

ShaidarHaran said:
The 8% is a bit tongue-in-cheek on my part. Should be at least 20%. Still though, that's a bit underwhelming compared to the last few generational performance deltas. Also with a 754mm^2 die I would expect more CUDA cores, but I understand that this generation is all about shifting the focus towards ray tracing.

Ray tracing....... The day we see a complete game render in raytracing... wake me up... I deal with raytracing since 15 years ... At the moment 3D modeling softwares are moving from raytracing for real time productivity in real time work to other type of engine ( Evee ) ( this way we can models with the same quality we will then render the scene with raytracing ,.... ( 3Ds, Blender etc )... games move to real time raytracing ... yes ...

Real time viewport .....

SpaceBeer · Aug 18, 2018

DavidGraham said:
The GTX 1070 had only 1920 cores vs 2816 for the 980Ti, the 1070 also had less 25% bandwidth, yet the 1070 was faster in every scenario..

And GTX 1070 had ~55% higher clocks, so it's theoretical performance (FP32, pixel and texture rate) are bit higher than 980 Ti's.

Clukos · Aug 18, 2018

That TU104 die looks large. The Ti one must be enormous.

ImSpartacus · Aug 18, 2018

Clukos said:
That TU104 die looks large. The Ti one must be enormous.

Nvidia strongly alluded to the 102-type GPU being 754mm2.

If the 104-type GPU is about 2/3 of it's bigger brother, then we're in the 500mm2 range .

DavidGraham · Aug 18, 2018

ShaidarHaran said:
Should be at least 20%. Still though

I will only say this once, you are extremely on the low side on this.

SpaceBeer said:
And GTX 1070 had ~55% higher clocks, so it's theoretical performance (FP32, pixel and texture rate) are bit higher than 980 Ti's.

Not that high in the end. They were almost the same, but with significantly lower bandwidth on the 1070.

Rootax · Aug 18, 2018

Maybe a stupid question, but can the RT cores assist the more traditional "cuda cores" for non-RT compute stuff ? Same thing with tensor cores ?

Deleted member 2197 · Aug 18, 2018

ShaidarHaran said:
I think only compute workloads are able to use the VRAM on multiple cards in a contiguous pool like that. Graphics workloads still see each card's VRAM as a separate pool. Which won't help in my case.

Not sure this is the case. All articles I've read seem to indicate combined coherent memory access via NVLink is not limited to just compute workloads. Hopefully we should know more on Monday.

Turing will be the first to use the new fast GDDR6 memory and has a video buffer of up to 48GB. In addition, two Quadro RTX cards can be connected by a special NVLink bridge where the two cards have coherent access to the combined video memory of both cards. This means with two cards, the total local memory can be up to 96GB.

Supporting larger local memory is important for movies and complex CAD designs in order to handle the rendering of large scenes. In the past, professional movie renderers couldn’t use graphics cards for movie rendering due to the relatively small frame buffer sizes. Now, with up to 96GBs of accessible local memory, it should be possible to cover the vast majority of the rendered scenes. A number of the professional renderers for movies will be adding Nvidia Quadro RTX support.

https://www.eetimes.com/author.asp?section_id=36&doc_id=1333598&_mc=RSS_EET_EDT

ShaidarHaran · Aug 18, 2018

DavidGraham said:
I will only say this once, you are extremely on the low side on this.

Good. I hope so. It's atypical for a new generation to only be marginally faster than the last one. I can see the stated clocks could be way off from reality, as is the case with every Pascal card on the market, but it seems like NV quotes clockspeed based on a set of parameters that I have never encountered in the real world, even before installing watercooling on every card, as I eventually do. When I owned a Titan X (Pascal) it boosted to ~1800MHz with a waterblock installed but no overclocking applied. Same goes for my 1070 (about 1900MHz "stock" boost under water). The thing is, installing waterblocks on Pascal cards had no effect on maximum (pre-overclocked) boost clock in my experience, only sustainability.

DavidGraham said:
Not that high in the end. They were almost the same, but with significantly lower bandwidth on the 1070.

I've owned both 1070 and 980 Ti. I went from 980 Tri-SLI down to single card 980 Ti (because multi-GPU support has gotten worse and worse over the last few years), up to Titan X Pascal (significant upgrade), down to 1070 (made money on the deal and wasn't gaming as much). The 980 Ti and 1070 are indeed, roughly comparable in most workloads. However, my particular 980 Ti was a beast of an overclocker, hitting 1550MHz. My 1070 is only an average overclocker, reaching just over 2GHz. More recent releases tend to run slightly better on the 1070, with older titles being a bit faster on 980 Ti. Even ignoring my watercooling and overclocking I've run across others with similar experience with these cards. It is impressive what performance can be achieved by the 1070 with such low bandwidth, for sure. Pascal is very efficient in its bandwidth utilization.

ShaidarHaran · Aug 18, 2018

pharma said:
Not sure this is the case. All articles I've read seem to indicate combined coherent memory access via NVLink is not limited to just compute workloads. Hopefully we should know more on Monday.

https://www.eetimes.com/author.asp?section_id=36&doc_id=1333598&_mc=RSS_EET_EDT

I'll keep an eye on that, thanks. Might actually be worthwhile to pick up 2 cards again if their VRAM pools can be combined. Would be even better if the driver handles this automatically and game devs don't have to jump through hoops to make it work.

Communism · Aug 18, 2018

NVLink 2 is 25 gBps per direction per connection. Should be fine for SLI, but don't expect miracles.

Ike Turner · Aug 18, 2018

Rootax said:
Maybe a stupid question, but can the RT cores assist the more traditional "cuda cores" for non-RT compute stuff ? Same thing with tensor cores ?

It will..

https://twitter.com/x/status/1030822744575954944

People shouldn't downplay the RT Cores IMO. But they also shouldn't expect anything more that shoddy Gameworks stuff on the gaming side. RTRT in games for useful features that don't require pixel peeping is still years away. But for everything non-gaming Nvidia just brought one hell of a product & I can't wait to grab one for myself for work either a 2080Ti or Quadro RTX 5000 depending on price and performance for what I need.

ShaidarHaran · Aug 18, 2018

Communism said:
NVLink 2 is 25 gBps per direction per connection. Should be fine for SLI, but don't expect miracles.

My desire for more VRAM is not contingent on a specific performance requirement. I simply need more VRAM in a single pool with a certain baseline of performance available. 16GB ought to get the job done.

CSI PC · Aug 18, 2018

ShaidarHaran said:
The 8% is a bit tongue-in-cheek on my part. Should be at least 20%. Still though, that's a bit underwhelming compared to the last few generational performance deltas. Also with a 754mm^2 die I would expect more CUDA cores, but I understand that this generation is all about shifting the focus towards ray tracing.

It also needs to be seen just how much impact the evolved SM-TPC-associated graphics related pipeline and processes-stages has on improving performance (also has a revised cache) and additional gains from the RT cores for other functionality.
However using RT cores beyond their traditional function probably will take time to expose as it will require specific library/functions that do not exist yet; you could access them directly in a similar way to Tensor Cores (that also had limited libary function support to begin with but could use c++ directly) but that is usually more specialist related development-optimisation.

silent_guy · Aug 18, 2018

Communism said:
NVLink 2 is 25 gBps per direction per connection. Should be fine for SLI, but don't expect miracles.

https://www.servethehome.com/nvidia-turing-introduced-with-the-quadro-rtx-line/

On the slide with the die spec, it’s says “100 GB/s”. Assuming that this is byte/s, and that this will be available for GeForce, then that’s nothing to sneeze at.

Communism · Aug 18, 2018

silent_guy said:
https://www.servethehome.com/nvidia-turing-introduced-with-the-quadro-rtx-line/

On the slide with the die spec, it’s says “100 GB/s”. Assuming that this is byte/s, and that this will be available for GeForce, then that’s nothing to sneeze at.

They are adding both directions for the 2 NVLink 2.0 ports together.

Also note that my original post already says gBps (gigabytes per second) as denoted by the capitalized B (if that was what was confusing you about my post).

silent_guy · Aug 18, 2018

Communism said:
They are adding both directions for the 2 NVLink 2.0 ports together.

50 or 100 GB/s: both of those are nothing to sneeze at. ;-) Especially compared to what was available in the past. It won’t allow one to treat memory on the other side as identical to local memory, but it’s 3x better than anything could reasonably have in the past.

Also note that my original post already says gBps (gigabytes per second) as denoted by the capitalized B (if that was what was confusing you about my post).

My brain saw the lower case ‘g’ and used that to override the upper case ‘B’!

Communism · Aug 18, 2018

silent_guy said:
50 or 100 GB/s: both of those are nothing to sneeze at. ;-) Especially compared to what was available in the past. It won’t allow one to treat memory on the other side as identical to local memory, but it’s 3x better than anything could reasonably have in the past.

My brain saw the lower case ‘g’ and used that to override the upper case ‘B’!

While great, it's still stuck in the canyon where you cannot simultaneously work on the same "frame" with both GPUs without latency/lag inducing queuing of multiple "frames" of input.

Probably need ~50% of local memory bandwidth as link bandwidth between dies before that can be a reality (at massive power cost at the very least, not to mention the die area cost).

Anything less than that and you are still left with AFR 2 with 2+ frames of input latency.

Deleted member 2197 · Aug 18, 2018

Assuming cards go pre-order on Monday, how long before we see the first reviews? I would imagine one of the things tested will be memory with the cards in NVLInk mode.

Nvidia Turing Speculation thread [2018]

ShaidarHaran

hardware monkey

McHuj

lanek

SpaceBeer

Clukos

Bloodborne 2 when?

ImSpartacus

DavidGraham

Rootax

Deleted member 2197

Guest

ShaidarHaran

hardware monkey

ShaidarHaran

hardware monkey

Communism

Ike Turner

ShaidarHaran

hardware monkey

CSI PC

silent_guy

Communism

silent_guy

Communism

Deleted member 2197

Guest

Similar threads