Nvidia Ampere Discussion [2020-05-14]

Yes, but does it matter? Transistors are cheap, power consumption isnt. Why not fill the whole die with fp32 units?
Because you could fill it with FP64 units, esp. if you already need full 32 Bit for the multiplier if it's supposed to double as INT32. ;)
 
2080Ti actual boost flops are closer to 16.5 TFs.
3070 is said to be "faster" than 2080Ti, not "like" it.
So yeah flops may well end up being equal and the seemingly lower utilization may be a result of bandwidth or some other limitations coming into play on older s/w.
Will 20 tflops RDNA2 card "trounce" the 20 tflops 3070? Possibly, sometimes. Universally? Doubtful. And I'm not even accounting for DLSS here.
Note that I expect Navi 21 to be higher than 20 tflops in actual shipping products. This one will likely be universally faster than 3070, of course.
Flops and wattages to compare must be what Nvidia advertise, not other figures. If you say 16,5 for 2080ti you could also say 25,5 for 3070, or whatever.
 
Seems NVIDIA caught wind of the Xbox Series X methodology of calculating RT performance, Microsoft said that the RT acceleration of Series X is equivalent to 13TF of compute, for a total of 25TF of compute across both the shaders and RT cores.

Jensen took the hint and declared that according to that Xbox Series X methodology, the 2080 RT cores have the equivalent of 34TF of compute, in addition to another 11TF of compute, for a total of 45TF while ray tracing, which is 80% faster than Series X.

For Ampere, the 3080 alone delivers the equivalent of 58TF from the RT cores, not taking into account the other 30TF of regular compute, which amounts to a crazy 88TF while ray tracing.
Without them telling how they achieve that number, it's quite bold talk.
MS's "13 TF of compute" is really up to 380G ray-box and 95G ray-triangle per second peak. According to someone who at least I've learned to trust to know what he's talking about, 2080 Ti would reach around 444G ray-box intersections peak (68*4*1.635GHz), ray-triangle is little fuzzier but his assumption was that it would be around 1/4 of ray-box like XSX.
Assuming his 2080 Ti numbers aren't way off, there's no way 2080 RT cores are anywhere near "equivalent of 34TF of compute" using the same methods MS did for their numbers
 
Flops and wattages to compare must be what Nvidia advertise, not other figures. If you say 16,5 for 2080ti you could also say 25,5 for 3070, or whatever.
Depends on what you compare to what and on what clocks Ampere and 3070 specifically will actually run. Again, we need to know the technical details.
 
It will be cross architecture in terms of a comparison, but I have heard from different types of sources that the RT performance differential referenced by Nvidia is going to be indicative of what we see in real world games on average. Hopefully there is an early generation game that offers RT performance on XSX that we can test in comparison to PC. Preferably on and off, and preferably with the same quality settings.

Perhaps we will know more as soon as the first GDC presentations about next gen RT pop up next year.
 
I dunno bro. You think its really bigger than stg 2000 to Riva 128 ?

1080Ti versus 980Ti is about 70% in these tests, though I think it's fair to say the variance is stronger - principally because 980Ti is short of memory at 4K.
Call me skeptical but I'm doubtful tensor cores can offer the same decompression performance as dedicated, fixed function units. Not to mention varying performance across the GPU lineup (How will GA106/107 do especially)
I presume that LOD will not help with decompression, since textures would be stored once for all MIPs in a block that needs to be decompressed, no matter the MIP level required to show on screen.

The reduced memory of the cheaper cards will also hurt, in terms of "scratchpad space while decompressing", but that effect should be reduced with the right kind of pipelining. But decompression workloads do tend to be bursty in nature, even at the most finely-grained level.

I think it's going to be a couple of years before we see AAA games making heavy usage of Direct Storage. So those GA106s will be dead anyway.
 
2080 Ti would reach around 444G ray-box intersections peak (68*4*1.635GHz), ray-triangle is little fuzzier but his assumption was that it would be around 1/4 of ray-box like XSX.
I don't believe this methodology is accurate at all. NVIDIA didn't state the output of each RT core.

Without them telling how they achieve that number, it's quite bold talk.
They know how Microsoft derived their numbers given the collaboration between NVIDIA and Microsoft.
 
I don't believe this methodology is accurate at all. NVIDIA didn't state the output of each RT core.

They know how Microsoft derived their numbers given the collaboration between NVIDIA and Microsoft.
Well, it is coming from someone who should know what he's talking about, but it's of course 2nd (or rather 3rd) hand information.

NVIDIA knows for sure, but without telling the details of how they came up with their numbers it's still marketing speech and may or may not be misleading, and I for one don't believe for a second without proof that even 2080 is over 260 % the speed of XSX in RT
 
Without them telling how they achieve that number, it's quite bold talk.
MS's "13 TF of compute" is really up to 380G ray-box and 95G ray-triangle per second peak. According to someone who at least I've learned to trust to know what he's talking about, 2080 Ti would reach around 444G ray-box intersections peak (68*4*1.635GHz), ray-triangle is little fuzzier but his assumption was that it would be around 1/4 of ray-box like XSX.
Assuming his 2080 Ti numbers aren't way off, there's no way 2080 RT cores are anywhere near "equivalent of 34TF of compute" using the same methods MS did for their numbers
That's for Ray Intersection, what about BVH Traversal?
 
Last edited:
If the NVENC encoder hasn't been improved I may hunt for a 2080TI in the future. I'm planning on upgrading from a 1080p monitor to a 1440p monitor, as 4K is (i think) a waste of resources at 40cm from the monitor. The 3080 looks nice but holy moly at those power requirements, and I feel the 8GB of the 3070 may be a disadvantage in the near future.
 
People should do more screenshots instead of just links to volatile web ressources. ;) It's gone already.

Was a 3070Ti with 16GB's of vram. Didn't catch any price, if there was any? Got taken away fast that page. I think NV wants to announce their products themselfs.

But, will the 3090 run Crysis remastered? :p
 
If the NVENC encoder hasn't been improved I may hunt for a 2080TI in the future.
Encoder stayed the same, says Nvidia, only Decoder was enhanced (for example AV1).
The 3080 looks nice but holy moly at those power requirements, [...]
Read the fine print: "2 - Recommendation is made based on PC configured with an Intel Core i9-10900K processor. A lower power rating may work depending on system configuration." Maybe it helps if your processors does not guzzle 250-ish watts alone. :)
 
Last edited:
Encoder stayed the same, says Nvidia, only Decoder was enhanced (for example AV1).

Read the fine print: "2 - Recommendation is made based on PC configured with an Intel Core i9-10900K processor. A lower power rating may work depending on system configuration." Maybe it helps if your processors does not guzzle 250-ish watts alone. :)
It rarely exceeds 100W in games. I'm not sure why we're making a big deal about Cinebench consumption. But I'm curious about the potential PCIe 3.0 bottleneck.

Edit: looks like the upper bound is ~150W with average consumption ~120W when juiced up to 5.2 GHz @ 1.4v. A little tweaking and you could get it closer to 100W. Not bad for top tier gaming perf in my book.

 
Last edited:

1080Ti versus 980Ti is about 70% in these tests, though I think it's fair to say the variance is stronger - principally because 980Ti is short of memory at 4K.

I presume that LOD will not help with decompression, since textures would be stored once for all MIPs in a block that needs to be decompressed, no matter the MIP level required to show on screen.

The reduced memory of the cheaper cards will also hurt, in terms of "scratchpad space while decompressing", but that effect should be reduced with the right kind of pipelining. But decompression workloads do tend to be bursty in nature, even at the most finely-grained level.

I think it's going to be a couple of years before we see AAA games making heavy usage of Direct Storage. So those GA106s will be dead anyway.

Yea the stg 2000 to the riva 128 was a game changer. The performance upgrade was drastic. Its hard to find benchmarks from back then however
 
Back
Top