DegustatoR
Legend
"Scalar datapath" though?
Isn't FP32 from A100 TCs a non-IEEE one?
This rate is maintained on all TC precision modes though which means that it's not coming from FP32 SIMDs, no?
I see two possibilities for gaming Ampere here:
1. Double width FP32 SIMDs which will likely lead to a double width of INT32 SIMD as well. They've done this previously between GP100 and GP10x.
2. A second 16-wide FP32 SIMD in place of the FP64 one of GA100. But for that to work well they'll need to be able to schedule FP32+FP32+INT32 or it will be either FP32+FP32 or FP32+INT32 per clock which will result in utilization issues.
This rate is maintained on all TC precision modes though which means that it's not coming from FP32 SIMDs, no?
I see two possibilities for gaming Ampere here:
1. Double width FP32 SIMDs which will likely lead to a double width of INT32 SIMD as well. They've done this previously between GP100 and GP10x.
2. A second 16-wide FP32 SIMD in place of the FP64 one of GA100. But for that to work well they'll need to be able to schedule FP32+FP32+INT32 or it will be either FP32+FP32 or FP32+INT32 per clock which will result in utilization issues.
Hey, that's I think the sane idea.Maybe a fast path through the tensors for RT calcs or something
On the contrary, doubling the width won't require any changes to dispatch. Adding a second one of the same width will though - if GA100 can't schedule FP32+INT+FP64, which seems an unlikely scenario.Doubling SIMD width would require a second dispatch unit otherwise you end up with the same utilization problem as Kepler.
Doubling SIMD width would require a second dispatch unit otherwise you end up with the same utilization problem as Kepler.
I’m hedging that the 2xFP32 is not general purpose. Maybe a fast path through the tensors for RT calcs or something. It’s mind boggling to think Ampere will push 30+ tflops of general compute.
Why though? Maxwell and Pascal were 32 wide and it always seemed excessive for gaming Turing to be 16 wide. If they'll go back to 32 wide for general math then you'll get something around 40 tflops from GA102 - which could be a good thing for non-gaming applications for the latter too.It’s mind boggling to think Ampere will push 30+ tflops of general compute.
On the contrary, doubling the width won't require any changes to dispatch. Adding a second one of the same width will though - if GA100 can't schedule FP32+INT+FP64, which seems an unlikely scenario.
Why though? Maxwell and Pascal were 32 wide and it always seemed excessive for gaming Turing to be 16 wide. If they'll go back to 32 wide for general math then you'll get something around 40 tflops from GA102 - which could be a good thing for non-gaming applications for the latter too.
Yeah, that's true, haven't thought of this. Well, guess we'll see soon.The dispatcher issues 32 threads per clock. That’s enough to feed one 32-wide pipe.
GA100 can schedule FP32+INT32 concurrently because each pipe is only 16-wide. Issuing to any other execution unit (FP64, SFU, Load/Store) will cause bubbles in the main FP and INT pipelines.
Isn't that the same utilization problem A100 should have with 4xFP16? But somehow they implemented it at least for some corner cases.
I still fondly remember how @aaronspink was dubious that GDDR would go beyond 6Gbps:Well, I will be completely floored and made a fool of. A completely non standard memory standard with a "surprise motherf*cker!" announcement. What a damned weird thing to do.
Especially as a 384bit bus with 18gbps GDDR6 could get the high end 3090 enough bandwidth by itself, at least going by the leaked performance. But hells maybe this means they're doing the weird cut down bus thing again. RTX Titan 2 with 24gb ram, 3090 with 10/20gb? Or 11/22???
Will be interesting to see the announcement now. And I wonder if GDDR6X yield is low enough, or it's expensive enough versus normal, that the surprise announcement somehow makes sense.
I honestly don't see GDDR5 getting much beyond 6 GT/s without branching into differential data variants.
Wow, they "lost" the comment section? That's... interesting.nVidia responded on their dev blog but the comments are not visible anymore. CarstenS has copied the response: https://forum.beyond3d.com/posts/2128606/
Yep, I'm thinking it's something like this, too.Wow, they "lost" the comment section? That's... interesting.
BTW: Chip on the back? Zotac (and someone before them) had those on the back of the PCB opposite of the GPU - but it was not another GPU, but a super-cap:
https://www.zotac.com/download/file...ery/graphics_cards/zt-t20820b-10p_image04.jpg
It's still supposedly Colorful's Vulcan, not reference, in the leaks.Was on super expensive editions of cards earlier though. Maybe that's what's inflating the BOM among other such as those rumored high speed enabling PCBs..
It is but overall BOM is still crazy.It's still supposedly Colorful's Vulcan, not reference, in the leaks.