Nvidia Ampere Discussion [2020-05-14]

Bondrewd · Jul 7, 2020

Voxilla said:
The relative high transistor density of the A100, might possibly be a result from switching to the latest N7+ process.

Nah, it's due to all the MAC piles required for GEMM going brrrrr at alarming speeds.
See any other ML part; they all hit very nice density figures for their respective nodes.

Voxilla said:
Edit: I'm probably wrong about this as N7+ seems to be high density and not high performance.

Nah N7+ has all the same options as vanilla N7.

trinibwoy · Jul 7, 2020

Bondrewd said:
Nah, it's due to all the MAC piles required for GEMM going brrrrr at alarming speeds.

Isn’t cache also relatively dense? A100 has a load of that too.

Bondrewd · Jul 7, 2020

trinibwoy said:
Isn’t cache also relatively dense? A100 has a load of that too.

Kinda, depends on the implementation.
Zen2 CCD is mostly SRAM, but is nowhere near the peak theoretical density (or what mobile SoCs achieve for that matter).

Voxilla · Jul 8, 2020

Bondrewd said:
Nah, it's due to all the MAC piles required for GEMM going brrrrr at alarming speeds.
See any other ML part; they all hit very nice density figures for their respective nodes.

Tensor cores take only about 10% of the die in Turing, even if that would be 20% for Ampere, that can not explain the high transistor density.

Bondrewd said:
Nah N7+ has all the same options as vanilla N7.

If that is the case, N7+ is a more plausible explanation for the high density compared to vanilla N7.

Bondrewd · Jul 8, 2020

Voxilla said:
Tensor cores take only about 10% of the die in Turing

They're xtor-dense MAC piles.

Voxilla said:
If that is the case, N7+ is a more plausible explanation for the high density compared to vanilla N7.

Nah, see RDNA2 stuffz and Zen3 and Kirin 990 5G.

Voxilla · Jul 16, 2020

Just read about this new Graphcore AI processor, eclipsing the A100, with 59,4 billion transistors on 823 mm2.
That is a transistor density of 72 MT/mm2.
As this is more than the max of 66,7 MT/mm2 for vanilla 7nm, this also must be using N7+.
The high density is a result of the 900 MB on chip memory this Graphcore has.
1 bit of SRAM requires 6 transistors, so the 900 MB translates to 43,2 B transistors...

Kaotik · Jul 16, 2020

Voxilla said:
Just read about this new Graphcore AI processor, eclipsing the A100, with 59,4 billion transistors on 823 mm2.
That is a transistor density of 72 MT/mm2.
As this is more than the max of 66,7 MT/mm2 for vanilla 7nm, this also must be using N7+.
The high density is a result of the 900 MB on chip memory this Graphcore has.
1 bit of SRAM requires 6 transistors, so the 900 MB translates to 43,2 B transistors...

It doesn't say it's using HP cells, so doesn't need to be N7+ necessarily.

Voxilla · Jul 16, 2020

Kaotik said:
It doesn't say it's using HP cells, so doesn't need to be N7+ necessarily.

If it would use HD at 91.2 MT/mm2 (which is only suitable for mobile SoCs AFAIK), the 900 MB would fit in 473mm2, leaving the remaining 16.2 B non SRAM transistors on 350 mm2, for a density of 46.3 MT/mm2. I don't think so.
And in case you would further doubt, SRAM is the densest way you can pack transistors.

Kaotik · Jul 16, 2020

Voxilla said:
If it would use HD at 91.2 MT/mm2 (which is only suitable for mobile SoCs AFAIK), the 900 MB would fit in 473mm2, leaving the remaining 16.2 B non SRAM transistors on 350 mm2, for a density of 46.3 MT/mm2. I don't think so.
And in case you would further doubt, SRAM is the densest way you can pack transistors.

Or then TSMCs estimates on transistor density aren't counted on just SRAM cells, but assume certain proportion of memory vs logic vs phys and whatnot.
For what it's worth, Hexus says it's N7

https://hexus.net/tech/news/cpu/144154-graphcore-ipu-machine-m2000-1u-blade-capable-1petaflop/

Voxilla · Jul 16, 2020

Kaotik said:
Or then TSMCs estimates on transistor density aren't counted on just SRAM cells, but assume certain proportion of memory vs logic vs phys and whatnot.
For what it's worth, Hexus says it's N7

https://hexus.net/tech/news/cpu/144154-graphcore-ipu-machine-m2000-1u-blade-capable-1petaflop/

No that would be ridiculous. The first thing tried on a new process is always some SRAM and based on that density is specified.
Graphcore says: "...Graphcore Colossus™ Mk2 GC200 IPU. Developed using TSMC’s latest 7nm process"
Though they don't tell anywhere it is N7+, it doesn't need a genius to figure that out.
If it proves not to be N7+, I'll buy you a beer, or maybe something stronger

Deleted member 13524 · Jul 16, 2020

900MB of SRAM.
Wow.. imagine having that as framebuffer.

Kaotik · Jul 16, 2020

Voxilla said:
No that would be rediculous. The first thing tried on a new process is always some SRAM and based on that density is specified.
Graphcore says: "...Graphcore Colossus™ Mk2 GC200 IPU. Developed using TSMC’s latest 7nm process"
Though they don't tell anywhere it is N7+, it doens't need to be a genious to figure that out.
If it proves not to be N7+, I'll buy you a beer, or maybe something stronger

I'm not an expert on these on any level, but let's see
https://en.wikichip.org/wiki/7_nm_lithography_process#Industry
N7 with HD cells is supposed to offer around 91-92 MTrans/mm^2. We also know that with HD cells one SRAM cell is 0.027 µm^2, which fits into 1 mm^2 37 million times.
Since 1 SRAM cell is actually 6 transistors, that 37 million times turns into 222 million transistors.
So if the density was reported on just SRAM cells, it would be 222 MTrans/mm^2, not 91-92 MTrans/mm^2 for the HD cells, right?
(For what it's worth, wikichip did feel the need to point out that N7 can offer really dense SRAM cells)

Voxilla · Jul 16, 2020

Kaotik said:
I'm not an expert on these on any level, but let's see
https://en.wikichip.org/wiki/7_nm_lithography_process#Industry
N7 with HD cells is supposed to offer around 91-92 MTrans/mm^2. We also know that with HD cells one SRAM cell is 0.027 µm^2, which fits into 1 mm^2 37 million times.
Since 1 SRAM cell is actually 6 transistors, that 37 million times turns into 222 million transistors.
So if the density was reported on just SRAM cells, it would be 222 MTrans/mm^2, not 91-92 MTrans/mm^2 for the HD cells, right?
(For what it's worth, wikichip did feel the need to point out that N7 can offer really dense SRAM cells)

Apparently you can't just compute density from SRAM cell size.
Like in this table a 14 nm SRAM cell is 0.05um2, which would be 6x20 MT/mm2, but as you see in the table it is actually only 6x13.7 MT/mm2.
In practice it must be even less, as for the 7nm Samsung chip with 0.026um2 cell size on that same page, the 256 Mbit SRAM is 69.3mm2, which equates to a mere 22 MT/mm2. If anybody has access to this paper, there might be more information there.

Megadrive1988 · Jul 16, 2020

ToTTenTranz said:
900MB of SRAM.
Wow.. imagine having that as framebuffer.

Holy shit that's a lot!

A1xLLcqAgt0qc2RyMz0y · Jul 20, 2020

This is the "Nvidia Ampere Discussion" thread.

Can the mod please remove the off topic posts.

Deleted member 2197 · Jul 24, 2020

"A100 has now become the fastest GPU ever recorded on #OctaneBench: 446 OB4* #Ampere appears to be ~43% faster than #Turing in #OctaneRender - even w/ #RTX off!"

https://twitter.com/x/status/1286448029600374784

techuse · Jul 24, 2020

pharma said:
"A100 has now become the fastest GPU ever recorded on #OctaneBench: 446 OB4* #Ampere appears to be ~43% faster than #Turing in #OctaneRender - even w/ #RTX off!"

https://twitter.com/x/status/1286448029600374784

Is Cuda 11 also used for the Turing results? Nvidia has a history of not using newer versions of Cuda for older GPUs because it zaps some of the performance deficit.

PSman1700 · Jul 24, 2020

pharma said:
"A100 has now become the fastest GPU ever recorded on #OctaneBench: 446 OB4* #Ampere appears to be ~43% faster than #Turing in #OctaneRender - even w/ #RTX off!"

https://twitter.com/x/status/1286448029600374784

Yes 50% faster what ive been reading on many places now. Quite good i think.

Deleted member 2197 · Jul 24, 2020

The OctaneBench results are comparing A100 (no RT cores available ) vs the best Turing result ~~(with RT cores)~~.

dorf · Jul 24, 2020

pharma said:
The OctaneBench results are comparing A100 (no RT cores available ) vs the best Turing result (with RT cores).

Does this benchmark even use RT cores?

Looking at Octanebench results page, the 1080Ti is tied with the 2080 and the Titan V is easily beating the Titan RTX.

https://render.otoy.com/octanebench/results.php?v=4.00&sort_by=avg&filter=&singleGPU=1

Nvidia Ampere Discussion [2020-05-14]

Bondrewd

trinibwoy

Meh

Bondrewd

Voxilla

Bondrewd

Voxilla

Kaotik

Drunk Member

Voxilla

Kaotik

Drunk Member

Voxilla

Deleted member 13524

Guest

Kaotik

Drunk Member

Voxilla

Megadrive1988

A1xLLcqAgt0qc2RyMz0y

Deleted member 2197

Guest

techuse

PSman1700

Deleted member 2197

Guest

dorf

Similar threads