AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Status
Not open for further replies.
Is a big navi (64cu or more) really a coming thing ? I mean, they are already at 225w with the 40cu version and 7nm, I don't get how they can do a bigger chip not consuming around 300w...

The price are what I predicted, AMD need to make money. nVidia will have to move the 2060 and 2070 a bit, but it's still not a threat.

Meh, waiting for the "true" or "full" next gen now. Navi 10 seems, like always, 1 or 2 years late...
 
Big edit - Going over this again is slightly confusing. The block diagram clearly shows 2 shader engines. Each Shader Engine has 10 "Dual compute units".

16-1080.9ce6ffcb.jpg


Now are these dual compute units the mixed wavefront thing they're doing? So 1 "Dual Compute Unit" = 2 32 thread (stream processor) or 1 64 SP unit. If so then the 5700 would need two of the block diagrams shown to match the given numbers.

Or these dual compute units are 2 64 SP units, and the block diagram represents a complete 5700. I'm honestly not sure which one it is, the terminology doesn't match up here.
Infinity fabric ?
 
Welcome to the end of CMOS as we knew them!
Process gains are now harder to come by and even harder to realize, particularly on larger dies.

251 mm2 is a large die ?

what´s interesting detail that one CU still consist of 64SP, leaving 4 × 16 SIMD vector units arrangement in favor of new 2 × 32 SIMDs which helps better shader utilization.

it seems that 2CU together share local data cache and shader instruction cache and scalar data cache. ( dual compute unit )
 
Given that phone SoCs and CCDs are <80mm^2, Y E S.
And phone SoCs also draw way less power which helps a lot at these nodes.

A positive observation comes from another slide though where they claim 2.3 times the performance per unit area, so it’s not all doom and gloom on the process front. Just, well, mostly.
 
Fascinating that work can be issued in 32-work item hardware threads. I was expecting 64 and 128

nVidia does 32-wide, and approximately all privately written GPGPU code was optimized for that. AMD needs to match it to minimize expense of translating code between the archs for it to have any hope of ever properly competing in the space.
 
I mean, N5/3 also offer plenty of area, just even more marginal and harder to realize perf/power uplifts.
I don’t really know enough about the properties of the 3nm options (I only know that AMDs chief of server products said that they’ll use it but beyond that the crystal balls go really murky).
What you say rings true for 5nm. But since it seems to offer potentially significant gain in density, products like GPUs can improve their performance per Watt by compromising frequency and dialing back power draw and still gain in absolute performance as well as in performance/W and /mm2. The process gains just won’t be as large.

(Consumer CPUs though, where sea-of-cores/flock-of-chicken approaches aren’t terribly efficient, probably won’t see a lot of improvement.)
 
But since it seems to offer potentially significant gain in density, products like GPUs can improve their performance per Watt by compromising frequency and dialing back power draw and still gain in absolute performance as well as in performance/W and /mm2
Cost per mm^2 yielded is also going up.
Is client dGPU big enough of a market to amortize several sizeable N5 dies across the product stack?
 
Navi 10 has a similar amount of transistor cores and performance compared to TU106 but its not faster. It seems that raytracing- and tensor-cores dont need much space.
 
Navi 10 has a similar amount of transistor cores and performance compared to TU106 but its not faster. It seems that raytracing- and tensor-cores dont need much space.

That's a bit of stretch, isn't it? Before making that sort of comparison you would surely need to see how the transistors themselves are organised? Nvidia might simply be able to achieve the same performance with less non-RT and tensor transistors?
 
Navi 10 has a similar amount of transistor cores and performance compared to TU106 but its not faster. It seems that raytracing- and tensor-cores dont need much space.
I don't think you can really compare the transistor usage that closely.
 
Navi 10 has a similar amount of transistor cores and performance compared to TU106 but its not faster. It seems that raytracing- and tensor-cores dont need much space.
Pascal had 7.2B transistors for 2560 SP. Turing has 10.8B for 2304 SP. That’s 66% more transistors per SP. Obviously there are other arch changes, but RT and tensor cores have a huge footprint collectively, clearly.
 
Yea, this is my first impression. GCN finally got significant changes done to its architecture. Some of those should have been in Fiji or Vega already.
To be fair, some of it was in Vega, just disabled for various reasons. I'm curious how close were those features to working, and if a silicon respin or revision would have been enough to enable the functionality.
Pascal had 7.2B transistors for 2560 SP. Turing has 10.8B for 2304 SP. That’s 66% more transistors per SP. Obviously there are other arch changes, but RT and tensor cores have a huge footprint collectively, clearly.
Pascal or Volta? Both have to be considered, especially since quite a bit of Turing seems to be Volta derived.
 
To be fair, some of it was in Vega, just disabled for various reasons. I'm curious how close were those features to working, and if a silicon respin or revision would have been enough to enable the functionality.

Pascal or Volta? Both have to be considered, especially since quite a bit of Turing seems to be Volta derived.
Volta is actually higher than Turing. 21.1B for 5120 SP.
 
Ok guys tell me if I am crazy here: Is Navi using a Chiplet setup ?

COMPUTEX_KEYNOTE_DRAFT_FOR_PREBRIEF.26.05.19-page-0122.jpg

Navi uses infinity fabric:
https://pics.computerbase.de/8/8/1/1/7/16-1080.9ce6ffcb.jpg

Ryzen 3000 chiplets:
http://www.comptoir-hardware.com/images/stories/_cpu/7nm_amd/ryzen3000-package.jpg

Hein? I only see a monolithic die there?

Are you mistaking a Shader Engine for a chiplet? They are sort of AMD's version of an Nvidia GPC, broadly speaking (except that geometry is outside of it on AMD).
 
Last edited:
Status
Not open for further replies.
Back
Top