AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Status
Not open for further replies.
Not the same. The same gaming performance. They're sacrificing something here, and it's compute throughput.



How can "faster than 2070" at 200W be 40% behind nvidia in efficiency, if there's a 12.5% difference in power consumption?

Are you perhaps comparing real 7nm GPUs from AMD with imaginary 7nm GPUs from nvidia?

Where have you seen the 200W number?
 
Not the same. The same gaming performance. They're sacrificing something here, and it's compute throughput.



How can "faster than 2070" at 200W be 40% behind nvidia in efficiency, if there's a 12.5% difference in power consumption?

Are you perhaps comparing real 7nm GPUs from AMD with imaginary 7nm GPUs from nvidia?
Well, if there are two companies that can get at least the nominal improvements from node changes, and usually more, are Apple and Nvidia. Is quite logical the imaginary product that Nvidia launches at 7nm 175 watts will be way faster than a 2080, surely nearer to a 2080ti per the last node changes. This plus RX cores...They got efficiency gains from the 14 to 12 nm transition!
 
Where have you seen the 200W number?
200W is just Vega 64's 300W with 50% efficiency uplift, so 300/1.5 = 200W.

RX 5700 should have a performance close to Vega 64's since AMD themselves compared it to a RTX 2070.

Looking at the framerates they got during the presentation (80 to 120 fps), I'm guessing they benchmarked Strange Brigade at 2560*1440 Ultra:

vVQVx0B.png



So its performance is similar to a Vega 64, perhaps closer to a Liquid Edition.


Well, if there are two companies that can get at least the nominal improvements from node changes, and usually more, are Apple and Nvidia. Is quite logical the imaginary product that Nvidia launches at 7nm 175 watts will be way faster than a 2080, surely nearer to a 2080ti per the last node changes. This plus RX cores...They got efficiency gains from the 14 to 12 nm transition!
Is nvidia going to release 7nm GPUs at all?
Last rumor I heard said they were going to skip 7nm and go with Samsung's 7nm EUV in the middle of 2020.
And then by mid 2020 we might not even be looking at Navi from AMD's side. Which is why we don't usually compare real GPUs with what-if imaginary ones.
 
200W is just Vega 64's 300W with 50% efficiency uplift, so 300/1.5 = 200W.

RX 5700 should have a performance close to Vega 64's since AMD themselves compared it to a RTX 2070.

Looking at the framerates they got during the presentation (80 to 120 fps), I'm guessing they benchmarked Strange Brigade at 2560*1440 Ultra:

vVQVx0B.png



So its performance is similar to a Vega 64, perhaps closer to a Liquid Edition.



Is nvidia going to release 7nm GPUs at all?
Last rumor I heard said they were going to skip 7nm and go with Samsung's 7nm EUV in the middle of 2020.
And then by mid 2020 we might not even be looking at Navi from AMD's side. Which is why we don't usually compare real GPUs with what-if imaginary ones.

Thanks, it makes I sense.

However, I don't think we can conclude that Navi is either 40% or 12.5% behind nVIDIA in efficiency, just that it indeed is behind, as GTX2070 is 12nm and Navi 7nm. We have no idea how much of that increase in efficiency comes from process or architecture. Granted that AMD says it has 25% better IPC, but AFAIK that is only applicable to CUs, not the front end? I guess we'll see soon anyway.
 
If you look at evolution, then you call it a different species rather near the branch, and not after it doesn't look like it's sibling anymore.
I see no problem with naming the new branch RDNA. It's going to evolve into something rather different I pressume. If you are forced to drop RT-hardware into it, why would you waste the RT-silicon on a wide vector-processor (massive coherent parallel compute line). Afterwards you play with non-coherent memory-access and maybe smaller vector-widths, and you're fully in RDNA and rather different than GCN which get's divergent optimizations too.
If you embrace the force fully, you go full lego, resulting in an entire clade of possible configurations.

edit: Ok, now I got the moniker RDNA.
 
Dr Su stressed that Navi is a purposeful designed gaming GPU and they started with a clean slate, like they did with Ryzen.

"AMD loves Gamers..."
DR Su is a Gamer herself and has been repeating that same mantra for over a year now, specially at gaming events. She just mentioned that Navi is purposely engineered for gaming and Navi will be used in consoles/etc, & that RadeonDNA architecture was designed to scale.

Navi = RDNA = Gaming



If @ 275mm2 Navi has RTX2070 prowess, then what does a 325mm2 Navi chip have in store for us..? (And what is the size of the TU104 again??)
 
On reddit, a user superimposed the pic on the leaked PCB and they're getting a 252mm2 die size.


If AMD have doubled the frontend, a bigger navi shouldn't be much bigger than 400mm2 with doubled shader count. Or maybe it triples it at the limit for current GCN chips?

Did AMD mention settings for Strange Brigade benchmarks because nvidia released a driver in April that improved Turing performance on Vulkan

https://www.nvidia.com/en-us/geforce/news/mortal-kombat-11-game-ready-driver/
 
Did AMD mention settings for Strange Brigade benchmarks because nvidia released a driver in April that improved Turing performance on Vulkan
No.
The thing is also an odd choice to demo the uArch that's not about throwing SPs at the problem for once.
 
Typically the reduction in ALU throughput is more severe than in other aspects (fillrate, bandwidth) for the salvage GPU...
Does it need to be "balanced"...
For GPUs in this size range the salvage bin would usually lose one CU per engine and going by things like the R480/470 and Vega64/60 the percentage isn't as problematic.
As for whether balanced disabling is strictly necessary, I am not sure. It seems like it's virtually done this way, even though I'm sure there are many GPUs with 4 CUs disabled that had fewer than 4 defective.
I could see arguments for streamlining the number of salvage SKUs and differentiating the products more clearly, and for load-balancing reasons where having one SE significantly unbalanced with the others might cause complications.

Now imagine how a software renderer would do this. e.g. threads spawned across arbitrary processors to match workload. With data locality being a parameter of the spawn algorithm. And cost functions tuned to the algorithms.
The choice is in how those threads understand which part of the workload applies to them. If there are multiple back-end threads, are those threads or sibling threads on the same core going to perform the front-end work, or is there going to be a subset of threads/cores devoted to front end work that will produce coverage and culling work that the other threads will consume. Defining the number of producers and consumers, and then figuring out how the results are communicated strikes me as being a more fundamental consideration than the specifics of an API.

The problem with hardware is partly that the data pathways have to be defined in advance for all use cases and have to be sized for a specified peak workload. So the data structures are fixed, the buffer sizes are fixed and the ratios of maximum throughput are fixed.
If there's a crossbar between front ends, any shader engine could produce output relevant to any other engine and be the consumer of output from any other. The lack of guarantee about where any given primitive may be and the way that screen-space is tiled give the problem those parameters.
Losing the front-end crossbar places the onus on a heavier and invariant front-end load on all shader engines, and the reliance on the memory subsystem and its crossbar. The memory crossbar has its own burdens with regards to clients and their needs, and its considerations for connectivity and capacity are made heavier. At least in that ASCII rumor table, I'm wary of the individual shader engines not having the traditional amount of grunt to get through the more predictable but heavier base demand.

This would be similar to how unified shader architectures took over. To do that, substantial transistor budget was spent, but the rewards in performance were unignorable. Despite the fact that the hardware was no longer optimised specifically for the API of that time. Remember, for example, how vertex ALUs and pixel ALUs were different in their capability (ISA) before unified GPUs took over? (Though the API itself was changing to harmonise VS and PS instructions.)
The API was mostly agnostic to the details of how its instructions were carried out. The biggest hardware difference was the disparate precision requirements between pixel and vertex shaders, which in modern times has returned with the advent of FP16 pixel processing that the APIs did not drive.


Is anyone trying to measure that Navi's size?
I'm actually betting on Vega 14nm parts as comparison.

If they were comparing to Radeon VII, then the +50% efficiency comparison would be much closer to the +25% IPC.

My guess is the architectural improvements are coming at very low power cost, so they're getting:
- x1.25 more performance out of new cache hierarchy and CU adjustments
- x1.2 higher clocks out of 14nm -> 7nm transition (which is the clock difference between a 1.45GHz Vega 64 and a 1.75GHz Radeon VII)
The GDDR6 subsystem would have a greater area and power impact, which may obscure some of the benefits. Depending on how AMD handles the memory channels and infinity fabric, there could be an area and power cost commensurate to the bandwidth and number of fabric stops.
A 256-bit GDDR6 system has 8 chips, albeit with 2 16-bit channels each. A controller block could handle multiple channels, but the physical locality would be worse for the same amount of bandwidth compared to Vega. Vega 10 has its fabric and controllers on one side, while a GDDR6 system likely has blocks around 3 or 4 sides, and if they have Vega's way of connecting clients with a strip of infinity fabric, the area penalty might be closer to how Vega 20 has a significant amount of area around the GPU belonging to uncore elements like the fabric.

Best guess from Andrei Frumusanu based on some other photos we have: 275mm2, +/- 5mm2.
For power efficiency, it would seem to be against a 14nm Vega 10 product. Lisa's specific words on the subject:

"And then, when you put that together, both the architecture – the design capability – as well as the process technology, we're seeing 1.5x or higher performance per watt capability on the new Navi products" (emphasis mine)
That puts Navi into a more realistic context. It's a necessary upwards step that I think will require that AMD keep closer to its promised cadence with Next-gen in order to provide a sufficiently compelling product pipeline going forward.

For all the times she's gone on stage, you'd think we'd just have the physical measurements for Lisa's hand for easier size analyses. ;)
Given how many fans seem to imagine a level of familiarity with AMD's CEO, you'd think there's even money on at least one of them already having 3D-printed replicas of it--for chip photo comparison reasons they'd say.

They might also be contractually obligated to keep Navi details under wraps until one or two console makers spill the beans on their next-gen's specs, during or around E3.

Just like they didn't disclose Vega 12's details until apple announced their new macbooks with Vega Pro 16/20, and didn't disclose the TrueAudio DSP in March 2013's Bonaire until the PS4 was announced in November of the same year.
If I recall correctly, AMD didn't disclose Bonaire was a next-gen Sea Islands GPU until after the consoles were announced.

what does this info mean to NAVI ?
Going from the differences:
"// First/last AMDGCN-based processors.
EF_AMDGPU_MACH_AMDGCN_FIRST = EF_AMDGPU_MACH_AMDGCN_GFX600,
- EF_AMDGPU_MACH_AMDGCN_LAST = EF_AMDGPU_MACH_AMDGCN_GFX909,
+ EF_AMDGPU_MACH_AMDGCN_LAST = EF_AMDGPU_MACH_AMDGCN_GFX1010,"

GFX6 was the first GCN architecture, and whatever the most recent GCN chip is gets put in the _LAST line.
It seems like the compiler developers do not consider it separate from its predecessors. At least from the standpoint of instruction generation, it has new features but still does the same general things.
 
If you look at evolution, then you call it a different species rather near the branch, and not after it doesn't look like it's sibling anymore.
I see no problem with naming the new branch RDNA. It's going to evolve into something rather different I pressume. If you are forced to drop RT-hardware into it, why would you waste the RT-silicon on a wide vector-processor (massive coherent parallel compute line). Afterwards you play with non-coherent memory-access and maybe smaller vector-widths, and you're fully in RDNA and rather different than GCN which get's divergent optimizations too.
If you embrace the force fully, you go full lego, resulting in an entire clade of possible configurations.

edit: Ok, now I got the moniker RDNA.
I like this analogy because that's the way that we do things in the real world. AMD says they rebuilt the whole thing from the ground up, that's fine, they absolutely can and it can still end up looking very similar to GCN. Perhaps they've discarded or changed things they didn't like, made some foundational changes to some items for the future, but it may still end up looking like GCN.

There's nothing wrong with this concept, a lot of our tools can be rebuilt ground up, but a hammer is a hammer, you can revisit the hammer and make things stronger here, lighter there, but at the end of the day, it's largely going to look the same. I don't believe in this concept that an all new ground up architecture would have to look wildly different. That doesn't make a lot of sense to me, if you've got something great and it's working really well in some areas, why toss those ideas aside. The graphics pipelines are moving towards general compute now and away from fixed-function, so how many different ways can we realistically slice and dice a GPU when we consider there are still specific objectives they want to hit.
 
At least in that ASCII rumor table, I'm wary of the individual shader engines not having the traditional amount of grunt to get through the more predictable but heavier base demand.
There clearly will be an impact if this is true. Spending extra transistors on, amongst other things, adding CUs to compensate for the lack of global functionality seems entirely rational to me. Worst-case performance should be improved by softening (if not removing) global bottlenecks.

NVidia made this investment a long time ago. It's about time AMD caught up.
 
Best guess from Andrei Frumusanu based on some other photos we have: 275mm2, +/- 5mm2.
For power efficiency, it would seem to be against a 14nm Vega 10 product. Lisa's specific words on the subject:

"And then, when you put that together, both the architecture – the design capability – as well as the process technology, we're seeing 1.5x or higher performance per watt capability on the new Navi products" (emphasis mine)

If it's just 1.5x per watt then a theoretical 80cu/512 bit bus big navi would have to be downclocked.

If we go by 25% "IPC" increase on the Strange Brigade benchmark then we'd need a clockspeed of around or over 2ghz to hit say,110fps at a 40cu card, and a 32cu card doesn't make sense for performance or die size. Which is pretty high for a finfet GPU, but then AMD maxing out clockspeeds regardless of (desktop) TDP has been their own stated strategy so that's hardly surprising.
 
Going from the differences:
"// First/last AMDGCN-based processors.
EF_AMDGPU_MACH_AMDGCN_FIRST = EF_AMDGPU_MACH_AMDGCN_GFX600,
- EF_AMDGPU_MACH_AMDGCN_LAST = EF_AMDGPU_MACH_AMDGCN_GFX909,
+ EF_AMDGPU_MACH_AMDGCN_LAST = EF_AMDGPU_MACH_AMDGCN_GFX1010,"

GFX6 was the first GCN architecture, and whatever the most recent GCN chip is gets put in the _LAST line.
It seems like the compiler developers do not consider it separate from its predecessors. At least from the standpoint of instruction generation, it has new features but still does the same general things.

so it means NAVI is still GCN (despite RDNA moniker) and next gen is the real next gen slated for 2020 as AMD Earning Calls roadmap shows. Thanks

I like this analogy because that's the way that we do things in the real world. AMD says they rebuilt the whole thing from the ground up, that's fine, they absolutely can and it can still end up looking very similar to GCN. Perhaps they've discarded or changed things they didn't like, made some foundational changes to some items for the future, but it may still end up looking like GCN.

didn´t they said the same about Polaris and then Vega ? I recall some Bridgman talk about how they rebuild every block in Polaris to get different lego....
 
Last edited:
No.
The thing is also an odd choice to demo the uArch that's not about throwing SPs at the problem for once.
Think so too, about the odd choice. If Navi is completely revamped and focussed on gaming, I would not use a benchmark where Vega already shines, but something where my previous µarch is doing not-so-great in order to fully highlight the improvements that have been made. Makes me wonder.
 
Status
Not open for further replies.
Back
Top