AMD: Navi Speculation, Rumours and Discussion [2019-2020]

pTmdfx · Aug 20, 2020

3dilettante said:
The general rule is a slice per channel, though there's precedent for this not being the case. The Xbox One X is one example, as are the 4-stack HBM GPUs (Fury, Radeon VII, Arcturus), going by driver values for the number texture channel caches--which was referenced in other places as representing the number of slices.

Having 20 slices on its own should be fine. The odd data point is the supposed leak of certain architectural values for the big RDNA2, which lists the count at 16.

Those HBM GPUs have power-of-two numbers in slices & channels though. So sharing a slice is relatively easy.

16 slices to 20 channels would require, say, groups of 4 L2 slices bound to 5 channels (20 / GCF), each of which uses a local 4/5 crossbar for memory blocks to be stripped at the same granluarity (256B?)

3dilettante said:
My interpretation is that the L1 cache controller evaluates requests and passes misses on to the L2. The various modes that bypass the L1 don't seem to bypass the controller, they just control whether the L1's storage will be used to service the request or if it needs to invalidate data at the same time. Skipping the L1 means the cache itself isn't used, but the controlling logic would be using the same paths to get the L2.

That's how I read it too. The whitepaper also said:

Accesses from any of the L0 caches (instruction, scalar, or vector data) proceed to the graphics L1.

a write to any line in the graphics L1 will invalidate that line and hit in the L2 or memory. There is an explicit bypass control mode so that shaders can avoid putting data in the graphics L1.

Each shader array comprises 10-20 different agents that request data, but from the perspective of the L2 cache, only the graphics L1 is requesting data.

3dilettante · Aug 22, 2020

pTmdfx said:
Those HBM GPUs have power-of-two numbers in slices & channels though. So sharing a slice is relatively easy.

16 slices to 20 channels would require, say, groups of 4 L2 slices bound to 5 channels (20 / GCF), each of which uses a local 4/5 crossbar for memory blocks to be stripped at the same granluarity (256B?)

The Xbox One X is an example of having a power of two number of cache slices mapped to a non-power of two number of channels.
8 L2 slices were connected to 12 channels. There were four main controller clusters covering a power of two number of channels, and the remaining 4 channels were split into pairs that were each shared by two clusters.
https://en.wikichip.org/wiki/microsoft/scorpio_engine
That decision could easily be a fluke, but a similar scheme could happen with the Series X. Four controller clusters each hosting 4 channels, then the remaining four could be split into two pairs that each straddle two main clusters.

The data fabric has crossbar nodes, although the broad GPU bus makes the overall interconnect a mesh. This might simplify the sharing arrangement, since the network could naturally route packets along the mesh. The 2x wider fabric from Renoir may also leave more slack since it can roughly 2x the bytes per clock than a single channel can deliver.

If it were Vega there might be further reason to do this due to possible L2 alignment issues with the power of two count of RBEs. Flushes could still occur if for some reason the L2 slices and ROP data somehow did not align naturally. Perhaps Navi's L1 removes this concern, although I did see speculation that the 4-banked L1 would have a bank per set of L2 slices, and 20 slices would indicate another bank or an expansion of the L1-L2 interconnect.

Krteq · Aug 31, 2020

All aboard a hypetrain

https://twitter.com/x/status/1300440732642148354

https://twitter.com/x/status/1300438803421618179

Bondrewd · Aug 31, 2020

wheeeeeeee, krumpin' time

SimBy · Aug 31, 2020

Were there any leaks about AMD using HBM on consumer cards this gen?

Bondrewd · Aug 31, 2020

SimBy said:
Were there any leaks about AMD using HBM on consumer cards this gen?

Yeah.

trinibwoy · Aug 31, 2020

Krteq said:
All aboard a hypetrain

Oh man, the combination of solid arch, high unit counts and market timing should result in a good old rumble at the high end. I can't remember the last time AMD checked all of those boxes at the same time.

SimBy · Aug 31, 2020

Bondrewd said:
Yeah.

Wasn't there someone from AMD (could be Raja) regretting the whole HBM on consumer cards idea?

Bondrewd · Aug 31, 2020

trinibwoy said:
Oh man, the combination of solid arch, high unit counts and market timing should result in a good old rumble at the high end

Yeah.

trinibwoy said:
I can't remember the last time AMD checked all of those boxes at the same time.

Arguably Hawaii.
This time they have the full stack (from a meagre funne APU to N21) with quick rollouts so smells like Evergreens to me.

SimBy said:
(could be Raja)

Was him, very recently.

SimBy said:
regretting the whole HBM on consumer cards idea?

You just need performance chops to back it up.

Rootax · Aug 31, 2020

I believe Raja said, in an Intel video, that he learned that hbm was costly, something like that. But I didn't remember the word "regret".

Krteq · Aug 31, 2020

SimBy said:
Were there any leaks about AMD using HBM on consumer cards this gen?

There are some Sienna related commits to AMDGPU kernel driver clearly mentioning HBM interface for that GPU.

Question is... will be customer cards based on Sienna Cichlid GPU or not?

Bondrewd · Aug 31, 2020

Krteq said:
Question is... will be customer cards based on Sienna Cichlid GPU or not?

It's literally N21, they're just using funny codenames to obfuscate their ninja ways.

yuri · Aug 31, 2020

So "GA104 can't beat it". Big Navi is faster than RTX 3070, mkay.

Bondrewd · Aug 31, 2020

yuri said:
So "GA104 can't beat it"

N22, yes.
Mobile will smell foul once we exit January'21.

yuri said:
Big Navi is faster than RTX 3070, mkay.

Faster than many a thing.

madhatter · Aug 31, 2020

Bondrewd said:
N22, yes.
Mobile will smell foul once we exit January'21.

Faster than many a thing.

When's N22 coming for desktop?

Bondrewd · Aug 31, 2020

madhatter said:
When's N22 coming for desktop?

1-4 weeks after N21.

Krteq · Aug 31, 2020

yuri said:
So "GA104 can't beat it". Big Navi is faster than RTX 3070, mkay.

Well, that seems to be a reason why NV jumped to GA102 for x80 card this time and also pumped TDP so high. So I'm really excited for upcoming days

madhatter · Aug 31, 2020

Bondrewd said:
1-4 weeks after N21.

Nice, I was worried it might be longer than that.

Bondrewd · Aug 31, 2020

Krteq said:
So I'm really excited for upcoming days

Oh yeah, makes me wish AMD was less lazy with their marketing.

madhatter said:
I was worried it might be longer than that.

Nah, they gotta crap all things RDNA2 out and move on.

Deleted member 13524 · Aug 31, 2020

Krteq said:
All aboard a hypetrain

https://twitter.com/x/status/1300440732642148354

https://twitter.com/x/status/1300438803421618179

Who is this twitter user and what's their track record?

AMD: Navi Speculation, Rumours and Discussion [2019-2020]

pTmdfx

3dilettante

Krteq

Bondrewd

SimBy

Bondrewd

trinibwoy

Meh

SimBy

Bondrewd

Rootax

Krteq

Bondrewd

yuri

Bondrewd

madhatter

Bondrewd

Krteq

madhatter

Bondrewd

Deleted member 13524

Guest