AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Status
Not open for further replies.
We're comparing fmax part to fmax part here, laddo.
Don't move the goalposts.
You are getting your 50%.
That's an easy target to beat when you're comparing a 3 year old product clocked to max and made on 4 year old tech process to a brand new one on new tech process.
Kudos to AMD for explicitly pointing +50% perf at iso power in Navi10 slides.
When you compare this to Radeon VII made on the same tech process and not clocked to the limits, it's not so rosy though. Maybe +8-10% efficiency on the same node?
That's why I am sceptical about RDNA2 slides, there is not a word on testing methodology and we don't know anything about RDNA2 chip configuration.
+50% performance at iso power is still an easy target for a much wider chip at fmin frequencies, this will be slightly below 2080 Ti level of performance at slightly higher than 2080 power consumption.
Anyway, AMD will target competing products and good efficiency at moderate frequencies might change for bad efficiency at higher frequencies in a blink of an eye.
 
As it stands now, the majority of them are seen as meaningless fragments
That's more than the actual AMD marketing on anything Q4'20 products so isn't that valid anyway?
fragments without any factual or source backing
We're talking about the literal MS-presented data right here right now.
That's why I am sceptical about RDNA2 slides, there is not a word on testing methodology and we don't know anything about RDNA2 chip configuration.
It's an extension of that RDNA1 slide so I doubt they'd change their methodology in-flight.
and we don't know anything about RDNA2 chip configuration.
For N21, yes we do.
4*2*5*2 aka 80CUs, 10 per SA.
might change for bad efficiency at higher frequencies in a blink of an eye.
PS5 runs 2.2 in console power envelopes so I sincerely doubt the fmax has an elbow until like 2.4-ish.
Even RNR tops at 2.5 for fmax, which translates to 1800-something peak freq on mobile on a good bin.
 
Is it though? XSX for something along the lines of Polaris, Vega IIRC had better IPC than Polaris and RDNA1 offered ~25% IPC over Vega, didn't it? So it could very well be compared to RDNA1 instead of GCN~4
Sorry for the post-necro, but just got around to listen to the talk fully.

Mark Grossman who was the presenter of that part, did explicitly say "architecturally, these CUs have 25% better performance per clock on average graphics workloads relative to the GCN generation ".
 
Yeah, gotta keep it silent.
So if you have to make a liquid metal cooling solution the size of what? 4 PCIE slots? to make a ~270 mm^2 chip silent then what does this tell you about it being okay running at these clocks?

Btw, XSS GPU is rumored to be a 10 WGP part running at 1.55 GHz. One would think that going with a smaller higher clocked part for a budget console would be preferable here - if the part can easily run at 2.2 GHz that is.
 
So if you have to make a liquid metal cooling solution the size of what?
PS5 isn't LM.
4 PCIE slots?
Needs to be bigger tbh.
to make a ~270 mm^2 chip silent
YES.
Goddamit yes I'm tired of my living room being filled with jet engine sounds every time I play any Sony 1st party vidya game.
Also it's not 270mm^2.
what does this tell you about it being okay running at these clocks?
It means they're not making another PS4Pro.
Btw, XSS GPU is rumored to be a 10 WGP part running at 1.55 GHz.
In no less than 3 SAs of 4; gotta keep the FF ratio set.
One would think that going with a smaller higher clocked part for a budget console would be preferable here
And waste area because CPUs and all other things necessary for XSX parity chew on it?
why bother.
 
Last edited:
PS5 isn't LM.

Needs to be bigger tbh.

YES.
Goddamit yes I'm tired of my living room being filled with jet engine sounds every time I play any Sony 1st party vidya game.
Also it's not 270mm^2.

It means they're not making another PS4Pro.

In no less than 3 SAs of 4; gotta keep the FF ratio set.

And waste area because CPUs and all other things necessary for XSX parity chew on it?
why bother.

You know I've always wondered about why gpu's have the chip facing down. Heat rises so wouldn't it be smarter to actually flip the card over and have the chip facing up for better transfer to the heatsink and of course better transfer on the heatsink ?

Also wrap around heatsinks could be a good way to save pci-e slots. Have a few heat pipes that wrap around the side and head to the top where they run through a fin set up while still having a single slot cooler at the bottom
 
I assume there's a bigger chance of the slots below the GPU being empty compared to space occupied by CPU after-market coolers or tight motherboard designs (RAM & CPU placement).
 
I assume there's a bigger chance of the slots below the GPU being empty compared to space occupied by CPU after-market coolers or tight motherboard designs (RAM & CPU placement).
maybe now but my pc always had filled slots growing up... remember sound cards and modems lol

Heat doesn't just magically rise, hot air generally rise but the fans blow into the GPU completely nullifying that effect.

Heat pipe/ fin layer
fan layer
graphics card
graphics chip
heatsink/ fin layer

A set up like that can work
 
You know I've always wondered about why gpu's have the chip facing down. Heat rises so wouldn't it be smarter to actually flip the card over and have the chip facing up for better transfer to the heatsink and of course better transfer on the heatsink ?

Also wrap around heatsinks could be a good way to save pci-e slots. Have a few heat pipes that wrap around the side and head to the top where they run through a fin set up while still having a single slot cooler at the bottom

My Asus Radeon EAX 1300 Pro would like a word with you. ;)
 

Attachments

  • Asus Radeon EAX 1300 Pro.JPG
    Asus Radeon EAX 1300 Pro.JPG
    361.2 KB · Views: 46
Sorry for the post-necro, but just got around to listen to the talk fully.

Mark Grossman who was the presenter of that part, did explicitly say "architecturally, these CUs have 25% better performance per clock on average graphics workloads relative to the GCN generation ".

Damn, looks like the main benefit of RDNA2 then, efficiency speaking, is better power usage.

Regardless, a bit of updated maths time. Big Navi: 384bit bus, 84cu, 19-21 teraflops? Seems like it might fit. Eh, whatever, guess we'll see.

Edit- oh ok, gpu fans facing shown might be like survivorship bias. That is to say, heat rises and so you need to worry less about the heat already rising away from a gpu and more about the heat potentially trapped under it.
 
Last edited:
You know I've always wondered about why gpu's have the chip facing down. Heat rises so wouldn't it be smarter to actually flip the card over and have the chip facing up for better transfer to the heatsink and of course better transfer on the heatsink ?
Heatpipes rely on capillary fluid transfer, so their efficiency is mostly unaffected depending on the mounting orientation. Given the heat density of the GPU, convection force would have marginal effect -- that's why the air around the cooler is blown with fans.

Also wrap around heatsinks could be a good way to save pci-e slots. Have a few heat pipes that wrap around the side and head to the top where they run through a fin set up while still having a single slot cooler at the bottom
I personally experimented with similar solution several years ago and it was actually less efficient than the standard mounting:

nxC8Q6q.jpg


This indeed saved some slot space, only if there is enough room above the card for the fan mount.
 
Heatpipes rely on capillary fluid transfer, so their efficiency is mostly unaffected depending on the mounting orientation. Given the heat density of the GPU, convection force would have marginal effect -- that's why the air around the cooler is blown with fans.

Correct. Gamers Nexus did a pretty thorough review and analysis of cooler orientation:
The tl;dr is that orientation doesn't matter.
 
@Bondrewd can you please expand upon your replies and aim for actual replies that progresses the open discussion forward for everyone? As it stands now, the majority of them are seen as meaningless fragments without any factual or source backing. Your posting style is alienating the vast majority of long time users. If you continue on your current path and it becomes a choice between them or you, the choice is obvious.

Not a long time user but thank god somebody said it. He's like a forum poster version of YT channels that present speculation as "facts" from their "industry insiders".

Damn, looks like the main benefit of RDNA2 then, efficiency speaking, is better power usage.

Regardless, a bit of updated maths time. Big Navi: 384bit bus, 84cu, 19-21 teraflops? Seems like it might fit. Eh, whatever, guess we'll see.

Edit- oh ok, gpu fans facing shown might be like survivorship bias. That is to say, heat rises and so you need to worry less about the heat already rising away from a gpu and more about the heat potentially trapped under it.

I take this post from Ryan@AT to mean that where RDNA1 reorganised the CUs into DCUs, most everything else was unchanged (or as unchanged as they can be when you completely re-do other parts of the GPU). So it would seem that they have reworked these other parts of the GPU this time around (best example being the TMUs to incorporate RT hardware within).
 
You know I've always wondered about why gpu's have the chip facing down. Heat rises so wouldn't it be smarter to actually flip the card over and have the chip facing up for better transfer to the heatsink and of course better transfer on the heatsink ?
ATX.
Goddamned ATX.
It's so long overdue for a replacement yet PC vendors actual are not even trying to do anything with it.
 
Apologies if I've misunderstood what you're saying here, but I think it works as following:

XSX has a 320-bit bus, with 5 x 64-bit controllers. Each controller has it's own L2 with 4 slices, so 5 x 4 makes the 20 slices. Which also fits the 20 memory channels MS described.

5MB total L2 fits with 1MB L2 for each of the five controllers.
The general rule is a slice per channel, though there's precedent for this not being the case. The Xbox One X is one example, as are the 4-stack HBM GPUs (Fury, Radeon VII, Arcturus), going by driver values for the number texture channel caches--which was referenced in other places as representing the number of slices.

Having 20 slices on its own should be fine. The odd data point is the supposed leak of certain architectural values for the big RDNA2, which lists the count at 16.


Even if the L1's can't request more than 4 accesses per cycle, you'll still need the 5 controllers, each with their four L2 slices to manages the 320-bit bus. Couldn't compute can bypass the L1 and make full use of the L2 bandwidth though ... (genuine question)?

I haven't seen anything about Big Navi L2 cache, but if it's a 384-bit bus shouldn't it be 6 controllers and therefore 24 L2 slices?
My interpretation is that the L1 cache controller evaluates requests and passes misses on to the L2. The various modes that bypass the L1 don't seem to bypass the controller, they just control whether the L1's storage will be used to service the request or if it needs to invalidate data at the same time. Skipping the L1 means the cache itself isn't used, but the controlling logic would be using the same paths to get the L2. If there were a separate crossbar fabric from the one used by the L1, the L1's marketed benefit of simplifying the on-die interconnect would be counteracted to some extent.

The big Navi RDNA2 table entry seems like an outlier if a lesser console chip can have 20 slices despite having a smaller number of channels, although it's not unprecedented for architectural values like the number of texture channel caches to be incorrect--particularly early on.
 
Status
Not open for further replies.
Back
Top