Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
I guess for me, the question is if they bypass VRAM for SFS streaming as per Beard Man's comments and it's going directly to the GPU.
Well then... where is it going?
L2 is connected to memory controllers...
L1 is the Shader Arrays
L0 is the CUs...
so where are the textures being dumped? How do we quickly distribute that incoming data to all the shader arrays that require it?

if there was a cache of some size but not L3, of which the purpose is to hold 1 copy of everything and L1 can check it or L2 for it... before going out to memory... then perhaps this setup might make sense to have even in a smaller configuration - even something as small as esram.

This is an interesting question. I don't know enough about this even in general, but thinking about it, I suppose if you are indeed treating it like it's in vram (which has been mentioned a few times) your IO unit would pass the data to whatever requested it, in as similar a manner as possible to how vram is accessed. So maybe you evict something from some level of cache and dump it there. For textures I suppose L1 would make most sense?

Those "SoC memory coherency" things on the die shot seem kind of beefy, and there seems to be one per shader engine. I'll point my finger at them and say based on nothing in particular that they manage the job.

I suppose you'd have to be able to chose whether data was copied in vram afterwards, or simply used and discarded to be fetched again if needed.
 

If true, I wonder how Sony is doing their RT? Or if the only difference is how the RT hardware is accessed? Or if there's 1 or 2 features of AMD's hardware RT that are on XBS and not PS5?

Mesh Shaders was already suspected to be different from whatever Sony is doing. VRS and SF [edit: fixed, previously I erroneously had SFS there] were suspected only because Sony haven't said anything about it. NOTE - previously this didn't mean it couldn't have it, just that it wasn't mentioned. If this is true, then it's probably that it doesn't have them or that Sony aren't using AMD's hardware implementation for them.

Regards,
SB
 
Last edited:
If true, I wonder how Sony is doing their RT? Or if the only difference is how the RT hardware is accessed? Or if there's 1 or 2 features of AMD's hardware RT that are on XBS and not PS5?

Mesh Shaders was already suspected to be different from whatever Sony is doing. VRS and SFS were suspected only because Sony haven't said anything about it.

Regards,
SB

Not using DirectX?

Tommy McClain
 

How would Microsoft know what's in or not in Sony's APU?

Or how much is this actually an advantage rather than semantics covering an equivalent custom solution or even a deficiency .

Eg mesh shader Vs geometry engine which is rumoured to be in RDNA3. (Not looked it up in detail, just using it as an example)

PR is going to PR....?

Edit: talking of semantics, the VRS is custom and not RDNA2 or that is at least what Microsoft has stated so far....
 
Not using DirectX?

Tommy McClain

Not using DirectX doesn't preclude using the hardware in RDNA2 that DX uses for RT. What the tweet is implying is that XBS consoles are the only ones that fully implement all of the RT hardware that DX gives developers access to. IE - either Sony are doing something different or there's some bit of RDNA2 RT hardware that doesn't exist on the PS5.

This also doesn't meant that Sony don't have extra hardware added to support their implementation of RT on the PS5 SOC.

Regards,
SB
 
How would Microsoft know what's in or not in Sony's APU?

Or how much is this actually an advantage rather than semantics covering an equivalent custom solution or even a deficiency .

Eg mesh shader Vs geometry engine which is rumoured to be in RDNA3. (Not looked it up in detail, just using it as an example)

could be that in both navi 2 and 3 amd has used microsoft patented tech to make vrs or sampler feed back or whatever work ?

So either sony doesn't have it , or they had to figure out another non patented way of doing it ?
 
Not using DirectX doesn't preclude using the hardware in RDNA2 that DX uses for RT. What the tweet is implying is that XBS consoles are the only ones that fully implement all of the RT hardware that DX gives developers access to. IE - either Sony are doing something different or there's some bit of RDNA2 RT hardware that doesn't exist on the PS5.

This also doesn't meant that Sony don't have extra hardware added to support their implementation of RT on the PS5 SOC.

Regards,
SB
not necessarily, tweet only imply that only xbox fully support rdna 2 and further just describe rdna 2 features (missing infinity cache tough) so imply that some of this feature ps5 gpu is lacking but not necessarily all of them
 
both of the consoles would benefit greatly from infinity cache, especially PS5 given how high clocked it is. Any trip to VRAM would stall the pipeline and toss away it's potential throughput (cycles) while waiting for data to come in.
So there's definitely a reason to have this in there.
But
for a variety of reasons it also should not be in there, and at least to me, it outweighs the pros.
a) silicon budget/die costs
b) backwards compatibility is going to be an issue
c) shrinking of the die is still tougher.

we've also seen this sort of cache augmentation in the past, and the combined bandwidth 76 GB/s + 192 GB/s way surpassed what was available on PS4 176 GB/s, and it still got it's ass whopped. 32mb of esram at 1/4 resolution. 128mb of infinity cache at 4k.
shrug.

so I just don't see the consoles going this way. It makes sense for both of them to steer clear of IC.
 
portable xbox series s with x amount of infinity cache to decrease ram costs and power consumption ? maybe based on 5nm ?
 
Interesting quote from this Anandtech article about RDNA2 RT:

https://www.anandtech.com/show/1620...-starts-at-the-highend-coming-november-18th/2

"Ray tracing itself does require additional functional hardware blocks, and AMD has confirmed for the first time that RDNA2 includes this hardware. Using what they are terming a ray accelerator, there is an accelerator in each CU. The ray accelerator in turn will be leaning on the Infinity Cache in order to improve its performance, by allowing the cache to help hold and manage the large amount of data that ray tracing requires, exploiting the cache’s high bandwidth while reducing the amount of data that goes to VRAM.

AMD is not offering any performance estimates at this time, or discussing in depth how these ray accelerators work. So that will be something else to look forward to once AMD offers deep dives on the technology."



So basically the RT hardware seems highly dependent on the Infinity Cache in terms of performance. What does this mean for consoles?...
 
So basically the RT hardware seems highly dependent on the Infinity Cache in terms of performance. What does this mean for consoles?...
I’m not sure if this is a statement from AMD or something Ryan is pondering. Best to ask him directly here @Ryan Smith

I can’t see much use for a cache with incoherent rays. And given the size of the bvh structures (based upon our knowledge of Turing) we are looking at 1GB to 1.5GB vram reservation IIRC. @Dictator will likely have an more accurate average for games here.

I would welcome statements from both Ryan and Alex here on what they think IC will mean for RT performance.
 
Game clocks are AMDs expected values.
If only there were reviewers who measured average clocks on Navi 10 in games and found out those are always above AMD's stated "game clocks"..



256-bit bus is used for cards all the way up to 2080 - 10TFs.
For 36 CUs it's not anemic.
I very specifically mentioned Big Navi on 256bit, which is neither related to the 2080 nor has 36CUs in any of its SKUs.



What do you want the quotation on?
On your claims over what I'm thinking.
I never mentioned 128MB. Why would a 36CU GPU have the same cache amount as a 80CU GPU? 128MB and its die area on the PS5 is something you fabricated by yourself, please refrain from putting words in my mouth.


even 64mb is still far too large it's going to take up 50mm^
Quotation needed.
 
I can’t see much use for a cache with incoherent rays. And given the size of the bvh structures (based upon our knowledge of Turing) we are looking at 1GB to 1.5GB vram reservation IIRC.
Presumably you could at least have the first few levels of the BVH cached? Traversals all have to start there, so reuse rate should be alright there.

That’s assuming the BVH packs in such a way that levels are contiguously laid out in memory.
 
Last edited:
On your claims over what I'm thinking.
I never mentioned 128MB. Why would a 36CU GPU have the same cache amount as a 80CU GPU? 128MB and its die area on the PS5 is something you fabricated by yourself, please refrain from putting words in my mouth.



Quotation needed.
The 32MB L3 on Zen 2 CPUs takes up roughly half of the 75mm^2 chiplet, so ~50mm^2 for 64MB really is a lower bound estimate here.

Also if you can’t be bothered to do your own research, please at least be polite to other members.
 
Status
Not open for further replies.
Back
Top