Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
RSX was basically a generation behind, 8800 series launched autumn 2006 (along with the intel quads). It didnt help it that the 8800 was one of the biggest jumps in history.
One would hope ps5 gpu fares better then rsx did. But since were at about 14TF now for 2018 released gpus (not counting titan), highest end navi/ampere could be close to 20TF, perhaps with hbm on amds side. Could be close to double, probably without downclocks.

Are we really comparing a PS5 with a 1200 dollar GPU that hasn't even released? *claps
 
Are we really comparing a PS5 with a 1200 dollar GPU that hasn't even released? *claps

We are comparing to what was available 2006, whats available now already, and what will be this year. The discussion wasnt about prices or price/perf ratios. Besides that are prices for this years products not made public yet.
 
Has this been discussed?


I'm trying to wrap my head around this. Apparently the 'Velocity Architecture' will make a 100GB pool where devs can put assets which a game can have instant access to.

I can't quite get my head around how this can be 'instant' - I'm thinking essentially this 100GB must live on the SSD and therefore be limited to the 4.8GB/s restrictions.

Can anyone explain how this works please?
 
We are comparing to what was available 2006, whats available now already, and what will be this year. The discussion wasnt about prices or price/perf ratios. Besides that are prices for this years products not made public yet.

If the comparison is apples to oranges. Then the comparison is meaningless. Current top GPUs already outperform XBSX and PS5. The comparison should be made at comparable price points.
 
Has this been discussed?

I'm trying to wrap my head around this. Apparently the 'Velocity Architecture' will make a 100GB pool where devs can put assets which a game can have instant access to.

I can't quite get my head around how this can be 'instant' - I'm thinking essentially this 100GB must live on the SSD and therefore be limited to the 4.8GB/s restrictions.

Can anyone explain how this works please?

I remember reading about this, but I don't think it was aimed for helping in rendering of RT graphics. As you say, the storage is so slow compared to a fast pool of RAM. I smell secret sauce and some cloud powder.
 
Has this been discussed?

I'm trying to wrap my head around this. Apparently the 'Velocity Architecture' will make a 100GB pool where devs can put assets which a game can have instant access to.

I can't quite get my head around how this can be 'instant' - I'm thinking essentially this 100GB must live on the SSD and therefore be limited to the 4.8GB/s restrictions.

Can anyone explain how this works please?
Yes, it would be limited by that restriction, but when you only need parts of that 100GB at once it should be enough for "instant".
It sounds like they're using HBCC-like memory controller, which would include page-based memory management
upload_2020-4-1_18-41-47.png

edit: of course with HBM replaced by GDDR6, but bandwidth should still be high enough
 
RSX was basically a generation behind, 8800 series launched autumn 2006 (along with the intel quads). It didnt help it that the 8800 was one of the biggest jumps in history.
One would hope ps5 gpu fares better then rsx did. But since were at about 14TF now for 2018 released gpus (not counting titan), highest end navi/ampere could be close to 20TF, perhaps with hbm on amds side. Could be close to double, probably without downclocks.

Considering that the 2080ti is already well ahead of both the PS5 and XSX GPU's, it's little surprise that the 3080ti would wipe the floor with them.

The benefit of the console, esp 2nd gen games, is a mature toolset dedicated to the hardware which let's it max out. No one is developing a game for a 2080ti as the baseline afterall.
 
Where there is a will, Nvidia will find a proprietary way!!!

https://wccftech.com/intel-optane-g...o-ssd-for-up-to-50-faster-gaming-performance/

Who would of thought SSD would get such a spotlight for this new generation of consoles and PC rigs.
Well optane was already proprietary to a hardware set, so NV doing this is not exactly... like making a new competing API or whatever. I must laugh at the nebulous "50%" better gaming performance. Uh, what? lol

When was the last time your GPU rendering speed was limited by disk access anyway? You are already in a horrible FPS place when that happens anyway.

edit: april fools? Oh god I am not a fan of this day. First crysis and now this stuff haha
 
Well optane was already proprietary to a hardware set, so NV doing this is not exactly... like making a new competing API or whatever. I must laugh at the nebulous "50%" better gaming performance. Uh, what? lol

When was the last time your GPU rendering speed was limited by disk access anyway? You are already in a horrible FPS place when that happens anyway.

edit: april fools? Oh god I am not a fan of this day. First crysis and now this stuff haha
woosh ;)
edit: hold up this is better:
 
Last edited:
Generally links are fine as long as they are not pointing to illegal content. So links to documents that are under an NDA and have been obtained illegally is generally frowned upon.
The enforceable legality of many non-disclosure agreements is highly contested! ;)

I suggest we leave it to the courts unless any of the mods happens to be practising lawyer. :yep2:
 
The enforceable legality of many non-disclosure agreements is highly contested! ;)
NDA or no, they're still under copyright. If we don't allow magazine scans or verbatim copy-pastes of entire articles, especially behind paywalls, I'm not sure we can justify links to knowingly copyrighted and restricted access materials.

The secret is to be very naive and blunder upon a link and share it everywhere without appreciating it - then it's just a mistake. ;)
 
But that doesn't explain (to me anyway!) why the BW drop was far higher than the CPU was using, and why that can't be fixed with a better memory controller. I would have expected (as did everyone else, because the BW drop came as a surprise) that while the CPU was accessing the RAM, the GPU had to wait, but it'd be 1:1 CPU usage to BW impact. What we saw on Liverpool was the RAM losing efficiency somehow, as if there was a switching penalty. I would hope AMD can fix that issue and have a near 1:1 impact on their console UMAs, so 1 ms of full RAM access for the CPU means only 1 ms less available for the GPU and the remaining frame time accessible at full rate.

This came up back with the launch of the current generation, with the ESRAM discussions and that PS4 contention slide.
There's a broad set of reasons, but a fundamental issue is that DRAM is not easy to get good utilization out.
DRAM optimizes for density and cost, meaning that the speed of internal DRAM arrays has had limited increase.
There is a preference for hitting the same array, or running a very linear pattern so that the process for page activation and access can be pipelined without showing up as lost cycles on the bus.
To optimize for cost and traces, the bus for reads and writes is shared and needs to be turned around whenever the access type changes.

It helps if the controller can build up a long list of pending accesses, where it can then re-order, combine, or chain them so that they take advantage of pipelining in the DRAM, don't hop between banks or bank groups, or force bus turnaround that prevents any transfers from occurring for tens of cycles.
The trade-off is that collecting a large number of accesses means accepting more latency for the individual accesses.

GPUs are structured to tolerate long latency and to generate many accesses, and they accept a lot of reordering.
CPUs are latency-optimized and don't tolerate much reordering, and they can be running workloads that just don't make very good access patterns.
There's a balancing act in how long the controller can run with a high-utilization pattern before it needs to disrupt it in favor of a latency-sensitive client, and the process can mean changing banks or bus modes for dozens of cycles, and eating a similar penalty going back.

AMD's supposedly incorporated more intelligent controllers, and it may have given more levels of priority. Zen has a much better memory subsystem than Jaguar, and it is more tolerant of latency, which could help. There are also cache subsystem changes and protocol differences that might reduce how many high-priority operations like atomics need to go to memory.
On the other hand, the higher performance for Zen can also give it greater ability to put demands on the memory subsystem versus the much more limited core performance and limited bandwidth of the coherent memory bus in the Jaguar SOCs.
Utilization should be better, but I don't think the utilization loss can go to zero.


Has this been discussed?

I'm trying to wrap my head around this. Apparently the 'Velocity Architecture' will make a 100GB pool where devs can put assets which a game can have instant access to.

I can't quite get my head around how this can be 'instant' - I'm thinking essentially this 100GB must live on the SSD and therefore be limited to the 4.8GB/s restrictions.

Can anyone explain how this works please?
An SSG version of a gaming card could be a pricier way to get a storage system into PCs with similar parameters to the customized console subsystems. It wouldn't require mass replacement of all the systems where the CPU doesn't have built-in compression and extra DCMAC hardware and the motherboard lacks a PCIe 4.0 NVME slot. There would need to be some transfers over the PCIe bus to the graphics card, but those could be limited to swapping a game's asset partition in and out rather than constant transfers.
It'd be a value-add for AMD's hardware, at least.
 
Yes, it would be limited by that restriction, but when you only need parts of that 100GB at once it should be enough for "instant".
It sounds like they're using HBCC-like memory controller, which would include page-based memory management
View attachment 3730

edit: of course with HBM replaced by GDDR6, but bandwidth should still be high enough

Thanks. I assume PS5 will have a similar system in place.
 
Thanks. I assume PS5 will have a similar system in place.
Cerny said in the presentation that the access to the game assets is mapped, and the dev doesn't even need to know if or how it's compressed, they address the virtual uncompressed data layout, it's all transparent.

Of course they need to know it comes from a 2.4GB/s or 5.5GB/s drive. Can't defy the laws of logic.
 
Last edited:
Utilization should be better, but I don't think the utilization loss can go to zero.

It can. There is no free lunch. Which means most of the "latency-sensitive" work CPU does should be limited by things in its cache.
CPU that spams memory bus with a lot of random requests will die underutilized in any case.
So essentially CPU access pattern should be: prefetch, work in cache, write. Where latency savings all go inside the "work in cache" phase.
Any workload where you need to "scan" a lot of memory should go to GPU. It's not negotiable.
 
Status
Not open for further replies.
Back
Top