Next Generation Hardware Speculation with a Technical Spin [pre E3 2019]

MBTP · Apr 29, 2019

3dilettante said:
I think Gen-Z seems like a poor fit for the context of a standalone console, especially if AMD's existing technologies are in use.
Gen-Z seems more applicable to things beyond the reach of the infinity fabric, and between devices with storage in the same rack or between different points in a data center. Supporting a broad range of devices, dynamically allocated resources, massive aggregate bandwidth, RAS, and security for a data center is wasted on a console where the relevant components are meant for a single static context within the reach of a PCIe or infinity fabric link (if not on the same die).
There are some signs AMD could have given some capability to use Gen-Z in its server products, but its own internal interconnect methods could satisfy the needs of a consumer console APU or MCM at lower overhead and cost.

You are probably right, I'm more focused on how to solve the problem rather than GenZ being part of the system, i just think the Gen-z solution is great, is a good example of what can be done and would allow more independence from AMD in next console iterations for Sony.

Shortbread · Apr 29, 2019

MBTP said:
While i was reading the wired article i've noticed some nuances that led me to the idea that the SSD he is talking about is not a conventional one, not sata based, nor nvme based.
"What’s built into Sony’s next-gen console is something a little more specialized."

"but Cerny claims that it has a raw bandwidth higher than any SSD available for PCs. That’s not all. “The raw read speed is important,“ Cerny says, “but so are the details of the I/O [input-output] mechanisms and the software stack that we put on top of the"

Based on those words i would highly suggest the possibility of custom and specialized hardware and i would add to that the possibility that we are going to see direct lanes to the CPU, GPU, I/O, (3d audio engine), and maybe even main memory, so eliminating some of the bandwidth restriction and latency caused by this. I sincerely don't know if inifinity fabric will cope with those extraordinary connections but Gen-z can

Scalable and provisioned memory infrastructure

Shared memory for data processing

Connectivity of processors, GPUs, accelerators, and optimized engines

Next generation DRAM, FLASH and storage class memory

Enabling persistent memory

"This leads to much simpler software and hardware, and this simplicity drives performance and lower costs. Gen-Z will provide this memory-semantic connectivity to devices including System on a Chip (SoC), data accelerators, storage, and memory on the motherboard and beyond the motherboard to rack scale."

The Anantech article says:
"The Core Specification released today primarily addresses connecting processors to memory, with the goal of allowing the memory controllers in processors to be media-agnostic: the details of whether the memory is some type of DRAM (eg. DDR4, GDDR6) or a persistent memory like 3D XPoint are handled by a media controller at the memory end of a Gen-Z link, while the processor itself issues simple and generic read and write commands over the link. In this use case, Gen-Z doesn't completely remove the need for traditional on-die memory controllers or the highest-performance solutions like HBM2, but Gen-Z can enable more scalability and flexibility by allowing new memory types to be supported without altering the processor, and by providing access to more banks of memory than can be directly attached to the processor's own memory controller."

Anand show 12431 genz-interconnect-core-specification-10-published

As I mentioned before, I think this has more to do with AMD's Infinity Fabric (or a variation of it) being integrated into PS5 overall I/O circuitry and interfaces. I can't picture Sony investing millions into something new, when AMD already has this specific tech available for its partners or customers. Infinity Fabric covers all these aspects (within patents and documentation) on such uses and scenarios that aren't specifically addressing CPU/SoC integration of IF. I believe Sony is not only trying to resolve "loading/install" inefficiencies, but all inefficiencies relating to I/O communications internally and externally, especially those relating to VR gaming (PSVR2).

For me and others whom are into VR-gaming, the main problem with VR gaming IMHO, is slow or late loading assets which breaks game immersion (even with a top-of-the-line SSD/NVME). I’ll put it another way, imagine running along a sky-bridge or rooftop fully engaged and doing what’s required. Then suddenly, out of nowhere, you’re falling into the abyss of no return from missing map assets/data tied to collision. If you thought VR motion sickness was bad, how about almost having a heart attack because your foundation or platform unexpectantly disappears from under you. Not cool at all... :|

3dilettante · Apr 29, 2019

Another element from the demonstration in the Wired article is whether there were changes in the game or possibly automatic measures that do more than leverage raw bandwidth.
HBCC or some variation of it that AMD posits for future products can load fragments of often large asset blocks or pages in various level of storage automatically. This can have its own benefits in trimming down RAM requirements, reduce transfer size and time further, and possibly reduce the amount of structure building or resource registration needed if the software can be adapted to partially resident or paged resources.
For example, perhaps a super-flexible system could do something like throw rays in a rendered scene and derive a subset of resources that can be rapidly loaded, rather than brute forcing enough bandwidth to load a whole sector of an open world.

Some of the benefits can be dependent on the flexibility of the storage and its interface. Flash doesn't have the physical burden of a spinning disk, although it still is subdivided into blocks and can experience performance inconsistencies with mixed/random workloads that standards like Xpoint do not.
Incidentally, Zen supports the instructions Intel added for supporting byte-addressable non-volatile memory. Also fun for speculation is that Sony for a time was actively researching high-speed non-volatile memory.
However, the caveats are that it's not clear Sony's efforts have produced results (most disclosures are years-old and not for this low-cost space), and that the more finely addressed and performant storage types run the opposite direction in terms of cost from the value-oriented large-block devices with frequently disappointing performance glass jawss

MBTP · Apr 29, 2019

Shortbread said:
As I mentioned before, I think this has more to do with AMD's Infinity Fabric (or a variation of it) being integrated into PS5 overall I/O circuitry and interfaces. I can't picture Sony investing millions into something new, when AMD already has this specific tech available for its partners or customers. Infinity Fabric covers all these aspects (within patents and documentation) on such uses and scenarios that aren't specifically addressing CPU/SoC integration of IF. I believe Sony is not only trying to resolve "loading/install" inefficiencies, but all inefficiencies relating to I/O communications internally and externally, especially those relating to VR gaming (PSVR2).

For me and others whom are into VR-gaming, the main problem with VR gaming IMHO, is slow or late loading assets which breaks game immersion (even with a top-of-the-line SSD/NVME). I’ll put it another way, imagine running along a sky-bridge or rooftop fully engaged and doing what’s required. Then suddenly, out of nowhere, you’re falling into the abyss of no return from missing map assets/data tied to collision. If you thought VR motion sickness was bad, how about almost having a heart attack because your foundation or platform unexpectantly disappears from under you. Not cool at all... :|

True, you both are probably right about IF, i just think Gen-Z is a very palpable solution for the problem, AMD may use something very similar thru the Infinity Fabric anyway or even faster since it's a closed system.

But i really can't see 2TB nvme ssds as the main storage solution, mostly because of his phrase "What’s built into Sony’s next-gen console is something a little more specialized."

MBTP · Apr 29, 2019

3dilettante said:
Another element from the demonstration in the Wired article is whether there were changes in the game or possibly automatic measures that do more than leverage raw bandwidth.
HBCC or some variation of it that AMD posits for future products can load fragments of often large asset blocks or pages in various level of storage automatically. This can have its own benefits in trimming down RAM requirements, reduce transfer size and time further, and possibly reduce the amount of structure building or resource registration needed if the software can be adapted to partially resident or paged resources.
For example, perhaps a super-flexible system could do something like throw rays in a rendered scene and derive a subset of resources that can be rapidly loaded, rather than brute forcing enough bandwidth to load a whole sector of an open world.

Some of the benefits can be dependent on the flexibility of the storage and its interface. Flash doesn't have the physical burden of a spinning disk, although it still is subdivided into blocks and can experience performance inconsistencies with mixed/random workloads that standards like Xpoint do not.
I

Yes i was going to write about Xpoint being possibly best case for perfomance and long term reliability(more write cycles), and ask what you guys thinks about Cerny's "RAW performance". Could this mean the performance out of the "Burst" zone of the SLC cache or simply relate to overall consistency of performance in mulples scenarios like mixed/random workloads? Or even just to make it sound better it really is under realworld conditions?

jlippo · Apr 29, 2019

milk said:
How do you use the ID buffer data to intersect against geometry? Isn't the ID buffer just a flat colour buffer with different values for each poly?

Yup, pretty sure it's a ROP write to buffer feature, just like color/Z/Stencil writes.
You can select when you change the value, so you can separate different drawcalls, objects etc.. So it does have some flexibility.

So yeah, you can write polygon id and barycentric coordinates if needed to a buffer, but I do not see how that is useful for actually finding intersections. (You need to rasterize or trace for that.)
Might be a way to defer shading or do some other silly things.

AlNom · Apr 29, 2019

https://www.amd.com/Documents/Radeon-Pro-SSG-Technical-Brief.pdf

Relevant to GPU+SSD discussion ?

Old news. :V

Gubbi · Apr 29, 2019

3dilettante said:
For example, perhaps a super-flexible system could do something like throw rays in a rendered scene and derive a subset of resources that can be rapidly loaded, rather than brute forcing enough bandwidth to load a whole sector of an open world.

Or just do as Rage did in 2010, if a texel is missing from main memory, use a lower mipmap version that is present, - and demand load the detail texture for future use. It almost worked for Rage with conventional HDDs, it worked fine with SSDs (and a bit more GPU RAM than was standard in 2010).

If your IO system can sustain 300k reads per second, that's 5000 reads per 60Hz frame. Using a 4K page size, that's 20MB texture data loaded up each frame (totalling 1.5GB/s, well within specs).

Cheers

DieH@rd · Apr 29, 2019

It's nice that we are focused on storage talk just when Days Gone has relesed. This game has almost 2 minute boot load just to reach lightly-animated main menu.

Mihailjones · Apr 29, 2019

DieH@rd said:
It's nice that we are focused on storage talk just when Days Gone has relesed. This game has almost 2 minute boot load just to reach lightly-animated main menu.

Does it really take almost 2mins? I didnt notice that it would take long at all, gotta check it later.

Maybe I'm still so used to commodore64+c-tapes that 2 minutes feels fast

Talking about days gone, I really like the vegetation and draw distance, finally the forest floor feels alive and no noticeable caps. Playing on pro, so I cant expect anything less than epic graphics from PS5, to my eyes days gone is really impressive game graphically so with fast cpu+ssd = ps5 will shine on this, even if it wont have super fast gpu

Deleted member 11852 · Apr 29, 2019

Mihailjones said:
Does it really take almost 2mins? I didnt notice that it would take long at all, gotta check it later.

It is a long load to the menu then another fairly slow load to get in game. If you suspend/resume then may you not notice it.

Globalisateur · Apr 29, 2019

DSoup said:
It is a long load to the menu then another fairly slow load to get in game. If you suspend/resume then may you not notice it.

I really don't understand that. What's the point ? Do they pre-load the assets of the last saved game ? The menu was so fast to load on KZ Shadow fall. It's like we regressed.

DieH@rd · Apr 29, 2019

Guerilla Games devs talked how they specifically optimized main menu to appear as fast as possible, even negotiating with middleware partners to not showcase their [animated] logos at boot.

Mihailjones said:
Maybe I'm still so used to commodore64+c-tapes that 2 minutes feels fast

I still remember clearly as it was yesterday when my cousin brought 5.25 floppy drive for C64. It was such a incredible loading boost!

Nisaaru · Apr 29, 2019

SSDs these days can only dream about the subjective speed differential of C64 turbo floppy accelerators

.

goonergaz · Apr 29, 2019

DieH@rd said:
Guerilla Games devs talked how they specifically optimized main menu to appear as fast as possible, even negotiating with middleware partners to not showcase their [animated] logos at boot.

I still remember clearly as it was yesterday when my cousin brought 5.25 floppy drive for C64. It was such a incredible loading boost!

Nothing beat football manager which (once loaded) made you endure a 15 minute ‘please wait’ screen!

MrFox · Apr 29, 2019

goonergaz said:
Nothing beat football manager which (once loaded) made you endure a 15 minute ‘please wait’ screen!

You don't know what waiting is until you tried to install win95 (beta) with a big stack of floppies, I think it took the entire day. It was my first PC experience, after using an amiga for 6 years where installing an OS was basically copying a few files on a blank disk or hdd and making it bootable. Installing a driver was copying a single file in /devices. I was not amused.

Shifty Geezer · Apr 29, 2019

Amiga to Windows was the worst technological transition humanity will ever face.

eloic · Apr 29, 2019

Shifty Geezer said:
Amiga to Windows was the worst technological transition humanity will ever face.

Ah... Amiga... which already had windows in its interface, BTW.

3dilettante · Apr 29, 2019

MBTP said:
Yes i was going to write about Xpoint being possibly best case for perfomance and long term reliability(more write cycles), and ask what you guys thinks about Cerny's "RAW performance". Could this mean the performance out of the "Burst" zone of the SLC cache or simply relate to overall consistency of performance in mulples scenarios like mixed/random workloads? Or even just to make it sound better it really is under realworld conditions?

Raw performance in that context may mean the peak delivered bandwidth from whatever storage is in the system. Cerny seems concerned with how software and the system can connect to and use with the storage, since he cites IO and software stack concerns. Peak bandwidth can depend on very linear and large block accesses, and can assume things like perfectly contiguous data and a drive with plenty of unused room for garbage collection or internally shifting to SLC mode.

Gubbi said:
Or just do as Rage did in 2010, if a texel is missing from main memory, use a lower mipmap version that is present, - and demand load the detail texture for future use. It almost worked for Rage with conventional HDDs, it worked fine with SSDs (and a bit more GPU RAM than was standard in 2010).

If your IO system can sustain 300k reads per second, that's 5000 reads per 60Hz frame. Using a 4K page size, that's 20MB texture data loaded up each frame (totalling 1.5GB/s, well within specs).

Cheers

I'm not sure how many affordable (any?) client drives can sustain 300K IOPS. Depending on how you would characterize the access pattern as random versus linear, the 4K size is one of the worst sources of degradation for anything not Optane.
The consumer NVME QLC drives I've seen reviewed on Anandtech can have random 4K burst rates of ~75MB/s on an empty drive (large amount of QLC cells set to SLC mode) to ~33 MB/s on full drive. Sustained 4K rates are slightly worse.
128K sequential reads can get the 1.3 GB/s rates so long as the data is contiguous and not fragmented (converts access to a set of random reads at ~150-200 MB/s), although losing a third of that performance if the drive is full can be expected.
QLC is not unique in being affected this way, though some of the drops for the Crucial and Intel QLC drives are particularly stark outside of very inexpensive low-end drives.

That's not including some other unexpected problem like Samsung's initial TLC drives needing to institute periodic refresh writes to keep read performance from dropping by 6x or more.
Given the complexity of the problem space, I'm not sure Sony can roll its own NAND SSD in a console budget to that level of sustained performance without some bug or inconsistency that might make the results generally not worth the investment.

function · Apr 29, 2019

3dilettante said:
The consumer NVME QLC drives I've seen reviewed on Anandtech can have random 4K burst rates of ~75MB/s on an empty drive (large amount of QLC cells set to SLC mode) to ~33 MB/s on full drive.

So you could set any QLC drive controller permitting) to work in MLC mode? And would that bring with it MLC like write endurance of 50~100k writes...?

Next Generation Hardware Speculation with a Technical Spin [pre E3 2019]

MBTP

Shortbread

Island Hopper

3dilettante

MBTP

MBTP

jlippo

AlNom

Moderator

Gubbi

DieH@rd

Mihailjones

Deleted member 11852

Guest

Globalisateur

Globby

DieH@rd

Nisaaru

goonergaz

MrFox

Deludedly Fantastic

Shifty Geezer

uber-Troll!

eloic

3dilettante

function

None functional

Similar threads