PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

imaxx · Mar 2, 2014

hesido said:
Even though PS4 has gigs of mem, what it doesn't have is 500MB/sec drive, so PRT could help reduce loading times, no?

Yep, this is one of the point PRT's usefulness.
The other (imho) is the fact that you can then release in future 30 gig of textures, being sure your system won't die trying to load all textures from disk but doing it 'on demand' and making all your textures way more denser than usual.

Deleted member 11852 · Mar 3, 2014

imaxx said:
Yep, this is one of the point PRT's usefulness.
The other (imho) is the fact that you can then release in future 30 gig of textures, being sure your system won't die trying to load all textures from disk but doing it 'on demand' and making all your textures way more denser than usual.

PRT seems aimed at solving the problem of ever larger textures and not enough video RAM, rather than disk I/O. You have a smart MMU that lets the graphics engine unload parts of textures from VRAM if they aren't needed, and make space for textures (or tiles from textures) that are. My understanding is that the shuttling isn't to/from disk but main memory.

I think the effect on disk I/O from using PRT would probably be marginal, but possibly higher - depending on how the 64k tiled textures reside on disc. Given the cache size on the average HDD and the way they cache sequential track reads, even if you only want twelve 64k textures from a 2mb texture file, the HDD is probably going to read and cache that entire texture file anyway, you'll save little-to-nothing - depending on your textures sizes, naturally.

PRT is a nod to approaches like ID's megatexturing but it's also a fundamentally a JIT solution, i.e. load/unload textures partially or in full just in time as the current rendering state requires. This is quite different to having oddles of GDDR memory and using a forecasting algorithm to pre-load the textures you'll most likely need and having them there already. Even if you don't need them this frame, oddles of RAM means you can keep them around until you definitely aren't going to need them in the foreseeable future.

I'm not saying PRT couldn't be useful but it seems targeted at solving a different problem.

Grall · Mar 3, 2014

If you only need to load (roughly) the chunks of a large high-rez texture which are actually visible, of course that should positively impact loadtimes...

Deleted member 11852 · Mar 3, 2014

Grall said:
If you only need to load (roughly) the chunks of a large high-rez texture which are actually visible, of course that should positively impact loadtimes...

How big are these textures and how do you envisage they are stored on disk so that just those tiles needed are read from disk? Bear on mind even the most rudimentary of modern HDDs do read-behind-read-ahead caching approximate to the track containing the actual data required.

taisui · Mar 3, 2014

I thought Rage already implemented a form of PRT with the mega texture via software.

Deleted member 11852 · Mar 3, 2014

taisui said:
I thought Rage already implemented a form of PRT with the mega texture via software.

The id Tech 5 engines supports megatextures but PRT's implementation is at the API level being supported by on-chip hardware.

Grall · Mar 3, 2014

DSoup said:
How big are these textures and how do you envisage they are stored on disk so that just those tiles needed are read from disk?

They could be...arbitrarily big, really. Rage-big? *shrug* Would depend on the game title. It's the concept which is the object of discussion, innit?

Bear on mind even the most rudimentary of modern HDDs do read-behind-read-ahead caching approximate to the track containing the actual data required.

Yes, but modern HDDs are also fast only when doing linear accesses, and reading a level's worth of very large detailed scattered textures (these consoles have 5+ gigs free RAM after all, the majority of which undoubtedly gets sunken into textures I'd think) would be slower than reading smaller chunks that initially only cover the visible screen.

Even the fastest consumer HDDs available today (which fit in a console, ie no 10k RPM drives) would probably whirr away for 20-30 seconds or maybe more to fill up 5 gigs. Loading only visible stuff first would logically be quicker.

DieH@rd · Mar 3, 2014

Found by gaffers in "Naughty Dog “surprised no one has found the secrets we’ve dropped around the internet" thread:

http://vimeo.com/88095995
http://vimeo.com/88104881

Achievable on PS4, assuming ND is using all possible techniques to pull this off [even Ryse-style of pre-calculating physics and then streaming that data into engine]??? Can someone identify these vids as part of some other random project?

AlNom · Mar 3, 2014

Looks promising.

DieH@rd · Mar 3, 2014

sadly, fake http://www.youtube.com/watch?v=8jD1bz4N3_0

Deleted member 11852 · Mar 4, 2014

Grall said:
They could be...arbitrarily big, really. Rage-big? *shrug* Would depend on the game title. It's the concept which is the object of discussion, innit?

I think the concept is well established, I was discussing the application of PRT and questioning it's relevance on a machine with gigabytes of GDDR. PRT would be genuinely useful for engines like id Tech 5 which support megatextures but I don't see many games using megatextures for their game worlds.

Yes, but modern HDDs are also fast only when doing linear accesses, and reading a level's worth of very large detailed scattered textures (these consoles have 5+ gigs free RAM after all, the majority of which undoubtedly gets sunken into textures I'd think) would be slower than reading smaller chunks that initially only cover the visible screen.

I would expect all the textures (and tiles within a texture), within any given pak, to be a contiguous file. If they are scattered about (fragmented) then you are having to do a bunch of sequential reads split by seeks which will be terrible for performance.

But your HDD generally wouldn't read small chunks of data, it reads great stretches of track - which is the read-behind-read-ahead buffer in effect. As I said in my previous post, if you have a 2mb texture on disc compromising thirty-two 64k PTR titles and only want tiles 1-5, 7-11, and 21-32 the HDD cache will scoop up the entire file regardless of you only wanting bits of it.

Even the fastest consumer HDDs available today (which fit in a console, ie no 10k RPM drives) would probably whirr away for 20-30 seconds or maybe more to fill up 5 gigs. Loading only visible stuff first would logically be quicker.

And with streaming games you should be able to get going without filling up RAM. Take for example GTA, when the game state initially begins (after starting GTA or loading a save game), you can see very little - usually the interior of a building. In the time it takes you to begin moving around around the game map, the game has already begun loading geometry and textures the things immediately around you, then higher resolution versions of those textures, then things further away and any RAM left over will be used to speculatively load things on where the game thinks you might be going.

So if you're driving a car north quite fast, it can speculatively grab textures needed to render geometry to the north. Unless I'm driving, of course ;-)

The comment about disk I/O derailed things. If a game designer is really looking to save disk I/O or reduce read times based on not needing an entire texture then they must know an awful lot about the game world, i,e, that an object is obscuring a number of tiles in any given texture and that sounds an awful lot like it being late to start streaming in a texture from disk. You want textures to be in video memory before you need them.

BoardBonobo · Mar 4, 2014

It's one of the reasons I thought I'd try a Hybrid drive in the PS4. 1TB of HDD and 8Gb of SDD.

pMax · Mar 4, 2014

DSoup said:
But your HDD generally wouldn't read small chunks of data You want textures to be in video memory before you need them.

The issue has more to do with the fact that, with PRT, you do not need to keep all the texture in memory, but only the chunks that fits.
So you can easily release textures of all the size you want, being sure that, at any given time, only the useful part will stay in memory.

Essentially, is like comparing the load of a full file with fopen vs. mmap.
If you can add huge textures to your game FOR FREE, it means you are free to make them and splat in your game, to get nicer output "for free".
With PRT, any game can provide higher quality textures without impacting the memory load time/load performance.
Something you could not do till now.

So yes, I personally expect developers to abuse of the feature...

Deleted member 11852 · Mar 4, 2014

pMax said:
The issue has more to do with the fact that, with PRT, you do not need to keep all the texture in memory, but only the chunks that fits.

Yes, that's what I said at the start.

pMax said:
Essentially, is like comparing the load of a full file with fopen vs. mmap.

That's the ultimate goal; where the engine programmer doesn't even need to think about how much of a texture is in GDDR (vram) or DDR (system RAM), but that the API and graphics card sort this out. I don't know if the current implementations are quite there yet but I've not seen a whole lot written on PRT since the initial AMD reveal.

With PRT, any game can provide higher quality textures without impacting the memory load time/load performance. Something you could not do till now.

You can do this now (id Tech 5 did this over two years ago) it but it means your engine has to create arbitrary textures to feed the GPU from the single large megatexture.

But to iterate. PS4 games have access to 5Gb of GDDR5 RAM. If developers are having to use PRT to make their textures fit then, well, I'd like to what the heck they are doing with their asset management.

Cjail · Mar 4, 2014

Was this posted?
http://gamingbolt.com/ps4-ice-team-...lingdetiling-on-the-cpu-is-10-100x-faster-now

Deleted member 11852 · Mar 4, 2014

Cjail said:
Was this posted?
http://gamingbolt.com/ps4-ice-team-...lingdetiling-on-the-cpu-is-10-100x-faster-now

It being IGN I assumed they'd mistyped his name, presumed Cory, but no his personal blog reveals his full name as Cort Danger William Folberth Stratton. Somebody with a name like that could accomplish anything!

However, in truth, I don't have the foggiest what this means for the average PS4 game.

Anybody? Preferably somebody has have actually write code for PS4..

JPT · Mar 4, 2014

DSoup said:
It being IGN I assumed they'd mistyped his name, presumed Cory, but no his personal blog reveals his full name as Cort Danger William Folberth Stratton. Somebody with a name like that could accomplish anything!

However, in truth, I don't have the foggiest what this means for the average PS4 game.

Anybody? Preferably somebody has have actually write code for PS4..

According to his own tweet, it will have basically no effect on running code. But help developers and their GPU tools.

https://twitter.com/postgoodism/status/440918044739530752

Rangers · Mar 5, 2014

Off topic but, I like this new gamingbolt site. I've noticed they basically tend to ask the questions B3Ders would be interested in. A focus on tech and comparing PS4/X1. Sometimes they get smaller developers to answer, who probably aren't as media wary, and provide interesting fodder. They have been showing up a lot lately.

3dilettante · Mar 7, 2014

I figured I'd link to this post concerning a Naughty Dog presentation that contains some discussion of the PS4 CPU and job system.

http://forum.beyond3d.com/showpost.php?p=1832042&postcount=134

The job system and CPU descriptions go with 6 cores being available.
There's some weirdness with the memory hierarchy description, for which I think there is some missing context.
There's discussion of the 2MB L2 for Jaguar, but then the presentation goes on to use split that suggests 1MB chunks, which I do not know the reason for.

That aside the latency numbers from L1 to L2 to DRAM are 3 to 26/190 to 220+ cycles.
A comparison with the Vgleaks numbers for Durango is 3 to 17/120 to 140-160 cycles.

I suspect there is a discrepancy with what is being measured for the L2 numbers, such as whether the ND presentation is going with general data for Jaguar and Durango is using best-case numbers.
Other measurements agree with 26, and Microsoft has stated they didn't mess with the clusters themselves.

The L2 sharing scenario may also be a worst-case number for Naught Dog's presentation, as Durango has something between 100 and 120 depending on how far up the other cache hierarchy you need to go.
Microsoft did claim they did more to update the interface used to share data between the L2s, which might explain some of the disparity.
Not knowing what exactly is measured, and the fact that reporting latencies in cycles leaves an uncontrolled variable, making a definitive comparison is not quite doable.

One thing I do think notable is a measure of the efficacy of AMD's on-die interconnect and the alleged burden of external GDDR5 or DDR3 memory.

We can sort of get a ballpark figure for the DRAM subsystem's latency contributions by looking at the difference between the remote L2 hit scenario and a miss to memory.
For Durango, it is 120 for a remote hit, and 140-160 for DRAM.
For Orbis, it is 190 for a remote hit and 220+ for DRAM.
DRAM has worst-case scenarios that likely fall outside the numbers indicated for both platforms, but the less pathological cases may be what is being reported.

Both get 30 or more cycles for DRAM, or 14-19% of total memory latency.
The in-cluster hierarchy takes another chunk of roughly the same size.
Something over half is taken up by on-chip broadcasts and interconnect traversal.
GDDR5 vs DDR3 is very much in the noise.

The heterogeneous memory subsystem of AMD's APUs is not breaking the overall trend from Llano, Trinity, Bobcat, Kabini, Kaveri, etc. in terms of latency.

Grall · Mar 7, 2014

3dilettante said:
Both get 30 or more cycles for DRAM, or 14-19% of total memory latency.
The in-cluster hierarchy takes another chunk of roughly the same size.
Something over half is taken up by on-chip broadcasts and interconnect traversal.
GDDR5 vs DDR3 is very much in the noise.

The heterogeneous memory subsystem of AMD's APUs is not breaking the overall trend from Llano, Trinity, Bobcat, Kabini, Kaveri, etc. in terms of latency.

Interesting. Thank you!

PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

imaxx

Deleted member 11852

Guest

Grall

Invisible Member

Deleted member 11852

Guest

taisui

Deleted member 11852

Guest

Grall

Invisible Member

DieH@rd

AlNom

Moderator

DieH@rd

Deleted member 11852

Guest

BoardBonobo

My hat is white(ish)!

pMax

Deleted member 11852

Guest

Cjail

Fool

Deleted member 11852

Guest

JPT

Rangers

3dilettante

Grall

Invisible Member

Similar threads