PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

Status
Not open for further replies.
We've had AMD employees on this forum confirm that technologies and enhancements made for these consoles have worked their way into the PC product line. Think of it this way, Sony has very talented hardware designers and the most technologically accomplished group of developers at WWS. It's all but inevitable that their input beginning back in 2008 or 2009 would have helped to shape the way forward for both GCN and HSA.

There's a distinct difference between "we made the customizations" and "we helped shape the future of these architectures" which of the latter should be split to "we told what we'd like to see in there" and "we made the necessary changes and additions ourselves"
 
I think where the special ALU's are coming back up is where he said intentionally not a 100% round where 100% round is taken to be all ALU's are exactly the same. It could be he is implying somehing different but he does seem to be implaying to every execution unit is the same.
 
I think where the special ALU's are coming back up is where he said intentionally not a 100% round where 100% round is taken to be all ALU's are exactly the same. It could be he is implying somehing different but he does seem to be implaying to every execution unit is the same.

Well, a 7970 is 100% round with 32 CUs and Hawai will be with more, so this is PR talking.ALUs will be used for graphics mainly and if true that there is a percentage underused then we will see some compute effects in first parties above all.
 
Well, a 7970 is 100% round with 32 CUs and Hawai will be with more, so this is PR talking.ALUs will be used for graphics mainly and if true that there is a percentage underused then we will see some compute effects in first parties above all.

yeah, which suggests there is some bottleneck in ps4 that keeps more than 14 cu's from being effective rendering.

it can be the cpu, the bandwidth (doubt this), or most likely imo, some complex internal combination of registers, caches, and other complex things.
 
He could just be talking about GCN's ALU-count scaling properties. No GPU architecture or configuration is perfectly balanced, this is not unique to the PS4.

The 7870 for example has 20 available CUs compared to the 7850's 16, a 25% advantage, but the scaling in games at the same clock speed almost never reaches that quantity, averaging around 10%:
http://www.techpowerup.com/reviews/Powercolor/HD_7850_PCS_Plus/28.html

Similar imperfect scaling can be observed between Tahiti and Cape Verde configurations.

There will always be bottlenecks and imperfect scaling, but sometimes it is more cost/die area/power effective to just add on more (ALUs in this case) than to eliminate the bottleneck.
 
It's a fixed architecture. Developers will optimize more/differently compared to PC SKUs. The % utilization there may not be comparable.
 
There's a distinct difference between "we made the customizations" and "we helped shape the future of these architectures" which of the latter should be split to "we told what we'd like to see in there" and "we made the necessary changes and additions ourselves"

We all know Sony is buying finished chips from AMD at the end. No one has ever been under any illusion that Sony went into AMD's designs themselves and started redoing the layout. Even Cerny has always couched his statements as feature requests made to AMD. I don't know why you want to engage in some weird semantic debate when there's no real controversy here.
 
We all know Sony is buying finished chips from AMD at the end. No one has ever been under any illusion that Sony went into AMD's designs themselves and started redoing the layout. Even Cerny has always couched his statements as feature requests made to AMD. I don't know why you want to engage in some weird semantic debate when there's no real controversy here.

Could you link to Cerny talking about that? There's no end (on other forums anyway) to "Sony did this and that themselves" because of Cernys statements about the customizations
 
http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php

The three "major modifications" Sony did to the architecture to support this vision are as follows, in Cerny's words:

  • "First, we added another bus to the GPU that allows it to read directly from system memory or write directly to system memory, bypassing its own L1 and L2 caches. As a result, if the data that's being passed back and forth between CPU and GPU is small, you don't have issues with synchronization between them anymore. And by small, I just mean small in next-gen terms. We can pass almost 20 gigabytes a second down that bus. That's not very small in today’s terms -- it’s larger than the PCIe on most PCs!
  • "Next, to support the case where you want to use the GPU L2 cache simultaneously for both graphics processing and asynchronous compute, we have added a bit in the tags of the cache lines, we call it the 'volatile' bit. You can then selectively mark all accesses by compute as 'volatile,' and when it's time for compute to read from system memory, it can invalidate, selectively, the lines it uses in the L2. When it comes time to write back the results, it can write back selectively the lines that it uses. This innovation allows compute to use the GPU L2 cache and perform the required operations without significantly impacting the graphics operations going on at the same time -- in other words, it radically reduces the overhead of running compute and graphics together on the GPU."
  • Thirdly, said Cerny, "The original AMD GCN architecture allowed for one source of graphics commands, and two sources of compute commands. For PS4, we’ve worked with AMD to increase the limit to 64 sources of compute commands -- the idea is if you have some asynchronous compute you want to perform, you put commands in one of these 64 queues, and then there are multiple levels of arbitration in the hardware to determine what runs, how it runs, and when it runs, alongside the graphics that's in the system."

"The reason so many sources of compute work are needed is that it isn’t just game systems that will be using compute -- middleware will have a need for compute as well. And the middleware requests for work on the GPU will need to be properly blended with game requests, and then finally properly prioritized relative to the graphics on a moment-by-moment basis."

This concept grew out of the software Sony created, called SPURS, to help programmers juggle tasks on the CELL's SPUs -- but on the PS4, it's being accomplished in hardware.

The team, to put it mildly, had to think ahead. "The time frame when we were designing these features was 2009, 2010. And the timeframe in which people will use these features fully is 2015? 2017?" said Cerny.

I think part of the problem is the interviewer represents this sections on the customization as changes "Sony made", but in the actually Cerny quotes when he says "We" I think he means that to include AMD, which you can see explicitly in the third point above. But he establishes a timeframe (2009) and a model they were using (SPURS) to establish hardware features in cooperation with AMD. This would have been at a time when AMD's GPGPU features were well behind nVidia, and Sony was pushing for capabilities far beyond what even GCN initially provided.
 
yeah, which suggests there is some bottleneck in ps4 that keeps more than 14 cu's from being effective rendering.

it can be the cpu, the bandwidth (doubt this), or most likely imo, some complex internal combination of registers, caches, and other complex things.

This is total rubbish, practically anything that would be used in rendering would be used in GPGPU the only parts that wouldn't be are the FF parts of the 3d pipeline and I really doubt that they are a bottleneck.
 
Doesn't anyone else think 64 queues is excessive? Why is that better to, say, 16 queues, and filling up new queues once they've been completed? 64 queues only makes sense to me if there's a possibility of the GPU processing all outstanding jobs faster than you can fill them, needing a huge pool of work to do in readiness.
 
Doesn't anyone else think 64 queues is excessive? Why is that better to, say, 16 queues, and filling up new queues once they've been completed? 64 queues only makes sense to me if there's a possibility of the GPU processing all outstanding jobs faster than you can fill them, needing a huge pool of work to do in readiness.

They have 16 CU's they would probably want a multiple of that so that each CU has a number of different queues to pick from.
 
you mean more than 16 queues to pick from?

because 64/18 = 3.555555555.....

Yeah my bad, its 18 CU's which gives them a bit more then 3 queues per CU. Which suggests to me that they want to do a lot of different things on the GPU at the same time.

Infact the different between the `Graphics` and `GPGPU` CU's might simply be the amount of queues they get..
 
http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php



I think part of the problem is the interviewer represents this sections on the customization as changes "Sony made", but in the actually Cerny quotes when he says "We" I think he means that to include AMD, which you can see explicitly in the third point above. But he establishes a timeframe (2009) and a model they were using (SPURS) to establish hardware features in cooperation with AMD. This would have been at a time when AMD's GPGPU features were well behind nVidia, and Sony was pushing for capabilities far beyond what even GCN initially provided.

That's the problem there, it's only assumption which depending on who one is is read differently, it could be read as "We told AMD what we'd like to see there" or "We did the modifications to the architecture ourselves"

Wasn't the 14+4 CU just something suggested to developers?

To my understanding it was nothing but example of how much of the GPU resources might be doing graphics while the rest is doing GPGPU, not anything that would be "set in stone" or that those 4 CU's would be somehow different or anything else.
In the end it's the software (game) you write for the PS4 that decides what happens and when it happens-
 
You'll have to excuse me, I've been a longtime lurker and have never signed up, before now, obviously.

There seems to be a certain amount of secrecy regarding some of the aspects of the PS4 hardware, nor have we seen any pictures of the motherboard.

During the February reveal, Sony mentioned a 'custom secondary processor' that has never been eloborated since (that I've seen). It's a bit of an out-there question, but do you think it's possible this secondary processor could hold the operating system in its own memory pool and process the entire system? I'd think it'd be rather easy for a cheap arm CPU to do this without much hassle.

It's been discussed that such a processor may just be used for the compression/decompression of the shared videos, but that'd mean its use is only for a tiny percentage of the time and would be a terrible use of resource.

The Shadow Fall slides/presentation might possibly put a stop to this idea. Though it could all be intentional from Sony, as the devs of Killzone must have had to put each aspect of the presentation through Sony, who could stop certain things being known.

I'd really want to know about that secondary processor more than any other aspect right now. It was used as a selling point remember, just like the 8GB of GDDR5.

Sorry if this is a bit rambly, am writing on my phone.
 
Sony hasn't elaborated much beyond the existence of it, and its ability to manage traffic between the network and storage as part of its background download and update functionality.

Its physical and logical location in the system is not disclosed, and no further details as to its design were put out.
This can be in part for competitive reasons, and possibly in part because it's not meant to be interacted with directly by developer software.

Because it can perform or initiate background system updates, it must at some level be capable of interacting with a priveledged domain at or above the level of the OS used by the CPU cores, since this is a sensitive part of the platform's DRM. That doesn't mean that the processor runs the OS. I would argue against the idea of using a background processor for running the OS in its entirety because there are system calls and functions that need to be invoked by the CPU cores, and keeping them isolated on a separate, most likely weaker, and possibly physically distant processor would compromise performance.

If there is some interaction with the main CPU cores, it might happen at the same level that the many IO devices and offload engines do in any standard PC. There are dozens of RISC cores in any PC that do a lot of system and controller work operating behind some layer of abstraction and asynchronously from the main system.
 
Status
Not open for further replies.
Back
Top