PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

patsu · Mar 27, 2013

Onion+ and the associated cache management stuff may be Sony IP. AMD may have their own GCN 2 enhancements.

The hardware world will always move forward, but it's the software optimization that delivers the final experience.

Love_In_Rio · Mar 27, 2013

patsu said:
Onion+ and the associated cache management stuff may be Sony IP.

The hardware world will always move forward, but it's the software optimization that delivers the final experience.

mmm, yes, but did anybody here really thought Sony wouldn´t put any of its Research and Development IP inside something really transcendental into PS4?.

At least, if you don´t make a better GPU than what is in the market for more than a year give us some virtual reality stuff!.

I mean...isn´t somehow PS4 a little "generic"?. Well, it will be very developer friendly but still, Samsung could launch a similar console in three months with little R+D investment and Android OS taped inside.

Exophase · Mar 27, 2013

onQ said:
I'm not saying that the Cell was a CPU + GPU I'm saying that the PS4 SoC with the CPU & GPGPU together is creating a Cell like Processor.

Jaguar Cores as the PPU & the GPGPU CU's as the SPE's.

People are looking right at it but still can't see what's happening.

Because that's totally wrong. I mean, you can force whatever analogy you want in your head but that doesn't make it a good one. A bunch of general purpose and relatively low clocked CPU cores is very different from the single highly clocked but very fragile PPE. Unless I've badly followed news that rumor of 8 compute-only CUs (awful design idea) was confirmed false. Now you may be tempted to say that the GPU's CUs are like SPEs anyway but I very much don't think so, the SPEs are not that different from the PPE except they have no cache, even worse branch prediction, no SMT, and a more specialized instruction set and much larger register file. But fundamentally they're still high clocked dual-issue narrow execution processors with 4-wide vectors. PS4's CUs are low clocked, very wide (64-wide), heavily multithreaded, access shared memory and have caches. The whole thing is tied together through cache coherent and incoherent buses accessing a common fast memory pool, totally different from PS3 that relies on DMA engines going over a bunch of separate memory spaces (6 SPE local stores, XDR, GDDR3)

I'm interested in what Mark Cerny was saying, but that auto translation makes my head hurt and looks really easy to misinterpret.. Still, sounds to me like he's saying this:

1) The CPU cores can bypass cache and go straight to main memory (or straight to the GPU which is going to be sitting on the same memory bus). Nothing new here, SSE has had instructions to do so since the beginning and CPUs have routinely supported it, they utilize write coalescing buffers for it.
2) L2 cache has some kind of "volatile" setting - my guess is this basically makes a line write through instead of write back, so that cache doesn't have to be manually flushed to be accessible via the non-coherent link. But who really knows...

patsu · Mar 27, 2013

Love_In_Rio said:
mmm, yes, but did anybody here really thought Sony wouldn´t put any of its Research and Development IP inside something really transcendental into PS4?.

At least, if you don´t make a better GPU than what is in the market for more than a year give us some virtual reality stuff!.

I mean...isn´t somehow PS4 a little "generic"?. Well, it will be very developer friendly but still, Samsung could launch a similar console in three months with little R+D investment and Android OS taped inside.

According to that Gamasutra link, they focus on tools and ease-of-development in Vita timeframe. There are of course other things like Gaikai and the new PSN stuff, plus their studios to round up their offering. Software and services are half the experience.

If PS4 is generic, that means Sony will use that as a strength to get more quality games. Presumably developers have more time to work on the game than solving compatibility issues. The cross platform developers were not as great with Cell, and they may be even more impatient this time round.

It would also imply that it's easier for Sony to take their games from PS Vita to PS4 to Gaikai.

Mark Cerny mentioned that they put in effort to make compute tasks work better in Orbis. So there should be some Sony IP there based on what we know already. If they announce more stuff, then we will find out more.

Even for Samsung, they can be beaten by Apple with newer hardware. They are not safe too. The platform owners will deepen their offerings by going beyond pure h/w features. That's one of the reasons Sony has PS+ to groom its own digital user base. It is about execution. You can play the game differently whether your hardware is generic or special.

onQ · Mar 27, 2013

Love_In_Rio said:
mmm, yes, but did anybody here really thought Sony wouldn´t put any of its Research and Development IP inside something really transcendental into PS4?.

At least, if you don´t make a better GPU than what is in the market for more than a year give us some virtual reality stuff!.

I mean...isn´t in some way PS4 a little "generic"?. Well, it will be very developer friendly but still, Samsung could launch a similar console in three months with little R+D investment and Android OS taped inside.

Without the SPE's the PS3 was a little "generic" but when devs put the SPE's to good use the PS3 shined & I think that will be the case with the PS4, when devs use the CU for compute tasks.

3dilettante · Mar 27, 2013

Exophase said:
1) The CPU cores can bypass cache and go straight to main memory (or straight to the GPU which is going to be sitting on the same memory bus). Nothing new here, SSE has had instructions to do so since the beginning and CPUs have routinely supported it, they utilize write coalescing buffers for it.
2) L2 cache has some kind of "volatile" setting - my guess is this basically makes a line write through instead of write back, so that cache doesn't have to be manually flushed to be accessible via the non-coherent link. But who really knows...

I understood part of it as there was a cache bypass added to the GPU memory pipeline that could write to cached system memory. Conceptually, it sounds like this could be done with a combination of the GCN-provided global-coherent bits and system-coherent bits in a vector store.
I'm not sure if Sony added a streamlined way of doing this, or if the new element is that this can be applied to the cached memory space, or why this would be Sony-specific.

Granted, running is not the translate cannot automated text brings the understanding disaster.

onQ said:
Without the SPE's the PS3 was a little "generic" but when devs put the SPE's to good use the PS3 shined & I think that will be the case with the PS4, when devs use the CU for compute tasks.

Without the SPEs, Cell would have been without qualification more than a little terrible.

Love_In_Rio · Mar 27, 2013

onQ said:
Without the SPE's the PS3 was a little "generic" but when devs put the SPE's to good use the PS3 shined & I think that will be the case with the PS4, when devs use the CU for compute tasks.

But if you use CUs for computing... where is the 7850 for rendering?.

onQ · Mar 27, 2013

Love_In_Rio said:
But if you use CUs for computing... where is the 7850 for rendering?.

To me it seems like Cerny is saying that it can do computing & graphics at the same time.

　These mechanisms are a mechanism for making harmonize the graphics processing and arithmetic processing, to work together efficiently. It is "harmony" in Japanese say. By customizing the cache and bus these large, successful on PS3 "(Note I: SPU Runtime System) SPURS" are trying to reproduce. SPURS is a mechanism to virtualize resources such as SPU, to management autonomously.
　In similar feeling any x86-64 GPU, the PS4 is hardware available resources at various levels. There are eight pipes, this idea has eight queues each operation. In each queue, you can perform arithmetic and physics middleware, and workflow made themselves. It is, in a state that was carried out at the same time graphics processing it.
　These features, in a big way in the launch titles are not used. However, more broadly, in the whole life of the console, to be used in many games, according to the years passed, this kind of functionality should be going to be more and more important things.

Exophase · Mar 27, 2013

Of course it can do compute and graphics at the same time, like every other desktop GPU that's been on the market for the past few years.. Love_In_Rio is saying that you don't get the full rendering power of a 7850-level GPU if you're using some of the CUs for something else.

patsu · Mar 27, 2013

Love_In_Rio said:
But if you use CUs for computing... where is the 7850 for rendering?.

A PC also needs to perform compute tasks on a 7850 yes ?

They can use the CPU but so can the PS4. The devs will just keep the physics calculation to a suitable or adjustable level.

Exophase said:
Of course it can do compute and graphics at the same time, like every other desktop GPU that's been on the market for the past few years.. Love_In_Rio is saying that you don't get the full rendering power of a 7850-level GPU if you're using some of the CUs for something else.

I thought PC GPGPU compute tasks are mostly asynchronous ? (e.g., particle effects).
Not sure how much they are optimized for individual AMD GPU SKU though.

yewyew · Mar 27, 2013

Love_In_Rio said:
But if you use CUs for computing... where is the 7850 for rendering?.

Without confirmation, unfortunately, I can only point to speculation of Liverpool's CU's being capable of doing double instuction for rendering AND compute simultaneously through the fact that they're unified CU's for both combined with the compute rings doing simulteneous asynchronous compute.

3dilettante · Mar 27, 2013

That's called GCN.
The difference for the upcoming designs is that their ability to receive commands has been broadened and made more flexible, but it is functionally the same.

onQ · Mar 27, 2013

Exophase said:
Of course it can do compute and graphics at the same time, like every other desktop GPU that's been on the market for the past few years.. Love_In_Rio is saying that you don't get the full rendering power of a 7850-level GPU if you're using some of the CUs for something else.

& Cerny is saying that it can do both at the same time efficiently.

meaning that you don't have to take away the CUs because the CUs can do Graphics & Computing at the same time efficiently.

pjbliverpool · Mar 27, 2013

onQ said:
& Cerny is saying that it can do both at the same time efficiently.

meaning that you don't have to take away the CUs because the CUs can do Graphics & Computing at the same time efficiently.

So you're suggesting the power of the CU's magically doubles if you're doing both graphics and compute work on them simultaneously since one type of workload has no negative impact on the other?

Scott_Arm · Mar 27, 2013

pjbliverpool said:
So you're suggesting the power of the CU's magically doubles if you're doing both graphics and compute work on them simultaneously since one type of workload has no negative impact on the other?

It's the untapped power of Cell 2.0

I'm interested in reading about the tweaks that may be custom to Orbis when they haven't been translated into mangled English.

Silent_Buddha · Mar 27, 2013

onQ said:
& Cerny is saying that it can do both at the same time efficiently.

meaning that you don't have to take away the CUs because the CUs can do Graphics & Computing at the same time efficiently.

So you are saying they doubled up (duplicated) everything in the CU and then made it so you could only use half of it for GPU and half of it for compute? Because that's the only way what you says works. Otherwise, anytime a CU is being used for a compute workload it can't be used for a graphics workload.

That would be a colossal waste of transistors as that would mean it is basically 36 CUs where 18 are useable only for graphics and 18 are useable only for compute.

Or more likely, that's where the whole 14+4 thing originates. You can only reliably use 14 CUs for graphics as 4 CUs are somewhat reserved for compute tasks. Hence using those 4 CUs doesn't impact graphics rendering which is reliant on the 14 CUs.

Either way, any resources that are doing compute aren't going to simultaneously be doing graphics related tasks.

Regards,
SB

patsu · Mar 27, 2013

yewyew said:
Without confirmation, unfortunately, I can only point to speculation of Liverpool's CU's being capable of doing double instuction for rendering AND compute simultaneously through the fact that they're unified CU's for both combined with the compute rings doing simulteneous asynchronous compute.

Can't a regular AMD GCN GPU do that ? I meant performing different jobs simultaneously scheduled by different ACEs.

3dilettante said:
That's called GCN.
The difference for the upcoming designs is that their ability to receive commands has been broadened and made more flexible, but it is functionally the same.

Ah beaten.

Silent_Buddha said:
Or more likely, that's where the whole 14+4 thing originates. You can only reliably use 14 CUs for graphics as 4 CUs are somewhat reserved for compute tasks. Hence using those 4 CUs doesn't impact graphics rendering which is reliant on the 14 CUs.

If they have ways to make sure the compute tasks don't mess up the graphics tasks' bandwidth and internal states in the GPU, then under these conditions, they don't really have to split the GPU up into 14 + 4 "partitions" unless it is more effective to do so.

The 8 ring buffers per pipe may be good for hiding latency and setup time.

yewyew · Mar 27, 2013

patsu said:
Can't a regular AMD GCN GPU do that ? I meant performing different jobs simultaneously scheduled by different ACEs.

I believe it's proprietary and not stock, which to me alludes from Mark Cerny's interview, stating they couldn't get HSA and such on time so they had to make their own...

Tahir2 · Mar 27, 2013

Do you guys thing that the Jaguar CPU will be a bottleneck in extracting all the theoretical performance from the graphics side of the APU?

DrJay24 · Mar 27, 2013

Silent_Buddha said:
So you are saying they doubled up (duplicated) everything in the CU and then made it so you could only use half of it for GPU and half of it for compute? Because that's the only way what you says works. Otherwise, anytime a CU is being used for a compute workload it can't be used for a graphics workload.

I'm guessing he misspoke. He could mean the CUs are less likely to go unused with the added compute only pipelines adding jobs to the queue. This is the "efficiency" that was touted a while back for Durango.

Liverpool has multiple compute rings and pipes that provides fine control over how the system resources are divided amongst all of the application’s GPU workloads. This pipelines can perform simulteneous asynchronous compute.

http://www.vgleaks.com/orbis-gpu-compute-queues-and-pipelines/

PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

patsu

Love_In_Rio

Exophase

patsu

onQ

3dilettante

Love_In_Rio

onQ

Exophase

patsu

yewyew

3dilettante

onQ

pjbliverpool

B3D Scallywag

Scott_Arm

Silent_Buddha

patsu

yewyew

Tahir2

DrJay24

Similar threads