PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

Grall · Mar 29, 2013

onQ said:
I'm just trying to figure out how it's getting maximum graphics of 1.843TFLOPS & computing at the same time.

IT ISN'T. ...As people have been repeatedly telling you for at least a week now.

Both types of jobs (gfx, computing) draw from the same pool of computing resources. If you consume X resource for one type of job, there's X resource less left for the other type of job.

Now stop this nonsense please.

onQ · Mar 29, 2013

Scott_Arm said:
It can't unless you count the cpu.

patsu said:
If you include the CPU, it would be higher than 1.84TFLOPS, almost 2 TFLOPS peak.

Scott_Arm said:
Exactly. The gpu cannot do more than 1.84 tflops regardless of the workload. Any additional computing would have to happen on the cpu.

But they are talking about running Compute on the GPU.

What was intriguing was new data on how the PlayStation 4's 18-compute-unit AMD graphics core is utilised. Norden talked about "extremely carefully balanced" Compute architecture that allows GPU processing for tasks that usually run on the CPU. Sometimes, employing the massive parallelisation of the graphics hardware better suits specific processing tasks.

"The point of Compute is to be able to take non-graphics code, run it on the GPU and get that data back," he said. "So DSP algorithms... post-processing, anything that's not necessarily graphics-based you can really accelerate with Compute. Compute also has access to the full amount of unified memory."

"The cool thing about Compute on PlayStation 4 is that it runs completely simultaneous with graphics," Norden enthused. "So traditionally with OpenCL or other languages you have to suspend graphics to get good Compute performance. On PS4 you don't, it runs simultaneous with graphics. We've architected the system to take full advantage of Compute at the same time as graphics because we know that everyone wants maximum graphics performance."

Leaked developer documentation suggests that 14 of the PS4's compute units are dedicated to rendering, with four allocated to Compute functions. The reveal of the hardware last month suggested otherwise, with all 18 operating in an apparently "unified" manner. However, running Compute and rendering simultaneously does suggest that each area has its own bespoke resources. It'll be interesting to see what solution Sony eventually takes here.

AlphaWolf · Mar 29, 2013

Compute can run on the gpu, it just can't exceed the total throughput of the gpu when combined with graphics tasks. You have ALU's performing the functions, if they are busy with a graphics task they can not perform a compute task. The maximum throughput is 1.84TF, some of this can be some compute and some graphics or all one or the other, but at no time will total throughput exceed 1.84TF from the gpu. 1.84TF is full capacity.

Shifty Geezer · Mar 29, 2013

onQ said:
But they are talking about running Compute on the GPU.

Where in that quote does it say they are getting 1.8 TF of graphics work AND compute on top?! It's talking about task switching without a penalty from compute to graphics and back again, or even running compute concurrently on some CUs and graphics on the rest of the CUs. No-one has said that it's possible to get 1.8 TF of graphics rendering and more compute (requiring more logic silicon than 18 CUs) on top of that.

Repeating the same quotes isn't going to change anything. You've been told another way to understand the comments. At this point if you don't see it the way everyone's explaining it, you should either agree to disagree or just take their word for it.

3dilettante · Mar 29, 2013

onQ said:
I'm just trying to figure out how it's getting maximum graphics of 1.843TFLOPS & computing at the same time.

The problem is that it sounds like you're trying to get an understanding of CS concepts and elements from PR blurbs, articles written by unsophisticated authors, and rumor posts.
Their goal isn't accuracy or necessarily educating the reader.
News blurbs and PR speak are imprecise in their word choice even if they aren't deliberately misleading, and they abuse terminology horribly.

I'm not claiming high expertise in this architecture beyond the documentation and presentations that have come out over the years, and that's no guarantee that these new chips that haven't been released yet couldn't change things in totally unexpected ways.

However, it's very useful as a sanity check when I read up on the current architecture and know some of the concepts before reading PR speak or rumors. It's why I keep asking for people to show their math, because there's a lot of stuff that a little preliminary effort would turn up red flags for.

When we've reached the point that we seem to be using different meanings of the word "blocking", I think there's a problem where some more preliminary research is needed, because it inderpins the concepts we're discussing.
There's a lot of underlying assumptions and misdirections possible with the words being used that all but the most effective tech authors miss, and that PR guys hope you never realize until after you buy their product.

patsu · Mar 29, 2013

onQ said:
But they are talking about running Compute on the GPU.

Yap ! As in you can have chicken or lamb for dinner. If you want both, you will have to give up some chicken and some lamb to fill your stomach to the same level.

The interesting thing is after they managed to run both compute and graphics jobs on the APU, can they combine the jobs ? Some of these jobs are related. So it's not always completely separate logically. Some data and computation may be applicable for both compute and graphics.

The other thing is: The traditional CPU-GPU arrangement is not optimal. In nextgen, AMD and Sony (and MS) designed their systems to avoid known issues that hampered simultaneous compute and graphics job execution.

EDIT:
For efficiency discussions, *sometimes* it's a choice between chicken and coffee. You may be able to find space for coffee even after you have finished the chicken.

rockaman · Mar 29, 2013

Maybe behind the PR speak it simply means that compute and graphics operations will be efficiently organised. They seemed to also emphasize those ACE/DME components as well. Maybe so the two processes don't interfere with one another as much. It sounds like compute on GPU tends to hit performance more than expected on PCs today, maybe due to software limits.

patsu · Mar 30, 2013

I don't really see any outrageous claims. It's just a general descriptions of what they have. But the topic is complex, it will take hours and days to fully flesh out the details. Conversely, it takes some knowledge and insights on our part to understand the nuances.

3dilettante · Mar 30, 2013

patsu said:
The interesting thing is after they managed to run both compute and graphics jobs on the APU, can they combine the jobs ? Some of these jobs are related. So it's not always completely separate logically. Some data and computation may be applicable for both compute and graphics.

The graphics front end already supports compute shaders. The compute pipes are what you get if you strip graphics-specific functions out. The use of compute shaders to enhance graphics was at least partly assumed by me as being part of the graphics load if those commands were issued for graphics work. The compute side was what I assumed was the GPGPU and non-pixel work.

rockaman said:
Maybe behind the PR speak it simply means that compute and graphics operations will be efficiently organised. They seemed to also emphasize those ACE/DME components as well. Maybe so the two processes don't interfere with one another as much. It sounds like compute on GPU tends to hit performance more than expected on PCs today, maybe due to software limits.

The context switch comment sounds like a reference to older GPUs (generation before last or earlier), which literally had to drop everything and initialize the compute code, run it, then drop it and reload the graphics context.
This is why after Nvidia started touting GPU PhysX but before the more modern concurrent architectures, one of the few ways to turn it on without killing performance was to have an SLI rig where PhysX ran on the second card. Until more flexible GPUs were developed, there was a massive penalty to running compute and graphics on the same card.

To use a previous example, if graphics took 10ms and compute took 6ms, running them on the same GPU looked like this:
Graphics+context switch penalty + compute + context switch penalty > 16ms.

Running two things took much longer than the sum of their run time.

patsu · Mar 30, 2013

3dilettante said:
The graphics front end already supports compute shaders. The compute pipes are what you get if you strip graphics-specific functions out. The use of compute shaders to enhance graphics was at least partly assumed by me as being part of the graphics load if those commands were issued for graphics work. The compute side was what I assumed was the GPGPU and non-pixel work.

Yeah, I was asking about the fixed function units a few pages back.

Averagejoe · Mar 30, 2013

AlphaWolf said:
No. 1.84TF is full throughput. 100% efficiency. That's the limit.

But GPU's on PC don't get 100% of that ever.

GPU's on PC are not 100% efficient,and that is a fact so if the PS4 can actually reach its 1.84TF it would do probably much better than the 7850 and probably even better as well than the 7870.

AlphaWolf · Mar 30, 2013

Averagejoe said:
But GPU's on PC don't get 100% of that ever.

GPU's on PC are not 100% efficient,and that is a fact so if the PS4 can actually reach its 1.84TF it would do probably much better than the 7850 and probably even better as well than the 7870.

Which has nothing to do with what OnQ was trying to suggest. The PC vs thread is elsewhere.

onQ · Mar 30, 2013

AlphaWolf said:
Which has nothing to do with what OnQ was trying to suggest. The PC vs thread is elsewhere.

Wait! What was I trying to suggest? the only thing that I been pointing out is that Sony said the PS4 will be able to compute while getting the maximal amount of graphics out of the 1.84TFLOPS without the computing taking away from the graphics.

temesgen · Mar 30, 2013

onQ said:
Wait! What was I trying to suggest? the only thing that I been pointing out is that Sony said the PS4 will be able to compute while getting the maximal amount of graphics out of the 1.84TFLOPS without the computing taking away from the graphics.

I think a good analogy would be to imagine you have a truck that can haul 1.8 tons of cargo in its bed. Now if I have bricks in the bed I can't haul oil because oil requires a modification prior to filling up. With the changes sony has come up with they can have 1 ton of bricks and .8 tons of oil all in the bed at the same time. In other words they reworked the bed so it can haul liquids as well as solids without having to swap anything out but that doesn't change the max hauling potential of the truck. I'm still limited to the physical limitations of the bed its just that now I can haul 2 different materials without having to unload/modify the bed.

-Sweeper_ · Mar 30, 2013

onQ said:
Wait! What was I trying to suggest? the only thing that I been pointing out is that Sony said the PS4 will be able to compute while getting the maximal amount of graphics out of the 1.84TFLOPS without the computing taking away from the graphics.

And what is your theory? An unannounced external GPU dedicated to compute tasks (like N64 EP/RAM Cartridge)? Many people explained to you that graphics + compute share the same 1.83 Tflop GPU. There is no 1.83 Tflops exclusively for graphics + extra compute resources unless there's additional hardware (there isnt).

Exophase · Mar 30, 2013

onQ said:
Wait! What was I trying to suggest? the only thing that I been pointing out is that Sony said the PS4 will be able to compute while getting the maximal amount of graphics out of the 1.84TFLOPS without the computing taking away from the graphics.

I hope Sony didn't actually say that because it's ridiculous.

You're acting like there's some predefined limit as to how good the ALU utilization can be in graphics tasks. If graphics tasks are using the ALUs maximally then there will be zero time left for other compute tasks. This is not an impossible scenario. There will probably be coarse-grain schedulers that prevent a single type of thread from dominating all the ALU time and starving the rest, but that just means that in this scenario compute tasks would be taking away from graphics.

I don't think you understand that to the shader cores all of these threads look the same, compute or graphics. And that graphics tasks are already highly parallel and can be fairly diverse. Meaning that there's nothing intrinsic about compute tasks that make them more likely to be able to run in a case where graphics tasks are all waiting on some blocked resource. If everything intrinsically has a low TMU:ALU ratio then eventually all the threads will be waiting for the TMUs or other caches or whatever; it doesn't matter if they're compute or graphics. Likewise, if you're running a bunch of ALU heavy graphics shaders you won't have a hard time keeping the ALUs loaded, even if they're not necessarily always performing useful work.

Another thing you need to understand is that just because ALUs are not utilized with perfect efficiency doesn't mean that they could have been if there were other threads available to do so. For example, if execution gets predicated away that counts as inefficiency but it's not recoverable.

Having more ACEs should allow more long-running compute tasks to be loaded simultaneously. That could give more diversity to the type of threads running and maybe help loading. Whether or not this level of task width is something developers will really take advantage of remains to be seen. But it isn't magic that ensures they'll do more work on average, it's more giving them additional flexibility in how they write compute algorithms and encourages them to try to offload more simultaneously.

onQ · Mar 30, 2013

Exophase said:
I hope Sony didn't actually say that because it's ridiculous.

Well they did.

The system is also set up to run graphics and computational code synchronously, without suspending one to run the other. Norden says that Sony has worked to carefully balance the two processors to provide maximum graphics power of 1.843 teraFLOPS at an 800Mhz clock speed while still leaving enough room for computational tasks. The GPU will also be able to run arbitrary code, allowing developers to run hundreds or thousands of parallelized tasks with full access to the system's 8GB of unified memory.

http://arstechnica.com/gaming/2013/...4s-hardware-power-controller-features-at-gdc/

"The cool thing about Compute on PlayStation 4 is that it runs completely simultaneous with graphics," Norden enthused. "So traditionally with OpenCL or other languages you have to suspend graphics to get good Compute performance. On PS4 you don't, it runs simultaneous with graphics. We've architected the system to take full advantage of Compute at the same time as graphics because we know that everyone wants maximum graphics performance."

http://www.eurogamer.net/articles/digitalfoundry-inside-playstation-4

3dilettante · Mar 30, 2013

What a number of posters are saying is that those statements are not really saying what you think they are.

The arstechnica text isn't a direct quote, and the eurogamer text doesn't say anything specific.
These aren't articles or press statements intended to educate readers or to really explain anything. They aren't being held to any standard of precision, and the more buzz and hype they create, the better it is for them.

There's a lot of stuff we can learn about these systems and computation in general, but I strongly encourage you to not use press blurbs as the starting point. That's like learning about cars from a salesman on the dealership lot.

patsu · Mar 30, 2013

onQ, they are talking about the total system (both processors), not merely about the GPU. They are not GPU vendors.

It's actually great to probe for info. Even better if we take the time to digest the discussion. Ask questions instead of solely restating your view if you need clarifications.

onQ · Mar 30, 2013

3dilettante said:
What a number of posters are saying is that those statements are not really saying what you think they are.

The arstechnica text isn't a direct quote, and the eurogamer text doesn't say anything specific.
These aren't articles or press statements intended to educate readers or to really explain anything. They aren't being held to any standard of precision, and the more buzz and hype they create, the better it is for them.

There's a lot of stuff we can learn about these systems and computation in general, but I strongly encourage you to not use press blurbs as the starting point. That's like learning about cars from a salesman on the dealership lot.

patsu said:
onQ, they are talking about the total system (both processors), not merely about the GPU. They are not GPU vendors.

It's actually great to probe for info. Even better if we take the time to digest the discussion. Ask questions instead of solely restating your view if you need clarifications.

I know they said that they carefully balance the two processors to make that happen, but if you think about that, why would they have to modify the APU for the CPU to have enough room for computational tasks? & why would they have to architect the CPU to take full advantage of Compute at the same time that the GPU does graphics? isn't that the way things already are?

PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

Grall

Invisible Member

onQ

AlphaWolf

Specious Misanthrope

Shifty Geezer

uber-Troll!

3dilettante

patsu

rockaman

patsu

3dilettante

patsu

Averagejoe

AlphaWolf

Specious Misanthrope

onQ

temesgen

-Sweeper_

Exophase

onQ

3dilettante

patsu

onQ

Similar threads