PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

patsu · Apr 25, 2013

You may be right. Someone should ping the PS Blog folks to clarify.

onQ · Apr 26, 2013

Shifty Geezer said:
Okay. With this and 3dcgi's post, I think I understand. I was thinking of a wavefront occupying the ALU's for 100% of the time during its resolution, and then another wavefront following behind, so there was no idle time. I hadn't made the connection with delays in the other pipes. Although I'd want to hear how much of a frame can really be lost on a GPU by such stalls. Is there really a lot of spare GPU power going to waste that can be repurposed to compute?

Thanks for the explanation.

How come when I tried to talk about this last month you & other members tried to make it seem as if I was crazy & it wasn't possible?

3dilettante · Apr 26, 2013

You'll have to specify which posts you are referencing.

The ones that I can recall did not say the same thing as your quote.

onQ · Apr 26, 2013

3dilettante said:
You'll have to specify which posts you are referencing.

The ones that I can recall did not say the same thing as your quote.

I was talking about the PS4 being able to run Graphics & compute at the same time without taking away from the graphics & I was posting links to prove that's what the people at Sony was saying & I even got a warning for talking about it because everyone else said I was wrong .

.

AlphaWolf · Apr 26, 2013

Using idle cycles for compute is not the same as using engaged ones.

patsu · Apr 26, 2013

onQ said:
I was talking about the PS4 being able to run Graphics & compute at the same time without taking away from the graphics & I was posting links to prove that's what the people at Sony was saying & I even got a warning for talking about it because everyone else said I was wrong .

.

You’re looking at the 30,000 feet level. At that level, you couldn’t give any details.

They were looking at the 3,000 feet level. Someone posted a specific scenario where it could happen.

Note that the magic is still with the developer although the architecture and tools may make it easier to take advantage of these “gaps”.

3dilettante · Apr 26, 2013

onQ said:
I was talking about the PS4 being able to run Graphics & compute at the same time without taking away from the graphics & I was posting links to prove that's what the people at Sony was saying & I even got a warning for talking about it because everyone else said I was wrong .

.

Clarifying this will require going over the specific posts and the wording used.
The point I remember being debated was the claim that the full floating point throughput of the GPU could be used for graphics even if compute was ongoing.

Shifty Geezer's post is not saying the same thing.

patsu · Apr 26, 2013

The high level view got mixed up with the FLOP hunt, which evolves into the Stasha hunt later.

onQ · Apr 26, 2013

3dilettante said:
Clarifying this will require going over the specific posts and the wording used.
The point I remember being debated was the claim that the full floating point throughput of the GPU could be used for graphics even if compute was ongoing.

Shifty Geezer's post is not saying the same thing.

No that's the conclusion that other people jumped to when I was talking about the quotes from Sony & I even pointed out what I was talking about.

onQ said:
I don't think anyone has said anything about exceeding the theoretical peak of 1.84 TFlops.

what I'm getting from it is that graphic rendering doesn't use GPGPU to it's full potential so sony made the APU so it can run compute without taking away from the graphic rendering.

http://forum.beyond3d.com/showthread.php?p=1723390#post1723390

Sonic · Apr 26, 2013

There seems to be some confusion here. I recall Shifty saying that a CU is unable to work on graphics and computer work at the same time. I recall patsu correcting him on that and providing a lot of us here with some education as to the way GCN works when concerning wavefronts.

http://forum.beyond3d.com/showpost.php?p=1724729&postcount=1259

patsu · Apr 26, 2013

They were talking about maintaining or exceeding the 1.84 peak FLOP count at that point. Cerny is more talking about a specific case of achieving/improving efficiency (by giving developers stats and control). It also entails how they see the CPU and GPU work together.

If we only look at the GPU attributes, we may miss the CPU story.

3dilettante · Apr 26, 2013

onQ said:
No that's the conclusion that other people jumped to when I was talking about the quotes from Sony & I even pointed out what I was talking about.

The post I thought of was earlier in the thread:

onQ said:
Wait! What was I trying to suggest? the only thing that I been pointing out is that Sony said the PS4 will be able to compute while getting the maximal amount of graphics out of the 1.84TFLOPS without the computing taking away from the graphics.

There were also a number of awkwardly bolded press lines that were later used as argumentation.

Ignoring that run of posts, the Gamasutra article quotes Cerny making clear the goals of the various tweak so reducing overhead and allowing compute and graphics loads to run well concurrently without excessively interfering with one another.
In any complex system with varying workloads, the impact each type of kernel has on the other is non-zero. The goal, as Cerny points out, is to make the interference as low as possible so that the concurrency is a net gain.

It's pretty straightforward to generate a scenario where either the compute kernel or graphics pipeline could have shaved a few cycles off of their execution times if they could use all of the GPU's resources. In the concurrent execution case, this is almost always not going to be the case, and any scenario where they do not impact each other at all is likely to be trivial or contrived. The PS4's stated goal is to make the sacrifice as small as possible, not zero.

rockaman · Apr 26, 2013

I thought he explained it well also. It seems when a graphics task is occupying a certain subset of X pipeline and where no other graphics tasks can fit in very well, he is giving a way to access that remainder for compute.

So it's like a complicated way of saying we're can use some of the excess unused portion of the GPU to get some other work done without disrupting anything too much.

So yea if you interpret it one way, the compute doesn't take away from the GPU rendering process in X situation. But at the same time the max throughput for the GPU as a whole still is 1.84 tflops no matter what you do. So compute still isn't necessarily being done free, only that there are some neat ways to fit it around the graphics processing sometimes. Sharing is caring and all that.

Sounds like a resolution of this whole discussion to me.

Shifty Geezer · Apr 26, 2013

onQ said:
How come when I tried to talk about this last month you & other members tried to make it seem as if I was crazy & it wasn't possible?

That whole thread was a nonsense rather than a technical discussion. If that wasn't your intention then you have only your posting style to blame. Your conversational technique didn't include specific technical questions, but just spammed repetitious quotes and muddled one-liners and a constant reference to rumours with a disregard for their relevance when the hardware was explained to us. Rather than trying to understand what was happening and connect the dots, you were just pushing a view and not knowing when to let it drop as you weren't making any headway.

Technically, what Cerny describes is no different to GCN. The existing GCN platform can still find holes in the graphics work to fit in some compute workloads as I understand it. Sony have just added a lot more threads to allow for a more flexible pool of compute resources, with perhaps a few improvements to the existing GCN memory systems to save cache pollution.

Shifty Geezer · Apr 26, 2013

3dcgi said:
That would really depend on the game and how well compute work complements graphics work. If compute needs whatever is bottlenecking the graphics shader it will slow down graphics work. The more ALU heavy the compute work is the more effective this will be as it's easier to become fetch bound than ALU bound.

Ignoring the efficiency of utilising these spaces (how much compute can gain use these GPU stalls), can you put an estimate figure (or rather, range depending on game) on how much idle time there can be on the ALUs in a frame? Like 1%, or 10%, or 50%? That's important to understand how much compute can be obtained from the GPU without impacting graphics, and indeed, how much wastage there is on a GPU not running compute.

nigelhere · Apr 26, 2013

How much amount of bandwidth do you think the CPU will take? 20GB/s?

itsmydamnation · Apr 26, 2013

nigelhere said:
How much amount of bandwidth do you think the CPU will take? 20GB/s?

sounds good to me and fits other AMD designs pretty well.

Shifty Geezer · Apr 26, 2013

nigelhere said:
How much amount of bandwidth do you think the CPU will take? 20GB/s?

According to a leak, I think that's all the CPU has available anyway.

patsu · Apr 26, 2013

They may have to refactor the code and reorganize the data to maximize compute usage. Current designs are probably architected around something else since PC graphics doesn't really do tightly coupled compute tasks like this.

3dcgi · Apr 26, 2013

Shifty Geezer said:
Ignoring the efficiency of utilising these spaces (how much compute can gain use these GPU stalls), can you put an estimate figure (or rather, range depending on game) on how much idle time there can be on the ALUs in a frame? Like 1%, or 10%, or 50%? That's important to understand how much compute can be obtained from the GPU without impacting graphics, and indeed, how much wastage there is on a GPU not running compute.

If you're rendering a bunch of simple triangles into shadow maps it's possible 50% of compute is idle, but if there's heavy pixel shading the number will be much less. I can't give a number for that situation that would be any more than a guess. Note one of the customizations Cerny mentioned addresses compute + geometry heavy workloads.

PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

patsu

onQ

3dilettante

onQ

AlphaWolf

Specious Misanthrope

patsu

3dilettante

patsu

onQ

Sonic

Senior Member

patsu

3dilettante

rockaman

Shifty Geezer

uber-Troll!

Shifty Geezer

uber-Troll!

nigelhere

itsmydamnation

Shifty Geezer

uber-Troll!

patsu

3dcgi

Similar threads