PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

Status
Not open for further replies.
Looking at that picture of the CU, and reading the leak info, it seems there is some mangling of the details, probably from English not being the writer's first language.

Who is sweetvar, and does any of the sweetvar info match up with this VGLeaks info? The "Scalar ALU's 320" doesn't make a lot of sense either (looking at the CU diagram), and is either a case of misunderstanding specs, or another confusion with English.


Edit:
Reading this link on Anandtech that describes what the Scalar ALU's do, it seems like adding an extra one to the CU could be pretty beneficial if the CU is used for compute tasks.

So what does a scalar unit do? First and foremost it executes “one-off” mathematical operations. Whole groups of pixels/values go through the vector units together, but independent operations go to the scalar unit as to not waste valuable SIMD time. This includes everything from simple integer operations to control flow operations like conditional branches (if/else) and jumps, and in certain cases read-only memory operations from a dedicated scalar L1 cache. Overall the scalar unit can execute one instruction per cycle, which means it can complete 4 instructions over the period of time it takes for one wavefront to be completed on a SIMD.

Conceptually this blurs a bit more of the remaining line between a scalar GPU and a vector GPU, but by having both types of units it means that each unit type can work on the operations best suited for it. Besides avoiding feeding SIMDs non-vectorized datasets, this will also improve the latency for control flow operations, where Cayman had a rather nasty 44 cycle latency.
- http://www.anandtech.com/show/4455/amds-graphics-core-next-preview-amd-architects-for-compute/4
 
I made some tweaks to that previous post.

I don't know if we cracked the case, but it seems to be the most logical explanation.

Looking at that picture of the CU, and reading the leak info, it seems there is some mangling of the details, probably from English not being the writer's first language.

Who is sweetvar, and does any of the sweetvar info match up with this VGLeaks info? The "Scalar ALU's 320" doesn't make a lot of sense either (looking at the CU diagram), and is either a case of misunderstanding specs, or another confusion with English.


Edit:
Reading this link on Anandtech that describes what the Scalar ALU's do, it seems like adding an extra one to the CU could be pretty beneficial if the CU is used for compute tasks.

- http://www.anandtech.com/show/4455/amds-graphics-core-next-preview-amd-architects-for-compute/4

Sweetvar is the one who leaked hardware details for almost a year now. Sony and MS both using Jaguar cores for their CPU came from him back in like March of April of 2012.

I'm not ruling that notion out, but I think the other is more plausible because of Sweetvar.
 
If each CU can do 128 (AMD GCN whitepaper) floating point calculations per clock, then at 800MHz, each CU can do 102.4 Gflops (800 MHz x 128). For those 4 CUs, that's basically 410 Gflops, which is what VGLeaks says is the theoretical performance of the 4 CUs. Someone else can check my math, but I believe that's correct. Unless VGLeaks is wrong about 410 Gflops for those 4 CUs, then I'd say that each of those CUs has an extra Scaler ALU, and not an extra SIMD.

Looks like they have 4 logically separate CUs that have extra Scalar ALUs to improve branching performance, which might be handy with different types of non-rendering computation.
 
Your math is fine Scott. What I'm saying is in that scenario, the fifth SIMD in these 4 CUs is not factored into the 410 GFLOPs because it's "dedicated" to compute. If the dev decides to use the fifth SIMDs for rendering instead of compute then it gives the minor boost referred to in the second bullet point.
 
Your math is fine Scott. What I'm saying is in that scenario, the fifth SIMD in these 4 CUs is not factored into the 410 GFLOPs because it's "dedicated" to compute. If the dev decides to use the fifth SIMDs for rendering instead of compute then it gives the minor boost referred to in the second bullet point.

Can you use a part of a CU for a different task than another part of a CU? I thought a CU could only run one shader at a time?

Edit: That can't be right.
 
I made some tweaks to that previous post.

I don't know if we cracked the case, but it seems to be the most logical explanation.



Sweetvar is the one who leaked hardware details for almost a year now. Sony and MS both using Jaguar cores for their CPU came from him back in like March of April of 2012.

I'm not ruling that notion out, but I think the other is more plausible because of Sweetvar.

Sweetvar26 also said that AMD felt the Xbox was a super computer and was "more" powerful. In hind sight, I think he mistakenly switched the platforms.

From the looks of it, Orbis fits the super computer descriptor better.
 
I like this setup and it makes a lot more sense. 1 SIMD per 4 of the CU should be enough to ensure some level of basic physics (hair, rag dolls, smoke, fluid etc) simulation while not bothering the rest of the GPU. Since doing compute tasks normally would apparently bog down the rest of the CU's? Now if a dev wants to go balls out on physics he still can. But anything doable in 102.4 glops is guaranteed... it's enough. What they can do in that number I don't know though.
 
Your math is fine Scott. What I'm saying is in that scenario, the fifth SIMD in these 4 CUs is not factored into the 410 GFLOPs because it's "dedicated" to compute. If the dev decides to use the fifth SIMDs for rendering instead of compute then it gives the minor boost referred to in the second bullet point.

Why would put 1 extra SIMD on 4 CUs, and then reserve those SIMDs on each CU, rather than just add an extra CU and reserve that?
 
Is 4 compute units such a big win for gameplay physics and animations? I remember ERP saying GPU compute isn't all that...and paired with the 1.6ghz Jaguar Octo-cores are laughable...something like dual core/4 threaded Haswell would have been better i would think... how soon will Apple/Samsung mobile products feature faster CPU than Orbis/Durango? :oops:

I can't be the only one disappointed with PS4 specs? What could be the reasons that Sony is low balling their new hardware? Coming 4 years after PS3, one can forgive...but it will be 8 years after PS3....and the new PS4 specs revealed are weaksauce.
 
Is 4 compute units such a big win for gameplay physics and animations? I remember ERP saying GPU compute isn't all that...and paired with the 1.6ghz Jaguar Octo-cores are laughable...something like dual core/4 threaded Haswell would have been better i would think... how soon will Apple/Samsung mobile products feature faster CPU than Orbis/Durango? :oops:

I can't be the only one disappointed with PS4 specs? What could be the reasons that Sony is low balling their new hardware? Coming 4 years after PS3, one can forgive...but it will be 8 years after PS3....and the new PS4 specs revealed are weaksauce.

4 CUs is 410 Gflops, which is something like (not totally sure) double the flops of the entire Cell Processor in the PS3, which probably never used even half of it's processing power for physics and animation. If they dedicate that much power to pure simulation type computing, it will be a vast improvement from last gen. Honestly, I'd rather see them do that then put the power into graphics rendering.
 
Can you use a part of a CU for a different task than another part of a CU? I thought a CU could only run one shader at a time?

Edit: That can't be right.

I would believe it can since these are separate SIMDs in the CU. I don't disagree with the extra Scalar ALU idea since it's virtually the same as what I was proposing at first. Though that would be ~6 GFLOPs and would not seem worth mentioning as a "minor boost". Likewise 410 GFLOPs seems rather large to be considered a minor boost.

Why would put 1 extra SIMD on 4 CUs, and then reserve those SIMDs on each CU, rather than just add an extra CU and reserve that?

Couldn't answer that. This is speculation factoring in Sweetvar's info and assuming that was a 5-SIMD CU.

Sweetvar26 also said that AMD felt the Xbox was a super computer and was "more" powerful. In hind sight, I think he mistakenly switched the platforms.

From the looks of it, Orbis fits the super computer descriptor better.

Yeah, but in things like that I try to focus more on the tangible numbers and not the opinion. That comment could have been because of the added components to Xbox 3 to improve efficiency. But I can see that possibly being crossed up.

Is 4 compute units such a big win for gameplay physics and animations? I remember ERP saying GPU compute isn't all that...and paired with the 1.6ghz Jaguar Octo-cores are laughable...something like dual core/4 threaded Haswell would have been better i would think... how soon will Apple/Samsung mobile products feature faster CPU than Orbis/Durango? :oops:

I can't be the only one disappointed with PS4 specs? What could be the reasons that Sony is low balling their new hardware? Coming 4 years after PS3, one can forgive...but it will be 8 years after PS3....and the new PS4 specs revealed are weaksauce.

AMD has promoted it for lighting.

http://www.youtube.com/watch?v=zYweEn6DFcU
 
Last edited by a moderator:
Based off the leak does anyone think software based BC of PS3 is possible with this hardware? What about PS2? Could the 4 dedicated CUs emulate the SPUs from Cell???
 
I would say we're looking at these two scenarios as most likely.

1. 4 CUs have an extra (scalar or modified) ALU and these CUs are dedicated to compute, but can also be for rendering. "Minor boost" would be incorrect IMO since they would make up 22% of the ~1.8 TFLOPs. This would mean Sweetvar's info would relate to something else.

2. 4 CUS each have an extra SIMD and these SIMDs are dedicated to compute, but these SIMDs could also be used for rendering. "Extra ALU" would be wrong in this case.
 
Based off the leak does anyone think software based BC of PS3 is possible with this hardware? What about PS2? Could the 4 dedicated CUs emulate the SPUs from Cell???

PS2 should be easy. You can emulate it on a any modern pc , I was doing it on a hd 4850 .


The ps3 might be possible but I dunno
 
I would say we're looking at these two scenarios as most likely.

1. 4 CUs have an extra (scalar or modified) ALU and these CUs are dedicated to compute, but can also be for rendering. "Minor boost" would be incorrect IMO since they would make up 22% of the ~1.8 TFLOPs. This would mean Sweetvar's info would relate to something else.

2. 4 CUS each have an extra SIMD and these SIMDs are dedicated to compute, but these SIMDs could also be used for rendering. "Extra ALU" would be wrong in this case.

3. Reading this again, I kind of wonder if it doesn't mean that 4 CUs have an extra scalar/modified/custom ALU, but are otherwise normal CUs that can be used for rendering. In the case that you use them for rendering, the ALU offers a "minor boost" to rendering performance. "Minor boost if used for rendering," was meant to refer to the extra ALU in each CU, not the CUs themselves. If you were to do GPGPU computing, those extra ALUs will help in some way eg. branching.

Puts us back at an 18 CU GPU with 1.8 TFlops, with 4 CUs (410 GFlops) that have a little customization to improve "compute" performance. I imagine shaders tend to be written without branching, because the data is highly parallelized and the shaders themselves are small and efficient. In the case of general computing, more branching or "one off" computing that the scalar ALUs are good at would be more beneficial. The statement "Hardware balanced at 14 CUs" is still a little confusing to me.
 
Last edited by a moderator:
3. Reading this again, I kind of wonder if it doesn't mean that 4 CUs have an extra scalar/modified/custom ALU, but are otherwise normal CUs that can be used for rendering. In the case that you use them for rendering, the ALU offers a "minor boost" to rendering performance. "Minor boost if used for rendering," was meant to refer to the extra ALU in each CU, not the CUs themselves. If you were to do GPGPU computing, those extra ALUs will help in some way eg. branching.

Puts us back at an 18 CU GPU with 1.8 TFlops, with 4 CUs (410 GFlops) that have a little customization to improve "compute" performance. The statement "Hardware balanced at 14 CUs" is still a little confusing to me.

I was going to make that a #3 as well, but backed off since ~6 GFLOPs doesn't seem worth even acknowledging as minor, though my opinion, as I'm sure Sony would need to mention it.

I think the hardware balanced comment makes #3 or #1 the most logical as the mindset would be that the console is designed as if it would only have 14 CUs. The other 4 CUs are "floaters" for lack of a better word since they could possibly be used either for compute or rendering.

This definitely doesn't seem to be coming from someone who would have English as a first language.
 
I was going to make that a #3 as well, but backed off since ~6 GFLOPs doesn't seem worth even acknowledging as minor, though my opinion, as I'm sure Sony would need to mention it.

I think the hardware balanced comment makes #3 or #1 the most logical as the mindset would be that the console is designed as if it would only have 14 CUs. The other 4 CUs are "floaters" for lack of a better word since they could possibly be used either for compute or rendering.

This definitely doesn't seem to be coming from someone who would have English as a first language.

Reading the whitepaper, the scalar ALU seemed to be included to improve latency from the previous architecture, especially in cases of GPGPU.

"The new scalar pipelines in the GCN compute unit are essential for performance and power efficiency. Control flow processing is distributed throughout the
shader cores, which reduces latency and also avoids the power overhead for communicating with a centralized scheduler. This is particularly beneficial for
general purpose applications, which tend to have more complex control flow than graphics."
- http://www.amd.com/jp/Documents/GCN_Architecture_whitepaper.pdf

So it isn't really the computational power of the scalar ALU that's important. Their importance is in improving the efficiency of the SIMD units, by reducing stalls from branching etc. I'm not sure how important they are for rendering, which I guess is why they'd be a "minor boost," but in the case of GPGPU, which might do more complex algorithms, it would be more helpful.
 
"Balanced" in engrish = "Standard" imo

For some perspective to those interested, 1 CU SIMD = 128 FP ops/clock cycle. 1 CELL core (SPE, PPU) SIMD = 8 FP ops/clock cycle. But CELL is at 3.2ghz while Liverpool would be at 800mhz.

So can we say 1 CU SIMD = 4 SPE?
With 4 dedicated SIMD we can have the equivalent of "16 SPE power" to do our GPGPU tasks.
Educate me if my simplistic extrapolation is wrong.
 
Giving my take on the 14+4 and the wording, how it sound to me is that the 4 CUs have 64+1 ALUs as opposed to the normal 64 in a CU. The +1 is dedicated to compute, but if used for rendering would give a minor boost. For me I would wonder if this "extra" ALU is modified in some manner. Is it the scalar ALU that's already on a CU, but modified? Is it another ALU in addition to that? I would assume that the scalar ALU on a CU is an "understood" and this is some kind of additional ALU.

Forgot about Sweetvar's info. That would make more sense to me than what I was trying to speculate from a performance perspective as my original idea wasn't suggesting much of a boost. That would mean it should be labeled as an "'extra' SIMD", not ALU. In turn it's the extra SIMD that is dedicated towards compute, but could be used for rendering for a minor boost.

64 ALUs = 102.4 GFLOPs

This could be used for compute or rendering.

This is also what Proelite was saying on GAF about DF being wrong about the extra CUs, they just being extra scalar ALUs for some of the compute units (ie the 4)

http://www.neogaf.com/forum/showpost.php?p=46748385&postcount=3513

So that nicely resolves the inconsistencies between DF/Vgleaks on Orbis.
 
^ Ok. Then I'll go with that then.

Reading the whitepaper, the scalar ALU seemed to be included to improve latency from the previous architecture, especially in cases of GPGPU.

- http://www.amd.com/jp/Documents/GCN_Architecture_whitepaper.pdf

So it isn't really the computational power of the scalar ALU that's important. Their importance is in improving the efficiency of the SIMD units, by reducing stalls from branching etc. I'm not sure how important they are for rendering, which I guess is why they'd be a "minor boost," but in the case of GPGPU, which might do more complex algorithms, it would be more helpful.

Right. This is why I went with either that scalar ALU being modified or an additional ALU being modified. I guess an additional ALU of some kind (which is apparently clarified to be additional scalar ALUs) would be more plausible than just modifying the current scalar ALU and now "counting" it as extra.
 
Status
Not open for further replies.
Back
Top