Betanumerical
Veteran
I can see you have not done your homework, so i will help you, but where will i start? He is the short way. Here is GCN 1.O in detail.
http://www.tomshardware.com/reviews/...cn,3104-2.html
After that read this one. You will find that all this is FACT.
http://www.amd.com/la/Documents/GCN_Architecture_whitepaper.pdf
1. AMD uses CUs (not SCs but we will for now) inside of that are called Vector Units that are 16wideSIMDs in sets of 4, we see this in VGleaks as the SCs to 4 SIMDs. This is GCN101.
2. Inside each CU has four Vector Units (VUs), each with 16 ALUs, for a total of 64 ALUs per CU. We see this in VGLeaks: Compute: SC to 4 SIMDs.
3.Inside each CU their is 1 SCALAR processor for all 4 SIMDs.
Every thing i just told you was a FACT..... This is in a CU of GCN1.0.
Now here is where SuperDae missed. NOWHERE in the documents anywhere on the internet can tell us WHY is their 4 VSPs in each SIMD and not just 1 in the CU, but we see this in VGLeaks: Compute: SC to 4 SIMDs to 16 VSPs, not 1 for the CU but 16. You can not dismiss this. "its bog standard GCN" not with 16 VSPs ( Vector scalar processors).
If it had only one VSP your right, it's GCN101. It's not.
Do you wont me to do the math? it's ......
Because your reading, and also understanding what the numbers in the vgleaks articles actually represent incorrectly.
From the GCN whitepaper
From vgleaksIn GCN, each CU includes 4 separate SIMD units for vector processing. Each of these SIMD units simultaneously executes a single operation across 16 (16 * 4, shockingly is 64) work
items, but each can be working on a separate wavefront. This places emphasis on finding many wavefronts to be processed in parallel, rather than relying on
the compiler to find independent operations within a single wavefront.
The underlined part is where vgleaks made there mistake, but its obvious its a mistake because of the maths used on the first page of the very article.Each of the four SIMDs in the shader core is a vector processor in the sense of operating on vectors of threads. A SIMD executes a vector instruction on 64 threads (64 / 16, is shockingly 16) at once in lockstep. Per thread, however, the SIMDs are scalar processors, in the sense of using float operands rather than float4 operands. Because the instruction set is scalar in this sense, shaders no longer waste processing power when they operate on fewer than four components at a time. Analysis of Xbox 360 shaders suggests that of the five available lanes (a float4 operation, co-issued with a float operation), only three are used on average.
Lets break this down even further.12 SCs * 4 SIMDs * 16 threads/clock = 768 ops/clock
12 ShaderCores yep (CU's) * 4 SIMD's (Per CU) * 16 (vector width per SIMD) = 768 ops/clock.
As you can see per the maths directly above, this is 100% stock standard GCN, if it wasn't I would be very worried for the obviously poor decisions that Microsoft had made to increase the processing power by 4, yet decrease the cache sizes by 4. It makes no sense.
You are also ignoring the fact that if it wasnt GCN1.0 then the numbers used at the start of the article would all be incorrect. Also I have yet to see anyone explain how it can have 4x's the SIMD/CU's/whatever and the same power as standard GCN thats some weak arse modifications they don't do anything.
GCN1.0.
It is clear here who has done there homework. The maths disagrees with you.