News & Rumors: Xbox One (codename Durango)

Betanumerical · May 17, 2013

marcberry said:
I can see you have not done your homework, so i will help you, but where will i start? He is the short way. Here is GCN 1.O in detail.
http://www.tomshardware.com/reviews/...cn,3104-2.html
After that read this one. You will find that all this is FACT.
http://www.amd.com/la/Documents/GCN_Architecture_whitepaper.pdf
1. AMD uses CUs (not SCs but we will for now) inside of that are called Vector Units that are 16wideSIMDs in sets of 4, we see this in VGleaks as the SCs to 4 SIMDs. This is GCN101.
2. Inside each CU has four Vector Units (VUs), each with 16 ALUs, for a total of 64 ALUs per CU. We see this in VGLeaks: Compute: SC to 4 SIMDs.
3.Inside each CU their is 1 SCALAR processor for all 4 SIMDs.
Every thing i just told you was a FACT..... This is in a CU of GCN1.0.
Now here is where SuperDae missed. NOWHERE in the documents anywhere on the internet can tell us WHY is their 4 VSPs in each SIMD and not just 1 in the CU, but we see this in VGLeaks: Compute: SC to 4 SIMDs to 16 VSPs, not 1 for the CU but 16. You can not dismiss this. "its bog standard GCN" not with 16 VSPs ( Vector scalar processors).
If it had only one VSP your right, it's GCN101. It's not.

Do you wont me to do the math? it's ......

Because your reading, and also understanding what the numbers in the vgleaks articles actually represent incorrectly.

From the GCN whitepaper

In GCN, each CU includes 4 separate SIMD units for vector processing. Each of these SIMD units simultaneously executes a single operation across 16 (16 * 4, shockingly is 64) work
items, but each can be working on a separate wavefront. This places emphasis on finding many wavefronts to be processed in parallel, rather than relying on
the compiler to find independent operations within a single wavefront.

From vgleaks

Each of the four SIMDs in the shader core is a vector processor in the sense of operating on vectors of threads. A SIMD executes a vector instruction on 64 threads (64 / 16, is shockingly 16) at once in lockstep. Per thread, however, the SIMDs are scalar processors, in the sense of using float operands rather than float4 operands. Because the instruction set is scalar in this sense, shaders no longer waste processing power when they operate on fewer than four components at a time. Analysis of Xbox 360 shaders suggests that of the five available lanes (a float4 operation, co-issued with a float operation), only three are used on average.

The underlined part is where vgleaks made there mistake, but its obvious its a mistake because of the maths used on the first page of the very article.

12 SCs * 4 SIMDs * 16 threads/clock = 768 ops/clock

Lets break this down even further.

12 ShaderCores yep (CU's) * 4 SIMD's (Per CU) * 16 (vector width per SIMD) = 768 ops/clock.

As you can see per the maths directly above, this is 100% stock standard GCN, if it wasn't I would be very worried for the obviously poor decisions that Microsoft had made to increase the processing power by 4, yet decrease the cache sizes by 4. It makes no sense.

You are also ignoring the fact that if it wasnt GCN1.0 then the numbers used at the start of the article would all be incorrect. Also I have yet to see anyone explain how it can have 4x's the SIMD/CU's/whatever and the same power as standard GCN thats some weak arse modifications they don't do anything.

GCN1.0.

It is clear here who has done there homework. The maths disagrees with you.

Pete · May 17, 2013

marcberry said:
Do you wont me to do the math? it's ......

I feel like we've been through this before....

Shifty Geezer · May 17, 2013

marcberry said:
but we see this in VGLeaks: Compute: SC to 4 SIMDs to 16 VSPs, not 1 for the CU but 16. You can not dismiss this. "its bog standard GCN" not with 16 VSPs ( Vector scalar processors).

It all comes down to this, and the accuracy of VGLeaks reporting. Either one believes the architecture diagram and every SIMD has an extra scalar unit (which serves what purpose? The scalar unit is for SIMD control, not computation), or one believes the numbers and corroborated leaks and sees the diagram as at fault.

marcberry · May 17, 2013

Last thing before i go to bed, that is their is a PATENT from AMD for VSPs it's big.

Shifty Geezer · May 17, 2013

Please provide a link to the AMD VSP patent.

Deleted member 7537 · May 17, 2013

Xenio said:
it's a 68 GB/s DDR3, how can you label it "slow"?
and you're forgetting about eSram too, why should they redesign the whole console, included memory system, for 12-13-14-15-X CU's gpu when the memory system was already designed as they want it to be?

It's "slow" for a GPU, if not, why bother putting GDDR5 in graphics cards? Or why MS included ESRAM in their console? What you are saying doesn't make sense.

You can't just add as more CU as you want and not expect memory to become a bottleneck.

It's funny because the actual design (according to Vgleaks AND other sources) has been praised for being so well balanced and efficient even by some actual game developers in this forum, but this new leaks just point to a "MOOAARR POWERRR" strategy by MS trying in a hurry to compete with PS4 specs, i just can't believe this beacuse it's just not coherent. Anyway we'll find out soon.

Xenio · May 17, 2013

jayco said:
It's "slow" for a GPU, if not, why bother putting GDDR5 in graphics cards? Or why MS included ESRAM in their console? What you are saying doesn't make sense.

You can't just add as more CU as you want and not expect memory to become a bottleneck.

It's funny because the actual design (according to Vgleaks AND other sources) has been praised for being so well balanced and efficient even by some actual game developers in this forum, but this new leaks just point to a "MOOAARR POWERRR" strategy by MS trying in a hurry to compete with PS4 specs, i just can't believe this beacuse it's just not coherent. Anyway we'll find out soon.

OMG, graphic cards are a different thing, welcome to 2013 from before-the-2005 times
to code for a console means to exploit the hardware, what there are so complicated to understand that no one coder in the world will make a different code for a graphic card that hold edram or esram inside? in the console realm it's different at all

amd cards have tessellator hardware from ancient times but no developer used it because it was NOT standard

and no, more cu doesn't means more BW needed, maybe it's less latency needed, depending what those CU's are used for, is what YOU say that doesn't make sense and we've already talked about this, use the search function please.

Betanumerical · May 17, 2013

Xenio said:
and no, more cu doesn't means more BW needed, maybe it's less latency needed, depending what those CU's are used for, is what YOU say that doesn't make sense and we've already talked about this, use the search function please.

It depends on what your doing, but the point he is trying to make (and a correct one) at that is that there no point having 100000000 CU's when you don't have the bandwidth to use them. There is going to be a point eventually where you need to add more bandwidth to start getting a decent increase in performance, just adding more execution units doesn't do anything. This is why all the high end video cards have super high clocked GDDR5 because they need the bandwidth to be able to sufficently keep the execution units busy, if it was all about latency then the higher end cards wouldn't even use GDDR5.

Grall · May 17, 2013

jayco said:
but this new leaks just point to a "MOOAARR POWERRR" strategy by MS trying in a hurry to compete with PS4 specs, i just can't believe this beacuse it's just not coherent.

It's not a "leak", it's just irate fanboys in a frenzy, feeding off of each others in a bizarre mass-ejaculation of fannishness.

Anyway we'll find out soon.

Yeah, and what we find will with high degree of probability tell us what we already know, because you don't just go making radical last-minute changes in an advanced piece of hardware and expect timetables, production targets, price levels etc to remain intact. It's not particularly realistic.

I could say that yes, stranger things have happened (because I'm sure they have), but what rumors have told us is that MS aimed rather conservatively from the outset with durango, so why should they go off now and change direction at the very last second? That is, as you say, not coherent.

infamous · May 17, 2013

Xenio said:
OMG, graphic cards are a different thing, welcome to 2013 from before-the-2005 times
to code for a console means to exploit the hardware, what there are so complicated to understand that no one coder in the world will make a different code for a graphic card that hold edram or esram inside? in the console realm it's different at all

amd cards have tessellator hardware from ancient times but no developer used it because it was NOT standard

and no, more cu doesn't means more BW needed, maybe it's less latency needed, depending what those CU's are used for, is what YOU say that doesn't make sense and we've already talked about this, use the search function please.

You know what Timothy Lottes said about 720 leaked specs by vgleaks, eurogamer article right ? He din't sound super excited about the BW limitation.

Xenio · May 17, 2013

Betanumerical said:
It depends on what your doing, but the point he is trying to make (and a correct one) at that is that there no point having 100000000 CU's when you don't have the bandwidth to use them. There is going to be a point eventually where you need to add more bandwidth to start getting a decent increase in performance, just adding more execution units doesn't do anything. This is why all the high end video cards have super high clocked GDDR5 because they need the bandwidth to be able to sufficently keep the execution units busy, if it was all about latency then the higher end cards wouldn't even use GDDR5.

as I wrote, depends on what the units are used for, not all the task involve a big movement of data such is texturing and some FB operations
years and years of benchmark teached us that only @ very high resolutions and AA implementations/quality (as supersampling or high level multisampling) is needed a lot of bandwitch, infact it's very common that @ 1080P or inferior resolution with AA2x or no AA (realm of consoles) often the cards perform very similar even if one is 3x expensive than another

Grall · May 17, 2013

Xenio said:
OMG, graphic cards are a different thing, welcome to 2013 from before-the-2005 times
to code for a console means to exploit the hardware, what there are so complicated to understand

Graphics cards are NOT a different thing, and in fact a console having CPU cores in a console consuming data from the same memory pool as the GPU makes the data capacity of said memory pool even more critical.

and no, more cu doesn't means more BW needed

Actually it does mean that, because you have to feed the CUs with instructions and data from somewhere, and deliver results somewhere. That "something" would be main RAM.

Betanumerical · May 17, 2013

Xenio said:
as I wrote, depends on what the units are used for, not all the task involve a big movement of data such is texturing and some FB operations
years and years of benchmark teached us that only @ very high resolutions and AA implementations/quality (as supersampling or high level multisampling) is needed a lot of bandwitch, infact it's very common that @ 1080P or inferior resolution with AA2x or no AA (realm of consoles) often the cards perform very similar even if one is 3x expensive than another

What FB operations and what Texture Operations would benifit from more Comptue Power? without having any extra bandwidth and please be speicifc, I want the exact operation.

Xenio · May 17, 2013

Betanumerical said:
What FB operations and what Texture Operations would benifit from more Comptue Power? and please be speicifc, I want the exact operation.

I was not surprised that you understand the opposite of what I wrote, infact texture and FB operations are those heavy based on BW and not viceversa

Betanumerical · May 17, 2013

Xenio said:
I was not surprised that you understand the opposite of what I wrote, infact texture and FB operations are those heavy based on BW and not viceversa

I read it wrong, like how you generally make wrong assumptions. But anyway, give us a example of something that would be better off with more compute but the same amount of bandwidth.

Shifty Geezer · May 17, 2013

Anything procedural. You could calculate noise textures on the fly, for example. As long as the working data fits in cache, you can perform multiple 'passes' over the same data, and we can be sure that if devs have access to a completely imbalanced design, they'll still make the most of it. If a platform is RAM+BW heavy, devs will prefer to prebake everything and stream. If it's compute heavy, they'll take to using funky algorithms to calculate as much as possible. Well, in the old days they would. I think in the modern era, games will be designed principally to budget, and any system imbalances will be seen as aggravating complexities that get in the way of porting games between platforms.

Grall · May 17, 2013

What game has ever managed to successfully incorporate procedural assets on a large level? Beyond something like, clouds in the sky, or an infinite runner iphone game with an ever-twisting track generated in realtime or such.

There's speedtree, which only does trees, and... Aaaannnddd....... (I'm thinking here... Not coming up with anything.

)

Procedural stuff isn't as useful as human-designed resources because they often tend to look bland, repetitive, lifeless, or just not very useful. It's tough to invent algos that are sufficiently realistic, especially without the performance going down the toilet I'd imagine. Besides, with 8 gigs (or 5, or whatever) gigs of RAM, do you really NEED procedural anything? There's plenty space to just store ready-made assets...

Shifty Geezer · May 17, 2013

No-one's had an abundance of processing power available to 'waste' on procedural content. If MS went crazy and stuck a second 1.2 TF GPU in Durango without changing anything else, devs (at least those daring enough to invest in Durango specific engine optimisations) would start to use the compute power on non-data-dependent tasks. They wouldn't leave 1.2 TF idle lamenting, "we haven't enough BW to do anything with all that potential, so don't use it."

Scott_Arm · May 17, 2013

marcberry said:
Last thing before i go to bed, that is their is a PATENT from AMD for VSPs it's big.

I search AMD VSP and I get "validated server program." Not that I believe that there's any merit to what you're writing. Just curious.

DJ12 · May 17, 2013

I googled it and it brought up the source for this none other than mistercteam. Dont waste any further time on it I all ready feel annoyed I read 2 pages of crap to see the patent.

News & Rumors: Xbox One (codename Durango)

Betanumerical

Pete

Moderate Nuisance

Shifty Geezer

uber-Troll!

marcberry

Shifty Geezer

uber-Troll!

Deleted member 7537

Guest

Xenio

Betanumerical

Grall

Invisible Member

infamous

Xenio

Grall

Invisible Member

Betanumerical

Xenio

Betanumerical

Shifty Geezer

uber-Troll!

Grall

Invisible Member

Shifty Geezer

uber-Troll!

Scott_Arm

DJ12

Similar threads