NGGP: NextGen Garbage Pile (aka: No one reads the topics or stays on topic) *spawn*

Status
Not open for further replies.
Ok, but doesn't your analysis here also totally ignore other hardware MS is surrounding the GPU with? Can any of those extra parts (eSRAM, display planes, DME's) make up for the 2 CU difference or 200 Gflops gap simply by being significantly more efficient? :?:

ESRAM will help with the bandwidth problem,but it will not double fill rate add extra CU or anything of that sort.

ESRAM going by theory here will give Durango a boost in bandwidth that will put it close to Orbis,not that will surpass it.

Wasn't Display Planes already shutdown and something other GPU already are doing.? even on PS3 or even PS2.?

Didn't DME suffer the same fate..?

Wasn't all that secret sauce debunked already.?

Let me ask you this,will DME ad extra 400Gflops.? Will ESRAM ad 400Gflosp as well.?

Most of this things what will do is help Durango with its bandwidth problems,not ad extra power.
 
Is he trying to say that Durango has half the ROPs yet the same bandwidth as Orbis (going with the additive split pools meme: DDR3 + EDRAM ~= 170GB/s), hence "twice" the "read/write rate per pixel?" It still doesn't make sense to me, but what do I know.

In what way would he be incorrect technically? I ask out of ignorance.

If someone could educate me..
 
Sometimes it surprises me that people waste their and my time defending an inanimate object in a technical thread in a decidedly nontechnical way. This is a forum for discovery, not winning vicariously through something. (That's what sports are for.)
 
They built a console that was a contemporary of the PS3, performed as well as (and outperformed in some cases) the PS3, and was cheaper to manufacture over its lifecycle. At the time, most people including Sony believed the PS3 was far more powerful. Thats the precedent i'm talking about to which MS deserves a nod this go 'round.




To anyone, what is the difference between 100 GLOPS on the CPU and 100 GFLOPS on a GPU? are they 'equal' in terms of accessibility, flexibility, 'power', etc?


That is not what i remember and what MS power point about the PS3 vs the 360 showed,MS even went to extreme to call Cell SPE useless DSP's,the whole notion that sony over promise and MS did not is silly,since they 1 they over hyped the console in more than 1 way,sony did the same.

In this industry there is not shame to what platform holder can claim their systems will do,sony over promise MS did the same,this remind me a time when sony was accuse of saying the PS2 would do toy story graphics,while the whole charade was done by MS on the xbox unveil and not sony.
 
To anyone, what is the difference between 100 GLOPS on the CPU and 100 GFLOPS on a GPU? are they 'equal' in terms of accessibility, flexibility, 'power', etc?

it's certainly easier to exploit quickly on cpu vs gpu,

thing is... you're not doing floating point operations on cpu side as much as gpu side so it's harder to maximize

so while the "gflop" performance would be double, the performance of other operations between durango and orbis would be the same so you truly wouldn't see double (or anywhere near, imo) the performance

In what way would he be incorrect technically? I ask out of ignorance.

If someone could educate me..

there'd be data redundancy... the data in the ESRAM would have to come from the RAM before it goes to the GPU - either way you couldn't get 170 GB/s unless you filled up the ESRAM and then read from it and the RAM over and over and over, the second you start writing to it, that bandwidth tanks.

but then again the Orbis bandwidth figure is going to be BS too, because you have random access reads from the CPUs... and GDDR5 afaik isn't low latency

people are throwing around theoretical peak performance like it's true real time performance. that irritates me.
 
Last edited by a moderator:
So there are rumblings that the Durango CPU is far more powerful than the jaguar cores in orbis?


And this is one of the things that make me laugh.

So the rumors now say Durango has a CPU with double the flops of the vanilla Jaguar,the thing is the jaguar on Orbis is not Vanilla,Vanilla Jaguar is not 8 cores is from 2 to 4 if i am not mistaken.

But even giving that advantage to Durango lets say is true,would that really be consider far more powerful.?

So Orbis having 600Gflosp on CU more is nothing,but Durango having 102 more Gflops in CPU is far more powerful.?

:cool:
 
Let me ask you this,will DME ad extra 400Gflops.?

Let me ask you this: How many shader flops are exercised in moving memory in AMD's GCN architecture in a typical game scenario? Typical compute scenario?

Will ESRAM ad 400Gflosp as well.?

Let me ask you this: How many shader flops (cycles) are un-utilized due to stalls, cache misses, and general round trip latency to system memory in a typical game scenario? Typical compute scenario?

Most of this things what will do is help Durango with its bandwidth problems,not ad extra power.

Let me ask you this: Earlier you were quite frustrated people were overlooking the raw spec differences (18/14 CUs vs. 12 CUs, system bandwidth, higher pixel fillrate, assumed higher texel fillrate, etc.) so I need to know how you determined how ESRAM only helps with bandwidth? Is the latency irrelevant? Do you have architectural insights that have allowed you to arrive at the position that bandwidth, and bandwidth alone, is the only beneficiary of ESRAM and it has no impact or actual compute utilization? (i.e. compare various AMD and NV architectures, configuration and architecture play huge roles in compute utilization).

Specifically I am quite interested in your understanding of how ESRAM locality/render pipeline access and latency and the lack of benefit considering, to my knowledge, we don't even know the latency of the ESRAM.
 
In what way would he be incorrect technically? I ask out of ignorance.

I'm just assuming he's incorrect because of that odd 64b vs 32b format reasoning, but I'm as ignorant as you, so I'd also like someone who knows what they're talking about to weigh in.

If the ROPs divide their time b/w color and depth buffers (CBs and DBs), and the CBs are meant to be in DDR3 per rumors, then maybe the DBs can be in ESRAM and so the ROPs would use the bandwidth of each memory type simultaneously? I should really stop speculating about something I'm clueless about.
 
Is he trying to say that Durango has half the ROPs yet the same bandwidth as Orbis (going with the additive split pools meme: DDR3 + EDRAM ~= 170GB/s), hence "twice" the "read/write rate per pixel?" It still doesn't make sense to me, but what do I know.
I remember there was a discussion about the amount of bandwidth available per ROP, I think it started with the idea the the PS4 wouldn't have enough bandwidth available to feed all the ROPs at max. But I don't know how to calculate that, it depends on the pixel format I guess.
 
Let me ask you this: How many shader flops are exercised in moving memory in AMD's GCN architecture in a typical game scenario? Typical compute scenario?



Let me ask you this: How many shader flops (cycles) are un-utilized due to stalls, cache misses, and general round trip latency to system memory in a typical game scenario? Typical compute scenario?



Let me ask you this: Earlier you were quite frustrated people were overlooking the raw spec differences (18/14 CUs vs. 12 CUs, system bandwidth, higher pixel fillrate, assumed higher texel fillrate, etc.) so I need to know how you determined how ESRAM only helps with bandwidth? Is the latency irrelevant? Do you have architectural insights that have allowed you to arrive at the position that bandwidth, and bandwidth alone, is the only beneficiary of ESRAM and it has no impact or actual compute utilization? (i.e. compare various AMD and NV architectures, configuration and architecture play huge roles in compute utilization).

Specifically I am quite interested in your understanding of how ESRAM locality/render pipeline access and latency and the lack of benefit considering, to my knowledge, we don't even know the latency of the ESRAM.


Exactly we don't know anything.

So how is Durnago ultra efficient while Orbis is not.?

Then i read a blog from an Nvidia worker one who created an algorithm for AA,analyze the specs and basically trew the whole magic sauce to the floor,even ESRAM.

He was going by the spec known.
 
Sometimes it surprises me that people waste their and my time defending an inanimate object in a technical thread in a decidedly nontechnical way. This is a forum for discovery, not winning vicariously through something. (That's what sports are for.)

What gets me is even if the VG leaks are accurate, are they accurate for the released hardware? (How old, how much devkit info is mixed in with final targets, etc)

Assuming they are accurate, there are huge missing chunks of information. e.g. on Durango we can imply 16ROPs and 48TMUs based on the frequency (800MHz) and pixel/texel fill rates but we don't have equally concrete information on Orbis. We don't know what the 18/14 CU deal is (it could be those 4CUs are for the OS primarily??) We have no figures on the ESRAM latency. We don't have final figures on reserves CPU cores OR memory. We don't know if Kinect/PSEYE will have dedicated silicon OR if the console APUs need to process what. We don't have a good idea of all the DSPs and other logic (e.g. sound chips and how robust) or if/how many ARM chips are present--and if they are for security and/or OS? As for those Jaguar cores are they the same in both consoles? And how will they work in a GDDR5 environment and how will that sort of memory contention work out in practice in the consoles? Do either have extra large last level caches for the CPUs?

Basically we don't know much about the architectures.

Even if the leaked specs are accurate there are still important specs missing and we don't have enough architectural details and actual real-world uses to say much.

Flipping things around, lets say it was Durango with more CUs + 32MB ESRAM. I personally, after Xenos, would want to hear how it works IN THE REAL WORLD before really drawing a conclusion. I have heard that for 720p it allows some great AA possibilities as well as for virtual texturing the speed and IQ is very good. But how will finished software where you are moving parts of the framebuffer in and out, compute tasks, all your GPU cache, your virtual texture budget, etc.? It could be amazing--or it could have some critical flaws in addition to being a total pain in the @$$ to work with.

Basically all those fortifying their positions already are just waiving their fanboy flags early and often. It was no different with the PS3/Xbox 360 where raw numbers ruled the roost **and it took years and years after the PS3 released for people to comes to grips with the architectural differences and utilization that impacted real game performance.** Some still have come to grasps with such.

Really not much point debating until we know something concrete and substantial. Right now the situation looks interesting and as I mentioned before like Orbis and Durango have pretty similar budgets for silicon, TDP, and general costs with Orbis more focused on core gamer wants (some extra visual performance) and MS trying to stay in the ballpark while living out their living room dream (hands free Kinect, DVR, media streaming/center). The wild card really comes down to architecture (is ESRAM really just there to blunt DDR3 bandwidth issues + help with cost reduction long term or is there some real computational benefits? If ESRAM was 500GB/s I think the debate would be tilted; but latency is not a frequently discussed/understood concept, and we have no numbers, so it isn't factored in. Which isn't a problem, unless you have already devoted yourself to a position).
 
Exactly we don't know anything.

So how is Durnago ultra efficient while Orbis is not.?

Then i read a blog from an Nvidia worker one who created an algorithm for AA,analyze the specs and basically trew the whole magic sauce to the floor,even ESRAM.

He was going by the spec known.

He was assuming it was to store the framebuffer. (which would be kinda dumb as it's not really a latency sensitive operation)- As we understand, it's recommended the framebuffer goes to DDR3 ram and shader/compute ops go through the ESRAM.

He wasn't thinking of the big picture.
 
So how is Durnago ultra efficient while Orbis is not.?

You are jumping too many questions to draw up your straw man.

How many compute cycles are lost in AMD's GCN due to latency or moving memory due to GDDR5 latency? Would this be impacted if 8 CPUs had access to VRAM?

Only then could you move to Durango, get the ESRAM latency figures, and begin to weigh such.

To answer your question, assuming no other bottlenecks, an on-die memory pool with broader access and low latencies is going to result in a more efficient (faster) system (same units/speed) than one with memory off chip.

The fact the DMEs may open up some compute resources by picking up tasks the shaders usually must coordinate is also an efficiency gain.

So (a) memory latency and (b) extra silicon for memory management argue Durango *is* more efficient in theory.

That is the theory; we don't have any specific data to QUANTIFY or experience to know "GOTCHA"s that qualify such. (Tiered memory pools are an obvious candidate for "oh crap, our performance just tanked!")

But yet, assuming ESRAM is genuinely low latency, there will be workloads (especially if it can all fit into ESRAM) where Durango could be more efficient.

Do you disagree? i.e. latency and shader memory management are corner cases (less than 1% relevant)? If so you may educate us :cool:
 
ESRAM will help with the bandwidth problem,but it will not double fill rate add extra CU or anything of that sort.

ESRAM going by theory here will give Durango a boost in bandwidth that will put it close to Orbis,not that will surpass it.

I'm not talking about surpassing Orbis. And can't the eSARAM be involved heavily in handling virtualized assets? Thought I read something about that here before, but might be wrong. Those kinds of assets don't require much bandwidth for the same visual output iirc.

Wasn't Display Planes already shutdown and something other GPU already are doing.? even on PS3 or even PS2.?
So the contention is that MS/AMD engineered in hardware display planes (and patented them) that have been commonplace for gaming for the past 13 yrs? Am I interpreting your comment right?

Didn't DME suffer the same fate..?

I'm not as up on the details as others here are, but my impression was that the DME's are not to be dismissed merely as run of the mill DMA's due to their extra functionality and/or how they work with the eSRAM. Not sure on that though. Be careful not to lean to heavily on confirmation bias with this stuff. You seem to lean on conclusions made by ppl who were openly mocking the notion that extra hardware even existed on Durango in the first place by labeling it as 'secret sauce' just to undermine it.

Wasn't all that secret sauce debunked already.?

Dismissed by ppl with an agenda who set out to mock its (now confirmed) existence in the first place? Sure. 'Debunked'? I dunno about that. I do seem to get the impression that of the two console Durango is the one looking better from day to day relative to the initial leaked specs. I think the expectation of what Orbis can do has been relatively static in contrast.

If AMD/MS already had a stock GPU with most of this stuff in it as is supposedly commonplace they wouldn't add extra kit around the GPU to do these things. Or are we to assume they removed those functions to push them outside the GPU? If so, wouldn't this suggest more room freed up for Durango's GPU to put its 1.2 Tflops to work on other stuff?

I think some ppl rely on the notion that MS added this stuff to remove performance issues relative to what Orbis was doing. A simple look at the timelines proves this can't possibly be true. MS had a major advantage based on what insiders are saying circa summer 2012 when devs revolted and forced Sony to up their specs. There is a VGLeaks article from Jun that bears that out to a tee. Yet we have specs for Durango that still hold true from 5 months prior to that. So clearly MS wasn't ever designing this stuff as a reaction to Orbis. It also doesn't make sense that MS and AMD would take the off the shelf 'stock solution', consider it, and then opt for a much more complex and elaborate (aka expensive to design) setup just so they could end up with weaker hardware than the stock solution offered. The whole off the shelf stock approach would surely have been the very first thing they considered after all.

Let me ask you this,will DME ad extra 400Gflops.? Will ESRAM ad 400Gflosp as well.?

I dunno. I'm asking. I don't see anyone arguing those two add 800 Glops to the performance. If MS is trying to leverage virtualized assets (virtual textures, virtual geometry) in the hardware design maybe they don't need as much processing? They certainly wouldn't need as much bandwidth based on what I've read either.
 
You are jumping too many questions to draw up your straw man.

I think the real question is did MS back into all this complementary hardware because they needed 8GB of RAM, or were they able to provide developers and the OS 8GB of RAM because their design philosophy for this console (efficiency, virtualized textures, stuff i really dont understand, etc) afforded them that much?

At first i thought the primary requirement was the 8GB but as more and more of the layout is coming into focus I'm not so sure.
 
How many compute cycles are lost in AMD's GCN due to latency or moving memory due to GDDR5 latency? Would this be impacted if 8 CPUs had access to VRAM?

Not sure if this is the same thing or not, but earlier today I did randomly come across this note in the June 2012 VGLeaks for Orbis. Much of that info still holds for today so it may be helpful in guesswork.

“>50% longer memory latency for CPU/ GPU compared to PC!!!”

http://www.vgleaks.com/world-exclusive-ps4-in-deep-first-specs/#comment-5026
 
Not sure if this is the same thing or not, but earlier today I did randomly come across this note in the June 2012 VGLeaks for Orbis. Much of that info still holds for today so it may be helpful in guesswork.

“>50% longer memory latency for CPU/ GPU compared to PC!!!”

http://www.vgleaks.com/world-exclusive-ps4-in-deep-first-specs/#comment-5026

Without having Jaguar cores to test that is difficult to quantify, especially as Jaguar cores will probably first appear on LP DDR platforms.

On the GPU side this could go a number of ways; the pro side would argue that GPUs are designed to hide latency, a unified memory helps, and if Orbis hits 192GB/s that is more bandwidth than Pitcairn while having fewer CUs/Lower frequency (something like 600-700GFLOPs difference iirc) so while latency may be increased bandwidth per CU is much higher. The negative side may be that the 50% is after that is taken into consideration and Orbis will also come in at ~170GB/s with the CPU really needing a solid ~40GB/s which would indicate Orbis CUs are tuned fairly inline with AMD's GCN; the latency being the biproduct of a shared memory bus with the CPU and while GPUs are fairly latency tolerance a 50% spike will affect performance, especially compute.

In the end we don't know if the 50% figure is accurate, where it fits into the equation, and how relevant it will be for performance.

But yeah, it goes back to people picking the rumors they like/dislike and disregarding the rest.
 
DMEs, from what we know, does R/W so the SIMDs don't remain idle.

On Xbox 360... they sat idle for 40% of the time... mainly due to things like this.

Question we have to ask is: was that fixed architecturally between Xenos and now?

if no: How far has that inefficiency been reduced by higher bandwidth memory or is it primarily latency sensitive?

and lastly: How much are GPUs typically losing now, and how much would the DMEs offset that inefficiency?
 
Its also important to note that the DMEs in durango, apart from the fact that there are 4 of them, they also perform additional tasks than normal DMAs, eg. each one can tile/untile, one can decode jpeg and LZ while one can encode LZ. These are functions that would traditionally be performed on a cpu or gpu, so it does save some compute resource.
 
Status
Not open for further replies.
Back
Top