NGGP: NextGen Garbage Pile (aka: No one reads the topics or stays on topic) *spawn*

Status
Not open for further replies.
Expect what from Durango game teasers ? :cry:

Not really platform specific teasers (unless its 1st party of course) but next gen games teasers in general. We may start seeing some games starting to include next gen versions too. GTA5 comes to mind as a good candidate.
 
I've been following this as closely as I can.

To just cut to the chase, I don't understand, and hopefully someone can enlighten me, how Durango makes up for the fact that it's GPU is outclassed by Orbis on paper, how it makes up for the fact it's ram configuration is outclassed on paper (talking bandwidth), how while the CPU's are supposedly identical, Orbis' has more dedicated cores to gaming. How, what is it, 12 vs. 18 compute units? How all of this somehow adds up to a "wash".

The reality seems to be, Orbis superior on paper. These rumored specs also I think make the case that MS looks to be going for an OS heavy, dedicated settop box while Sony is going for a more traditional, gaming-power focused box.

Yet we hear over and over from 'reliable' (???) sources that "it's a wash! It's a wash! ignore the numbers it's a wash!". And the only valid argument I see for that is "Durango is so efficient in it's SOC design it's even with Orbis", who then site that "Xbox 360 was only 60% efficient", implying that Orbis won't be efficient and that cancels out it's 'on paper' advantages.

So I guess then I have a few questions for that. Isn't Orbis also a system on chip design? What makes people think it will be so inefficient compared to Durango? Both consoles still don't have final dev kits, meaning that aren't people just coding for 'ballpark' PC hardware right now? Meaning without final hardware, how does anyone know how efficient either one will be?

Thanks and I hope I don't come off as a completely out of touch, I am a fast and interpretive reader but I am by no means an expert on any of this.
 
There is no wash. Those guys backpedaled on that manifesto recently anyways.

The reality is let's just close the boards and wait until E3.
 
There is no wash. Those guys backpedaled on that manifesto recently anyways.

The reality is let's just close the boards and wait until E3.

Now wait just hold on let's not get crazy and do that! I just joined! lol.

I am one day late to the discussion so I missed that.

I know it's early to call any of this but going by the 'rumored' specs, that at this point we really don't have any reason to doubt, at least to me, this seems very clear what both companies are up to.

And I know where I stand as a gamer, I'm more interested in the dedicated gaming box vs. a set top media center, but that's just me. And btw I'm a 360 only gamer this generation incase anyone wants my 'credentials', I never owned a PS3.
 
Your question is easy .
There's no wash . Those specs suggest that Orbis is way ahead ...there's almost no area that it doesn't excel.
I believe it's quite shocking to see Durango being so behind .
Obviously we should wait for the official reveal but something tells me that there wont be substantial changes.
 
There are ways the rumored Durango design could offset the difference in CU's on the currently rumored Orbis design.

We know very little about the SRAM pool, AFAICS it's there because of the low latency access from the GPU, but it could also be multi-ported which would increase the total effective bandwidth for the system.

It's my understanding that the ESRAM is Durango is not intended to be used like the EDRAM in 360, rather in most cases the primary target buffer will be in DDR3 memory, and the data reads will be from the ESRAM.

Depending on how much better the latency of the ESRAM is that could lead the the CU's getting better utilization.

All of that depends on being able to stream textures and other input data EFFICIENTLY through ESRAM, I don't believe this is easy to do, and it's probably why the MS docs apparently devote a lot of time to the Data Move Engines.

I wouldn't even like to posit a guess as to how much sourcing data in SRAM would actually help in efficiency, a lot depends on how much shaders are memory vs ALU limited both now and in the future, and if you can schedule the data moves efficiently.

There are other pieces to this, I'd imagine you render shadows to ESRAM directly, but if modern engines are predominantly deferred, what do you do with the first pass, can you send some of the MRT's to DDR3 and some to ESRAM?

In the end if the leak is accurate and I believe the Durango leak at least probably is, then optimizing the usage of the ESRAM is going to be a lot of rope for developers to hang themselves with.
 
There are ways the rumored Durango design could offset the difference in CU's on the currently rumored Orbis design.

We know very little about the SRAM pool, AFAICS it's there because of the low latency access from the GPU, but it could also be multi-ported which would increase the total effective bandwidth for the system.

It's my understanding that the ESRAM is Durango is not intended to be used like the EDRAM in 360, rather in most cases the primary target buffer will be in DDR3 memory, and the data reads will be from the ESRAM.

Depending on how much better the latency of the ESRAM is that could lead the the CU's getting better utilization.

All of that depends on being able to stream textures and other input data EFFICIENTLY through ESRAM, I don't believe this is easy to do, and it's probably why the MS docs apparently devote a lot of time to the Data Move Engines.

I wouldn't even like to posit a guess as to how much sourcing data in SRAM would actually help in efficiency, a lot depends on how much shaders are memory vs ALU limited both now and in the future, and if you can schedule the data moves efficiently.

There are other pieces to this, I'd imagine you render shadows to ESRAM directly, but if modern engines are predominantly deferred, what do you do with the first pass, can you send some of the MRT's to DDR3 and some to ESRAM?

In the end if the leak is accurate and I believe the Durango leak at least probably is, then optimizing the usage of the ESRAM is going to be a lot of rope for developers to hang themselves with.

To be clear are you saying that the frame buffer has been moved to the main memory, and the ESRAM will be used for general purpose tasks, can you elaborate? I was also wondering could the DME help share data between CPU and GPU in a way that would emulate a pseudo-HSA paradigm?
 
OKAY THEN! Doesn't the RSX in the PS3 have a theoretical 400 GFLOPS while the Xenos in the X360 a theoretical 240 GLOPS, so do people also believe the PS3 has a more powerful GPU than the X360? David Shippy who designed CELL says that the Xenos is more sophisticated than the RSX. In regards to Durango it has more memory, better memory latency and possibly more effecient FLOPS due to the ESRAM and whatever these DME modules do. Where did the rumour come from they simply help with the bandwidth? VGleaks does not explain what the DME modules do.
 
OKAY THEN! Doesn't the RSX in the PS3 have a theoretical 400 GFLOPS while the Xenos in the X360 a theoretical 240 GLOPS, so do people also believe the PS3 has a more powerful GPU than the X360? David Shippy who designed CELL says that the Xenos is more sophisticated than the RSX. In regards to Durango it has more memory, better memory latency and possibly more effecient FLOPS due to the ESRAM and whatever these DME modules do. Where did the rumour come from they simply help with the bandwidth? VGleaks does not explain what the DME modules do.

RSX was downgraded to 500mhz before launch -your figure asume it is 550-. Besides those 400 gflops include the texture alu flops, that are not counted in Xenos, as texture units are decoupled from shaders.
 
Last edited by a moderator:
The Nvidia 680 has 3.1 TFLOPS and the AMD 7970 3.8 TFLOPS but FLOPS do not solely determine the power of a GPU.
 
Last edited by a moderator:
To be clear are you saying that the frame buffer has been moved to the main memory, and the ESRAM will be used for general purpose tasks, can you elaborate? I was also wondering could the DME help share data between CPU and GPU in a way that would emulate a pseudo-HSA paradigm?

As far as I've been told, you can render to either memory pool, but in most cases you will want to render to the DDR3. I don't think it's quite as simple as this, if you are rendering simple polygons with few texture/data reads you'd want to render those to the ESRAM.
If your shader is using a lot of input data, you'd want the target in DDR3 memory and the data sources in ESRAM because this reduces the impact of GPU cache misses, which can be significant.

Because the ESRAM is small the DME's are there to asynchronously transfer data to the ESRAM before it's needed, this still eats bandwidth and there is an issue timing the transfers so as not to stall the GPU.

The DME's really are just data movers, with some additional functionality, and I would assume some way to synchronize with the GPU.
 
X360 Xenos 240 GFLOPS / PS3 RSX 400 GFLOPS = .6667 which implies that the Xenos is only 66% the power of the RSX. Yes or No?


ERP said:
As far as I've been told, you can render to either memory pool, but in most cases you will want to render to the DDR3. I don't think it's quite as simple as this, if you are rendering simple polygons with few texture/data reads you'd want to render those to the ESRAM.
If your shader is using a lot of input data, you'd want the target in DDR3 memory and the data sources in ESRAM because this reduces the impact of GPU cache misses, which can be significant.

Cheers, that sounds interesting and sort of innovative, effectively you've got L3 cache for your GPU so I wonder sort of FLOP improvements could be gained from this, furthermore the CPU could also help with GPU workload couldn't it.
 
X360 Xenos 240 GFLOPS / PS3 RSX 400 GFLOPS = .6667 which implies that the Xenos is only 66% the power of the RSX. Yes or No?.

No, rsx has 16 programmable flops per pixel shader as you cant count the texture unit as mentioned above. Also mentioned above, it runs at 500mhz, not 550mhz. Thus rsx sports 232 GFLOPS compared with 240 in xenos. And xenos flops are more flexible thanks to them being unified. Xenos was a better gpu than rsx in quite a few ways.
 
We know very little about the SRAM pool, AFAICS it's there because of the low latency access from the GPU, but it could also be multi-ported which would increase the total effective bandwidth for the system.
There are stages where I can see the latency aspect is helpful, particularly outside of the latency-tolerant CU stages in the rendering pipeline.
There are various points of streamout and buffering in the fixed-function/specialized parts of the GPU that might not generally need as much bandwidth, but which cannot hide latency as readily.

I'm curious if that's as important to the CUs, which are latency tolerant in ways that the fixed-function domain is not. They are quite capable of burning bandwidth even before taking advantage of latency improvements.

That there are multiple data engines makes me wonder if there are multiple banks. If they are on-die, the aggregate width of their data buses could also be a source of differentiation.
 
X360 Xenos 240 GFLOPS / PS3 RSX 400 GFLOPS = .6667 which implies that the Xenos is only 66% the power of the RSX. Yes or No?

They are totally different architectures so flops are not really comparable. The Orbis vs Durango flops comparisons are being made within the context that the GPUs are shareing the same or very similar architectures from AMD
 
Status
Not open for further replies.
Back
Top