How Important are FLOPS to Gaming Performance?

onanie · Jul 18, 2005

Tap In said:
For instance, this is probably the first time that general purpose physics calculation would be achievable, with a reasonable degree of success, on a graphics processor and is a big step towards the graphics processor becoming much more like a vector co-processor to the CPU.

Is that physics calculation something that "MEMEXPORT" enabled, or something that GPUs can inherently do already (vector calculations)?
Is dave referring more to how results can be written/read to system RAM?

Tap In · Jul 18, 2005

onanie said:
Tap In said:

For instance, this is probably the first time that general purpose physics calculation would be achievable, with a reasonable degree of success, on a graphics processor and is a big step towards the graphics processor becoming much more like a vector co-processor to the CPU.

Click to expand...

Is that physics calculation something that "MEMEXPORT" enabled, or something that GPUs can inherently do already (vector calculations)?
Is dave referring more to how results can be written/read to system RAM?

well that's a good question but since Dave says "probably the first time that general purpose physics calculation would be achievable, with a reasonable degree of success, on a graphics processor" I'm gonna have to go with MEMEXPORT makes it possible.

Also... don't overlook this:

With the capability to fetch from anywhere in memory, perform arbitrary ALU operations and write the results back to memory, in conjunction with the raw floating point performance of the large shader ALU array, the MEMEXPORT facility does have the capability to achieve a wide range of fairly complex and general purpose operations; basically any operation that can be mapped to a wide SIMD array can be fairly efficiently achieved and in comparison to previous graphics pipelines it is achieved in fewer cycles and with lower latencies.

onanie · Jul 18, 2005

Tap In said:
onanie said:

Tap In said:

For instance, this is probably the first time that general purpose physics calculation would be achievable, with a reasonable degree of success, on a graphics processor and is a big step towards the graphics processor becoming much more like a vector co-processor to the CPU.

Click to expand...

Is that physics calculation something that "MEMEXPORT" enabled, or something that GPUs can inherently do already (vector calculations)?
Is dave referring more to how results can be written/read to system RAM?

Click to expand...

well that's a good question but since Dave says "probably the first time that general purpose physics calculation would be achievable, with a reasonable degree of success, on a graphics processor" I'm gonna have to go with MEMEXPORT makes it possible.

Also... don't overlook this:

With the capability to fetch from anywhere in memory, perform arbitrary ALU operations and write the results back to memory, in conjunction with the raw floating point performance of the large shader ALU array, the MEMEXPORT facility does have the capability to achieve a wide range of fairly complex and general purpose operations; basically any operation that can be mapped to a wide SIMD array can be fairly efficiently achieved and in comparison to previous graphics pipelines it is achieved in fewer cycles and with lower latencies.

Click to expand...

The important question - how is this going to be different to the RSX functionality that we know about already? If it is the "first time", then it is only because there are currently no PC GPU cards that have direct access to system RAM.

Tap In · Jul 18, 2005

onanie said:
Tap In said:

onanie said:

Tap In said:

For instance, this is probably the first time that general purpose physics calculation would be achievable, with a reasonable degree of success, on a graphics processor and is a big step towards the graphics processor becoming much more like a vector co-processor to the CPU.

Click to expand...

Is that physics calculation something that "MEMEXPORT" enabled, or something that GPUs can inherently do already (vector calculations)?
Is dave referring more to how results can be written/read to system RAM?

Click to expand...

well that's a good question but since Dave says "probably the first time that general purpose physics calculation would be achievable, with a reasonable degree of success, on a graphics processor" I'm gonna have to go with MEMEXPORT makes it possible.

Also... don't overlook this:

With the capability to fetch from anywhere in memory, perform arbitrary ALU operations and write the results back to memory, in conjunction with the raw floating point performance of the large shader ALU array, the MEMEXPORT facility does have the capability to achieve a wide range of fairly complex and general purpose operations; basically any operation that can be mapped to a wide SIMD array can be fairly efficiently achieved and in comparison to previous graphics pipelines it is achieved in fewer cycles and with lower latencies.

Click to expand...

Click to expand...

The important question - how is this going to be different to the RSX functionality that we know about already? If it is the "first time", then it is only because there are currently no PC GPU cards that have direct access to system RAM.

well does anyone know if RSX supports MEMEXPORT?

I have not heard about it yet so I guess we will find out sooner or later.

For now we know Xenos does do it and does it well according to Dave's article.

Several people here undoubtedly understand how MEMEXPORT differs from typical GPU functions better than I so perhaps they could chime in here.

onanie · Jul 18, 2005

Tap In said:
well does anyone know if RSX supports MEMEXPORT?

Yes, it just doesn't have a fancy name.

Tap In · Jul 18, 2005

onanie said:
Tap In said:

well does anyone know if RSX supports MEMEXPORT?

Click to expand...

Yes, it just doesn't have a fancy name.

So you have details on RSX and how it functions with a MEMEXPORT feature?

please share with the group.

onanie · Jul 18, 2005

Tap In said:
onanie said:

Tap In said:

well does anyone know if RSX supports MEMEXPORT?

Click to expand...

Yes, it just doesn't have a fancy name.

Click to expand...

So you have details on RSX and how it functions with a MEMEXPORT feature?

please share with the group.

Glad to share, although this is widely known.

"The RSX can render pixels to any part of memory, giving it access to the full 512MB of memory of the PS3", from press conference.

Tap In · Jul 18, 2005

onanie said:
Glad to share, although this is widely known.

"The RSX can render pixels to any part of memory, giving it access to the full 512MB of memory of the PS3", from press conference.

okay.

I'm not convinced yet that it will be as functional (considering Dave's article) but until we know more details on RSX (or someone more knowledgeable here can clarify) I'll take your word for it.

thx

onanie · Jul 18, 2005

Tap In said:
onanie said:

Glad to share, although this is widely known.

"The RSX can render pixels to any part of memory, giving it access to the full 512MB of memory of the PS3", from press conference.

Click to expand...

okay.

I'm not convinced yet that it will be as functional (considering Dave's article) but until we know more details on RSX (or someone more knowledgeable here can clarify) I'll take your word for it.

thx

Thank you for considering my points.

(if you're wondering if it will be as "functional" in general terms - well, they've even used an Nvidia GPU to perform database calculations)

Bobbler · Jul 18, 2005

Isn't MEMEXPORT similar to TurboCache or ATI's PC equivalent (can't remember the name now)? I'm not really sure how turbocache and such works, but it sounds similar.

MEMEXPORT is just a fancy name for functionality that if either didn't have they'd (probably) be heavily crippled. It seems to me the functionality that MEMEXPORT and PS3 equivalent are more a side effect of the bus architecture that they both chose rather than some new innovative way of doing things -- by having huge bandwidth between RSX and Cell and Xenos and XCPU it would make sense that they could do the things they do.

aeriic · Jul 18, 2005

onanie said:
Glad to share, although this is widely known.

"The RSX can render pixels to any part of memory, giving it access to the full 512MB of memory of the PS3", from press conference.

This is not what MEMEXPORT will be doing...

In simple terms the MEMEXPORT function is a method by which Xenos can push and pull vectorised data directly to and from system RAM. This becomes very useful with vertex shader programs as with the capabilities to scatter and gather to and from system RAM the graphics processor suddenly becomes a very wide processor for general purpose floating point operations.

Current GPUs can access main RAM through APG/PCIe. The difference is in the type of data it can read/write and what it can do with it...

...basically any operation that can be mapped to a wide SIMD array can be fairly efficiently achieved and in comparison to previous graphics pipelines it is achieved in fewer cycles and with lower latencies.

It's true that current GPUs have been used to do some GP work, but why do you think only academics/researchers have done this? I don't think they'll be doing any database ops with either RSX/C1 anyway, but in the example you mentioned, what happens when the database doesn't fit in the frame buffer (ie: 20M records)? How about implementing random writes? Integer math ops..., etc... A proof-of-concept doesn't always mean it's a practical solution applicable elsewhere... and (in this example) really doesn't pertain to what X360/PS3 will be able to do graphics/physics/AI -wise.

onanie · Jul 18, 2005

aeriic said:
This is not what MEMEXPORT will be doing...

Current GPUs can access main RAM through APG/PCIe. The difference is in the type of data it can read/write and what it can do with it...

There is no formal restriction on the type of data that RSX can transfer to/from XDR. Additionally, there is nothing peculiar about the xenos ALUs that will allow it to perform more general purpose operations than any other GPUs.

MEMEXPORT, as bobbler quite succinctly put it, is a "side effect" (which in my personal belief is being as a marketing point, but i wouldn't expect you to agree).

Inane_Dork · Jul 18, 2005

onanie said:
"The RSX can render pixels to any part of memory, giving it access to the full 512MB of memory of the PS3", from press conference.

So... that makes it as capable as basically every GPU to come out since the GF2. Rendering pixels is not what MEMEXPORT is about. Similar, but vastly different. Arbitrary write within a shader is totally unheard of (prior to Xenos, of course). Rendering pixels is the antithesis of an arbitrary write.

onanie · Jul 18, 2005

Inane_Dork said:
onanie said:

"The RSX can render pixels to any part of memory, giving it access to the full 512MB of memory of the PS3", from press conference.

Click to expand...

So... that makes it as capable as basically every GPU to come out since the GF2. Rendering pixels is not what MEMEXPORT is about. Similar, but vastly different. Arbitrary write within a shader is totally unheard of (prior to Xenos, of course). Rendering pixels is the antithesis of an arbitrary write.

Let me illustrate "arbitrary write", inane_dork.

"we need to share the data between the CPU and GPU as much as possible. That's why we adopted this architecture. We want to make all the floating-point calculations including their rounded numbers the same, and we've been able to make it almost identical. So as a result, the CPU and GPU can use their calculated figures bidirectionally"

"RSX can directly refer to a result simulated by CELL and CELL can directly refer to a shape of a thing RSX added shading to"

Inane_Dork · Jul 18, 2005

onanie said:
Let me illustrate "arbitrary write", inane_dork.

"we need to share the data between the CPU and GPU as much as possible. That's why we adopted this architecture. We want to make all the floating-point calculations including their rounded numbers the same, and we've been able to make it almost identical. So as a result, the CPU and GPU can use their calculated figures bidirectionally"

"RSX can directly refer to a result simulated by CELL and CELL can directly refer to a shape of a thing RSX added shading to"

Except that that's not arbitrary whatsoever. That's fetching textures and locking render targets after rendering to them. I can do that on my PC right now. That's still locked into outputting solely pixels to solely the render target(s).

onanie · Jul 18, 2005

Inane_Dork said:
Except that that's not arbitrary whatsoever. That's fetching textures and locking render targets after rendering to them. I can do that on my PC right now. That's still locked into outputting solely pixels to solely the render target(s).

Not arbitrary? Just fetching textures and locking render targets? It is amazing that you can infer that from the two statements that i quoted.

aeriic · Jul 18, 2005

onanie said:
Let me illustrate "arbitrary write", inane_dork.

"we need to share the data between the CPU and GPU as much as possible. That's why we adopted this architecture. We want to make all the floating-point calculations including their rounded numbers the same, and we've been able to make it almost identical. So as a result, the CPU and GPU can use their calculated figures bidirectionally"

"RSX can directly refer to a result simulated by CELL and CELL can directly refer to a shape of a thing RSX added shading to"

I don't think it was ever an issue whether Cell+RSX could do something similar. To the contrary, I stated this already. The way you are describing what you believe MEMEXPORT to be is what inane_dork is trying to address. It is not: AGP/PCIe sidebanding/GART/DIME/TC, etc. If that were the case, I don't think Dave would have described it the way he did in the article since practically all current GPUs have those abilities...

If this feature is just a side-effect or accident, I only wish ATI/nVidia stumbled upon these side-effects more frequently...

This comment you quoted by KK is pretty vague. He can be referring to framebuffer ops as he mentioned earlier in the same interview...

onanie · Jul 18, 2005

aeriic said:
I don't think it was ever an issue whether Cell+RSX could do something similar.

Well then, perhaps it was trivial to point it out as a named feature in the xenon/c1 combination. If there is no issue then the discussion is settled.

DeanoC · Jul 18, 2005

onanie said:
aeriic said:

I don't think it was ever an issue whether Cell+RSX could do something similar.

Click to expand...

Well then, perhaps it was trivial to point it out as a named feature in the xenon/c1 combination. If there is no issue then the discussion is settled.

The difference between render target writes and MEMEXPORT is that MEMEXPORT uses no coherancy patterns to dictate the data output. For example a vertex shader on Xenon can do this

MEMEXPORT TO Address(0), Val0
MEMEXPORT TO Address(10000), Val1
MEMEXPORT TO Address(2344), Val2
MEMEXPORT TO Address(9990), Val3
And still write fragments via the pixel shader to EDRAM

To do the same thing using a conventional rasterisor would involve 5 seperate triangles (one for each memory write). Thats a vast difference for many GPGPU operations, any GPU can do MEMEXPORT like function but by using lots and lots of triangles...

It basically allows full scatter/gather memory functions, the major difference between CPU and GPUs.

Shifty Geezer · Jul 18, 2005

I think the key difference between XB360 and PS3 integration is that, where we know both have communcation lines direct to CPU storage, we dont know if RSX can export data during a shader to RAM.

A conventional GPU can write to RAM, but it can only export pixel data and the like. Xenos can apparently send any data from within its shader array. This means you can send vectorised data to Xenos, have it proces that data, and returned the still vectorised data for use elsewhere, without having to bunch that data up into packets that fit triangle data structures as is currently done. This is what opens Xenos up as a vector processor.

We don't know what degree of data manipulation RSX has. On the one hand it doesn't really need to be a vector processor as that's what Cell's for, but on the other nVidia talk suggests they and Sony having a similar vision of the future, which might well see a more free-form data structure appear in RSX.

How Important are FLOPS to Gaming Performance?

onanie

Tap In

onanie

Tap In

onanie

Tap In

onanie

Tap In

onanie

Bobbler

Shazbot!

aeriic

onanie

Inane_Dork

Rebmem Roines

onanie

Inane_Dork

Rebmem Roines

onanie

aeriic

onanie

DeanoC

Trust me, I'm a renderer person!

Shifty Geezer

uber-Troll!

Similar threads