R600 nugget...

Joe DeFuria · May 13, 2005

Could be completely talking out of his ass, of course but:

http://features.teamxbox.com/xbox/1145/The-Xbox-360-Dissected/p5/

But a graphics chip from ATI that incorporates all the technologies seen in the Xbox 360 graphics subsystem may not arrive until 2006, early 2007. That graphic processor, a so called R600, will be the generation after the R520 and is supposed to be the first GPU to take full advantage of Longhornâ€™s WGF 2.0 features and finally incorporate the other fundamental aspect of the Xbox 360 GPU: the embedded DRAM.

SecretFire · May 13, 2005

Why would you add DRAM to the core? It's slower than SRAM and adding a lot of it certainly wouldn't be good for yields.

Natoma · May 13, 2005

SecretFire said:
Why would you add DRAM to the core? It's slower than SRAM and adding a lot of it certainly wouldn't be good for yields.

DRAM is cheaper than SRAM I believe. And they could modularize it so that it's more of an L2 cache rather than an a true embedded memory solution. Then again, the lessons they learn from the R500 could certainly give them the experience they need to successfully embed DRAM in the R600.

Meh. It's too far off anyways.

Entropy · May 13, 2005

SecretFire said:
Why would you add DRAM to the core? It's slower than SRAM and adding a lot of it certainly wouldn't be good for yields.

Bandwidth, coupled with RAM density.
For CPUs, the purpose of on-chip memory has typically had the purpose of providing a pool of low latency memory. For a GPU however the purpose would predominantly be to increase available bandwidth or conversely to reduce the need for extremely high off-chip bandwidth.
There can still be pools of cache on the chip for low-latency access.
The reason they use embedded DRAM instead of SRAM is that DRAM density is much higher, and they can fit more of it at any given time and process technology.
The reason it has been used on consoles for two generations now, and not PC GPUs is at least partly that the expected resolution for PCs has been higher. The Nintendo GC supports 640x480. The XBox 360 will support 720p. By 2007 TSMC will presumably be using 65nm lithography, which might allow sufficient amounts of eDRAM to be integrated to suit PC purposes. Or not.
Will it decrease yields and drive up chip costs? Definately. But it will also decrease dependence on the comparatively very expensive external memory, so overall system costs may actually be reduced.

DemoCoder · May 13, 2005

The chief stumbling block for a desktop GPU is to put enough eDRAM to handle the full range of resolutions that desktop users expect. With consoles, you can design for one resolution, like 720p. But a desktop GPU should support resolutions well into the 2-4megapixel range, like 24" and 30" widescreen LCDs (Apple Cinema, etc). You can buy a 24" widescreen as cheap as $800-900 now. (Dell 24" incher with rebate) 1920x1200 preferred, but 1600x1200 a must. That takes atleast 23mb and we're not even including HDR, MRT, and AA formats.

PC-Engine · May 13, 2005

Console games running in 1920x1080 HDTV resolution will require the same fillrate as a PC part...

Jawed · May 13, 2005

PC-Engine said:
Console games running in 1920x1080 HDTV resolution will require the same fillrate as a PC part...

What if the console up-scales 1280x720 to 1920x1080?

Jawed

Jawed · May 13, 2005

DemoCoder said:
The chief stumbling block for a desktop GPU is to put enough eDRAM to handle the full range of resolutions that desktop users expect. With consoles, you can design for one resolution, like 720p. But a desktop GPU should support resolutions well into the 2-4megapixel range, like 24" and 30" widescreen LCDs (Apple Cinema, etc). You can buy a 24" widescreen as cheap as $800-900 now. (Dell 24" incher with rebate) 1920x1200 preferred, but 1600x1200 a must. That takes atleast 23mb and we're not even including HDR, MRT, and AA formats.

In R500 the AA samples don't live in the EDRAM. The framebuffer is purely for the frame itself. AA samples are held in local memory until they can be resolved into a completed pixel.

Jawed

nAo · May 13, 2005

Jawed said:
AA samples are held in local memory until they can be resolved into a completed pixel.Jawed

What if local (on chip?) mem isn't enough?

Dave B(TotalVR) · May 13, 2005

I imagine the best implimentation of embedded ram would be to use two cores and link them over a wide memory bus, that way you can grow the silicon seperately and not worry about yields. You will have extra packaging costs and complications to deal with instead though.

As for having lots of it.. I suspect it is just enough to fit one part of the rendering process in such as the z-buffer (my first guess) which would free up a lot of the memory bus and reduce the granularity of memory accesses because z-buffer acess is the most granular. Saying that though, it would require quite the change on ATi's part because the z-buffer writes are almost certainly tied in witht he way the framebuffer is writeen into RAM. It could simply be a large L2 cache for textures, allowing the driver to pull textures into this ram pool in advance of rendering a certain player model or something.

Anything would be pure speculation though.

Jawed · May 13, 2005

nAo said:
Jawed said:

AA samples are held in local memory until they can be resolved into a completed pixel.Jawed

Click to expand...

What if local (on chip?) mem isn't enough?

Local means non EDRAM. EDRAM is not used to hold AA samples. (Well I think a portion of EDRAM might be used to cache AA samples that are being pipelined for filtering). AA samples consume far too much memory to fit within EDRAM.

On a graphics card Local Memory is the graphics card's memory. In Xbox 360 it's the area of system RAM usable by the GPU. Apparently 256MB of XBox 360 RAM is dedicated to the GPU.

Jawed

_xxx_ · May 13, 2005

Dave B(TotalVR) said:
I imagine the best implimentation of embedded ram would be to use two cores and link them over a wide memory bus, that way you can grow the silicon seperately and not worry about yields. You will have extra packaging costs and complications to deal with instead though.

From what I read, ATI is developing something like that for Nintendo's Revolution thingy. It's supposed to have a dual-core GPU.

nAo · May 13, 2005

Jawed said:
Local means non EDRAM. EDRAM is not used to hold AA samples. (Well I think a portion of EDRAM might be used to cache AA samples that are being pipelined for filtering). AA samples consume far too much memory to fit within EDRAM.

What is your definition of AA samples?
If these AA sample are going of chip how much bw do you need to read/write them?

On a graphics card Local Memory is the graphics card's memory. In Xbox 360 it's the area of system RAM usable by the GPU. Apparently 256MB of XBox 360 RAM is dedicated to the GPU.
Jawed

It should be a UMA architecture, just like first XBOX, AFAIK

Xmas · May 13, 2005

DemoCoder said:
The chief stumbling block for a desktop GPU is to put enough eDRAM to handle the full range of resolutions that desktop users expect. With consoles, you can design for one resolution, like 720p. But a desktop GPU should support resolutions well into the 2-4megapixel range, like 24" and 30" widescreen LCDs (Apple Cinema, etc). You can buy a 24" widescreen as cheap as $800-900 now. (Dell 24" incher with rebate) 1920x1200 preferred, but 1600x1200 a must. That takes atleast 23mb and we're not even including HDR, MRT, and AA formats.

R500 does support partitioned rendering as well as storing incompressible framebuffer tiles in another location, though I'm not as convinced as Jawed that this other location is required to be in external memory. 720p is slightly more than 7MiB for color and Z, that leaves 3MiB for incompressible tiles. So that's enough if one in ten tiles is incompressible (assuming 4xAA).

Partitioned rendering is especially easy as WGF2.0 requires the GPU to be able to write out the geometry stream after the GS, so no shader work has do be done twice, just the triangle setup.

DemoCoder · May 13, 2005

Don't you mean write out the post-transform vertex stream? GS->VS->PS, it's the VS that needs to be performed twice. Once to get post-transformed vertices, and again later to setup all the per-vertex inputs to the PS.

Having to do tiling yourself seems very annoying, unless Microsoft has some XNA tools to make it easier. It's annoying enough that I think dev's won't like it, since it will fracture the desktop market. On GPU X, I can just render the whole scene, but on GPU Y, I must do manual tiling? Wha?!?

On consoles this mentality works, but a desktop GPU? It better be able to handle IMR-style code and make such tiling automatic with no developer code needed, otherwise competing IHVs will make a big deal out of how difficult this GPU is to program.

Mintmaster · May 13, 2005

PC-Engine said:
Console games running in 1920x1080 HDTV resolution will require the same fillrate as a PC part...

Don't forget that there are few displays out there capable of 1080p, and 1080p is half the framerate anyway (30 fps) from what I know, so 1080i should give a better image in motion.

In other words, you only need 1920x540. That's only a few more pixels than 1152x864, and about half as many as 1600x1200.

I wonder if we'll see a re-emergence of interlaced display modes on computer monitors?

DemoCoder · May 13, 2005

Mintmaster said:
PC-Engine said:

Console games running in 1920x1080 HDTV resolution will require the same fillrate as a PC part...

Click to expand...

Don't forget that there are few displays out there capable of 1080p, and 1080p is half the framerate anyway (30 fps) from what I know, so 1080i should give a better image in motion.

No, 1080p @ 60Hz was added to ATSC, and upcoming HD-DVD formats (Blu-Ray) are slated to support it. Like I told other people, this year, CES 2005 was the year of 1080p. Everyone was showing off 1080p displays and claiming 10000:1 CR. Panasonic, Samsung, LG, Sony, Fujitsu, Phillips, you name it, they had 1080p on view. 1080p sets are available now, and the first Samsung 1080p DLP sets will arrive in June for a lower introductory MSRP than the previous 720p (HLP5065W, etc) Within the lifespan of the R500, it is reasonable to assume a few million 1080p sets are going to be sold.

Because of the way that HD displays (primarily DLP, but LCD too, not PDP) are produced via semiconductor-like processes, resolution is being bumped up as the process shrinks. In 2 years, it is unlikely that TI will even manufacture 720p DMD chips anymore as they will move most of their production over to xHD3 eventually. If in two years from now, TI is still producing the HD2, it would be like Intel and AMD selling chips on .25 micron. That means 1080p displays will only get cheaper, and it will get harder and harder to find new sets with 720p, just like 480p PDPs and LCDs sets are on their way out, after only 3 years on the market. Samsung's 1080p DLP is coming out at $3299, whereas their flagship 720p DLP last year came out at $4199.

Now, that said, I don't think we need to worry about 1080p on consoles, but my point stands about a desktop GPU - it has different design requirements than a console. You just can't take the R500 out of the Xbox360 and sell it as a PCI card. The fixed nature of consoles is what makes eDRAM at this juncture feasible IMHO.

Jawed · May 13, 2005

nAo said:
What is your definition of AA samples?
If these AA sample are going of chip how much bw do you need to read/write them?

Colour + Z + coverage mask, 9 bytes.

Method and apparatus for video graphics antialiasing using a single sample frame buffer and associated sample memory

I can't tell you how much bandwidth, because that's where the lossless compression comes in.

Jawed said:
Jawed said:

On a graphics card Local Memory is the graphics card's memory. In Xbox 360 it's the area of system RAM usable by the GPU. Apparently 256MB of XBox 360 RAM is dedicated to the GPU.
Jawed

Click to expand...

It should be a UMA architecture, just like first XBOX, AFAIK

Well I've seen two figures for the memory in XB360, 512MB total, and 256MB (as a portion of the 512MB of UMA) for the GPU. Typically, AA samples are going to consume way more memory than the EDRAM can hold. The only place for them is off-GPU memory.

Can't wait till we get some real detail on XB360.

Jawed

Xmas · May 13, 2005

DemoCoder said:
Don't you mean write out the post-transform vertex stream? GS->VS->PS, it's the VS that needs to be performed twice. Once to get post-transformed vertices, and again later to setup all the per-vertex inputs to the PS.

Erm, no, actually it's VS->GS->PS. Tessellation got dropped.
The post-transform/GS output stream contains all the necessary data, it's the triangle setup that needs to be done as many times as you have screen partitions (and triangle setup is usually never the bottleneck).

Having to do tiling yourself seems very annoying, unless Microsoft has some XNA tools to make it easier. It's annoying enough that I think dev's won't like it, since it will fracture the desktop market. On GPU X, I can just render the whole scene, but on GPU Y, I must do manual tiling? Wha?!?

On consoles this mentality works, but a desktop GPU? It better be able to handle IMR-style code and make such tiling automatic with no developer code needed, otherwise competing IHVs will make a big deal out of how difficult this GPU is to program.

Who said manual tiling? Even today a driver could do automatic tiling, though there would be no point without eDRAM except allowing ridiculously large render targets. What I meant to say is that tiling will be particularly easy/cheap with WGF2.0 chips, because they need to be able to store the post-transform vertex stream anyway.

Joe DeFuria · May 13, 2005

It is interesting to me that given a 65nm process, I would estimate it should be possible to include enough eDRAM on a PC chip to finally cover the "standard" 1600x1200 resoultion. This would have similar frame buffer characteristics / abilities as Xenon's 90nm chips at 1280x720 resolution.

R600 nugget...

Joe DeFuria

SecretFire

Natoma

Entropy

DemoCoder

PC-Engine

Jawed

Jawed

nAo

Nutella Nutellae

Dave B(TotalVR)

Jawed

_xxx_

nAo

Nutella Nutellae

Xmas

Porous

DemoCoder

Mintmaster

DemoCoder

Jawed

Xmas

Porous

Joe DeFuria

Similar threads