R600 nugget...

Jawed · May 13, 2005

Xmas said:
R500 does support partitioned rendering

I interpret this as splitting the full frame into two half-frame render targets. The two halves are joined when the copy to the front buffer is performed. Does that make sense?

But I wonder if XB360 simply scales 1280x720 up to 1920x1080 (we know it down-scales for SDTV) rather than rendering to a larger frame.

as well as storing incompressible framebuffer tiles in another location, though I'm not as convinced as Jawed that this other location is required to be in external memory.

Method and apparatus for video graphics antialiasing using a single sample frame buffer and associated sample memory

Because the sample memory 38 is typically slower than the frame buffer 36 and more memory operations are required to store a multi-sample data set than compressed sample set, the system can be slowed if it is forced to wait for read and write operations to and from the sample memory 94. In order to offload some of these operations, a First In First Out (FIFO) buffer may be included in the sample memory controller 33. The FIFO buffers the stream of memory operations provided to the sample memory controller 33 from the fragment controller 31. This allows the fragment controller 31 to perform other functions in the system while these memory operations are performed.

As it happens, I think the "missing" 3MB of EDRAM may be used to implement this FIFO. This isn't the only time the FIFO is called upon to supply data.

720p is slightly more than 7MiB for color and Z, that leaves 3MiB for incompressible tiles. So that's enough if one in ten tiles is incompressible (assuming 4xAA).

Bottom of the page:

http://www.beyond3d.com/reviews/sapphire/512/index.php?p=01

The EDRAM is obviously only supporting the back buffer, and at only 1280x720, so that's 7MB without AA. In Dave's formula, the back buffer requires an extra 14MB for 4xAA. Dave's formula doesn't take account of AA sample compression though, nor does it account for the fact that not every pixel will have AA samples at any arbitrary instant during rendering... In other words the amount of memory consumed by AA samples will expand and contract over the duration of rendering a frame.

Jawed

nAo · May 13, 2005

Jawed said:
I can't tell you how much bandwidth, because that's where the lossless compression comes in.

I read somewhere current compression lossless schemes on GPUs have 2x or 4x compression ratios most of the time.

Jawed said:
Well I've seen two figures for the memory in XB360, 512MB total, and 256MB (as a portion of the 512MB of UMA) for the GPU.

Official site specs:

Code:

    * 512 MB of 700 MHz GDDR3 RAM
    * Unified memory architecture

Typically, AA samples are going to consume way more memory than the EDRAM can hold. The only place for them is off-GPU memory.

Are you telling me X360 has just 22 Gb/s to share with 3 cores/6 threads, textures and AA samples? I really hope it doesn't work that way,
it would be badly designed/unbalanced, imho

Can't wait till we get some real detail on XB360.

Click here

nAo · May 13, 2005

Joe DeFuria said:
It is interesting to me that given a 65nm process, I would estimate it should be possible to include enough eDRAM on a PC chip to finally cover the "standard" 1600x1200 resoultion. This would have similar frame buffer characteristics / abilities as Xenon's 90nm chips at 1280x720 resolution.

At the same time another IHV use the edram transistors budget to vastly increase their next GPU 'shading power'..who is going to win?

(I'm not being sarcastic, it's a legitimate question..)

Jawed · May 13, 2005

Mintmaster said:
PC-Engine said:

Console games running in 1920x1080 HDTV resolution will require the same fillrate as a PC part...

Click to expand...

Don't forget that there are few displays out there capable of 1080p, and 1080p is half the framerate anyway (30 fps) from what I know, so 1080i should give a better image in motion.

In other words, you only need 1920x540. That's only a few more pixels than 1152x864, and about half as many as 1600x1200.

I wonder if we'll see a re-emergence of interlaced display modes on computer monitors?

Woah! of course, that means 10MB is enough for 1920x540. Good thinking!

Jawed

Joe DeFuria · May 13, 2005

nAo said:
At the same time another IHV use the edram transistors budget to vastly increase their next GPU 'shading power'..who is going to win?
(I'm not being sarcastic, it's a legitimate question..)

Good question indeed.

Xmas · May 13, 2005

Jawed said:
nAo said:

What is your definition of AA samples?
If these AA sample are going of chip how much bw do you need to read/write them?

Click to expand...

Colour + Z + coverage mask, 9 bytes.

Method and apparatus for video graphics antialiasing using a single sample frame buffer and associated sample memory

I can't tell you how much bandwidth, because that's where the lossless compression comes in.

Actually, in this patent there is no coverage mask stored along with the samples. Either all the samples of a pixel can be compressed to a single 8 byte data block (32 bit color, 31 bit Z/stencil, 1 bit flag). Or they can't, in which case they're stored uncompressed (32 byte for 4xAA) in sample memory while pointer information is stored in the framebuffer (32 bit pointer, 31 bit front most Z(/stencil?), 1 bit flag (*))

Once the selected address has been determined at step 14, the method proceeds to step 22. At step 22, the pixel sample set is stored at the selected address in the sample memory. Preferably, the pixel sample set includes a number of color and Z value pairs equal to the number of samples per pixel that are generated via the oversampling scheme. Ordering of the samples as stored determines which color/Z pair describes each sample. In a system with 8 samples per pixel and 32-bit color and Z values, each sample set will require 64 bytes.

(*) This is how it's described in the patent. However, I wonder whether you'd really need a 32bit pointer. With 19 bits, you could index half a million uncompressed pixels, which is 57% of 720p, certainly more than enough. Then you could store the highest as well as the lowest Z as 22bit value (rounded as necessary), allowing both trivial accept and trivial reject. Stencil would only be stored in sample memory.

There might be a better way, but I somehow feel that 32 bit for such a pointer are a waste of space.

Jawed · May 13, 2005

nAo said:
Joe DeFuria said:

It is interesting to me that given a 65nm process, I would estimate it should be possible to include enough eDRAM on a PC chip to finally cover the "standard" 1600x1200 resoultion. This would have similar frame buffer characteristics / abilities as Xenon's 90nm chips at 1280x720 resolution.

Click to expand...

At the same time another IHV use the edram transistors budget to vastly increase their next GPU 'shading power'..who is going to win?
(I'm not being sarcastic, it's a legitimate question..)

Until 3DMark turns on AA/AF by default, I think we can be certain Nvidia will "win".

Jawed

nAo · May 13, 2005

I don't think wasting tons of edram is the only way to speed up your AA

Jawed · May 13, 2005

Don't be so hasty reading the patent:

FIG. 4 illustrates a flow diagram corresponding to an antialiasing method that requires fewer memory resources than conventional oversampling schemes. At step 50, a pixel fragment is received that corresponds to a stored pixel in a pixel array. The pixel fragment is characterized by pixel fragment data that may modify the currently stored pixel data that characterizes the stored pixel. The fragment may include a color value, a Z value, and a coverage mask, where the coverage mask indicates to which of the samples for the pixel the color and Z values correspond. For example, in an 8-sample/pixel oversampling scheme, a mask of 00000001 might indicate that the color and Z values for the fragment should only be applied to the final sample (sample 7 if the samples are numbered 0-7). A fragment that has complete pixel coverage may be represented by 11111111.

This is the basis for AA sample compression that ATI already uses in R420, and prolly earlier GPUs.

Jawed

Jawed · May 13, 2005

nAo said:
Are you telling me X360 has just 22 Gb/s to share with 3 cores/6 threads, textures and AA samples? I really hope it doesn't work that way,
it would be badly designed/unbalanced, imho

Yes.

Can't wait till we get some real detail on XB360.

Click to expand...

Click here

No, I want something insightful, in the same manner as the R420 Architecture White Paper that I keep linking to:

http://www.ati.com/products/radeonx800/RADEONX800ArchitectureWhitePaper.pdf

Jawed

Xmas · May 13, 2005

Jawed said:
Xmas said:

R500 does support partitioned rendering

Click to expand...

I interpret this as splitting the full frame into two half-frame render targets. The two halves are joined when the copy to the front buffer is performed. Does that make sense?

I think that's how it works.

But I wonder if XB360 simply scales 1280x720 up to 1920x1080 (we know it down-scales for SDTV) rather than rendering to a larger frame.

Scaling might be an option, but don't 1080 HDTV sets usually have their own scaling?

Because the sample memory 38 is typically slower than the frame buffer 36 and more memory operations are required to store a multi-sample data set than compressed sample set, the system can be slowed if it is forced to wait for read and write operations to and from the sample memory 94. In order to offload some of these operations, a First In First Out (FIFO) buffer may be included in the sample memory controller 33. The FIFO buffers the stream of memory operations provided to the sample memory controller 33 from the fragment controller 31. This allows the fragment controller 31 to perform other functions in the system while these memory operations are performed.

Click to expand...

As it happens, I think the "missing" 3MB of EDRAM may be used to implement this FIFO. This isn't the only time the FIFO is called upon to supply data.

This is a very interesting possibility indeed. But I think such a FIFO doesn't need to be particularly large, maybe 1 KiB would be enough.

(*)720p is slightly more than 7MiB for color and Z, that leaves 3MiB for incompressible tiles. So that's enough if one in ten tiles is incompressible (assuming 4xAA).

Click to expand...

Bottom of the page:

http://www.beyond3d.com/reviews/sapphire/512/index.php?p=01

The EDRAM is obviously only supporting the back buffer, and at only 1280x720, so that's 7MB without AA. In Dave's formula, the back buffer requires an extra 14MB for 4xAA. Dave's formula doesn't take account of AA sample compression though, nor does it account for the fact that not every pixel will have AA samples at any arbitrary instant during rendering... In other words the amount of memory consumed by AA samples will expand and contract over the duration of rendering a frame.

For R500, yes. But that kind of AA is not implemented in current chips. Current chips don't use pointer information and indirection, and therefore they have to allocate the space for all samples to allow random access to every pixel location. All sample information is stored in the same buffer, but if a tile can be compressed that means less bandwidth is required. You need a bit-mask that indicates which tiles are compressed, and if it is compressed you only read the first X bytes starting from the tile start address, otherwise you read N*X bytes.

(*)btw. I was talking about tiles above because current chips use tiles for framebuffer compression. But the patent is about compressing single pixels, so if R500 indeed implements this patent, my quoted paragraph above should read pixels instead of tiles.

Xmas · May 13, 2005

Jawed said:
Don't be so hasty reading the patent:

FIG. 4 illustrates a flow diagram corresponding to an antialiasing method that requires fewer memory resources than conventional oversampling schemes. At step 50, a pixel fragment is received that corresponds to a stored pixel in a pixel array. The pixel fragment is characterized by pixel fragment data that may modify the currently stored pixel data that characterizes the stored pixel. The fragment may include a color value, a Z value, and a coverage mask, where the coverage mask indicates to which of the samples for the pixel the color and Z values correspond. For example, in an 8-sample/pixel oversampling scheme, a mask of 00000001 might indicate that the color and Z values for the fragment should only be applied to the final sample (sample 7 if the samples are numbered 0-7). A fragment that has complete pixel coverage may be represented by 11111111.

Click to expand...

This is the basis for AA sample compression that ATI already uses in R420, and prolly earlier GPUs.

Jawed

To me this reads as if the "incoming coverage mask" is the one generated by the rasterizer, i.e. the one that belongs to the fragment that is just being rendered/finished. There is no indication that such coverage mask is actually stored anywhere, nor read back from memory.

Jawed · May 13, 2005

I believe coverage mask AA compression is in use in R420:

The RADEON X800 multi-sample anti-aliasing unit can perform Z tests at 2, 4, or 6 different locations per pixel to determine what proportion is covered by the current triangle. The sample locations are read from a programmable lookup table, and can be varied from frame to frame. Color values are calculated only once per pixel, but up to 6 different colors can be stored for each pixel to handle cases where multiple triangles intersect and overlap. To accommodate the varying number of possible colors stored for each pixel, a special compressed frame buffer format is used. This allows color compression of up to 6:1 in the typical case where most pixels require only a single color value to be stored.

As regards the FIFO, Mintmaster's point about 1920x540 being the frame buffer size for 1080i means that 10MB is just enough for 1920x1080i. But that's a rather peculiar frame buffer in which the vertical resolution is twice the horizontal resolution. That should really fuck up texturing, apart from anything else.

So, I dunno about the FIFO. Another reason for the FIFO not being EDRAM is that the FIFO lives on the GPU, and further reading of the patent suggests to me that it is quite specific that the EDRAM/blend/filter unit is not part of the GPU die at all.

In fact it seems to me, now, that the ROP in R500 is effectively split in two between the GPU and the EDRAM unit. The EDRAM unit, when it "rejects" fragments/AA samples as un-resolvable, passes them back to the GPU for it to generate/blend AA samples, as needed, and put them into Sample Memory.

There's two different kinds of compression under discussion in these patents:

1. compression of fragment/AA sample data, to reduce bus bandwidth consumption twixt GPU and EDRAM

2. compression of AA sample data to reduce the consumption of sample memory

Jawed

AlNom · May 13, 2005

Jawed said:
But I wonder if XB360 simply scales 1280x720 up to 1920x1080 (we know it down-scales for SDTV) rather than rendering to a larger frame.

noobie Question: what's the point of upscaling? :?

Jawed · May 13, 2005

Xmas said:
To me this reads as if the "incoming coverage mask" is the one generated by the rasterizer, i.e. the one that belongs to the fragment that is just being rendered/finished. There is no indication that such coverage mask is actually stored anywhere, nor read back from memory.

The AA sample set for a pixel is in flux for the duration of a frame render. Each time a new visible fragment for a given pixel appears in the ROP, it has to decide how the new fragment's AA samples interact with the existing AA samples for that pixel (which might derive from 2 or more fragments). This can result in the destruction of some AA samples (no longer visible fragment) or the creation of AA samples (the new fragment partially covers an existing fragment's AA sample(s)).

The result is a new set of AA samples. A maximum of 6 in the case of R420. The ROP simply juggles bits in the coverage mask (6-bits wide) to take account of the samples - it all depends on how many visible fragments currently contribute to the pixel's final colour.

Each fragment that's visible in a pixel has its own coverage mask defining which of the 6 AA geometry sample positions it colours.

Jawed

Jawed · May 13, 2005

Alstrong said:
Jawed said:

But I wonder if XB360 simply scales 1280x720 up to 1920x1080 (we know it down-scales for SDTV) rather than rendering to a larger frame.

Click to expand...

noobie Question: what's the point of upscaling? :?

You want to convert the resolution of the picture to the resolution of the display device. To provide the maximum image quality you want to do some filtering while you up-scale.

As someone mentioned earlier, the display can up-scale too.

In the leak diagram for XB360 there's a Video Scaler.

Jawed

Geo · May 13, 2005

Depends on the display device too, of course. Most of the new fixed pixel HDTVs, you really don't have a choice. Are they even making 1080p CRT TVs? I think all of the 1080p TVs I've heard of are DLP/LCD. On those, whatever you throw at it is going to be upscaled to 1080p anyway by the TV itself.

Tho I would still think it would have to look better to show "real" 1080p on a 1080p. I'd be a little disappointed if xbox360 is just upscaling 720p to 1080p instead of doing real 1080p. And really wouldn't see much point in it (since the overwhelming # of TVs that could take advantage of it can do that for themselves, and probably better quality), other than as a misleading checkbox.

Edit: Maybe the scaler is for 720p to 1080i? Most (all?) of the first gen HD TV's couldn't do 720p, but did do 1080i. At least our previous CRT Mits was that way.

Xmas · May 13, 2005

Jawed said:
I believe coverage mask AA compression is in use in R420:

The RADEON X800 multi-sample anti-aliasing unit can perform Z tests at 2, 4, or 6 different locations per pixel to determine what proportion is covered by the current triangle. The sample locations are read from a programmable lookup table, and can be varied from frame to frame. Color values are calculated only once per pixel, but up to 6 different colors can be stored for each pixel to handle cases where multiple triangles intersect and overlap. To accommodate the varying number of possible colors stored for each pixel, a special compressed frame buffer format is used. This allows color compression of up to 6:1 in the typical case where most pixels require only a single color value to be stored.

Click to expand...

There's nothing in this paragraph that hints at coverage masks being stored.

Do you have any information that indicates that R420 or R500 store coverage mask information along with the color information? Storing a coverage mask only makes sense to me in lossy compression algorithms.

As regards the FIFO, Mintmaster's point about 1920x540 being the frame buffer size for 1080i means that 10MB is just enough for 1920x1080i. But that's a rather peculiar frame buffer in which the vertical resolution is twice the horizontal resolution. That should really fuck up texturing, apart from anything else.

That can be fixed by dividing the vertical gradients by two.

Jawed · May 13, 2005

A coverage mask is non-lossy. I don't know why you think it's lossy.

Jawed

Mintmaster · May 13, 2005

DemoCoder said:
No, 1080p @ 60Hz was added to ATSC, and upcoming HD-DVD formats (Blu-Ray) are slated to support it. Like I told other people, this year, CES 2005 was the year of 1080p. Everyone was showing off 1080p displays and claiming 10000:1 CR. Panasonic, Samsung, LG, Sony, Fujitsu, Phillips, you name it, they had 1080p on view. 1080p sets are available now, and the first Samsung 1080p DLP sets will arrive in June for a lower introductory MSRP than the previous 720p (HLP5065W, etc) Within the lifespan of the R500, it is reasonable to assume a few million 1080p sets are going to be sold.

Because of the way that HD displays (primarily DLP, but LCD too, not PDP) are produced via semiconductor-like processes, resolution is being bumped up as the process shrinks. In 2 years, it is unlikely that TI will even manufacture 720p DMD chips anymore as they will move most of their production over to xHD3 eventually. If in two years from now, TI is still producing the HD2, it would be like Intel and AMD selling chips on .25 micron. That means 1080p displays will only get cheaper, and it will get harder and harder to find new sets with 720p, just like 480p PDPs and LCDs sets are on their way out, after only 3 years on the market. Samsung's 1080p DLP is coming out at $3299, whereas their flagship 720p DLP last year came out at $4199.

Cool. I always thought it was strange 1080p wasn't 60 fps, and in fact didn't believe it at first when someone here told me. Seemed kind of pointless to me.

I'm looking forward to the 1080 sets coming out. We almost bought the Panny HD2+ set last christmas, but now that I have an idea about the xHD3 prices I'm glad we didn't. Looks like the pixel shifting (or whatever you call it) is working great for cost reduction in the HD3 and xHD3. I hope the rainbow effect doesn't bother me down the road, though.

BTW, you don't need to tell me how DLP chips are made. I'll be studying MEMS at Caltech in the fall

Actually, I was hoping you'd give some advice around a month ago in my thread when I was deciding between there and MIT, but oh well.

Now, that said, I don't think we need to worry about 1080p on consoles, but my point stands about a desktop GPU - it has different design requirements than a console. You just can't take the R500 out of the Xbox360 and sell it as a PCI card. The fixed nature of consoles is what makes eDRAM at this juncture feasible IMHO.

Yeah, I know. I sort of meant that last line in my previous post jokingly, but forgot to put an emoticon. The fixed nature is also the reason the unified shader architecture is more feasible for the console market right now, IMHO, though for different reasons.

R600 nugget...

Jawed

nAo

Nutella Nutellae

nAo

Nutella Nutellae

Jawed

Joe DeFuria

Xmas

Porous

Jawed

nAo

Nutella Nutellae

Jawed

Jawed

Xmas

Porous

Xmas

Porous

Jawed

AlNom

Moderator

Jawed

Jawed

Geo

Mostly Harmless

Xmas

Porous

Jawed

Mintmaster

Similar threads