What expect from Xenos smart edram except 4xAA

Shifty Geezer · Apr 11, 2006

Have we yet been told what exactly the logic can do? Such as what blend modes are supported, and what instructions? Without that knowledge we can't really guess at exotic uses it might be used for. If it's limited to basic LERPS and ADDs, that won't be much more you can do other than render, particles and AA.

Mintmaster · Apr 11, 2006

I think that's all we expect it to do, but it's a tad more involved than the way you're describing. It first has to interpolate the z values for each sample, but I'm not sure how the parent die communicates the required data to the daughter. Sample coverage is another thing, but that might be done on the parent die. Then it must perform a per sample z test and stencil test, and if it passes, either directly write or read-blend-write the samples.

The point of dividing the dies this way is to find the point in the graphics pipeline where minimal data transmission is needed and minimal logic is needed on the daughter die after recieving the data. The logic will be just enough to perform the basic rendering functions at 32 samples per clock (or 64 samples per clock for z-only).

So you have bunch of compare units, interpolators, incrementers/decrementers, LERP units, and data routing logic. Might be some scanline rasterization hardware too depending on how the z, coverage, and position info is communicated. Nothing fancy.

Titanio · Apr 11, 2006

Mintmaster said:
Say you disable z-buffering and have no textures, and are simply doing alpha blending with the colour buffer. 22.4GB/s is only enough for a maximum of 2.8GPix/s. Add in the z-test, a texture for smoke or fire or whatever, and a less than perfect memory controller, then you drop to well below that. The B3D review of the 7600GT gets ~2.3GPix/s in the single texture alpha blend test without z-test (7600GT has 22.4GB/s BW, and is BW bound in this particular test).

I'm a little confused here. Are these numbers coming from Dave's G73 article, or somewhere else? Some of them are present in the article, but not all as far as I can tell for what you're presenting them as (I see where the 2.3Gpixels figure is coming from, but the only 2.8Gpixels figure I can find in the article relates to color+z fillrate as opposed to color-only).

Jawed · Apr 11, 2006

http://www.beyond3d.com/forum/showpost.php?p=716900&postcount=40

Jawed

Murakami · Apr 13, 2006

Maybe, EDRAM can be used for some fancy motion blur effects, like PS2 one...

czekon · Apr 13, 2006

hope they will start using it for 4xAA finally in every title

Shifty Geezer · Apr 13, 2006

Murakami said:
Maybe, EDRAM can be used for some fancy motion blur effects, like PS2 one...

I don't think so. I don't know how progrmmable the logic is, and whether it can do pixel-shader like arbitary Taps, but I doubt it. I'm guessing it works with tile sized buffers and for each pixel, blends between corresponding buffers. If you could blend between a pixel and neighbours or other pixels from the same buffer, you could do other effects like blurs potentially. I'm not getting that vibe for Xenos though.

czekon · Apr 13, 2006

so it's gonna do only aa then??? ..kind of suck !! damn ATI

London Geezer · Apr 13, 2006

czekon said:
so it's gonna do only aa then??? ..kind of suck !! damn ATI

"Free" 4x AA sucks?

czekon · Apr 13, 2006

yeah it sound great "free 4xaa" but why so many games with 2xaa and sometimes not even in 720p... damn devs

liolio · Apr 13, 2006

I Saw that 1200/700 (whithout)screen weight 7mb in the xenos article.
Could the 3 mb left in the edram could act as a L3 cache for xenon?
Or latencies are to hight and there no improvement to expect over tap in the uma pool?
could it be "caching" some data that Xenon could generate?
i've no ideas of the size need for some physic calculation, r2vb, etc...
But what i want to figure is using the 3mb left of edram for some memexport stuff.

scooby_dooby · Apr 13, 2006

Shifty Geezer said:
I don't think so. I don't know how progrmmable the logic is, and whether it can do pixel-shader like arbitary Taps, but I doubt it. I'm guessing it works with tile sized buffers and for each pixel, blends between corresponding buffers. If you could blend between a pixel and neighbours or other pixels from the same buffer, you could do other effects like blurs potentially. I'm not getting that vibe for Xenos though.

I was under the impression that the EDRAM could perform at least AA, DOF, Motion Blur, Particle Effects and HDR blending.

Graham · Apr 14, 2006

In reguards to DOF/motion blur, wouldn't you:

copy a region from your backbuffer (in main memory) to fill half the edram, then apply your pixel shader logic using however many texture lookups (I'd presume a mix of point and filtered lookups), writing them back to the second half of your edram. Take the internal region of the newly written second half of edram and copy it back to the backbuffer in main memory.

There would be a bit of juggling to get the edges of each region correct (depending on your filter kernel) provided you only split the backbuffer into regions horizontally or vertically (not both), but I don't see it requiring allocating a duplicate buffer, and you effectivly get to apply a post processing filter with very small use of the chips external bandwidth. You'd probably end up using about 2-3x the size of the buffer in both read/write bandwidth? Wouldnt this be quite insignificant compared to the total bandwidth used otherwise?

Hardknock · Apr 14, 2006

We have so many PS3 devs that post here and post great information, wish we had more 360 devs to shed light on more things.

3dcgi · Apr 14, 2006

Xenos' edram does color test, depth test, blending, AA downsample, etc. It's purpose is to remove a bandwidth bottleneck from main memory so it's helping even when AA is off. However, to be utilized to its fullest 4x AA should be on.

ShootMyMonkey · Apr 14, 2006

Your note about dual-ported RAM is interesting, but you're wrong about thinking that it necessarily affects blending speed. Whether it's dual-ported or not, I don't know, but the internal bandwidth available is enough for full speed blending.

As long as the blended fillrate is up to a certain point. Bear in mind that while framebuffer accumulation and blend passes were bread and butter on the PS2, you have more bandwidth between rasterizer and eDRAM on the GS than on Xenos (48 GB/sec vs. 32), and it was rendering at much lower resolution, so the fillrate demands per pass were comparatively much lower.

No question, though, that having eDRAM is that much better for blending as bandwidth and latency pretty much rule the day. I just think people assume too much taking the PS2 as an example of eDRAM utilization.

I was under the impression that the EDRAM could perform at least AA, DOF, Motion Blur, Particle Effects and HDR blending.

You had the wrong impression, then. I have no idea where you got *all* of that. AA, of course... fine... Motion blur... eh... the thing about PS2 motion blur was because it was a simple weighted framebuffer blend with a front buffer copy, which of course, sits in eDRAM, but you dealt with the blend process yourself anyway (not as if the eDRAM there had any sort of functional logic). In the modern world of pixel shaders, people aren't going to be quite so satisfied with something so simple, and several samples from the frontbuffer and/or backbuffer copies are in order, so that pretty much rests on texture memory. Same thing applies to DOF, regarding the post-processing of the backbuffer.

I'm not sure what you mean by eDRAM "performing" particle effects, but if you're simply referring to the blending side of it, then... whatever. HDR blending... mmmm... as long as you're using a natively supported color format, blending is blending, even though some things might run slower or faster and some things will or won't demand tiling. It's not going to do much of anything for you if you use alternative bitpacked color spaces like the 16:8:8 Luv(?) format HS is using.

Mintmaster · Apr 16, 2006

ShootMyMonkey said:
As long as the blended fillrate is up to a certain point. Bear in mind that while framebuffer accumulation and blend passes were bread and butter on the PS2, you have more bandwidth between rasterizer and eDRAM on the GS than on Xenos (48 GB/sec vs. 32), and it was rendering at much lower resolution, so the fillrate demands per pass were comparatively much lower.

Yeah, but the PS2 eDRAM didn't have logic within it. Since it wasn't on a separate die, there were no bus speed restrictions. PS2's GS read the pixels and z values, did the blending and z testing, then wrote values back. For XBox360, one transfer of 4 bytes* can potentially result in a read and write in both the z and colour buffers for each of 4 samples, i.e. 64 bytes of internal traffic. That's a factor of 16 there that can't be ignored in the comparison you're making.

*(plus some positional and z-info, and though I'm not sure how that's compressed and what the average BW becomes, it should be tiny)

darkblu · Apr 16, 2006

Mintmaster said:
For XBox360, one transfer of 4 bytes* can potentially result in a read and write in both the z and colour buffers for each of 4 samples, i.e. 64 bytes of internal traffic. That's a factor of 16 there that can't be ignored in the comparison you're making.

*(plus some positional and z-info, and though I'm not sure how that's compressed and what the average BW becomes, it should be tiny)

IIRC it's not compressed. the 'compression' comes from the fact that you don't multi-sample in your fragment pipeline. but you still have to pass down a full fragment (color, depth, stencil), so the factor is not 16 but 4, i.e. exactly your AA factor, if we don't count the z/rop read-write multiplicity.

Mintmaster · Apr 17, 2006

darkblu said:
IIRC it's not compressed. the 'compression' comes from the fact that you don't multi-sample in your fragment pipeline. but you still have to pass down a full fragment (color, depth, stencil), so the factor is not 16 but 4, i.e. exactly your AA factor, if we don't count the z/rop read-write multiplicity.

Well I got the info from the B3D article:

Dave Baumann said:
All of this processing is performed on the parent die and the pixels are then transferred to the daughter die in the form of source colour per pixel and loss-less compressed Z, per 2x2 pixel quad. The interconnect bandwidth between the parent and daughter die is only an eighth of the eDRAM bandwidth because the source colour data value is common to all samples of a pixel here, and the Z is compressed.

Given that 32GB/s is enough for 8 bytes per pixel at 8 pix/clk, I can see where you're coming from. However, AA needs to determine the Z value for each sample. If I were to guess, I'd say it's probably a depth value plus two slopes for the quad, because slopes are needed anyway to interpolate the Z values. You also need an AA coverage mask and position information for the quad, which is likely 36 bits minimum.

Also, not counting the z/rop read-write multiplicity is not really fair. For any pixel actually written, you must do at least a z read and write per pixel. Moreover, the only reason PS2 had that much bandwidth was so that the worst case was handled fast enough, so that's the case we should look at if we want to compare the figures. The PS2 eDRAM, AFAIK, was 19.2 GB/s read and 19.2 GB/s write for the framebuffer, exactly enough to read and write both z and colour values at 2.4 GPix/s. The remaining 9.6 GB/s was for texture bandwidth.

Anyway, it looks like Xenos needs 8 bytes of data transfer per pixel (which is what the interconnect has available at 4GPix/s), not 4. So the factor is 8, not 16. My mistake. Looking at these PS2 numbers is rather interesting, though. Framebuffer bandwidth for PS3 is less than half of PS2, yet 1080p has seven times the pixels of 640x448. I think PS2's GS had slightly misplaced priorities.

BTW, stencil data transfer isn't needed, since you just need to send the stencil renderstate info before drawing the batch.

Dave Baumann · Apr 17, 2006

Yes, I remember them saying compressed Z and thought it was in the article. I think the Z is already at the multisample coverage level because of the fact that the Heirarchical Z buffer is on the parent die, and will be at the multisample level coverage.

What expect from Xenos smart edram except 4xAA

Shifty Geezer

uber-Troll!

Mintmaster

Titanio

Jawed

Murakami

czekon

Shifty Geezer

uber-Troll!

czekon

London Geezer

czekon

liolio

Aquoiboniste

scooby_dooby

Graham

Hello :-)

Hardknock

3dcgi

ShootMyMonkey

Mintmaster

darkblu

Mintmaster

Dave Baumann

Gamerscore Wh...

Similar threads