The most Detailed Tech Information on the Xbox360 yet

Do you know where the vertex data, for vertices that are sidelined by the current tile, gets stored? On-GPU cache, or system memory?

Presumably this is actually a huge amount of data so would have to be system memory?

Is the API available publically?

Jawed
 
The advantage of the system in R500 is consistency.

Bandwidth can never hamper fillrate, doesn't matter if your ZBuffering, Alpha Blending, AA is on or off you get 8 pixels per clock and the bandwidth is available to exactly sustain that.

The chip supports tiling to allow backbuffers to exceed the EDRAM size limitation, there is obviously a cost incurred copying from the EDRAM to system RAM, this coupled with some redundant geometry work (and there is logic on the chip to reduce this) makes up the 5% or so overhead ATI is claiming for AA.

It's a really interesting design in a number of resects, IMO the way the EDRAM is used is similar to Flipper although obviously more advanced. The chip has a number of other interesting features that ATI/MS haven't discussed yet.
 
ERP said:
The advantage of the system in R500 is consistency.

Bandwidth can never hamper fillrate, doesn't matter if your ZBuffering, Alpha Blending, AA is on or off you get 8 pixels per clock and the bandwidth is available to exactly sustain that.

If and only if you're backbuffer fits totally in eDRAM, you aren't using render-to-texture, and you aren't using HDR. If not, you not only take a bandwidth hit from having to copy tiles around, but you take a CPU/Vertex Shader hit from needing to tile the scene, and you lose coherency with respect to texture fetches and cache, since tiling is a huge state change.

So I doubt even the R500 will reach its peak fillrate given most workloads. If they are truly set on running at 720p 2x-4xFSAA by default, then they missed their design point and should have designed enough RAM to target that resolution. Shouldn't the design point have been enough RAM for 720p FSAAx2 HDR?
 
DemoCoder said:
ERP said:
The advantage of the system in R500 is consistency.

Bandwidth can never hamper fillrate, doesn't matter if your ZBuffering, Alpha Blending, AA is on or off you get 8 pixels per clock and the bandwidth is available to exactly sustain that.

If and only if you're backbuffer fits totally in eDRAM, you aren't using render-to-texture, and you aren't using HDR. If not, you not only take a bandwidth hit from having to copy tiles around, but you take a CPU/Vertex Shader hit from needing to tile the scene, and you lose coherency with respect to texture fetches and cache, since tiling is a huge state change.

So I doubt even the R500 will reach its peak fillrate given most workloads. If they are truly set on running at 720p 2x-4xFSAA by default, then they missed their design point and should have designed enough RAM to target that resolution. Shouldn't the design point have been enough RAM for 720p FSAAx2 HDR?


Obviously there is a cost for the copy, but it's independant of complexity. One thing I'm not clear about is whether you can render to the memory while a resolve is in process.

The texture cache impact is minimal because in general the tiles are large and it only affects polygons that span tiles.

There is logic to support the tiling, which significantly reduces wasted VS work.

You can use the 32 bit HDR mode which I think most developers will go for (unless there are significant visual artifacts), or get half the fill rate with fp16.

So no it won't hit its peak rate, but it will get a lot closer than something fighting for external memory bandwidth.

IMO I think the original target was 640x480x4AA, until HD became a marketing bullet point, I'd also rather all the manufacturers left the decision to support HD up to the developers. Personally I think I'd rather have 3x the flops per pixel than 3x the pixels, but I don't really see myself playing games on a HDTV anytime soon (and I own 2).
 
DemoCoder said:
If not, you not only take a bandwidth hit from having to copy tiles around,

I'm not seeing that myself - as far as I see there is no "copying tiles around"; a tile is rendered to the EDRAM, downsampled and posted to system RAM - this only happens once whether there is a full frame or in the EDRAM or multiple tiles, except in the case there is multiple tiles is just happens more times per frame.

but you take a CPU/Vertex Shader hit from needing to tile the scene

Not sure that is the case either, since AFAIK the geomtry that isn't in the current frame tile is saved.
 
ERP said:
IMO I think the original target was 640x480x4AA, until HD became a marketing bullet point, I'd also rather all the manufacturers left the decision to support HD up to the developers. Personally I think I'd rather have 3x the flops per pixel than 3x the pixels, but I don't really see myself playing games on a HDTV anytime soon (and I own 2).

Agreed, totaly
 
10:10:10:2 is not an HDR format IMHO. HDR means "HIGH dynamic range" 10:10:10:2's dynamic range is only 4x that of 8:8:8:8. And 2-bit alpha? So you get the ability to blend slightly overbright samples, much like Nvidia and ATI used to advertised slightly increased DX8 pixel shader ranges ([-1,1] or [-8,8]) With FP16, you get 10bits of precision and a huge range. The minute you take an HDR lightprobe and render it into a "FX10" framebuffer you've lost a huge amount of information, and that means either highlight detail loss or shadow detail lost, thus defeating much of the purpose of HDR framebuffers in the first place. At best, FX10 can represent a scene with a -1 to +1 f-stop exposure on each side, but real world scenes span a much wider range than that.

BTW, what will you do with 3x the FLOPs per pixel? Let's say you have tripple the R500's ALUs per pipe. What would you use all that pixel shading power for? Procedural textures? Those are not going to look good. There is a limit to what the pixel shader model can do for lighting and realistically, if you want to simulate more advanced models, you need more fillrate, not just shader ops. Pixel shaders cannot communicate with one another nor compute interactions between surfaces unless they have been written and turned into a texture already.

Sorry, but I'll take 3x the pixels at this point. Not just because it helps with global shadowing and lighting algorithms, but because I prefer higher resolution. I don't run my PC games in 640x480, even if that would guarantee me a more stable framerate, because frankly, I've had my fill of NTSC resolution.
 
Well, it's true that the costs are the same, the question is not whether it is better, but whether it enables the developer to do something useful with
it. Having DX8 pipelines that supported extra precision ([-1,1] or [-8,8]) was "better" but it was not "better enough" to support a new range of algorithms that need the slightly better precision.

HDR enables tone mapping techniques, I fail to see what "FX10" buys me in this regard. As far as I can tell, it gives me one extra bit of overflow and underflow. It means if I go the Debevec route and make lightprobes for usage throughout my map, most of the data will be thrown away as soon as it hits the framebuffer.

Don't you feel that calling this buffer format "High Dynamic Range" is a little bit disingenuous?
 
DaveBaumann said:
Not sure that is the case either, since AFAIK the geomtry that isn't in the current frame tile is saved.

Well then look at it this way, the CPU has to partition the tiles, which eats cycles and bandwidth. Either way, its a hit that you don't take if you have enough RAM to hold the whole frame.
 
DaveBaumann said:
I'm not seeing that myself - as far as I see there is no "copying tiles around"; a tile is rendered to the EDRAM, downsampled and posted to system RAM - this only happens once whether there is a full frame or in the EDRAM or multiple tiles, except in the case there is multiple tiles is just happens more times per frame.
What about rendering to textures and the like? I envisage post-processing and more advanced rendering techniques being somewhat restricted by XENOS sharing the system-wide bandwidth. Or is this just not seen as heavy-use requirement and able to be absorbed into the existing rendering pipeline?
 
I already mentioned RTT which is what I was getting at. With traditional GPUs, RTT doesn't neccessarily require an extra copy from the backbuffer to texture. With the XB, RTT requires an extra copy from BB to main ram, where it will be used as a texture. Now Dave is saying this would have happened anyway, but it is the lack of memory that is forcing the copy in the first place. Yes, maybe the R500 can leave the rendered texture in eDRAM, but then the GPU is doing texture loads from eDRAM and the eDRAM has even less available memory.

Ever since the first leak months ago of only 10mb eDRAM I've been disappointed. Only 2.5x the PS/2's VRAM and that's using a much more advanced process and exotic architecture to boot.

I guess its like not having 4 CELLs for the PS3. Disappointing.
 
DemoCoder said:
Don't you feel that calling this buffer format "High Dynamic Range" is a little bit disingenuous?

At the moment I haven't heard it being described to me as a "High Dynamic Range" buffer format, it has been said to me that it has a higher range than 32-bit, which is the case. This is evidently not the only format that is supported in the frame buffer, but as best I can see this is probably the default format.

Well then look at it this way, the CPU has to partition the tiles, which eats cycles and bandwidth.

Why does the CPU partition the tiles? This appears to be a function of the graphics processor.
 
ERP called it a 32-bit HDR mode.

So you're saying the R500 is a true TBR (I assume not TBDR) and can sort post-transformed geometry in HW?
 
DemoCoder said:
DaveBaumann said:
Not sure that is the case either, since AFAIK the geomtry that isn't in the current frame tile is saved.

Well then look at it this way, the CPU has to partition the tiles, which eats cycles and bandwidth. Either way, its a hit that you don't take if you have enough RAM to hold the whole frame.

i think HD is biting somebody in the butt.. all of a sudden instead of 'better pixels' we're back to the 'more pixels' agenda, as if the history of computer graphics to date has taught us squat. well, consider it another free lession in 'why marketting should not be allowed to influence product designs'. ..damn, and that just when i was starting to like that machine for the flipper-like approach..
 
DemoCoder said:
Now Dave is saying this would have happened anyway, but it is the lack of memory that is forcing the copy in the first place.

No, I was talking about the operation from the point of view of the frame buffer requirements being greater than the EDRAM.
 
DemoCoder said:
BTW, what will you do with 3x the FLOPs per pixel? Let's say you have tripple the R500's ALUs per pipe. What would you use all that pixel shading power for? Procedural textures? Those are not going to look good. There is a limit to what the pixel shader model can do for lighting and realistically, if you want to simulate more advanced models, you need more fillrate, not just shader ops. Pixel shaders cannot communicate with one another nor compute interactions between surfaces unless they have been written and turned into a texture already.

High quality shadow map filtering seems like a good use.
 
Back
Top