The most Detailed Tech Information on the Xbox360 yet

DemoCoder said:
Gubbi, if tiling is so cheap, why didn't ATI use *less* eDRAM, boost yields, lower costs, increase margins, and possibly even put so little that it could fit on the main R500 core?

Clearly, ATI analyzed the issue and tried to put enough to hold 640x480x4xFSAA. There must have been a reason for their decision.

10MB was probably just a sweet spot economically and performance wise.

It appears that they needed 192 ROP processors to meet their performance target (which was a given fillrate with 4x fsaa). Considering that the transistor count for the EDRAM hovers around 150 million, the ROP part of the chip is about *half* the chip (80Mbit DRAM = 80 million transistors+some line amps+repeaters)

So cutting the DRAM part from 10MB to 2MB would only cut cost modestly. But you're right, they probably considered 640x480 the sweet spot, otherwise they could have put more DRAM in.

The entire thing probably has extra ROPs+DRAM columns for rendundancy giving it yields in the high 90s with just a few percent extra die area used.

And considering that going from 1280x720x 2xfsaa to 4xfsaa results in 92-95% the performance, flushes are cheap.

Cheers
Gubbi
 
DemoCoder said:
Jawed said:
In HL-2 X850XTPE is bandwidth limited at 800x600 when you turn on AA.

In B3D's test for AA performance:

http://www.beyond3d.com/reviews/ati/r480/index.php?p=12

X850XTPE is losing 45% with 4XAA.

Aren't you forgetting that every 2-samples requires an extra clock, so going from 2x-4x cuts fillrate in half, regardless of the bandwidth.

To be honest I don't know. The fill-rate here is 78.6 (no AA) versus 46.6MP/s. The test is shader and texture intensive with a stencil-shadow pass. I find it difficult in all these test to differentiate between the effects of fill-rate and bandwidth, when discussing AA. There seem to be too many other unknowns (overdraw, triangle-count etc.)

The XT850XTPE could have infinite bandwidth, and 4xFSAA would still lose ~50%. The RSX could win if its ROPs (doubtful) have been upgraded to write more AA-samples per clock, or if it has more ROPs than the XT850PE. Even though they'd be bandwidth limited, more ROPs would mean less of a fillrate hit for 4xFSAA.

If you can provide some solid numbers for fill-rate hit then please go ahead.

Well 1024x768 with FP16 HDR no AA is the playable limit (60fps) for NV40, and that's 85% of the pixels in 1280x720. 1080p 4xAA HDR looks doomed to me.

Playable limit on what game? I can run Counter-Strike source @ 1280x1024x2x FSAA on a GeForce 6600 above 60fps on most maps.

We're talking about HDR here. HDR is a big performance hit on NV4x.

Jawed
 
It is supposedly less a hit on the RSX/G70. HDR is a fillrate hit on the NV4x, just like 4xFSAA. It's not purely the bandwidth, its the fact that the ROP takes 2 cycles to write 64-bits, just like it needs 2 cycles to write more than 2-AA samples.
 
Gubbi:
The entire thing probably has extra ROPs+DRAM columns for rendundancy giving it yields in the high 90s with just a few percent extra die area used.
Yields above 95% seem a little optimistic for any part with a decent degree of complexity.
 
Jawed said:
We're talking about HDR here. HDR is a big performance hit on NV4x.
Probably for other reasons than bandwidth. Using two 2-channel FP16 textures is much faster on NV40, and some games seem to use quite inefficient ways of doing HDR rendering.
 
Next question...

Since the eDRAM offers so much bandwidth, what other things may we expect? It seems it would be great at doing stencil shadows like in Doom 3, will it be fast enough to do enough samples to make them appear soft? Will the geometry limitations in D3 be less of an issue?

As for HDR, since this seems to be bandwidth intensive and the eDRAM has a lot of bandwidth, why are we not hearing about FP32 blending? If I have understood correctly it can do FP10 and FP16 (10 being default), you would think with all that bandwidth you would want to put it to good use. While it would take extra transistors, it would seem the image quality would greatly improve.
 
Acert93 said:
Next question...
Since the eDRAM offers so much bandwidth, what other things may we expect? It seems it would be great at doing stencil shadows like in Doom 3, will it be fast enough to do enough samples to make them appear soft? Will the geometry limitations in D3 be less of an issue?

Stencil shadows are geometry and fillrate (stencil/z) bound, not bandwidth bound. Even the "bandwidth constrained" NV4x architecture ends up with over 17 Giga zixel/stencil/sec in fillrate tests. For example, see http://www.xbitlabs.com/articles/video/display/geforce6600gt-theory_7.html That's why I said the R500 fillrate is too low for these algorithms, unless, it supports double-pumped Z (which I hear it does)


The biggest gains from eDRAM bandwidth come if you want to do an insane amount of blending. For example, lots of HDR + blend will kill any architecture today.
 
DemoCoder said:
The biggest gains from eDRAM bandwidth come if you want to do an insane amount of blending. For example, lots of HDR + blend will kill any architecture today.

Thanks DemoCoder!
 
DemoCoder said:
Acert93 said:
The biggest gains from eDRAM bandwidth come if you want to do an insane amount of blending. For example, lots of HDR + blend will kill any architecture today.

What's dumble pumped Z?

And didn't you say too much HDR is no good for the R500 just a few pages back?
 
Well, the assumption is, HDR on R500 yielded more tile swapping. I am reserving judgement as to how "cheap" it is for developers to use tiling when they run out of memory. Also, FP16 exceeds the interconnect bandwidth, so R500 won't reach peak fillrate with FP16. If it had enough RAM to hold a 720p HDR frame, it would reach peak HDR fillrate.

On the other hand, it will still be much faster than the RSX at FP16 HDR blending, since the blending can be done "on-chip" on the eDRAM daughter-chip.
 
DemoCoder said:
On the other hand, it will still be much faster than the RSX at FP16 HDR blending, since the blending can be done "on-chip" on the eDRAM daughter-chip.

Well that is a good relative comparison. If it is faster at FP16 and can do go for an even fast FP10 with tradeoffs (speed for quality), well, that sounds like good options to me.

Always good to give devs options and let them choose what is best for their title.

It will be interesting to see how well the RSX does at 128bit (FP32?) pixel percision for HDR; i.e. will it be fast enough to actually make it a feasible feature or is this a check box type item developers would be NUTS to use. Features are nice, but they do need to be fast enough to be usable.

Sony made a big deal out of HDR and 1080p, it will be interesting to see how both of those play out.
 
I don't even think the RSX supports true 128-bit framebuffers. I think it is emulated. There is no good reason to use one except maybe some GPGPU algorithms.
 
aaaaa00 said:
How's that?

Say that FP10 is 7e3 bits per component.

1e0 = 1
127e7 = 16,129

That's 63x the dynamic range of 8:8:8:8, at the same speed. Tradeoff is loss of accuracy, but its up to the developer what they want to use. Yes it supports blending.

True, it's not as good as FP16, but it's twice as fast. Depending on what you're doing, that's good enough.

The tile size can bet set by the developer, its up to them how they want to tune it.

But since lighting terms are additive, isn't accuracy almost as important as dynamic range? HDR is also about preserving detail in bright areas, not just preserving the fact that they're bright.
 
But since lighting terms are additive, isn't accuracy almost as important as dynamic range? HDR is also about preserving detail in bright areas, not just preserving the fact that they're bright.
this is 3d graphics . You need to strike a balance .

Think of fp 10 like ati's move with fp24 for shaders . Nvidia had fp 16 which was the same speed as ati's fp 24 but didn't look as good and had fp 32 which was better than fp24 but ran much slower . So the fp24 was a good middle ground

In the current hardware on the market fp 16 hdr takes a huge hit and while it would be more accurate than fp 10 the trade off for speed may be large enough to warrent the drop in quality .

I dunno i don't have a rsx or a r500 to judge .
 
But in this case, FP10 is not IMHO the "middle ground" like FP24 was, since like FP16 in pixelshaders, it will add significant artifacts, infact, worse than FP16 in pixelshaders.

Thinking about it a little more, a more reasonable "middle ground" might have been something like 9:9:9:3:2 All 3 components share the same exponent "intensity" since in general, if you have a color where the R component dominates the B component by a huge ratio, you won't perceive it anyway. Of course, blending would have to be modified to support choosing the right exponent. (the ADD operation won't be as simple anymore)
 
But in this case, FP10 is not IMHO the "middle ground" like FP24 was, since like FP16 in pixelshaders, it will add significant artifacts, infact, worse than FP16 in pixelshaders.
have we seen this to be the case ? This is what nvidia was claiming with fp24 to show how much better fp32 was and how games will quickly hit the limits of fp24 and well tis been what 3 years and we haven't seen any games looki like crap because of fp24 . Heck when nvidia stops putting hacks into thier drivers i can't say i've seen a game look bad in fp 16 either .


however at this point in time fp16 on current hardware takes a massive hit (i believe farcry is something like 40% and half life 2 is going to require 400$ cards for hdr )

Thinking about it a little more, a more reasonable "middle ground" might have been something like 9:9:9:3:2 All 3 components share the same exponent "intensity" since in general, if you have a color where the R component dominates the B component by a huge ratio, you won't perceive it anyway. Of course, blending would have to be modified to support choosing the right exponent. (the ADD operation won't be as simple anymore)
mabye but it seems like from what your saying it would be much slower than fp10 and mabye so slow it wouldn't show an improvement over fp 16


Like i said we don't have the hardware and we haven't seen games running on it . For most games it may be fine and for other games fp16 may be needed. I believe the r500 , xenos or whatever supports fp16 also .
 
jvd said:
have we seen this to be the case ? This is what nvidia was claiming with fp24 to show how much better fp32 was and how games will quickly hit the limits of fp24 and well tis been what 3 years and we haven't seen any games looki like crap because of fp24

FP24->FP32 isn't a big increase in perceptable precision, whereas 7bits->10bits is. Think of it this way. Can you tell the difference between a framebuffer that can only store 16 intensities of color vs one that can store 128 intensities? The answer is, yes. As you go up in precision, the PQ improvements diminish. You won't notice an error in the 24th bit of an FP32 value in the majority of circumstances, but you will notice an error in the 7bit. Increasing precision has diminishing returns at the high end.

Look, if you buy ATI's argument that FP16 is bad in pixel shaders compared to FP24, because it introduces artifacts, than doing lots of FP10 blends is going to be substantially worse.

The best analogy I can think of is remember 16-bit color? Remember how Voodoo's "22-bit color" (16-bit frame buffer dithering) was supposed to be a good tradeoff of performance vs speed? That was a 5:6:5 or 5:5:5 RGB format. Blending on that card could cause really bad artifacts. Noticable artifacts. The different between 8-bits per color and 5-bits per color was quite noticable, just as the difference between 7-bits and 10-bits on HDR will.

however at this point in time fp16 on current hardware takes a massive hit (i believe farcry is something like 40% and half life 2 is going to require 400$ cards for hdr )

We are talking consoles, not desktop cards. HDR in the RSX won't take as much of a hit. HDR, like 4xFSAA, caused a fillrate hit. It was unrelated to bandwidth. It will be fixed in future desktop cards, even low-end ones, simply because FP framebuffers were an "afterthought" in these architectures, not a first-class format.

mabye but it seems like from what your saying it would be much slower than fp10 and mabye so slow it wouldn't show an improvement over fp 16

No, it wouldn't be slower, it would just require slightly different blend arithmetic, just like the FP10 format requires different blend arithmetic compared to FX8/12.

The other thing is, is only 2 bits of alpha sufficient? That means, you can only blend with 0, 1/3, 2/3, and 3/3.
 
jvd said:
But in this case, FP10 is not IMHO the "middle ground" like FP24 was, since like FP16 in pixelshaders, it will add significant artifacts, infact, worse than FP16 in pixelshaders.
have we seen this to be the case ?

DeanoC has mentioned that they're getting artifacts even with FP16.
 
Bjorn said:
DeanoC has mentioned that they're getting artefacts even with FP16.
But only in really extreme cases (alpha blending clouds in front of the sun), we will modify the artwork/engine before we change the format.

FP10 gives you much lower dynamic range but probably enough if your careful with art. FP16 is alot better but more bandwidth and you can still run out of precision. FP32 is gonna kill bandwidth just to fix a few corner cases, I can't see anybody who has a choice between FP16 or FP32 frame buffers using FP32....

INT16 is also a pretty good choice if available (a 8.8 fixed point would be almost as good as FP16)
 
Back
Top