HD problems in Xbox 360 and PS3 (Zenji Nishikawa article @ Game Watch)

ERP said:
I personally have a soft spot for Xenos because it's architecturally interesting, and in some ways extremely clever.
Couldn't you say the same for the PS2? That didn't make it more powerful...
 
predicate said:
Couldn't you say the same for the PS2? That didn't make it more powerful...

The PS2 is pure brute force, there is some elegance in that, but Xenos is actually clever.
 
ERP said:
MS is attempting to address developers biggest complaints about Xbox.
Sony's doing the same by taking a PC graphics part and putting it in PS3.
Nintendo did the same thing with the GameCubes memory performance vs N64.

In almost all cases the manufacturers seem to over react.

Some day I would like what would be a really good console from a dev POV (not given unlimited performance/Ram/BW....).
 
Mintmaster said:
-"Lens effect, refraction, HDR effects such as bloom and glare" are all things which are unaffected by tiling. You have to resolve your scene and write it to main memory before you can do any of these effects. It sits as a whole image there.
To check the original text I browsed the article again and it seems something went wrong at the writer's side. Basically these bullets in tiled rendering demerits
# Lens effect, refraction, HDR effects such as bloom and glare, and other frame buffer filtering cause overlapped drawing near tile boundaries.
# Objects that cross boundaries can't use the cache efficiently.
# CPU L2 cache locking is practically unusable.
were replaced with a different paragraph because of unknown reason. Anyway, the new paragraph is literally like this:
Besides, in special effects that recycle the content in the Z-buffer side of a rendered frame, such as Depth of Field simulation, pseudo-subsurface scattering and pseudo-light scattering simulation, the Z-buffer used to render the previous frame is cleared in rendering each tile. To prevent it, you have to save the depth value in a buffer separately prepared in the shared memory using MRT. It is equal to writing a Z-value twice, which means extra memory-bandwidth consumption. As this double writing is unnecessary when rendering all at once, it can be referred as one of demerits in tile rendering.
Do you think this is more valid than the old points or not?

Mintmaster said:
-I don't know what the heck the "Developer C" quote is talking about.
This even when I was translating it I had no idea either why he says that when they are supposed to use the upscaler so I hope someone can solve it.

Mintmaster said:
I don't see how geometry can be that big of a deal, either.
This image in Ninety-Nine Nights (N3) is in the article as an example.
http://www.watch.impress.co.jp/game/docs/20060426/3dhd10.htm
Since I have no idea of specific numbers of geometry these massive crowd games require, I'd like to ask developers about them.
 
Besides, in special effects that recycle the content in the Z-buffer side of a rendered frame, such as Depth of Field simulation, pseudo-subsurface scattering and pseudo-light scattering simulation, the Z-buffer used to render the previous frame is cleared in rendering each tile. To prevent it, you have to save the depth value in a buffer separately prepared in the shared memory using MRT. It is equal to writing a Z-value twice, which means extra memory-bandwidth consumption. As this double writing is unnecessary when rendering all at once, it can be referred as one of demerits in tile rendering.

What's the big deal with exporting a completed z-buffer when a frame is finished for use in the next one? I can only assume I know so little about what they're doing, that I'm missing 99.99% of what needs to be done there.

Regarding geometry, shouldn't only geometry lying across two tiles (which, in a scene such as that, might be several dozen, but is a relatively minor fraction of the total anyway) need to be resubmitted and transformed? (Unless your engine isn't built for tiling and you end up doing more work due to that)
 
one said:
This image in Ninety-Nine Nights (N3) is in the article as an example.
http://www.watch.impress.co.jp/game/docs/20060426/3dhd10.htm
Since I have no idea of specific numbers of geometry these massive crowd games require, I'd like to ask developers about them.
Funny thing is that I looked up N3 images and found that very same pic before posting in this thread. In fact, that sort of scene is better suited for tiling than just one super-high poly object in the screen with one shader. For something like this scene, you just need bounding boxes or spheres on each character, and then you find their distance from the planes separating the tiles. Very simple test.

A good 3D engine will already do something similar for the four planes bounding the viewing frustum. If an object is guaranteed completely outside, don't flag it to be sent to the GPU. For 720p w/ 4xAA (3 tiles), you simply add two more planes, and instead of one boolean flag, there will be three marking possible visibility in each tile. Now just repeat your normal rendering procedure for each tile.

I'm not even talking about using predication here. Some geometry will be sent twice, but that should only be a problem if you have huge batches of polys. If you do, then you can afford to split them, and do the same thing as above. Or you can do predicated tiling to figure out where they go.

The reason this is not so simple to just tack onto existing game engines is that they may assume you're only going to send everything down once. They may not get everything ready for rendering beforehand, and might do some things on the fly without storing the sequence of draw commands and renderstate changes. It's the same reason that a Z-only pass or front to back object sorting isn't employed all the time (I think). If you know about tiling beforehand, though, it's not so hard to put into your engine.
one said:
Basically these bullets in tiled rendering demerits were replaced with a different paragraph because of unknown reason.
Interesting. Maybe they realized their mistake, but unfortunately only to replace it with another one.
one said:
Do you think this is more valid than the old points or not?
Nope, not at all.

First of all, some of these effects are somewhat incompatible with AA anyway. Secondly, even if you needed the full resolution of the unresolved Z-buffer, you're talking about the transfer of 15MB per frame. That's 4% of your memory bandwidth at 60fps. Finally, a traditional architecture also needs the Z-buffer data for these effects, and so it would have three options. One is to disable z compression when rendering the scene, so that the z-buffer can be read by the pixel shader for these effects. This would massively increase bandwidth usage while rendering the scene. The second option is to decompress the z-buffer into a texture, which may not even be possible, and moreover it requires more bandwidth than the copying they're mentioning for Xenos. The third option is using MRT, in which case you merely need more tiles, and bandwidth is again the same as on a non-tiled architecture.

Summary: This isn't a disadvantage at all, and may even be an advantage.
 
Last edited by a moderator:
one said:
This image in Ninety-Nine Nights (N3) is in the article as an example.
http://www.watch.impress.co.jp/game/docs/20060426/3dhd10.htm
Since I have no idea of specific numbers of geometry these massive crowd games require, I'd like to ask developers about them.

PS2 has 1 game with 100,000 polygons for only animated soldiers but I do not know the name of this game, I am sorry. This information is in Gamasutra. So I think Xbox360 can have much more.
 
I'd always thought a game like N3 would be mostly using imposters for the characters... In which case tiling would have practically zero geometry hit...?
Is this what the cache locking is refering to? rendering imposter quads dynamically?

sigh I need sleep.
 
Mintmaster said:
B) PS2's eDRAM was for both the framebuffer and textures. XB360 has up to 512 MB for texture memory.
Last I checked PS2 games mostly used mainmemory for textures. People keep trying to portray the fact GS textured directly from eDram as a limitation - yet reality is it was its greatest strength - free render to texture instead of costly resolves to external memory - aside for free alpha blending, that was the main graphical highlight of vast majority of PS2 games.

If for example, you redesigned only the PS2 memory architecture to work like 360, end results would come off noticeably weaker(before anyone asks, I am arguing 360 memory layout doesn't fit PS2 hw, nothing to do with 360 itself).
For me, the 'ideal' memory arch. for PS2 would have been what we got in PSP 5 years later (ironically though PSP also introduced a lot of changes to graphic subsystem that IMO negate certain advantages of such a configuration).

"Lens effect, refraction, HDR effects such as bloom and glare" are all things which are unaffected by tiling. You have to resolve your scene and write it to main memory before you can do any of these effects. It sits as a whole image there.
You should rephrase that - they ARE affected by tiling(see the argument about SPEs doing these a few weeks ago), but with 10MB eDram there's a good chance you can avoid having to tile when performing them on Xenos.

Acert93 said:
On the other hand the area it makes 'more' sense is bandwidth in general.
What nAo was pointing at is that bandwith bottleneck is somewhat smaller consideration now then it was when PS2 came out - hence why he feels it makes less sense.
And I tend to agree with him - PS2 generation was all about rendering cheap&fast pixels, as well as a drastic jump in polygon density(but not vertex shading complexity) from standards of that era - ratio of math ops before hitting memory was respectively tiny.
 
Last edited by a moderator:
ERP said:
If they didn't have EDRAM they'd need a 256bit bus or two bus' to make any sort of high def rendering practical.
I'm not sure I agree with this..;)
Opaque pixels are likely to be processed with long shaders, hence most of the available per pixel bandwidth would be used to fetch textures, not color + z.
Tiling is interesting, but I've yet to see real evidence that it actually reduces transistor counts at a given performance level.
Just to make it clear, I was not advocating any kind of tiled rendering GPU, I was thinking that a small on chip tile cache would use much less transistors than n MBytes of edram.
I'd like to use those transistors for more ALUs..
I also believe that next gen consoles CPUs might be very good at tiling geometry ;)
Basicly one would render all the opaque pixels as in current GPUs (that tile cache I'm talking about would not be even used in this case) and all the transparent pixels in tile order.
A relatively cheap way to achieve near-to-theoretical fillrate without devoting tons of transistors to edram.

I think it's way to early to be declaring MS's or Sony's graphics chip choices bad or good, we have to wait and see.
I'm not advocating Xenos nor RSX, I was just expressing an opinion on a hypothetical GPU :)

Marco
 
pc999 said:
Some day I would like what would be a really good console from a dev POV (not given unlimited performance/Ram/BW....).
Then you would get as many consoles designs as there are developers out there (unless senior devs "convince" their underlings that their design "is best for them as well"... ;)).
 
nAo said:
Just to make it clear, I was not advocating any kind of tiled rendering GPU, I was thinking that a small on chip tile cache would use much less transistors than n MBytes of edram.
I'd like to use those transistors for more ALUs..
Ken Kutaragi: "If we tried to fit enough volume of eDRAM onto a 200-by-300-millimeter chip, there won't be enough room for the logics, and we'd have had to cut down on the number of shaders. It's better to use the logics in full, and to add on a lot of shaders."

nAo said:
...I was just expressing an opinion on a hypothetical GPU

Hmm....
 
Fafalada said:
Last I checked PS2 games mostly used mainmemory for textures.
Interesting. I always thought 2MB seemed like too little, but this is what I kept reading about PS2. The original 3DFX Voodoo gave you decent graphics with only this much memory so I figured it was possible.
You should rephrase that - they ARE affected by tiling(see the argument about SPEs doing these a few weeks ago), but with 10MB eDram there's a good chance you can avoid having to tile when performing them on Xenos.
Well from what I've read in the B3D article, Xenos can only texture from the GDDR3, and not from the eDRAM. Personally I think this would have been a useful feature, just as you're saying about PS2, but Xenos needs to transfer the framebuffer to main memory for random access in the pixel shaders.

That's why I said tiling doesn't affect those effects. You have to assemble the full scene first before post-processing, and whether you do that tile by tile or all at once doesn't matter. But yeah, the final resolved 720p image is only 3.7MB, so there shouldn't be any problem in applying the effect itself.
 
nAo said:
I believe edram makes less sense this generation compared to the previous one.
Most of the time we need to render slow opaque pixels and fast relatively simple transparent pixels.
I'm not really convinced about that.

Empirically, if that was the case, I'd expect to see a much lower hit for antialiasing than what we see in games on the PC (where cards have twice the BW per ROP/shader unit), especially in scenes where there aren't many transparent pixels on the screen.

Not only that, but this presentation from Sweeney mentions drawing 10M pixels per frame in Gears of War due to "multiple rendering passes per object and per-light". That obviously means alpha blending.

I'm not sure why they're multipassing, but my guess is they can better control the number of pixels that each light affects this way. I have some ideas on neighbourhood transfer using PRT that work in this way to avoid N^2 complexity.

Fafalada said:
What nAo was pointing at is that bandwith bottleneck is somewhat smaller consideration now then it was when PS2 came out - hence why he feels it makes less sense.
And I tend to agree with him - PS2 generation was all about rendering cheap&fast pixels, as well as a drastic jump in polygon density(but not vertex shading complexity) from standards of that era - ratio of math ops before hitting memory was respectively tiny.
I dunno, I just don't see how the bandwidth usage per pixel will stay the same if we want to advance graphics. Indeed, math ops will continue to go up, but to say the number of pixels drawn and textures accessed (for a given resolution) will decrease or stay constant is dreaming, IMHO. Smoke/fog/fire/dust/fur/grass will always look better with more pixels. Post processing is used more nowadays, and it needs plenty of bandwidth. AA needs bandwidth. HDR needs bandwidth. I'm not saying we need PS2 levels of BW per screen pixel, as the original XBox has proven with one sixth the BW. But the latter isn't as good a figure as we could use.

Maybe I'm wrong. I've always been bewildered how devs can make graphics cards chug so much even though they have such incredible power. I kept thinking they just wrote horrible code, but seeing it again and again makes me think they must be doing something useful that gobbles the power. Regarding bandwidth, IHV's obviously make their decisions for a reason, and stripped down value cards with half the bus width show notable performance drops too.
 
Last edited by a moderator:
Mintmaster said:
Empirically, if that was the case, I'd expect to see a much lower hit for antialiasing than what we see in games on the PC (where cards have twice the BW per ROP/shader unit), especially in scenes where there aren't many transparent pixels on the screen.
I have completely different empirical data..but you know, closed platforms are different ;)
Not only that, but this presentation from Sweeney mentions drawing 10M pixels per frame in Gears of War due to "multiple rendering passes per object and per-light". That obviously means alpha blending.
Not just alpha blending, z-pre passes and shadow maps..
I dunno, I just don't see how the bandwidth usage per pixel will stay the same if we want to advance graphics. Indeed, math ops will continue to go up, but to say the number of pixels drawn and textures accessed (for a given resolution) will decrease or stay constant is dreaming, IMHO. Smoke/fog/fire/dust/fur/grass will always look better with more pixels.
That's definitively true, but I'm not saying edram is not useful, I'm saying I can live without it and I'd prefer to spend the same amount of transistors on more imo useful features.
Post processing is used more nowadays, and it needs plenty of bandwidth.
though on Xenos edram is not helpful here..
AA needs bandwidth. HDR needs bandwidth.
Gimme more compression..:)
I kept thinking they just wrote horrible code, but seeing it again and again makes me think they must be doing something useful that gobbles the power.
Regarding bandwidth, IHV's obviously make their decisions for a reason, and stripped down value cards with half the bus width show notable performance drops too.
I was speaking from a console dev perspective, in the next 2 or 3 years you will be surprised from what this half bus GPUs can do.. no doubts about it :)

Marco
 
Back
Top