Xbox 360: First Look at Final Hardware

Titanio said:
(and I do believe motion blur, DOF etc. are implemented in shaders?).
Perhaps it could be done that way, but the fastest and most straight-forward way of doing it is just plain old multipass rendering. "Motion blur" (or as I like to call it: motion crap) is really just blending the current frame with the previous frame. It's not really blur at all, particulary if the movement inbetween frames is large. True motion blur is notoriously difficult to implement, particulary in the way it is accomplished in the human eye.

Depth of field was done by rendering the same scene from several slightly skewed angles in the old 3dfx T-buffer and then blending the resulting frames (in this case, 4 frames were used). This had the effect that where the difference between the frames were small, the image became sharp. Where the difference was large, it became blurred. I assume that is the way modern games do it as well. Not really sure how this would be done in a shader...

One may want to take note that there's nothing in particular that helps to hide jaggies by these rendering techniques. Obviously jaggies may be less pronounced where several frames overlay and fuzz up the image, but they don't fix the fundamental issue that causes jaggies to appear... On the other hand, there should be nothing on a technical level that prevents xenos from doing these effects WITH AA enabled either. There may be performance issues (particulary with DoF), but that's another issue. :p

I'm a little wary however of the assumption that xenos eDRAM automatically should have "enough" bandwidth to do 4xAA with no performance hit. DRAM is not 100% efficient, depending on how accesses are done to it, it can either run very fast, or it can run like shit. Random access speed is typically terrible, while linear bursts can run very quickly. Unless there are some kind of SRAM write buffers to hide first-access latencies, refresh cycles and page break penalties and such, chances are on-die bandwidth could be significantly less than theoretical maximum, leading to much higher performance hit than 5% when doing 4xAA...

Also, as eDRAM appears to be single-ported (or else MS and ATi would have stated higher bandwidth figures), it means xenos will be locked out when doing framebuffer read/writes to main memory. This will happen after every completed frame/tile, or several times per frame/tile when doing render to texture ops and/or blur, DoF etc... Hopefully that won't stall the rendering pipeline (as it likely won't have to access eDRAM on every cycle when pixel shading and such), but it could be another slight performance issue early on in the console's life. 3rd generation x360 titles probably won't be hindered much at all by stuff like this I'm sure. :D
 
Guden Oden said:
I'm a little wary however of the assumption that xenos eDRAM automatically should have "enough" bandwidth to do 4xAA with no performance hit. DRAM is not 100% efficient, depending on how accesses are done to it, it can either run very fast, or it can run like shit.
How do you think current PC GPUs with one eighth or sixteenth of the bandwidth of EDRAM do 4xAA?

With compression you can save bandwidth.

With pipelining and access to pixels in groups of 4 (or more) you reduce the latency hit per pixel.

Random access speed is typically terrible, while linear bursts can run very quickly. Unless there are some kind of SRAM write buffers to hide first-access latencies, refresh cycles and page break penalties and such, chances are on-die bandwidth could be significantly less than theoretical maximum, leading to much higher performance hit than 5% when doing 4xAA...
The performance hit is due to tiling (framebuffer copy + tiled-geometry re-processing overhead).

Also, as eDRAM appears to be single-ported (or else MS and ATi would have stated higher bandwidth figures), it means xenos will be locked out when doing framebuffer read/writes to main memory. This will happen after every completed frame/tile, or several times per frame/tile when doing render to texture ops and/or blur, DoF etc... Hopefully that won't stall the rendering pipeline (as it likely won't have to access eDRAM on every cycle when pixel shading and such), but it could be another slight performance issue early on in the console's life. 3rd generation x360 titles probably won't be hindered much at all by stuff like this I'm sure. :D
This particular "stall" is for about 2-3% of frame render time, total.

Supposedly Xenos can render to Z even while it is reading out the backbuffer/render-target colour data. So when tiling, which requires a Z-pre-pass, Z can be written while colour is being read.

Jawed
 
Animation

blakjedi said:
Forget tiling blah blah blah... why does the X360 animation in nearly every title except Kameo suck? As a day one purchaser, I am concerned.


I am not sure about final frame-rate but some aspects of elder scrolls 4 has good animation.
 
Titanio said:
On Xenos? I got the impression from Dave's article that it wasn't using colour/z compression.
The connection twixt parent and daughter utilises compressed data. The daughter die, itself, works entirely with uncompressed data.

Jawed
 
Jawed said:
The performance hit is due to tiling (framebuffer copy + tiled-geometry re-processing overhead).

Jawed

Do you have a sense of the performance penalty going from "no tiles" to "two tiles"?

What are your thoughts on using AA with these other graphic effects like DoF/motion blur? Are they vying for the same rendering resources or are they each happening on "different parts fo the chip"?

J
 
expletive said:
Do you have a sense of the performance penalty going from "no tiles" to "two tiles"?
No, not in any specific way I'm afraid. All we have is ATI's word about the "effectively free AA". And I'm no dev, so no useful insider info...

As discussed already in this thread, there's the effect of triangle size/count on the number of triangles affected by tile boundaries. It's really hard to predict what'll happen over the lifetime of XB360.

For example with in-GPU tessellation (e.g. using viewport LOD to determine the degree of tessellation - so that near objects are more finely drawn than distant objects) any back of an envelope calculations about the count of triangles straddling tiles go right out the window. Do you count un-tessellated triangles or post-tessellated triangles?

I guess that z-pre-pass is best performed on post-tessellated triangles. If so, that basically means that distant objects (say there's a thousand trolls, for the sake of argument) will be low poly, so even if a few hundred trolls straddle the two tiles (say) the ability of the GPU to perform adaptive tessellation makes the tiling overhead on those distant trolls pretty minor. If a troll is 10,000 polys close-up but 500 in the distance you can see that the GPU isn't going to struggle.

It's also worth remembering that when Xenos performs a z-pre-pass, it's doing so with all 48 pipelines - running at 5x or more faster than a conventional GPU can render the z-pre-pass - that time saved is traded-off against the overhead of multiple-tiling. Not to mention the fact that the fully-populated z buffer means that all pixel shading runs way faster due to a huge reduction in overdraw.

What are your thoughts on using AA with these other graphic effects like DoF/motion blur? Are they vying for the same rendering resources or are they each happening on "different parts fo the chip"?
My understanding of both DoF and motion-blur techniques is that they're performed solely as pixel-shader routines.

DoF seems to be done by calculating a low-res distance-from-focal plane - based on the frame's z-data you can save the distance from focal plane as a texture. Then blur the entire frame as another texture. Then blend the blurred texture with the original frame based on the "blur factor" in the first texture.

Motion blur takes one or more frames (saved as textures) and blends them with the frame being drawn - I'm sure there'll be howls of protest at that simplification.

So, both techniques are pretty much independent of tiling - since tiling doesn't really make any difference to the number of pixels that are drawn.

In terms of AA, purely in its own right, it's always a "free" aspect of the frame-render (or render to a texture). As long as it's technically possible (i.e. the render uses a data format that's compatible with AA) then it's available.

The issue that ERP was alluding to earlier in the thread appears to be a basic "engine" issue. If the developer designed an engine capable of DoF and motion blur but did so before the implications of "tiled AA" were apparent (i.e. the requirement to perform a z-pre-pass), then the work involved in "re-sequencing" the phases required to render a frame may be too late in the day for early games.

Well, that's how I read it, anyway.

Jawed
 
Jawed said:
No, not in any specific way I'm afraid. All we have is ATI's word about the "effectively free AA". And I'm no dev, so no useful insider info...

As discussed already in this thread, there's the effect of triangle size/count on the number of triangles affected by tile boundaries. It's really hard to predict what'll happen over the lifetime of XB360.

For example with in-GPU tessellation (e.g. using viewport LOD to determine the degree of tessellation - so that near objects are more finely drawn than distant objects) any back of an envelope calculations about the count of triangles straddling tiles go right out the window. Do you count un-tessellated triangles or post-tessellated triangles?

I guess that z-pre-pass is best performed on post-tessellated triangles. If so, that basically means that distant objects (say there's a thousand trolls, for the sake of argument) will be low poly, so even if a few hundred trolls straddle the two tiles (say) the ability of the GPU to perform adaptive tessellation makes the tiling overhead on those distant trolls pretty minor. If a troll is 10,000 polys close-up but 500 in the distance you can see that the GPU isn't going to struggle.

It's also worth remembering that when Xenos performs a z-pre-pass, it's doing so with all 48 pipelines - running at 5x or more faster than a conventional GPU can render the z-pre-pass - that time saved is traded-off against the overhead of multiple-tiling. Not to mention the fact that the fully-populated z buffer means that all pixel shading runs way faster due to a huge reduction in overdraw.


My understanding of both DoF and motion-blur techniques is that they're performed solely as pixel-shader routines.

DoF seems to be done by calculating a low-res distance-from-focal plane - based on the frame's z-data you can save the distance from focal plane as a texture. Then blur the entire frame as another texture. Then blend the blurred texture with the original frame based on the "blur factor" in the first texture.

Motion blur takes one or more frames (saved as textures) and blends them with the frame being drawn - I'm sure there'll be howls of protest at that simplification.

So, both techniques are pretty much independent of tiling - since tiling doesn't really make any difference to the number of pixels that are drawn.

In terms of AA, purely in its own right, it's always a "free" aspect of the frame-render (or render to a texture). As long as it's technically possible (i.e. the render uses a data format that's compatible with AA) then it's available.

The issue that ERP was alluding to earlier in the thread appears to be a basic "engine" issue. If the developer designed an engine capable of DoF and motion blur but did so before the implications of "tiled AA" were apparent (i.e. the requirement to perform a z-pre-pass), then the work involved in "re-sequencing" the phases required to render a frame may be too late in the day for early games.

Well, that's how I read it, anyway.

Jawed

That was a great answer, thanks!

J
 
Nemo80 said:
Well in a Dev. interview Bizarre said they will be using 2xAA at 720p only, and 4xAA at the lower 640x480 resolution. So i guess there is some serious performance hit (both at 30 fps!).
? You are assuming that Xenos is the bottleneck at both resolutions. It is quite likely the penalty at 720p for 4xMSAA could be an issue and thus 2xMSAA was used; yet on 480p 4xMSAA is used, but 60fps is not reached due to bottlenecks elsewhere in the engine. PGR3 does extensive modeling of car damage, physics, and other tasks. As we have heard from many developers, a lot of launch games are using 1 CPU. Further, with the delayed beta kits (which were not recieved until Aug.) that has left minimal time to optimize the code for the unique closed system--especially when transitioning from two large OOO CPUs to a tricore in-order CPU.

The fact they are running at 2xMSAA at 720p indicates that hit cannot be too bad at 480p which is 1/3 the total pixels. So your assumption that the framerate issue must be Xenos/Tile related is premature.
 
Acert93 said:
a lot of launch games are using 1 CPU.
I would be surprised if any of them are using more than one thread. It seems like Microsoft is cutting it really close with their launch.
 
Dr. Nick said:
I would be surprised if any of them are using more than one thread.
Infinity Ward claims to be using all three cores with Call of Duty 2, and that game will probably make it for launch. Itagaki has made references to multithreading DOA4, though I don't know for sure if that's in the final game. And Visual Concepts has talked about the power of multithreading on the X360 with NBA 2k6, though I'm not sure if they flat-out said they were running on more than one thread.

There may be more, but those are the ones that come to mind.
 
Titanio said:
The logic can do a certain amount obviously, but I'm sure you could push it to the point of degrading performance - obviously there are limits. What we're wondering about here doesn't directly relate to the processing or bandwidth on the daughter die, though, but what happens when you need to start tiling. Or perhaps a combination of all of the above? (Although I can't see bw ever being an issue at least).

The reason for the suspicion about the lower resolution is that the res we've seen some shots at would allow the framebuffer to fit very neatly into the eDram without any tiling, and with 2xAA as required by MS (it comes to 9MB). It may all just be coincidence, but it seems a little odd. I'm trying to find out if 720p is the min native res, but to little avail sofar..

From a MGS rep on teamxbox forums when asked about X360 being able to upconvert 720p to 1080i:


"Yes, yes it does. The breakdown goes like this: all Xbox 360 games are natively rendered for 720p and/or 1080i resolution. On top of that, Xbox 360 will run those games in HD on TVs supporting 480p, 720p and/or 1080i—you just set up your Xbox 360 for the setting which you prefer for your TV.

So there lots of HDTVs out there which run in 480p and 1080i but not 720p, like mine at home, gar. The good news for those of us with these kinds of TVs is that Xbox 360 games can use either 480p or 1080i to look as awesome as possible on those TVs."


J
 
Someone on some presentation said to someone other who said on some forum( :D ) that 30 fpr PGR3 works on 1 core only. But do not take my word as a truth.
 
Lysander said:
Someone on some presentation said to someone other who said on some forum( :D ) that 30 fpr PGR3 works on 1 core only. But do not take my word as a truth.

I don't think you will have to worry about anyone taking your word as truth. :)
 
Lysander said:
Someone on some presentation said to someone other who said on some forum( :D ) that 30 fpr PGR3 works on 1 core only. But do not take my word as a truth.

Only 30 frames per race? Damn.








/coat
 
wait for alpha kits, wait for beta kits, wait for final hardware, wait no it uses only 1 core,
wait for E3 ... why all these nonsens
call a pig a pig .
PGR will be 30fps and it will probably look gorgous
 
The funny thing is, I didn't see one site who dared to put down a guess about the current (X05) framerate of PGR... So either it is 60 now, or the blur did a fine job after all.

Or did I miss anything?
 
Lysander said:
Someone on some presentation said to someone other who said on some forum( :D ) that 30 fpr PGR3 works on 1 core only. But do not take my word as a truth.

I'm sensing some sarcasm here so let me clarify. :)

There is a thread open on the TXB forums where 3 reps from MGS are fielding questions from users. That being said, i have no idea how knowledgeable they are or accurate their information is so take all with a grain of salt. I hoped such a disclaimer was implied when i stated the source the first time around. :)

Heres the thread if any of you are interested:

http://forum.teamxbox.com/showthread.php?t=377341

J
 
Back
Top