*rename* Balancing Work between Xenon & Xenos

Highly unlikely.

Honestly, we are not sure if Bungie will use it or not..Let's see....

Has he said it's in-game? As far as I'm aware he's said it's not CG, which is not the same thing!

Geoff has said that it is in-game since Bungie is going to show off the whole opening sequence of the game...

I'm not suggesting it. Devs are (Barbarian, joker454, nAo, Carmack, et al).

Carmack also stated that he doesn't like the Cell processor either..Oh well, doesn't stop us from enjoying the games on the xbox 360 and PS3 though..
 
Honestly, we are not sure if Bungie will use it or not..Let's see....

There are a lot more performance considerations compared to a static tessellated mesh, let alone the complexity introduced during content creation...

Carmack also stated that he doesn't like the Cell processor either..

Regardless, fact of the matter is that per-thread performance is a joke with lots of "gotchas" with Load/Hit/Store and the lack of out-of-order execution. Xenon's strength was its 6 threads in a reasonable die size circa 2005.

Oh well, doesn't stop us from enjoying the games on the xbox 360 and PS3 though..
This is the tech side of the forum, so please...
 
How the Xbox GPU can help the Xbox CPU is a much more interesting question :)
Well, we used it to do the reverse DCT in H.264 decoding. Of course, if we hadn't, the shaders would have been sitting there idle.
MEMEXPORT is your friend.
 
There are a lot more performance considerations compared to a static tessellated mesh, let alone the complexity introduced during content creation...



Regardless, fact of the matter is that per-thread performance is a joke with lots of "gotchas" with Load/Hit/Store and the lack of out-of-order execution. Xenon's strength was its 6 threads in a reasonable die size circa 2005.

This is the tech side of the forum, so please...

Very good point indeed....
 
With post-processing, you're sampling the image taking some amount of cycles (texture fetch/lookup/sample). The ALUs are fully orthogonal to the texture processors/addressing units and can do some operations in parallel (especially if the post-process is light on ALU usage).

e.g. Gears 2 does some fog calculations during SSAO IIRC.

edit: when they do the down-sample pass of AO.
First of all thanks for your answer I've been unexpectedly busy at work last week, so my answer is a bit late.

Please don't read the text above (not finished/clear if it ever will...)
The issue is still a bit unclear to me but my question was most likely not precise enough.
I've tried to come with some else than a queue of questions but it was not any clearer...
*First, I think I've a misunderstanding about when post processing happens. One would answer after that processing is done :LOL: but what do we call processing tho?
My understanding is that post processing happens after blending and MSAA resolve has happen, is that right?
*** About the data, after resolve and blending previous data (Z buffer, colors, Alpha) present in EDRAM are lost right? only the resolved frame buffer is left, or data stay in EDRAM and results are straight in main RAM
*If you don't do post processing you copy the frame buffer to the RAM, and actually you do the same if you want to do post processing as the xenos can't read from EDRAM, How long does it takes?
*during post processing the GPU has to access the frame buffer hich is in RAM. The only way for the GPU to touch the RAM is through texture fetch, right? It takes some time so till the operation is not complete we can consider ALU cycles as free, that's the idea right?
*At which the xenos can fetch textures? So usually how long it take to fetch the whole framebuffer to xenos? I remember reading some ms.

*** About the calculations, that's where I get really in trouble/ I'm lost as I don't understand what is happening. Above are random ideas about what could be happening (it might not make any sense...)
1) you sample the frame-buffer as a texture, what effects would be achieve this way?

2)you want to some change through shaders. Say you use your framebuffer to texture 1280x720 pixels then run your calculations, various question here.
Can the GPU swallow that without Alpha value, or you have to injected/set alpha value?
Can the GPU swallow that without Z value, or you have to create a matching "flat" Z-buffer?
Could you memexport the color buffer or you have to resolve/blend first (useless).

3) you send the whole frame buffer EDRAM as a color buffer, set a matching Alpha accordingly, compute something in else during the time it takes to move the framebuffer back then the GPU blend it together send it back RAM. (Like in KZII some render target done in SPU is blended with the framebuffer)

I'm lost about what is really happening, the flexibilty xenos provide but as I try to figure it out there is one thing I understand not being able to read from your render target(s) is a pain in the ass.
Actually I come to the conclusion that post processing as a name is misleading. It's not "after processing/rendering" but more do what you could not afford to do at a given point of rendering.
As I try to figure this out it comes clear that developers hands are more tied than I thought.
I don't know how depth of field is handled/faked in video game but to make it properly you would need per pixel-Z value, I
 
liolio said:
*First, I think I've a misunderstanding about when post processing happens. One would answer after that processing is done :LOL: but what do we call processing tho?
My understanding is that post processing happens after blending and MSAA resolve has happen, is that right?

Yes, that is correct. You render the scene in some manner, where the results are eventually written to a render target (directly or not). That is resolved to main memory, which includes MSAA resolve is needed.

You then render (usually) full screen effects to the back buffer (or to other intermediate buffers). These effects may be simple things like colour filters, or more complex effects like bloom (which are multipass, requiring extra render targets and passes). Some of these effects may still require depth information from the rendering pass, eg, a an explosion warping effect, etc.

After that, you then usually render the UI on top of everything else.

liolio said:
*** About the data, after resolve and blending previous data (Z buffer, colors, Alpha) present in EDRAM are lost right? only the resolved frame buffer is left, or data stay in EDRAM and results are straight in main RAM

Yes, the data in EDRAM either gets resolved or it doesn't. You can only access it as a texture if it gets resolved.

liolio said:
*If you don't do post processing you copy the frame buffer to the RAM, and actually you do the same if you want to do post processing as the xenos can't read from EDRAM, How long does it takes?

As it's basically a memory copy, it's subject to the limitations of system bandwidth. You can get a rough idea by running the numbers.

liolio said:
*during post processing the GPU has to access the frame buffer hich is in RAM. The only way for the GPU to touch the RAM is through texture fetch, right? It takes some time so till the operation is not complete we can consider ALU cycles as free, that's the idea right?

Yes, and this is why a GPU has a scheduler. It will keep as many 'threads' (which might be pixel quads) in flight as it can. The number of threads is usually limited by the number of temporary registers a shader uses (only so much space to store them). When a texture fetch stalls because the data isn't in the texture cache, the gpu will put that thread aside, and try and replace it with one which hopefully needs data that is in cache.
When you become fetch limited, it basically means these threads are all stalled, waiting on the fetch from main memory. At which point, you could be doing ALU work for 'free', as otherwise they would be idle.

This is one of the gotcha's with the xbox - it's texture cache isn't huge, and there is a *big* penalty for missing the cache (which is hopefully hidden by other threads). So if your shader is jumping around like mad, sampling all over the place, then performance can simply implode. The xbox has a few tricks to help out, such as the ability to specify a texture will be tiled.

liolio said:
*At which the xenos can fetch textures? So usually how long it take to fetch the whole framebuffer to xenos? I remember reading some ms.

Once again, it's a bandwidth issue. A simple operation (such as drawing a texture to the entire screen) should get very close to the theoretical performance limits.

liolio said:
1) you sample the frame-buffer as a texture, what effects would be achieve this way?

2)you want to some change through shaders. Say you use your framebuffer to texture 1280x720 pixels then run your calculations, various question here.
Can the GPU swallow that without Alpha value, or you have to injected/set alpha value?
Can the GPU swallow that without Z value, or you have to create a matching "flat" Z-buffer?
Could you memexport the color buffer or you have to resolve/blend first (useless).

3) you send the whole frame buffer EDRAM as a color buffer, set a matching Alpha accordingly, compute something in else during the time it takes to move the framebuffer back then the GPU blend it together send it back RAM. (Like in KZII some render target done in SPU is blended with the framebuffer)

1) I think you misunderstand. The frame-buffer should be considered a special case texture. Otherwise, all rendering is done to a render target (ie, a texture that can be drawn to). So it's more a case of 'render to a texture, draw to back buffer with FX' than 'render to back buffer, copy back buffer to texture, draw texture to backbuffer with FX'. There is no difference in resources, just one has a redundant copy. (Or perhaps I misunderstand the question)

2) Pretty much all texture fetches will fetch the entire value for the given texel in the texture. Except some bizzaro-formats which no one uses. So if you wrote a RGBA texture, you will sample an RGBA value. If you use the alpha value is up to you. The hardware is setup to fetch in certain sizes, such as 32bits (RGBA 8 bit), etc.
Z buffers are separate from colour buffers. However some chips can read z-buffers as if they were textures (Most z buffers are 24 bit FP with the remaining 8 bit for stencil masking).

3) I'm sure you could memexport the colour buffer, and in fact, I believe Deano does exactly this in Brink :) But I'd expect you'd only do it in very special cases with very special requirements, say, you had some extra info that you only wanted for every 4th pixel (or something like that) - on top of the normal colour output.

liolio said:
I'm lost about what is really happening, the flexibilty xenos provide but as I try to figure it out there is one thing I understand not being able to read from your render target(s) is a pain in the ass.
Actually I come to the conclusion that post processing as a name is misleading. It's not "after processing/rendering" but more do what you could not afford to do at a given point of rendering.
As I try to figure this out it comes clear that developers hands are more tied than I thought.
I don't know how depth of field is handled/faked in video game but to make it properly you would need per pixel-Z value

It's pretty much not possible to read from the render target currently being written. It's one of those things that would be quite useful - but GPU architectures make it very impractical (this is my understanding at least). Random-read would be undefined behavior.

Post processing can usually best be thought of as operations that occur in screen space. Applying bloom, for instance, without post processing would be close to impossible (or at least impractically slow).

Yes, basically. Developers use what they have, and they cheat the system as best they can to get away with what they can. Limitations are everywhere, it's a battle.

And yes, for DOF, you need a depth value. The common way to do this, is take your scene output, copy it to a smaller texture (say, 4x smaller), blur it once or twice (which takes two passes each time, with an intermediate render target). You then have the original, sharp rendered image and a smaller blurred version. The DOF would then be simulated by interpolating between the two based on depth.
This is fast, but has some pretty nasty accuracy issues such as nasty fringes. But for most games it's 'good enough'.
 
Graham thank a lot for highly informative answer, I'm quiet happy you took a pick at my post
Thing are a lot clearer to me now,once again many thanks :)
 
You then have the original, sharp rendered image and a smaller blurred version. The DOF would then be simulated by interpolating between the two based on depth.
This is fast, but has some pretty nasty accuracy issues such as nasty fringes. But for most games it's 'good enough'.
Interesting aside:

I once did a size reduction on a picture I'd previously saved away on my harddrive - for whatever reason, I can't remember. Then I was flipping through these pictures one day using the built-in image viewer in windows XP and I hit the miniature version first. The viewer automatically blew up the miniature, creating a bilinear-filtered mess of a picture. I then flipped to the next image, which was the original. I then experienced what looked like a crossfade between the two images; it wasn't a sharp transition! It actually looked like a proper wipe effect... Very strange.

I flipped back and forth between the two pics a number of times and the effect remained every time. Must have been some image-peristence effect of the brain that created it. :D

Anyway, great post man. Thanks a million for your explanations. Very interesting and intriguing stuff you're posting!
 
Physic on GPU/Xenos

Hi all,

I did a little search on the matter only to find that I may pass on ressurecting more than 3 years old threads :)

R.Huddy in an recent interview spoke about the fact that some of the Bullet Physics libraries should be ATI/GPU accelerated this year.

Do you think this outcome could have some impact on the 360 development in some years?

EDIT
Actually I wonder if this could only be add to this thread Using Xcpu to help graphics?
And possibly a renaming of the aforementioned thread could be option, how about:
"Xenon, Xenos who's to help the other?"
 
Last edited by a moderator:
GPU accelerated Bullet Physics library would be good news, if the GPU usage is low enough and if it doesn't cause much additional latency. Currently in Trials HD we are using a whole CPU core (one of three) to run Bullet Physics simulation. However we plan on moving other stuff to the GPU in our next project, so I doubt we could afford to use more than 2ms of GPU time per frame for the physics simulation (frame total time is 16.6ms when rendering at locked 60 fps). I doubt the physics simulation runs so well on the GPU that 2ms would be enough to do work equal to the 16.6ms of a full CPU core (2 hardware threads). Beyond physics engine, Trials HD does not tax the CPU cores that much, so we would not gain anything for having one core free. For games that are CPU limited, the GPU accelerated physics could be highly benefical. But for most games (with balanced CPU&GPU workload) the GPU is best used on graphics related processing (latency is not an issue). It's the same on PC and consoles. You always should use the processing unit that does the work best, and always optimize the data synchronization overhead and data movement overhead to be as small as possible.
 
Regarding all the talk about tesselation, what do you think of this article?

Evergreen/DX11 tessellators vs XBox 360's
Similar but different

by Charlie Demerjian

September 15, 2009

ONE OF THE FEATURES of the upcoming ATI Evergreen family, also known as the 5-series, is a tessellator. While this might be old news to graphics card enthusiasts, this time it really is different, mainly because Microsoft is finally backing the technology.
ATI has been putting tessellators in its hardware for generations. The 2000 series was one of the first, but there have been bits and pieces like 3D normal map compression long before. Depending on how you count, Evergreen is either the fourth or fifth generation of the technology.
ninjatessellationdemo_500.jpg

Tesselators make lots of triangles. Here is a picture from AMD.
Most people count ATI's 2xxx, 3xxx and 4xxx as its first three generations of graphics cards with tessellators, but they forget the most important one, the Xenos/R500 in the Xbox 360. Tessellation was roundly ignored on the PC side of game development because it wasn't ubiquitous and lacked standards. Now with the Xbox 360 and DX11 both having the technology, it should mean an easy port from the Xbox 360 to the PC and back, right?
Yes and no. We are told that the tessellator in the R770/4870 cards is a strict subset of the ones in the Evergreen cards. Given that the technology - going back to the R600 family, and even the Xbox 360's R500 - was done by the same company, it is likely that even those older version are also a fairly strict subset of the succeeding generations. So on the functionality side, the answer is a clear yes, you can port the code with little change.
On the no side there is the small problem of how you access the tessellators. With DX11, Microsoft pulled, well, a Microsoft, and changed how things are done. The DX11 way of calling the tessellators is different from the non-DX11 way of doing things. In Microsoft's defense, the 'old way' was not a standard because it only existed in ATI cards and the 360, but you would think Microsoft would just use what was there.
So in the end, the functionality of Evergreen's tessellators is a superset of that of the older series graphics cards, but how you get to them is different. The code underneath should be very similar if not the same, and more critically, optimizations should be very similar too. That means the differences are very likely able to be patched over with a smart compiler.
With progress in gaming being gated, or more aptly stymied, by console capabilities, this is a welcome bit of progress. As long as Microsoft can convince developers to use the tessellator on the Xbox 360, it looks like the ones in DX11 will be used as well. That might possibly lift PC gaming from the stone age of "next gen" consoles to the bronze age. Raise your stone axe and howl at the moon for progress.S|A


http://www.semiaccurate.com/2009/09/15/evergreendx11-tessellators-vs-xbox-360s/
 
Richard Huddy is probably referring to more recent, and more flexible ATI GPUs.
Indeed, but I just thank that some code should be portable ;)

Thanks Sebbi for the reply, so would one solution be to keep GPU physics dealing with "cosmetic" effects.
 
Indeed, but I just thank that some code should be portable ;)

It would be more likely if GPU based Bullet calls were done through Direct Computer rather than OCL as then it would depend on whether MS felt like implementing some subset of DC on the X360.

I'm not sure there'd be much work in porting a subset of OCL to X360 though.

And this would be in reference to possible GPU and CPU ports of OCL/DC which I wouldn't exactly think to be too likely.

Although there would be some possible benefits. OCL CPU version for example can automatically take advantage of multiple cores/threads.

Regards,
SB
 
Last edited by a moderator:
I can think of over 30 million reasons to continue the model http://www.microsoft.com/presspass/press/2009/may09/05-28XboxGrowthPR.mspx rather than chasing the current vocal minority http://www.xbitlabs.com/news/video/..._Millionth_DirectX_11_Graphics_Processor.html (nvidia has yet to release a DX11 part) ;)

Are you suggesting that MS should have gimped the spec in order to maintain parity with the Xbox 360? In that case they should have never bothered introducing the geometry shader either, since the 360 doesn't support that....
 
Are you suggesting that MS should have gimped the spec in order to maintain parity with the Xbox 360? In that case they should have never bothered introducing the geometry shader either, since the 360 doesn't support that....

They should have left an option to support those features for devices that have it.
 
There are no optional features in DX10/DX11.

and that is a problem. I think tesselation would have taken off faster if developers could tap into the large amounts of dx 9 hardware that supported it. I believe its everything from the radeon hd 2800 and up that supported it along with the low end verisons of the 2800s ? Thats alot of card many more than dx 11 has out right now.

They could have simply left a profile for dx 9 tesselation that may have been a step down but would have gotten the ball rolling . Right now and for most likely the next year or two your leaving out more hardware with some form of tesselation than there are dx 11 parts in the market.

By having the dx 9 tesselation as a path you coujld have gotten more developers on the 360 using it and thus more pc games using it. Even if dx 11's version is much better.
 
Back
Top