Xenon System Block Diagram

Panajev said:
1920x1080p back-buffer at 24 bpp ( no destination alpha ) and full-screen Z-buffer at 16 bpps would mean 10,368,000 bytes or 9.88 MB so I think 10 MB would suffice for them.
Except that no chip with 32bit native color from recent history could store 24bpp in different alignment then 32bit, which means it eats up the same space as 32bpp. Even GS - which had a lot of consideration done in that regard due to same eDram limitations.
16bpp Z might be a bit tight for resolution that high and with sizes of games in the future, but I guess one could manage there. Though that means no Stencil shadows :(
 
Fafalada said:
Cybamerc said:
It's clearly a GC inspired design but I think what they are Dave, ERP and Pana are suggestion is that the back buffer is stored in main RAM as well but that it is rendered into the eFB. That way you can have frames that exceed 10 mb in size. But it comes at a cost... and well, it'll be interesting to see how it works.
That sounds like going around your ass into front pocket :? It sounds like regular cache with a larger buffer just for the size's sake... that can't be right, can it?

Anyway, according to the diagram, it's not going to store any texture data, so it's not gonna be like PS2 either way.
At first thought I was assuming it would be more like GC then (where you manually control copying data from eFB to main mem), but apparently some people think otherwise?

Yeah and you have three of them. One 3.5 GHz CPU would be pretty nice for a 2005 system going by previous standards.
Indeed - right now even my dream rig (the fastest dual opterons you can buy) would get smoked by this *L*

Fafalada,

what Cybamerc meant was the "usual" store the back-buffer and the Z-buffer in the e-DRAM and the front-buffer in main RAM.

Dave and his "what fits in e-DRAM goes at the fastest speed and what does not fit goes at normal ( still high :) ) speed" comment seems to make sense given that kind of GCN-esque philosophy and some freedom in the allocation of the e-DRAM space.

Do you think all render-to-texture operations will go to main RAM ?

I have to say that the connection with main RAM looks very fast ( >24 GB/s ) so I am not exactly worrying too much about it: the NV40 does not have much more bandwidth to its VRAM compared to what the XGPU 2 has to main RAM.

Where do you read 48 Shader units ? 1 ALU op/cycle = 1 Shader unit ?

Yeah, I can kinda see it... I'd rather them say "48 Shader ALUs and each can do N amount of FP ops and N + M FP ops while dual-issuing Scalar and Vector isntructions.
 
Panajev2001a said:
Do you think all render-to-texture operations will go to main RAM ?

Just out of interest Pana, when you refer here to render-to-texture, is it similar to the render-to-texture done in 3D apps, ie texture baking?
 
Ug Lee said:
Panajev2001a said:
Do you think all render-to-texture operations will go to main RAM ?

Just out of interest Pana, when you refer here to render-to-texture, is it similar to the render-to-texture done in 3D apps, ie texture baking?

No, what he means is basically a DX rendertarget. You render your scene to a texture, then using the texture for something else.

You can do lots of effects with this.

A simple example would be a blur effect. You render your scene to a texture, then you render that texture into a full-screen quad with a pixel shader to apply the blur effect.

Reflections can be done with a rendertarget. You render the scene from the point of view of the object, then use that rendering as a texture on the object itself.

This is not texture baking (if I understand what you're talking about) which is precomputing something like lighting in a 3D rendering program, and baking it statically into the object's texture so you don't need to calculate it at runtime.
 
Everyone seems to assume that there will be three separate CPUs. But the block diagram seems to indicate that Xbox2 will use a tri-core CPU. Notice the squuare around the three cores and the caches. So it couldn't be a PowerPC 976.
 
aaaaa00 said:
Ug Lee said:
Panajev2001a said:
Do you think all render-to-texture operations will go to main RAM ?

Just out of interest Pana, when you refer here to render-to-texture, is it similar to the render-to-texture done in 3D apps, ie texture baking?

No, what he means is basically a DX rendertarget. You render your scene to a texture, then using the texture for something else.

You can do lots of effects with this.

A simple example would be a blur effect. You render your scene to a texture, then you render that texture into a full-screen quad with a pixel shader to apply the blur effect.

Reflections can be done with a rendertarget. You render the scene from the point of view of the object, then use that rendering as a texture on the object itself.

This is not texture baking (if I understand what you're talking about) which is precomputing something like lighting in a 3D rendering program, and baking it statically into the object's texture so you don't need to calculate it at runtime.

Beautiful, thanks for the clarification. Makes perfect sense. :)

Something that might interest though, this is out of the Max 6 guide:

The Render To Texture tool in 3ds max lets you render, or “bake,â€￾ various scene elements into your textures, including lighting and shadows. You can use these special textures in real-time 3D applications such as games to reduce the burden on the renderer, thus improving the frame rate.

Would probably be a lot simplier for it to be referred to within Max as simply 'texture baking'. But enough of straying off topic. Thanks again.
 
Well, didn't exactly reached my expectation, but it fits the rumour. The diagram even have the timing of the caches like ERP said on another thread.

I guess MS really was bleeding money with Xbox, and would go with a cheaper system for Xbox2.
 
Fafalada said:
Panajev said:
1920x1080p back-buffer at 24 bpp ( no destination alpha ) and full-screen Z-buffer at 16 bpps would mean 10,368,000 bytes or 9.88 MB so I think 10 MB would suffice for them.
Except that no chip with 32bit native color from recent history could store 24bpp in different alignment then 32bit, which means it eats up the same space as 32bpp. Even GS - which had a lot of consideration done in that regard due to same eDram limitations.
16bpp Z might be a bit tight for resolution that high and with sizes of games in the future, but I guess one could manage there. Though that means no Stencil shadows :(
Do you mean to say that having 24bit buffers doesn't save any space?!
Then why would anyone ever use it, let alone implement it in hardware?
 
With three possibly dual-threaded CPU cores sharing the same 1MB L2, won't there be an awful amount of cache thrashing going on?

That tends to happen a lot on the northwood P4 after all, and it just offers 2 threads on 512k cache... Maybe cache lines can be locked in L2 or there's ways to partition the cache so different threads don't bump each other out of the cache. Guess we'll learn eventually. :)

Then again, maybe this diagram is just a well-made fake... :devilish::LOL:
 
Is it possible that specs have already been locked down by Sony,MS and N? Considering that 2006 is the launch year??
 
Three G5 class CPUs running at 3.5 GHz, with 1 MB of shared L2, ultra fast FSB and very fast main RAM will smoke Desktop PCs that are out at the same time Xbox 2 launches ( mid 2005 ).

Tough, to say, CPU race on the desktop could speed up, but I think you're basically on the money here. I just feel like take some potency out of this statement. Price performance, this will likely make many things its bitch.

With three possibly dual-threaded CPU cores sharing the same 1MB L2, won't there be an awful amount of cache thrashing going on?

Possibly.

That tends to happen a lot on the northwood P4 after all, and it just offers 2 threads on 512k cache... Maybe cache lines can be locked in L2 or there's ways to partition the cache so different threads don't bump each other out of the cache. Guess we'll learn eventually.

The cache lines on the P4 are quite large, so eviction rates are ugly, which is one significant reason the L2 cache increase going from willy to nw was so dramatic.
 
The problem I see is that HDTV users suffer a doubly bad penalty. Not only do they suffer the normal fillrate and bandwidth hits you'd expect running at higher resolutions, but they also get booted to "normal ram", which means an ever bigger loss of performance running in HDTV resolutions. I hope the PS3 is more impressive. I honestly was expecting a little more for a next-gen console.
 
Squeak said:
Do you mean to say that having 24bit buffers doesn't save any space?!Then why would anyone ever use it, let alone implement it in hardware?
As long as the chip only has native 32bit alignment and no 24bit one, then it won't save you any address space. What you end up with is a bunch of 8bit 'holes' next to every 24bit entry.
While those holes can be filled if you have a format that fits that kind of alignment (GS for instance has extra 8/4bit texture formats added just for that), the address area occupied by the 24bit buffer stays the same as with 32bit.
In 1080P case, 8MB, with not enough space left over for Z, even 16bit one.
 
DemoCoder:

> but they also get booted to "normal ram", which means an ever bigger
> loss of performance running in HDTV resolutions.

Not if you "tile" the back buffer. Render into the eFB and copy to main RAM when its full. I think this is what Dave has been suggesting. How well something like that would work depends on the implementation though I guess.

> I hope the PS3 is more impressive. I honestly was expecting a little
> more for a next-gen console.

It's an interesting design though. So much CPU power but a fairly modest GPU by 2005 standards (?) and little RAM. It's the Anti-Xbox :p
 
Well, like I said in another thread, it has to be more sophisticated than tiling, it has to be a deferred renderer like DreamCast. Otherwise, you wouldn't be able to flush and re-read tiles fast enough, and eventually you'd be stalled flushing the tile to system RAM or reading it back.

All current GPUs are tile based.
 
If you use the ondie RAM as a big cache for the Z and frame buffers, you'd need to gang geometry up based on screen u,v coordinates to get good locality, almost impossible with an immidiate mode renderer.

However with front-to-back rendering and hierarchical Z test (or 2 pass rendering) you should get decent usage of a 10MB cache with a demand loaded tiling scheme.

Cheers
Gubbi
 
Back
Top