Complete Details on Xenos from E3 private showing!

Acert93 · May 23, 2005

compres said:
What's wrong with adding the bandwith if you would have needed it anyway if it wasnt for the embedded memory?

1. Because the entire system memory is not accessible with that bandwidth, a mere 10MB with a very specialized task, has the majority of the bandwidth.

2. Just because the eDRAM has 256GB/s of bandwidth does NOT mean the system will have 256GB/s of savings in framebuffer!!

If a game uses HDR, 4x AA, and whatever at 1080p @ 60fps and using lets say 30GB/s of framebuffer (made up number), the eDRAM is only effectively giving the system a 30GB/s overhead.

The 256GB/s is just a huge number that will never be fully utilized, not even close, not in our dreams, not in tomorrows dreams (maybe next gen though). It is overkill because of its *application*. It needs to be quick and it needs to move stuff IMMEDIATELY; but the 256GB/s logic-to-eDRAM connection on the daughter die will never be fully saturated Unlike the 512MB memory pool(s) on the PS and Xbox 360.

Basically since the bandwidth the eDRAM isolates is for only one task, it is only fair to count the savings in that task. Whether it is 1GB/s or 900GB/s a second, if the framebuffer only ever uses X ammount of bandwidth, everything else is irrelevant.

quest55720 · May 23, 2005

Acert93 said:
compres said:

What's wrong with adding the bandwith if you would have needed it anyway if it wasnt for the embedded memory?

Click to expand...

1. Because the entire system memory is not accessible with that bandwidth, a mere 10MB with a very specialized task, has the majority of the bandwidth.

2. Just because the eDRAM has 256GB/s of bandwidth does NOT mean the system will have 256GB/s of savings in framebuffer!!

If a game uses HDR, 4x AA, and whatever at 1080p @ 60fps and using lets say 30GB/s of framebuffer (made up number), the eDRAM is only effectively giving the system a 30GB/s overhead.

The 256GB/s is just a huge number that will never be fully utilized, not even close, not in our dreams, not in tomorrows dreams (maybe next gen though). It is overkill because of its *application*. It needs to be quick and it needs to move stuff IMMEDIATELY; but the 256GB/s logic-to-eDRAM connection on the daughter die will never be fully saturated Unlike the 512MB memory pool(s) on the PS and Xbox 360.

Basically since the bandwidth the eDRAM isolates is for only one task, it is only fair to count the savings in that task. Whether it is 1GB/s or 900GB/s a second, if the framebuffer only ever uses X ammount of bandwidth, everything else is irrelevant.

I could be wrong but I thought the output chip is from MS and can't do 1080p? There is so much info flying around it is easy to get confused.

compres · May 23, 2005

Acert93 said:
compres said:

What's wrong with adding the bandwith if you would have needed it anyway if it wasnt for the embedded memory?

Click to expand...

1. Because the entire system memory is not accessible with that bandwidth, a mere 10MB with a very specialized task, has the majority of the bandwidth.

2. Just because the eDRAM has 256GB/s of bandwidth does NOT mean the system will have 256GB/s of savings in framebuffer!!

If a game uses HDR, 4x AA, and whatever at 1080p @ 60fps and using lets say 30GB/s of framebuffer (made up number), the eDRAM is only effectively giving the system a 30GB/s overhead.

The 256GB/s is just a huge number that will never be fully utilized, not even close, not in our dreams, not in tomorrows dreams (maybe next gen though). It is overkill because of its *application*. It needs to be quick and it needs to move stuff IMMEDIATELY; but the 256GB/s logic-to-eDRAM connection on the daughter die will never be fully saturated Unlike the 512MB memory pool(s) on the PS and Xbox 360.

Basically since the bandwidth the eDRAM isolates is for only one task, it is only fair to count the savings in that task. Whether it is 1GB/s or 900GB/s a second, if the framebuffer only ever uses X ammount of bandwidth, everything else is irrelevant.

It is a very specialized task that you will use in 100% of the games during 100% of the time. It might not save you the whole 256GB/s, but even if its less than 100GB/s it enables for every single game to have far better image quality. Again Sony hypes many features that are quite less significant than this one and no one coplains.

The whole system does not have access to it becouse only the video chip needs it, but in a NUMA system it would have been used 100% of the time, so in fact it can be seen as more bandwith, but I agree maybe not 256GB/s.

And also you should not say "only 30GB/s", becouse its quite a lot. And I dont agree it will never be used becouse since its there, and its a console, it will be used close to its maximum quite often.

Acert93 · May 23, 2005

quest55720 said:
I could be wrong but I thought the output chip is from MS and can't do 1080p? There is so much info flying around it is easy to get confused.

The R500 can render 1080p, but the MS chip that does the scaling and output only supports 1080i "currently". I had the 1080p number in my head because I was thinking of it as a comparison to the RSX and the bandwidth savings between the two, but you are right should have just used 1080i to make it clear (although I wonder if 1080i is rendered at 1080p internally or if it can alternate lines... I am not sure!)

Acert93 · May 23, 2005

compres said:
It is a very specialized task that you will use in 100% of the games during 100% of the time. It might not save you the whole 256GB/s, but even if its less than 100GB/s it enables for every single game to have far better image quality.

It wont be saving 100GB/s either. It takes a balance of bandwidth AND processing power. Disproportionate bandwidth-to-processing power is like trying to fill a river bed with a eye dropper.

And lets not forget that the RSX has 38GB/s of bandwidth accessible to it. Just a comparison, the PS2 had a lot more memory bandwidth this gen than the Xbox--but you already noted in another thread that the Xbox looked better. How is that possible? Because there is a balance. Some huge bandwidth number in of itself does nothing.

The whole system does not have access to it becouse only the video chip needs it, but in a NUMA system it would have been used 100% of the time, so in fact it can be seen as more bandwith, but I agree maybe not 256GB/s.

The point is it is not an apples-to-apples comparison. Just adding up bandwidth does not tell the entire story. Looking at the bandwidth of the 512MB of memory on each system is lacking because the Xbox 360 does not use the 512MB for backbuffer.

Adding the backbuffer bandwidth to the system bandwidth is not fair because the system does NOT have access to it since it is a specialized task.

And also you should not say "only 30GB/s", becouse its quite a lot.

Only is relative, I used "only" because it is a fraction of the 256GB/s total available.

And I dont agree it will never be used becouse since its there, and its a console, it will be used close to its maximum quite often.

The Xbox 360 does not have the power to maximize that bandwidth. It has a huge bandwidth because eDRAM has a huge bandwidth and to ensure it never is a bottleneck. eDRAM made of DRAM has more latency than say, 1T-SRAM like the Flipper used. One way around the problem of latency is to pump up the bandwidth.

compres · May 23, 2005

Acert93 said:
compres said:

It is a very specialized task that you will use in 100% of the games during 100% of the time. It might not save you the whole 256GB/s, but even if its less than 100GB/s it enables for every single game to have far better image quality.

Click to expand...

It wont be saving 100GB/s either. It takes a balance of bandwidth AND processing power. Disproportionate bandwidth-to-processing power is like trying to fill a river bed with a eye dropper.

And lets not forget that the RSX has 38GB/s of bandwidth accessible to it. Just a comparison, the PS2 had a lot more memory bandwidth this gen than the Xbox--but you already noted in another thread that the Xbox looked better. How is that possible? Because there is a balance. Some huge bandwidth number in of itself does nothing.

The whole system does not have access to it becouse only the video chip needs it, but in a NUMA system it would have been used 100% of the time, so in fact it can be seen as more bandwith, but I agree maybe not 256GB/s.

Click to expand...

The point is it is not an apples-to-apples comparison. Just adding up bandwidth does not tell the entire story. Looking at the bandwidth of the 512MB of memory on each system is lacking because the Xbox 360 does not use the 512MB for backbuffer.

Adding the backbuffer bandwidth to the system bandwidth is not fair because the system does NOT have access to it since it is a specialized task.

And also you should not say "only 30GB/s", becouse its quite a lot.

Click to expand...

Only is relative, I used "only" because it is a fraction of the 256GB/s total available.

And I dont agree it will never be used becouse since its there, and its a console, it will be used close to its maximum quite often.

Click to expand...

The Xbox 360 does not have the power to maximize that bandwidth. It has a huge bandwidth because eDRAM has a huge bandwidth and to ensure it never is a bottleneck. eDRAM made of DRAM has more latency than say, 1T-SRAM like the Flipper used. One way around the problem of latency is to pump up the bandwidth.

Damn long post am a lazy poster.

Well if there is no fear way to compare the numbers then why is it fine when it is sony that is coming up with irrational numbers? I think this is no worst than what m$ did here.

From what I know from digital circuits and microprocessors this r500 design looks very elegant so far. The thing is sony claims ps3 its twice as powerful, and I very much doubt it is in real world performance, since the cell processor looks like a nightmare to code for and I have yet to see any significant bottleneck in the m$/ati design..

Mordecaii · May 23, 2005

Quite a few people have commented that the 360 processor (XeCPU) is going to be as hard or harder to program for... Why does CELL always get targeted as "being hard to program for"? You still have to manage multiple threads with the XeCPU so it shouldn't be any easier.

cthellis42 · May 23, 2005

Some numbers are "more irrational" than others.

None of them are actually "rational" for comparing complete system performance in games whose first generations are still under construction now. They're just marketing bullet points.

You'd think people--especially "smart" people like we geeks, who are the only ones who actually CARE to compare these numbers--would realize that by now.

Acert93 · May 23, 2005

Mordecaii said:
Quite a few people have commented that the 360 processor (XeCPU) is going to be as hard or harder to program for... Why does CELL always get targeted as "being hard to program for"? You still have to manage multiple threads with the XeCPU so it shouldn't be any easier.

How do you know it wont be any easier? Are you saying that because they both require multi-threading your app? There is a LOT more to it than that. What if one has great compilers and the other requires a lot of care and special attention?

I am not a game programmer, but just today a programmer on B3D gave their opinion that the CELL would be harder to work with. Another programmer who has worked with XeCPU and PS3 has commented about the flexibility of the the XeCPU compares to the SPEs.

Obviously experience, time/budget, and the result you are going for are going to impact your impression of any HW. Again, I am not a programmer, but from what I have read 1. PS3 is easier than the PS2 and 2. the PS3 has so much power it is worth the time to work with a streaming environment and some of the limitations of the SPEs. They are just too powerful to ignore.

Mordecaii · May 23, 2005

Well, in answer to your question... someone else had stated that CELL would be harder to target for, so I decided to take the opposite approach to balance things out. But my point was, obviously compilers and development tools play a HUGE part in how easy it is to program for each CPU (hence why MS has an advantage over Sony in this department, and why Sony is getting help from people like nVidia and open source standards). My point was simply that saying that CELL will always be harder to program for because it is a new architecture is completely false... Honestly, it wouldn't surprise me to hear that either of them was the easiest to program for (between the two of them, not counting Ninty).

Panajev2001a · May 23, 2005

The problem with multi-threading is one of synchronization, of data protection (isolation of critical data that is accessed by otentially multiple threads, but we want to force it to be accessed by a thread at a time helping us synchronize the threads without them stepping on each other's feet) and of thread communication.

A compiler won't do it all for you: knowledge of a good API such as OpenMP or MPI is necessary (C/C++ is not the happiest choice for parallel processing and multi-threaded programs unless used with good partner API's) and lots of attention by the programmers.

Still, http://www.beyond3d.com/forum/viewtopic.php?p=528655#528655 there are some who see a reason why the CELLset-up might not be too bad for the task at hand.

Contrary to popular opinion here, SCE/IBM/Toshiba tested the idea of more general purpose cores (more PPE's and thus less SPE's attached to each PPE) to augment performance, but decided that for the kind of tasks the CELL based Broadband Engine would be best to achieve maximal performance while also allowing programmers to extract it efficiently. I think the PlayStation 3 will be helped by powerful compilers and well designed high level API's and tools.

Mordecaii · May 23, 2005

Thanks for that info Panajev

Lazy8s · May 23, 2005

Shifty Geezer:

Have I finally cracked it?!

Embedding the framebuffer is more of a device-level implementation, like tile accelerated rendering, for keeping access on chip when doing bandwidth intensive operations like blending, sampling, z checking and stencil, etc. A step further could be taken to the device level by deferred rendering acceleration through display lists for pre-sorting the image without wasting calculation to the framebuffer.

On a scale of how various processors overcome the limitations of data calculation (mainly overdraw expense) and of data transfer (mainly off-chip bandwidth expense), the RSX sits closer to application-level handling, the X360 GPU stays somewhere by the middle, and TBDLR is at the other end nearer the device-level.

Could PS3 manage this too? RSX outputs data into a tiny tile to fit into a SPE's LS. SPE does the fast mundane work and chucks it back out?

Some kind of tile acceleration might be needed to be able to use the SPEs like X360's Smart 3D Memory. Plus, doing z check, stencil, blending, sampling, etc. is probably faster with hardware that's been specialized for it.

ecliptic · May 23, 2005

If the Xenos GPU is using tile rendering, it will seriously kill the RSX in overall bandwidth.

Plus how can anyone just shrug off the 256GB/s? The processing that the Smart RAM does (specifically anti-aliasing) wont eat up the main ram bandwidth.

Jawed · May 23, 2005

Why does the EDRAM unit have 256GB/s of internal bandwidth?

Why not 128GB/s?

Or 64GB/s?

Is 256GB/s over-engineering?

Jawed

Jawed · May 23, 2005

Panajev2001a said:
Contrary to popular opinion here, SCE/IBM/Toshiba tested the idea of more general purpose cores (more PPE's and thus less SPE's attached to each PPE) to augment performance, but decided that for the kind of tasks the CELL based Broadband Engine would be best to achieve maximal performance while also allowing programmers to extract it efficiently. I think the PlayStation 3 will be helped by powerful compilers and well designed high level API's and tools.

Cell is not designed for gaming in PS3. That just happens to be one of the things it can do.

"Broadband Engine" should be the clue. The SPEs are a general purpose, ultra-high performance digital signal processor implemented as a multi-threaded, multi-configurable stream processor.

Though I have to admit, I'm not sure what you are going to do with 12 decoded HDTV streams coming out of Cell.

Jawed

London Geezer · May 23, 2005

Jawed said:
Panajev2001a said:

Contrary to popular opinion here, SCE/IBM/Toshiba tested the idea of more general purpose cores (more PPE's and thus less SPE's attached to each PPE) to augment performance, but decided that for the kind of tasks the CELL based Broadband Engine would be best to achieve maximal performance while also allowing programmers to extract it efficiently. I think the PlayStation 3 will be helped by powerful compilers and well designed high level API's and tools.

Click to expand...

Cell is not designed for gaming in PS3. That just happens to be one of the things it can do.

"Broadband Engine" should be the clue. The SPEs are a general purpose, ultra-high performance digital signal processor implemented as a multi-threaded, multi-configurable stream processor.

Though I have to admit, I'm not sure what you are going to do with 12 decoded HDTV streams coming out of Cell.

Jawed

Well, the fact that we don't have 10 brains and 40 eyes is not Sony's problem. Someone has to advance technology, right? That's our own limitation for not being able to watch 12 movies at once!!

j/k

Shifty Geezer · May 23, 2005

@ Acert : Thanks for that explanation. I do agree with you. I appreciate now the approach ATI have taken and it makes sense. My main problem is the use of the bandwidth figure which is misrepresented as you agree. I think your right in that it should be described in terms of bandwidth saved rather than actually bandwidth.

@ ecliptic : The 256 GB/s bandwidth figure should not count as bandwidth because it's the rate of logic and local storage internal to a processor, not between parts.

Think about it this way. On the RSX or Xenos GPU main or any CPU, there's registers. These are local storage that the logic writes to and from. If these registers didn't exist, if there was no option to store data locally on the processor, it would have to fetch all that data from main RAM and write to main RAM. That would need bandwidth. By having these local stores, the bandwidth can be considered 'infinite' because there's no delay in fetching the data for the logic. Likewise with L1 cache. We don't talk about bandwidth between CPU logic and level 1 cache, even though it's storing and fetching data. Local stores are amazingly fast, but expensive, so two types of memory are used.

This eDRAM is excatly the same thing. Although it's manufacturing technology is called eDRAM, it is in reality a large cache or area of direct storage for the logic. It is NOT a part of the Xenos main processing. That is, the unified shaders DO NOT access this external pool of memory at 256 GB/s. The eDRAM and logic should be considered a seperate processing unit, a back-buffer processor unit if you will, and as such the internal data transfer rates inside this 'BBPU' shouldn't feature in any bandwidth ratings, just as internal bandwidth in CPUs doesn't. MS it seems to me, have falsely presented the architecture of their graphics rendering to manufacture a mind-numbing marketting figure. This is why I've had so much trouble understanding how this pipeline and bandwidth fits together. Now I've cracked it, I see how misplaced that figure is.

In short, just because it's called eDRAM, it's no different to a cache, and you shouldn't add cache access bandwidth to a 'total system aggregate' bandwidth without doing the same for ALL processors, including CPUs and their caches.

Jawed · May 23, 2005

Shifty your argument is like saying that the bandwidth savings due to hierarchical Z or z-compression should be ignored when comparing one GPU with another.

Why do you think ALL GPUs use these techniques for speeding up Z? These aren't marketing tick boxes, this is real performance.

I'm sorry but dismissing a bandwidth-free back buffer is just the most ridiculous thing going. It is the same as dismissing all forms of data compression or compiler optimisations.

It's like saying that a 256-bit bus is overkill, let's stick with a 128-bit bus.

It's ridiculous beyond belief.

Jawed

Shifty Geezer · May 23, 2005

No, I'm saying toot the technology, but don't give it a false representation. Don't call on-chip local storage data transfer rates with on-chip logic as bandwidth. Instead, say this technology saves xxx bandwidth from system or whatever.

Complete Details on Xenos from E3 private showing!

Acert93

Artist formerly known as Acert93

quest55720

compres

Acert93

Artist formerly known as Acert93

Acert93

Artist formerly known as Acert93

compres

Mordecaii

cthellis42

Hoopy Frood

Acert93

Artist formerly known as Acert93

Mordecaii

Panajev2001a

Mordecaii

Lazy8s

ecliptic

Jawed

Jawed

London Geezer

Shifty Geezer

uber-Troll!

Jawed

Shifty Geezer

uber-Troll!

Similar threads