WTH IGN?! They posted the spec analysis from Major Nelson!!

360 vs the PS3

A couple of clarifications (from my own perspective).

1. The transistor counts for both GPUs are fairly considered. The number Major Nelson gave out does not include the EDRAM, it only includes logic.

2. If you think the memory bandwidth comparison is unfair, you're missing his point. The point the article clearly makes to me is that the PS3 is severely limited in bandwidth with respect to framebuffer transfers. At 1920x1080 resolution the PS3 will require literally all of the bandwidth available simply for the frame buffer, without considering the depth buffer, or multisampling. The point of the paper is to say that the 360 has dedicated 256 GB/sec so that it can transfer a huge frame buffer, depth buffer, with any other buffers (HDR, etc), with 4x multisampling. If you think this is not a considerable difference, or an unfair comparison, you do not understand memory architectures.
 
Jawed said:
Frankly, I believe M$ on this.

Jawed

:oops:

As Neo would say: "Whoa!" ;) Your assessment of both sides technologies has been pretty fair here on the forum and I know you have looked at R500 pretty closely... I know they are different designs and not readily comparable, but your comments are interesting.

Personally I was kind of thinking that Sony and MS were just dealing with the same problem in two different ways. Sony was throwing around almost 50GB/s of bandwidth to help feed the monster (and just assumed progressive scan at 1920x1080 was not going to be possible with 4x AA, heavy FP32 HDR effects, motion blur, and other bandwidth intensive stuff) and that MS's design was a nice way to deal with the 720p/1080i issue with limited system bandwidth by isolating the framebuffer/back buffers.

I was just looking at them as different... but the eDRAM may really open the door for a lot of bandwidth intensive effects that the more general purpose memory pools cannot.

It will be interesting to see... I would enjoy more comments in this area Jawed. This is one of the biggest differences between the two systems, exploring it is insightful.
 
Shifty Geezer said:
jvd said:
if ms did release this article (which i haven't heard proof of yet )
IGN posted the news story + article, saying it was sent them from MS in an email. The bitching does go way back and is expected to a degree, but it shouldn't be escalating to this level of FUD.
well ms didn't write it , just showed it to ign . Is this any worse than phil telling g4tv / ign that killzone was real time even though the developer of the title told g4tv / ign that it was a representation the day before ?
 
jvd said:
Shifty Geezer said:
jvd said:
if ms did release this article (which i haven't heard proof of yet )
IGN posted the news story + article, saying it was sent them from MS in an email. The bitching does go way back and is expected to a degree, but it shouldn't be escalating to this level of FUD.
well ms didn't write it , just showed it to ign . Is this any worse than phil telling g4tv / ign that killzone was real time even though the developer of the title told g4tv / ign that it was a representation the day before ?

I don't think Phil ever said it was real time, he said that it could have been, but because of time issues they had to make some videos too. He also said that it's just scratching the surface of what we are going to see...
Well that isn't any better though. :)
 
Gee, it must be some kind of shocking news to people that Sony likes talking about GFLOPS. :p Considering last round, the nerd hubbub that ensued, the people still licking their wounds to this day... anyone who considers it surprising or supremely offensive for them to do the same thing again don't have good memories. Nor do they really remember people touting "speed" in other ways, like Mhz last time. Single metric comparisons are just plain silly at this point. (At least for whole-system comparisons.)

Nelson's post makes me roll my eyes for different reasons (BS in the form of in-depth analysis), but mostly because it was set up to send to news orgs directly instead of playing itself out like any old post in the blogosphere. IGN gave cautionary comments about it, but it'd still be an unfortunate trend.

When we have FULL system info, and the likes of Hannibal working his way through it all, then we'll actually have reasonable system analysis. Of late, though, glurdge has just tried to take more feasable shapes.

If you think this is not a considerable difference, or an unfair comparison, you do not understand memory architectures.
Please, sir, to be listing the aggregate memory bandwidth of the PS2 versus the Xbox and grace is with how telling that number is.
 
Just my small take on it, given that the edram unit and the GPU are treated as a single entity in terms of the graphics architecture even if on seperate pieces of silicon, doesn't that mean its an internal bus speed and not a traditional memory to/from CPU or GPU. My take on it is that it was done that way to reduce individual part complexity and improve yields. That kind of fuzzy math would be like Sony adding the element interconnect bus bandwidth inside Cell into the aggregate for the whole system. Apples and oranges even in my comparision but that's about as fair as the FUD i've seen elsewhere.
 
Tacitblue said:
Just my small take on it, given that the edram unit and the GPU are treated as a single entity in terms of the graphics architecture even if on seperate pieces of silicon, doesn't that mean its an internal bus speed and not a traditional memory to/from CPU or GPU. My take on it is that it was done that way to reduce individual part complexity and improve yields. That kind of fuzzy math would be like Sony adding the element interconnect bus bandwidth inside Cell into the aggregate for the whole system. Apples and oranges even in my comparision but that's about as fair as the FUD i've seen elsewhere.

You're forgetting that GPUs in current gaming systems consume an order of magnitude more bandwidth than the system CPU.

X850XTPE is generally acknowledged to be bandwidth limited at 37.6GB/s. RSX has 22.4+15=37.4GB/s available to it.

Cell could be programmed to perform vertex processing to help RSX, rendering vertex data for the next frame while RSX renders the current frame for example. Unfortunately RSX is around 10x more powerful than the combined capability of the 7 SPEs in Cell (180GFlops versus 1.8TFlops, according to Sony).

So Cell isn't going to help very much with graphics...

Jawed
 
Tacitblue said:
Just my small take on it, given that the edram unit and the GPU are treated as a single entity in terms of the graphics architecture even if on seperate pieces of silicon, doesn't that mean its an internal bus speed and not a traditional memory to/from CPU or GPU. My take on it is that it was done that way to reduce individual part complexity and improve yields. That kind of fuzzy math would be like Sony adding the element interconnect bus bandwidth inside Cell into the aggregate for the whole system. Apples and oranges even in my comparision but that's about as fair as the FUD i've seen elsewhere.

The eDRAM is there to isolate the frame buffer bandwidth from the rest of the system. The frame buffer does not need to be big, it just needs to be fast.

So instead of paying $$$ for 512MB of super fast memory to compensate for the frame buffer, MS spent $$ for 512MB of relatively fast memory and isolated the frame buffer with a small memory pool that is really fast.

Just make believe numbers, what if you have 512MB of memory that has 50GB/s of bandwidth. If the frame buffer and all the effects (AA, HDR, alphas, stencils, HDR, etc...) at 1080i took up 40GB/s of bandwidth, that leaves a mere 10GB/s for texture fetches, geometry, and the CPU pulling information from the memory.

On the reverse, lets say you had 512MB of memory that only had 25GB/s of bandwidth. Now what if you could do all 40GB/s of that in a 10MB space (which has 256GB/s of bandwidth). That leaves you 25GB/s for texture fetches, geometry, and the CPU pulling information from memory.

Those are made up numbers, but you can see how an isolate framebuffer and save you system bandwidth.

That said I do not expect the PS3 to do 1080p with 4x AA, HDR, and the like at 60fps. The CELL has 256MB of XDR and the GPU has 256MB of GDDR3 at 23GB/s, and they can share. The PS3 gets close to 50GB/s of total system bandwidth, so it is no wuss.

The real questions I have are

1. Is the 10MB of eDRAM a fair tradeoff for 23GB/s of memory? Do we expect the frame buffer to save 23GB/s of bandwidth?

2. The IQ difference of 4x AA, specifically is the PS3 going to be able to match the high levels of AA + other features. 4x AA at 1600x1200 is a big hit to SLI 6800U on todays games, what about tomorrows games with 10x the geometry and even higher resolution of 1080p?
 
Re: 360 vs the PS3

plat22433 said:
2. If you think the memory bandwidth comparison is unfair, you're missing his point. The point the article clearly makes to me is that the PS3 is severely limited in bandwidth
1) The article talks about aggregate bandwidth. What's that supposed to show?! If I have a chip A and a chip D, and they need to share data between them, if there's 2 memory controller chips B and C between them with seperate buses like this...

A ... 20 Gb/s ... B ... 20 Gb/s ... C ... 20 Gb/s ... D

That's an aggragate bandwidth of 60 Gb/s, but that figure tells you nothing about how quickly A and D can communicate, which is 20Gb/s and with added latencies due to the extra steps. With a straight A ... D bus you get 20 Gb/s that would be faster than the above 60 Gb/s aggregate.


2) On this following matter I'm confused and may not understand properly, so someone please corrent me if wrong, but going RIGHT back to basics.

There's a front buffer, in RAM, which is shown to the screen...
There's a back-buffer on which drawing occurs...
There's a GPU that takes vertex and texture data and renders the image to the backbuffer...
There's vertex and texture data in RAM to be sent to the GPU.

The difference between RSX and XENOS is the backbuffer, in DDR for RSX and in eDRAM in XENOS.

Now both GPUs need to be supplied data to render anything. This is many megabytes of data with multiple high-res textures, complex models etc. More data then will fit in XENOS's eDRAM even without the backbuffer taking up it's share. Both GPUs, when rendering a scene, will still be receiving data from 20-25 Gb/s bandwidth supplies. Surely that's the limiting factor for both then? The eDRAM has it's advantages (which I don't really understand, especially as DaveB says it renders in tiles so maybe it's got some cacheing going down?) but it doesn't provide 256 Gb/s total bandwidth for use with textures and other data. That's apparently only for writing to the back-buffer. (Also, isn't that 256 Gb/s actually 64 Gb/s real, 256 Gb/s equivalent? And isn't there talk that it's something like 32 Gb/s second and the 256 Gb/s is only in the internal logic of the eDRAM?)

nVIDIA have said they are constantly evaluating other rendering options like unified shaders, eDRAM, etc. and this time around felt it didn't benefit them. ATI aren't including eDRAM on PC GPUs either, despite the same bandwidth limitations there. eDRAM can't magically elliminate all bandwidth limitations or they'd all be doing it. Also XB's performance was better than PS2's, despite having no eDRAM and lacking its large aggregate bandwidth.

Surely both GPUs are going to be restricted by main RAM bandwidth, and if so, RSX has the advantage in having maybe 2x the bandwidth available (22.5 DDR +share of 25 XDR vs share of 22.5 DDR)?

Regardless, the comparisons MS makes are 'if Sony run the same code as we do, doing things the same way, they'll be slower' without considering 'if Sony's games do things differently to take advantage of their particular hardware design's strengths' which is why it's a very bad technical document to base any opinions on. I'm sure Sony could produce similar lop-sided breakdowns (God forbid!!!) to show how lame and crippled 360 is. Both consoles are capable, and trying to make out either one has serious design flaws is a damning endictment against the skilled engineers that worked on them, IMO.
 
RSX's input bandwidth is 22.4GB/s + 20GB/s. R500's is 22.4GB/s + 10.8GB/s - textures and geometry respectively, in simple terms.

So on the face of it, RSX has 10GB/s more input bandwidth.

But RSX's 22.4GB/s is eaten up by raster output: blending, AA filtering, and z-testing.

RSX has 37.4GB/s of output bandwidth (22.4GB/s + 15GB/s). There's no doubt that the 15GB/s to Cell are real and usable, e.g. for post-processing (depth of field was an example, which would consist of multiple render targets processed against the final frame).

The basic problem is that raster output chews up a lot of bandwidth. 37.6GB/s isn't enough for X850XTPE and slightly less for 6800 Ultra isn't better either.

6800 Utra's key advantage is its ability to generate a stencil shadow twice as efficiently as X850XTPE, i.e. making better use of available bandwidth. It's about the only time (in heavy graphics at high resolution) when bandwidth isn't such a constraint (there's no texturing to be done, no blending and no AA either).

Jawed
 
360 vs the PS3

The entire point here is that the RSX's bandwidth to the back buffer is insufficient. In order to pull off HD resolutions with effects it would require 52 GB/sec, which is far beyond what it is capable of. The 360 on the other hand has 256 GB/sec of bandwidth.

It is important to remember that things like 4x AA, depth/stencil buffers, HDR surfaces, etc plus OVERDRAW all cut away at this bandwidth. As such, the PS3 will require a VERY efficient renderer, and VERY low resolution/effect quality in order to avoid a bottleneck here.

The 360 on the other hand can afford for tremendous overdraw, effects, resolutions, antialiasing, and still never be bandwidth bound.

This is such an important fact - and clearly in my mind was the reason for showing the bandwidth as aggregate. The single most important bandwidth in these two specific systems is that between the GPU and the backbuffer - since that tends to be consumed by large and timely data transfers, rather than many small transfers. As a result, I feel it is a fair comparison to highlight the severe deficiency of the PS3.
 
I think its nonsense to claim the RSX can't handle HD resolutions when a GeForce 6600 GT can. There is a difference between not being able to hit your peak throughput and not having a good *effective throughput*. The effective throughput of a Geforce 6800 Ultra for example, is 4.5 Gigapixels/s, which is higher than the XBGPU. The RSX with less bandwidth is likely to hit 3gigapixels effective, which is more than enough to handle HD resolutions. 2xFSAA consumes very little extra bandwidth due to compression. It's effectively *free* on the Nv4x. 4xFSAA consumes an extra clock cycle. This may or may not be fixed on the RSX. We'll see.

But your comments are a little bit disingenuous, or, just wrong.

Remember, it is effective throughput that matters, not whether you can hit your peak. A card could have a theoretical fillrate of 100gigapixels, but only hit 8gigapixels effective, which is terrible efficiency, but it would still be twice the XBox GPU.

So don't confuse a preference for elegance and efficiency, with real performance.
 
Re: 360 vs the PS3

plat22433 said:
The entire point here is that the RSX's bandwidth to the back buffer is insufficient. In order to pull off HD resolutions with effects it would require 52 GB/sec, which is far beyond what it is capable of. The 360 on the other hand has 256 GB/sec of bandwidth.

I haven't had time recently to read up on the latest gossip, but where does your 52GB/s number come from? Are you really saying (and I really am asking), that the data for each frame to the backbuffer would take ~866MB (assuming 60fps)?
 
Democoder - you're forgetting that output fragments are but a small proportion of the bandwidth consumption of Raster Output. Blending (particularly HDR with 8 bytes of colour data per pixel), Z-testing and AA filtering all require extra memory accesses (read and write) on top of the fragment data being output by the GPU's pixel shader pipelines.

It is the overhead of Raster Output, not to mention overdraw, that kills effective fill-rate.

If you're getting a fill-rate of 100MP/s at 1600x1200 on a 6800Ultra with 16xAF/4xAA that doesn't mean you're consuming 3.1GB/s of bandwidth (59MB per frame at 58 fps - uncompressed, for the sake of argument).

You can't ignore the overheads of overdraw, blending, z-testing and AA filtering.

Jawed
 
Jawed said:
Democoder - you're forgetting that output fragments are but a small proportion of the bandwidth consumption of Raster Output. Blending (particularly HDR with 8 bytes of colour data per pixel), Z-testing and AA filtering all require extra memory accesses (read and write) on top of the fragment data being output by the GPU's pixel shader pipelines.

It is the overhead of Raster Output, not to mention overdraw, that kills effective fill-rate.

If you're getting a fill-rate of 100MP/s at 1600x1200 on a 6800Ultra with 16xAF/4xAA that doesn't mean you're consuming 3.1GB/s of bandwidth (59MB per frame at 58 fps - uncompressed, for the sake of argument).

You can't ignore the overheads of overdraw, blending, z-testing and AA filtering.

Jawed

Hi Jawed,

Are you including anything other than the raw image, or have you folded other information into the 59MB/frame number?

Just the raw output pixels will be 7.68MB/frame at 1600pixels*1200pixels*4bytes/pixel. Of course as you say, there is much more to it than that (overdraw, aa samples, etc), but how did you arrive at 59MB/frame?

Thanks,
Nite_Hawk
 
Re: 360 vs the PS3

plat22433 said:
The 360 on the other hand has 256 GB/sec of bandwidth.
No. From what I understand it HASN'T got 256 Gbytes a second bandwidth.

This is such an important fact - and clearly in my mind was the reason for showing the bandwidth as aggregate.
The aggregate figure is pure baloney though. The bandwith from Xenos to backbuffer, held in eDRAM, is something like 32 GB/s. That glorious 256 GB's bandwidth figure is on-chip communication between the eDRAM storage and logic circuits on the same chip. By that same measurement, MS should have included all the on-chip bandwidth for XeCPU and Cell. Imagine the figures for bandwidth of 7 SPE's to their local storage + level 2 cache etc.!!

I have never seen (or at least noticed) the communications rate between on-chip logic and local storage ever called 'bandwidth'.

The advantage Xenos has is that often used, demanding functions are executed by seperate logic on the eDRAM. The use of eDRAM is basically as a large cache for part of the graphics process. This is a fair design feature of RSX to be worth describing, but claiming it as bandwidth for the GPU is total nonsense.
 
And you're forgetting that the AA downfilter step doesn't occur until the final frame is done, requires minimal bandwidth,and can be handled on "scanout". Moreover, z-testing and overdraw as not as simple as you make out. Z-tests are accelerated by hierarchical-Z, and overdraw is helped by Early-Z rejection.

The only real serious bottleneck for the RSX I'll grant you is alpha blending, but that totally depends on the workload, and all it does is influence the types of workloads developers through at the GPU. For example, the PS2 excelled at alpha blending, hence alpha blend effects were used alot more.

The idea that an RSX level GPU can't handle 1280x720 is sheer nonsense. Like I said, previous generation GPUs have already demonstrated nearly 3Megapixels/s @ HD resolutions and 2xFSAA. Have you been playing all your PC titles at 640x480 these years? Battlefield 2 is coming out soon, has next-gen level polygon counts, and I assure you it will run at 720p @ 60fps on top-end GPUs.
 
Honda & Ken Kutaragi interview

For example, RSX is not a variant of nVIDIA's PC chip. CELL and RSX have close relationship and both can access the main memory and the VRAM transparently. CELL can access the VRAM just like the main memory, and RSX can use the main memory as a frame buffer. They are just separated for the main usage, and do not really have distinction.

This architecture was designed to kill wasteful data copy and calculation between CELL and RSX. RSX can directly refer to a result simulated by CELL and CELL can directly refer to a shape of a thing RSX added shading to (note: CELL and RSX have independent bidirectional bandwidths so there is no contention). It's impossible for shared memory no matter how beautiful rendering and complicated shading shared memory can do.
 
Nite_Hawk said:
Just the raw output pixels will be 7.68MB/frame at 1600pixels*1200pixels*4bytes/pixel.
You have to count Z/stencil, which is normally another 4 bytes for the pixel.

Of course as you say, there is much more to it than that (overdraw, aa samples, etc), but how did you arrive at 59MB/frame?
Instead of 15MB per frame you have to account for the 4xAA samples, which in uncompressed form is 4x (colour + z/stencil=8 bytes) per pixel, i.e. 32 bytes per pixel, 59MB.

Check the back-buffer calculaton here:

http://www.beyond3d.com/reviews/sapphire/512/index.php?p=01#why

Jawed
 
The size is memory has nothing to do with the bandwidth consumed, compression is always on (on NV architecture, color buffer compression even when AA is disabled) but in general, with MSAA compression works *very well* and achieves a bandwidth reduction near what you'd expect. Of course, you can always posit a pathological scenario where every pixel on the screen is a polygon edge, but we know in reality that this rarely happens, even in so-called next-gen titles.

The only way you get to use your uncompression numbers is if FP16 can't support MSAA and compression, in which case, its worse than that, since supersampling will have to be used.
 
Back
Top