An 8600gts RSX instead of a 7800 gtx

I tried to do the same for the 360.
If Ms were about to do a 360 now with the same transistor budget, they would likely use a HD4550 derivated part.
Here two links to reviews of this card:
http://www.pcper.com/article.php?aid=628&type=expert
http://www.anandtech.com/video/showdoc.aspx?i=3420
The comparison is interesting as both parts are close in transistor budget (I don't count memory).
The HD4550 is made out of 242 millions transistors.
The xenos is made out of 232 millions transistors + 70 millions on the daughter die.
Xenos is slighty bigger (~15%) the HD4550 is clocked higher (~20%).
On top of that the HD4550 is likely to spend some transistorstors on its pretty advance media accelerator (don't remember the exact name).
Overall it think that it's fair to estimate that the only difference between both part is design and that's design that would decide the perf behaviour.

We should focus on games available on both 360 and PC:
Bioshocks, quake wars and oblivion.
It's clear that from these results the HD4550 is more than a match to the xenon (and in regard to texture filtering... well it's in another league).
And it only granted with 12,8GB of bandwith.

Could such a GPU could have made dev life easier? I think so.
The render back ends do a so much better use of the availalble bandwith that I'm not sure that MS would have chosen to use edram.
No edram => no tiling.
More freedom to the developpers.
And a deferred render would have done marvel on the chip.
Directx10.1 compliance.
The 360 would be even cheaper and more potent.

My post is not that much on topic (but not vs in any way) but it just cool to make clear that manufacturers not only throw extra clock cycles and extra-transistors to the problem.
It's clear that the HD4550 is better by design (so per transistor) than xenos and less demanding system wize at the same time :=)
 
Last edited by a moderator:
We should be a bit more specific when we say 7900 because RSX is definatly not on par with a 7900GTX in terms of shader power. The 7900GTX is a full 30% faster in that measure.
Yeah 7900GTX comes at 650 MHz, whereas RSX is at 550 MHz (according to da onlinez anyway). That's the same as that special G70 "7800GTX" 512MB, actually. The more typical 7800GTX 256MB comes at 430MHz.
 
I tried to do the same for the 360.
If Ms were about to do a 360 now with the same transistor budget, they would likely use a HD4550 derivated part.
Maybe in some twisted version of the past that has modern GPU technology but is limited to 90nm... ;)

For the future, I'm sure the next ATI-designed console GPU will be faster than even a 4850.
 
Maybe in some twisted version of the past that has modern GPU technology but is limited to 90nm... ;)

For the future, I'm sure the next ATI-designed console GPU will be faster than even a 4850.

If the xbox next is 2011 we will see something much further beyond that. It will most likely be based on dx 11 and some where in the radeon 6850 line up or whatever it will be called. After all the r800 is predicted to come out later this year.
 
It's clear that from these results the HD4550 is more than a match to the xenon (and in regard to texture filtering... well it's in another league).
And it only granted with 12,8GB of bandwith.
I don't think you're right about that at all. The HD4550 doesn't have near the shader ability of Xenos (it has basically 240 less efficient stream processors vs. 80 in the 4550's), and in 2005 they couldn't get near 750 MHz, either. Alpha blending - a big source of framerate slowdown during effect sequences - would be 1/3 as fast on the 4550 or even less. Minimum framerate is very important to a console. Finally, your reviews ignore AA, too.

For the purpose of a 2005 console, Xenos was probably even better than the HD 4000 series architecture. It doesn't have the extra baggage of DX10, and the EDRAM gives more BW to the CPU.
 
Maybe in some twisted version of the past that has modern GPU technology but is limited to 90nm... ;)
The HD4550 is made out of 242 millions transistors.
The xenos is made out of 232 millions transistors + 70 millions on the daughter die.

So the extra 10 million trannies (not including daughter die) would not have been possible on 90nm? Or the theoretical 90nm 4550 wouldn't be able to reach a sufficient clockspeed?

Edit: It seems Mintmaster has answered my question :)
 
I don't think you're right about that at all. The HD4550 doesn't have near the shader ability of Xenos (it has basically 240 less efficient stream processors vs. 80 in the 4550's), and in 2005 they couldn't get near 750 MHz, either. Alpha blending - a big source of framerate slowdown during effect sequences - would be 1/3 as fast on the 4550 or even less. Minimum framerate is very important to a console. Finally, your reviews ignore AA, too.

For the purpose of a 2005 console, Xenos was probably even better than the HD 4000 series architecture. It doesn't have the extra baggage of DX10, and the EDRAM gives more BW to the CPU.
I'm not sure I have the knowledge to understand all your points so I prefer ask than stay dumb.
Ok the xenos is made of 48 ALUs each of them consists of 5 wide SIMD (4+1 in fact).
So I get from where comes your "basically 240 less efficient stream processors"
Following the same idea, the HD4550 could actually be consider as made out of 16 ALUs.
So the chip has around a third of the xenos shader power (I pass on efficience and clock).
I don't know what is the bottle neck for Alphablending from your comment compute power is the cumprit (1/3 ratio). Do I understand properly?

In regard regard to your 750MHz I don't get from where it comes.
The HD4550 & 30 both run @600MHZ while VRAM are clocked @ 800MHZ and 500MHz clocked. So Bandwith seems the main culprit in performances differences.
The difference in perfs is lesser than the difference in bandwidth.
@ the time the 360 came out 700 MHz was a high frequency for gddr3, is your comment related to this value?
But see my comment above actually HD4550 may run fine with 700MHz VRAM (seems that there are bottlenecks elsewhere, and I pass on AA, and the cpu could have a little less bandwidth to play with).

For AA, so many games pass on it that as reviews I don't think it's an important factor.
Really few games offer AA on 360... sadly
In regard to the reviews I lnked the only game tested with AAx4 is Quake wars.
In highest quality with AAx4 @1024*768 it runs @36fps (average).

In regard to your comment about directx10 baggages do you think they are responsible for the lack of shading power per transistors? (I mean xenos is ~300 millions transistors the hd4550 240 millions, it is 1/6 less and the chip ends with a third of the shading power and with a higher clock).
Could the amount of transistors devoted to media acceleration be a culprit here?

Does somebody knows which pc settings match 360 quality for bioshock, quake war, oblivion? That would make comparision more relevant :)

So if I understand properly your comments the main concern would be lack of shading power in regard to xenos and that even fine tuning/optimizations wouldn't allowed for sustained 30fps in most cases.

I still think that the tech behing the HD4550 is impresive, MS may have dump unwanted logic and pack more ALUs. They could have pass on eDRAM too ( and on AA in most cases). The 360 might be dirty cheap too produce.
 
Last edited by a moderator:
So the extra 10 million trannies (not including daughter die) would not have been possible on 90nm? Or the theoretical 90nm 4550 wouldn't be able to reach a sufficient clockspeed?

Edit: It seems Mintmaster has answered my question :)
Mintmaster didn't really answered your question (and I'm not questioning the validity of his comment, he is a dev ;) )
Yes ten millions transistors (or more) would have been possible @90nm.
The RSX is >300 millions transistors and it started as a 90nm part ;)
For the clock speed, it would have been possible but it might have not met MS goals in regard to heat disspation/power consumption ;)
 
Mintmaster didn't really answered your question (and I'm not questioning the validity of his comment, he is a dev ;) )
Yes ten millions transistors (or more) would have been possible @90nm.
The RSX is >300 millions transistors and it started as a 90nm part ;)
For the clock speed, it would have been possible but it might have not met MS goals in regard to heat disspation/power consumption ;)

I think Mintmaster did answer my question indirectly; RV710 isn't as fast/well suited for a 2005 console as Xenos ;)
 
Mintmaster didn't really answered your question (and I'm not questioning the validity of his comment, he is a dev ;) )
Yes ten millions transistors (or more) would have been possible @90nm.
The RSX is >300 millions transistors and it started as a 90nm part ;)
For the clock speed, it would have been possible but it might have not met MS goals in regard to heat disspation/power consumption ;)

I always thought Mintmaster was an armchair expert or maybe a PC software developer and not a console developer. Correct me if I'm wrong.
 
I always thought Mintmaster was an armchair expert or maybe a PC software developer and not a console developer. Correct me if I'm wrong.
You're comment make me question my memory. I guess he can clarify the situation ;)
 
In regard regard to your 750MHz I don't get from where it comes.
The HD4550 & 30 both run @600MHZ while VRAM are clocked @ 800MHZ and 500MHz clocked.
Oh, my mistake. I must have been thinking of RV730.

Still, when you take RV710 and downclock it to 500MHz then it'll probably be half the speed of Xenos.

For AA, so many games pass on it that as reviews I don't think it's an important factor.
Really few games offer AA on 360... sadly
It's not that few. Look at the resolution list. 2xAA is still AA, and that wasn't enabled in any of the 4550 reviews you showed.

In regard to your comment about directx10 baggages do you think they are responsible for the lack of shading power per transistors? (I mean xenos is ~300 millions transistors the hd4550 240 millions, it is 1/6 less and the chip ends with a third of the shading power and with a higher clock).
Could the amount of transistors devoted to media acceleration be a culprit here?
Both will be significant. Maybe it's possible to take RV730 and trim it down from 500M to 350M transistors by taking out everything that doesn't need to be there for a console, but it's tough to say. Then, when you reduce the clock down to 500MHz, the overall performance advantage over Xenos may be 20%.

We can talk all day about a 4000 series based GPU vs. Xenos, but we just don't have the data to make any conclusions. Besides, even ATI didn't have that technology back then. R600 and RV630 were terrible in terms of perf/mm2, and while RV670 was better, it still wasn't good enough, and definately worse than RSX.

I always thought Mintmaster was an armchair expert or maybe a PC software developer and not a console developer. Correct me if I'm wrong.
Mostly an armchair expert. Former hardware engineer (a couple summers at ATI) and PC engine coder (a startup that didn't have enough money to keep going).
 
I'm not sure I have the knowledge to understand all your points so I prefer ask than stay dumb.
Ok the xenos is made of 48 ALUs each of them consists of 5 wide SIMD (4+1 in fact).
So I get from where comes your "basically 240 less efficient stream processors"
Following the same idea, the HD4550 could actually be consider as made out of 16 ALUs.
So the chip has around a third of the xenos shader power (I pass on efficience and clock).

It actually has 44% the raw shader power of Xenos, but greater efficiency would easily push it over the 50 or perhaps 60% mark.

The 4550 is simply too cut down to compete with Xenos. It might achieve 60% of its performance when memory bandwidth isn't too much of an issue but thats the best you can expect.

Now the 4650 on the other hand, would be a much more interesting comparison. It runs circles around Xenos in all respects apart from bandwidth since it has only 16GB/sec. This seems incredibly unbalanced IMO so it would completely depend on the games requirements as to whetehr the 4650 or Xenos would be faster.

The 4670 greatly reduced the bandwidth issue by doubling memory speed so I expect that GPU would have been an execllent alternative to Xenos had the 360 been launching today rather than 3 years ago!
 
It would have had a little less theoretical power, but I can imagine the system would be an awful lot more flexible overall in terms of what you can do graphically.
 
Some data about the HD4550 & 4650

HD4650:
Transistor Count: 514M
320 stream processors
32 texture units
(It's made of 8 arrays, each array contain 8 ALUs (5 wide) and 4 texture units).
8 render back ends
128bits bus 500MHz DDR3 => 16 GB/s

HD4550:
Transistor Count: 242M
80 stream processors
8 texture units (Two of the same array as the 4650 I guess)
4 render back ends
64bits bus, 800MHz DDR3 =>12.8 GB/s

It's even clearer that ALU are cheap, basically the HD46xx packs 4 times the ALU count and two time the ROP count with ~twice the the number of transistors.

A chip including 6 arrays of stream processors/texture units with an fine tuned number of render back ends without the UVD2 egine may have ended a reasonable alternative to the xenos and tinier.
The whole system would be quiet cheaper to produce: no edram, simplier mobo.

It would have had a little less theoretical power, but I can imagine the system would be an awful lot more flexible overall in terms of what you can do graphically.
I'm not sure. I guess you speak about constrain tiling induce.
Basically the gpu and the cpu would fight for the 22.8GB/s of available Bandwidth.
The GPU may have ended with less than that (in the worse case the cpu can use up to 10GB/s).
To make the most of the system I feel that Ms should have pushed deferred renders ala KZII. I think they would have succes in delivering proper tools/etc. but if we speak about flexibilty... well I don't know.
I remember reading that tiling somewhat limit usefulness of a function like memexport, does Deferred renders would have the same effect? (no clue....).
And it would force a design choice on dev team:
now it's: you want AA => tile
and it would be, want good perfs? => deferred render

Between would be fun as lead on the 360 would make the ps3 renditions of multiplatforms games better
 
I don't know what is the bottle neck for Alphablending from your comment compute power is the cumprit (1/3 ratio). Do I understand properly?

Alpha blending and antialiasing on Xenos are done completely by the EDRAM dauther board, and do not stress the GPU at all (basically blending on Xenos is free). On other hardware, blending requires a lot of extra bandwith (read previous pixel from memory, blend together, write blended pixel to memory). This is important when you are rendering lots of alpha blended particles for example.
 
Alpha blending and antialiasing on Xenos are done completely by the EDRAM dauther board, and do not stress the GPU at all (basically blending on Xenos is free). On other hardware, blending requires a lot of extra bandwith (read previous pixel from memory, blend together, write blended pixel to memory). This is important when you are rendering lots of alpha blended particles for example.
thanks for your response ;)
 
I'm not sure. I guess you speak about constrain tiling induce.
Basically the gpu and the cpu would fight for the 22.8GB/s of available Bandwidth.
The GPU may have ended with less than that (in the worse case the cpu can use up to 10GB/s).
Talking PS3 and RSX, unless I'm mistaken I thought that the 8600GTS would offer a higher read/write bandwidth than the RSX?
To make the most of the system I feel that Ms should have pushed deferred renders ala KZII. I think they would have succes in delivering proper tools/etc. but if we speak about flexibilty... well I don't know.
I remember reading that tiling somewhat limit usefulness of a function like memexport, does Deferred renders would have the same effect? (no clue....).
And it would force a design choice on dev team:
now it's: you want AA => tile
and it would be, want good perfs? => deferred render

Between would be fun as lead on the 360 would make the ps3 renditions of multiplatforms games better
Well, from a pure architecture standpoint, the 8600 chipset is far more flexible in its functionality. Whilst the RSX is in no way a bad GPU or necessarily a limiting factor for the system, the 8 series chip with its unified architecture and SM4 based credentials could offer a lot more interesting implementations for visual techniques and perhaps making for even more exciting applications using cell as the GPU could already combat several graphical challenges.

Afaik, issues such as deferred rendering with MSAA and transparencies could actually be combated thanks to the programmability of the 8 series (i.e. not requiring two separate renderers) having checked out demos and such of this in DX10, leaving the CPU free to compute other complex algorithms. Then to consider the efficiency, the possibilities for geometry shader applications (in terms of tessellation, shadows, reflections, lighting) on board the GPU and many other things related to the better architecture.

As for the 360, I heard that an issue (although not a complete barrier) when it comes to deferred rendering & tiling is the physical size of the eDRAM unit we come into contact with. Sorry if I'm wrong :)
 
Back
Top