A thought on X-Box 2 and Creative Labs.

You make the exact same mistake that Sony does, assuming that an impressive set of techical specifications in the abstract sense makes for a better console

Heh.. that was MS marketing and their comparason graphs in 2001.

As far as using a "whopping" TFLOP for a rasterizer, I have to assume you've never worked with software rasterizers at any length(high end packages). Using a title like MDK2 which features a pure software rasterizer, although extremely primitive by comparison, a GHZ x86 CPU pushes 1%-2% the framerate that a GeForce1 SDR does, and even then(when using hardware) the limit is still CPU based as the processor can't push the game code fast enough.

Umm.. no. Thats perhaps the most rediculous comparason possible. MDK2 wasn't designed for a software rasterization pipeline and it shows. It [in software] is just emulating the hardware (OGL) specs that they designed to and calls that use...

Remember 'Termite Games' - I think a programmer from their used to post here. Their DVA engine, back a year or 2 ago... may have changed, comes to mind because it used a basic CPU and some agressive visability and culling to show onscreen polygon counts and dynamic lighting way ahead of it's time. Nobody's designing for a fully software drived 3D pipe, yet... So comparing games with 'software' support is just wrong.

Do you not read my posts? I don't even think they're will be fragment shading in 5 years, so wheres the problem? Sonys backing the Stanford based, Real-Time High Level programmable Shading project... my guess is they're doing that for a reason.

Besides, if your drawing a 80,000 polygon mesh/character, whats wrong with shading at a per vertex level? When the size of the average polygon nears that of a pixel, your almighty hardware rasterizer breaks down [Without massive design philosophy changes] and you can achieve sub-pixel accuracy with vertex shading.

http://www.itworld.com/Comp/1437/itwnws_01-03-12_supercomp/

PS. CELL looks to be low-k di-electric, CU interconnects, SOI on a sub-0.1um. The 0.10um SOI that was liecensed was later stated to be the processed used in the combined Emotion Engine-GS chip.
 
Since this is 3D Hardware and not just related to Consoles on the PC side I think the AGP bus is fast becoming the next hurdle.

What ever happened to HOS? Was all the rage, then slowly died out.... Need more modern generation cards out there I guess.
 
You make the exact same mistake that Sony does, assuming that an impressive set of techical specifications in the abstract sense makes for a better console.

Do they make that mistake? The 'numbers' delivered are rather solid numbers based on specific cases of what the hardware 'will' do. MS has done no different really, and Nintendo's numbers don't mean a whole lot because it's just a guess what they believe developers will attempt to acomplish. In end, advertised numbers are just their for consumption of platform fanboys.

Let's look at the EE vs the P3 used in the XBox, absolute and utter obliteration for Sony, yet they still get whipped in the graphics department and cost more to develop for.

This is a bit of a reach because you're basing the judgment of a discrete peice of silicon based on the product of the whole?

PS3 should amplify this greatly.

"Should" or "could?"

This is a much worse scenario then the PS2, there they were using a modified MIPS processor which is something developers had been working with for many years and even then it is taking them years to get the hang of it.

I think you have a bit of a misunderstanding of where the difficulties of the PS2 lie. Even with the funky register layout, and MMI instructions, the MIPS based EEcore is hardly something developers are having difficulties with (if they are then they have other serious issues to work out).

If MS was going to try and build a high level API from scratch for a chip that wasn't even built yet for XB2 I'd be saying the same thing about them.

Well to some extent they are with DX9 (as is OpenGL 2.0)

As far as using a "whopping" TFLOP for a rasterizer, I have to assume you've never worked with software rasterizers at any length(high end packages). Using a title like MDK2 which features a pure software rasterizer, although extremely primitive by comparison, a GHZ x86 CPU pushes 1%-2% the framerate that a GeForce1 SDR does, and even then(when using hardware) the limit is still CPU based as the processor can't push the game code fast enough.

Well the problem with this assumption is you're using an x86 processer (well actually the PC architecture as a whole + software environment) as a basis for an argument for against designing hardware for target that poses almost none of the design constraints face by those designing hardware for the PC market or products that are going to share architectural aspects across various markets.

You're also contradicting yourself a bit by extolling the virtues of a "fully" programmable GPU vs. a software rasterizer (assuming you're talking about total rasterization and not just setup), since code running utilizing a "fully programmable" GPU is essentially a software rasterizer in itself (excepting certain fixed-functions like setup).

Trying to emulate pixel shaders on a CPU you will be closer to 0.1% of comparable time frame dedicated hardware, which means you would be slower then dedicated hardware several generations old.

Could you elaborate more on what you're trying to point out there? It's seems a bit too much of a broad generalization.

Looking at the TFLOPS/GFLOPS numbers for a CPU v a rasterizer is useless as dedicated hardware needs a lot less operations to complete the same task that a CPU does.

I think you really need to realize just how much computation a teraFlop is... Neither the GScube (the 16 that I've used, and the 64 that I've seen, but not used), nor the SX-4 and 5 that I got to mess with in school were acheiving even half that much computation. And I have yet to see anything done in real-time on any current GPU that comes close (the P10 will be interesting though).

Of course I won't get into whether Sony/Toshiba/IBM (STI? ;) ) can actually create a 1TFLOP part or not.

EDIT: I see Vince responded in better detail...
 
High-end five years ago was Voodoo2 with three 64-bit 100MHz EDO DRAM channels... that's 2.4GB/sec.

Well, first of all, Voodoo2 was 4 years ago. (February, 1998). Second...to be exact, it was three 64 bit 90 Mhz EDO DRAM channels. ;)

Third, if you are going to say that, we might was well doble that to six 64 bit 90 Mhz modules (4.8 GB/sec), due to V2 SLI.

Most importantly though, I'll repeat with emphasis: ;)

At the time, the best single chip solution, bandwidth wise, was the Riva 128. 128 bit, 100 Mhz, SDRAM.

I certainly agree that multi-chip, and multi-board configurations, is one way to attack the problem. But I'm purposely limiting the comparison here to "bandwidth per single chip" to have a somewhat apples-to-apples comparison.
 
My point was that you are already take it as fact that the potential performance differential for alternative architectures has stayed the same over the years ... the fact that the few companies left are sticking to them makes the assumption reasonable, but it doesnt proove anything.
 
Speaking in pure MHz terms... fatest accelerators now use 300MHz DDR (equivalent of 600MHz)... and GPU's have increased clockspeed from xx amount to 300MHz+

Memory tech has improved dramatically, people are just saying they want/need more bandwidth right now and probably forever.... Dave Perry said something about developers being a lot like gas...gas fills whatever volume it is in [or something like that]...and still wants to push out

8)
 
Speaking in pure MHz terms... fatest accelerators now use 300MHz DDR (equivalent of 600MHz)...

I wish people wouldn't say "600MHz" equivalent for something like 300MHz DDR-RAM (because it's not)...

Memory tech has improved dramatically, people are just saying they want/need more bandwidth right now and probably forever.... Dave Perry said something about developers being a lot like gas...gas fills whatever volume it is in [or something like that]...and still wants to push out

Well bandwidth by itself is useless. Useful bandwidth as a byproduct of several factors (clock, bus-width, latency), and it's usually those factors influence that memory performance and deliver bandwidth developers seek.

All things being equal, given the choice of 300MHz DDR-RAM and 600MHz SDR-RAM, I'd take the later in a heartbeat even though they both provide the same theoretical bandwidth.
 
What about HOS?

The reason you don't hear about HOS any more is that once they were available, everyone tried them, and decided they weren't that useful in video games.

Higher Order Surfaces turned out to be difficult to author, difficult to control, slow to render on current hardware, not very bandwidth efficient at low levels of detail, and not implemented uniformly across different hardware architectures.

On the other hand, triangle meshes are easy to author and use, only take up twice as much space at low levels of detail, and render really quickly on all modern hardware.

HOS isn't even used very much as a non-real-time data compression technique. Quake 3 uses them a little, and I think Jax and Daxster has an ellipsoid primitive, but most games don't bother. This is because just about all you can do with HOS is make smoothly curving surfaces, which aren't that common in video games. (Most landscapes and architecture aren't smooth, and most parts of most monsters / players aren't smooth either.)

I think per-pixel displacement maps on top of normal polygons will become common in the future, but I'm not sure when we'll see traditional HOS (either NVIDIA's nurbs or ATI's smothed meshes) used very much.
 
Vince said:
Remember 'Termite Games' - I think a programmer from their used to post here. Their DVA engine, back a year or 2 ago... may have changed, comes to mind because it used a basic CPU and some agressive visability and culling to show onscreen polygon counts and dynamic lighting way ahead of it's time. Nobody's designing for a fully software drived 3D pipe, yet... So comparing games with 'software' support is just wrong.

The programmers name is Jim Malmros from Sweedish based Termite Games. They are close to finishing the DVA engine and a Multi-Player test should be out soon. I don't know if the DVA engine will support software rendering anymore.


EE? GS?
Somebody?

Emotion Engine and Graphic Synthesizer the chips found in the Playstation 2.
 
Ah, well I still say he's off his rocker ... the main problems to be solved with large scale distributed systems are algorithmic and its way too early to limit yourself by trying to pour it into hardware. Waste of time since the networks themselves dont exist, cant exist even without cheap fibre speed photonic switching.

Mr. Kutaragi still see's Cell as a computation node though ... not a pure communication one, so the EE/Cell seperation does not make a whole lot of sense if you want to take his interview at face value. Personally I think its a bit of hocus pocus to confuse the competition, I doubt they are spending all that money to make an architecture which only makes sense for supercomputers which can lay a fibre backbone.
 
DaveB-

Well ben, a 128bit bus would have to be operating at 3Ghz for 50GB/s of bandwidth. That aint gonan happen, you need at least a 256 bit wide bus but then you are still talking 750Mhz/1.5ghz DDR memory. 325Mhz QDR with a 256bit wide bus would manage that, the expense would be rediculopus though.

'96 100MHZ SDR 1.49GB/sec

'01 300MHZ DDR 8.49GB/sec

'06 900MHZ QDR 53.64GB/sec

That's if we stick to a 128bit bus.

Darren-

Maybe five years ago yeah, but XBox will need to be ready to go in about 4 years from now.. its specs will certainly need to be finalized in less then 4 years.

Four and a half. Let's see where bandwith sits in November if you want to shave a year off :)

Yeah I realise that, still at 1920x1080x64 4xFSAA with a 32bit compressed Z-buffer and around 5 overdraw at 60fps we're still looking at around 28gb/s only for the framebuffer and Z-buffer. Now add CPU bandwidth. texture bandwidth, geometry bandwidth ect. If your right about 50gb/s for highend video ram in 2006 then XBox 2 is going to need that highend ram for those settings (also what if people want to move to 100fps for consoles?).

1080i is interlaced, your bandwith numbers are double what they should be. Moving over 60FPS on a console? TVs don't do that well when you disable VSync, until the standards are revised sometime around thirty years from now your talking closer to 14GB/sec. Bandwith to spare. And of course, you are assuming that they won't be moving to an embedded framebuffer. Given Moore's law, that should be fairly simplistic(the N64 had 4MB total RAM, the GC has 75% embedded).

Even if I did the cost factor is still their as we're talking about XBox 2 using highend cutting edge 50gb/s ram (that I don't even think will excist ).

50GB/sec I think is very conservative, wait and see where we are sitting at the end of this year.

With a PowerVR design it could do 1920x1080x64 4xFSAA at 60fps (outputing at 32bit) with only 500mb/s bandwidth for the framebuffer and no Z-buffer! Which means it could use extremely low end ram, or normal low to mid end ram which means a large price difference as well as having loads more bandwidth available for textures, CPU ect.

That sounds real nice. What about the fact that you will likely be dealing with more polygons then pixels? Changes the bandwith factoring quite a bit.

Opinions I've heard say that Xbox looks best, GameCube second and PS2 third.

Massive model corruption detracts a bit more for a game then those people I guess :)

I was talking about 640x480x32 4xFSAA not just 640x480x32. Also your not factoring in overdraw. With the limited shared main memory bandwidth available to XBox the Kyro III would push allot more fps and leave more texture bandwidth left over.

100Million polygons per second. Rely on fillrate on a console and any vanilla PC is going to throttle you.

I'm not sure what you mean here, wether Nvidia do XBox 2 or IMG/VIA their's always going to need to be a North/South bridge as well as a CPU and graphics and sound chips.. so what's your point here?

nV supplys this all in their cost. They have combined functionality.

Why would they be pushing for strong OpenGL support in their next console? As for IMGTEC not being known for OpenGL support, until they were known for poor OpenGL support, but Kyro changed that, Kyro II's OpenGL drivers are very impressive.

Development costs are skyrocketing, dev houses are laying people off and closing up because of this. Backing a high level API with a decade of refinement behind it along with widespread developer knowledge of it makes it the only logical choice outside of DirectX ;)

KyroII has good OpenGL drivers? They must have improved a staggering amount since I had one.

We don't have 10GB/s though, theoretically we do, but in effect when compared to SDR ram DDR is not twice as fast.

Compare the GF2 MX 400 to the GF4 MX. Crossbar makes a big difference.

Joe-

? Simply put, my own extrapolation is that bandwidth will be "no more of a problem in 5 years, than it is a problem now

What are the big bandwith problems left? Currently we have moved from 640x480x16 @30FPS five years ago to 10x7x32x4x @60FPS+ today with increasing bandwith needs for rasterization needs on top of the massive increase we have seen in resolution. We get to 1600x1200x64x4x and then what is left? The increases in bandwith needs due to increasing resolution demands has significantly outpaced those on the rasterization side, not too much longer and that end of bandwith will not be an issue.

Vince-

Umm.. no. Thats perhaps the most rediculous comparason possible. MDK2 wasn't designed for a software rasterization pipeline and it shows. It [in software] is just emulating the hardware (OGL) specs that they designed to and calls that use...

Could have used the Lightworks render engine, MentalRay or Renderman as examples also, although they are significantly slower then that used in MDK2 and they are designed to run in software.

Do you not read my posts? I don't even think they're will be fragment shading in 5 years, so wheres the problem? Sonys backing the Stanford based, Real-Time High Level programmable Shading project... my guess is they're doing that for a reason.

How many years in development? How many years left? That will be real great for game development don't you think? Moving to spline based/HOS/Geometric LOD system is something that works for hardware too.

Nobody's designing for a fully software drived 3D pipe, yet

That's how 3D started, it actually hasn't been that long that hardware support has been around. Gaming didn't create 3D.

Besides, if your drawing a 80,000 polygon mesh/character, whats wrong with shading at a per vertex level?

Where should I start? Filtering is the most serious issue. Rely on per vertex shading and you will need to revert to a rather heavy geometric LOD system with differing vertex shading characteristics at each level of tesselation or run into massive aliasing issues. Then you have the difficulty dealing with multiple environmental effects on every vertex, assuming you want to drop pixel shader support due to the difficulty of emulating them using non dedicated hardware. If you don't, then your screwed with the weak rasterizer support anyway so you are better off trying to force it through using some sort of vertex shading I would assume.

So you chew up a load of processing power on geometry, and then chew up more on a geometric LOD system, then chew up some bandwith along with more CPU overhead to utilize an alternating vertex shader scheme based on distance tied in with your LOD system, then amplify your T&L load significantly by relying on extremely complex vertex shader routines, which you will need to have six or more of at least to avoid serious aliasing issues. Building for the code for all that will be real simple though, right? Particularly using a completely new architecture with a new instruction set to learn, new register architecture and primitive compilers on top of having massive multi threading issues to work around.

When the size of the average polygon nears that of a pixel, your almighty hardware rasterizer breaks down

Why in the world do you think that? I can only assume that you have never worked with sub pixel sized polys on a hardware rasterizer.

Archie-

Do they make that mistake? The 'numbers' delivered are rather solid numbers based on specific cases of what the hardware 'will' do.

I'm not talking about the specifications. I'm talking about something being more impressive from an overall engineering standpoint compared to being a better gaming platform. The XBox is easily the most plebian and boring design out of all the consoles with the PS2 clearly being the most impressive from an engineering standpoint. Doesn't help it.

This is a bit of a reach because you're basing the judgment of a discrete peice of silicon based on the product of the whole?

That is my part of the discussion. Having "Cell" by itself will not make the PS3 better then the others, even if they do have a significantly weaker processor. Add in increased development costs and does it make for a better platform?

"Should" or "could?"

Should. The natural trend is for development costs to increase. MS and Nintendo have gone through great lengths to reduce this, Sony is clearly more interested in engineering accolades.

I think you have a bit of a misunderstanding of where the difficulties of the PS2 lie. Even with the funky register layout, and MMI instructions, the MIPS based EEcore is hardly something developers are having difficulties with (if they are then they have other serious issues to work out).

That IS the point.

Well to some extent they are with DX9 (as is OpenGL 2.0)

To some extent with backwards compatibility built on top of work already doen ;)

Well the problem with this assumption is you're using an x86 processer (well actually the PC architecture as a whole + software environment) as a basis for an argument for against designing hardware for target that poses almost none of the design constraints face by those designing hardware for the PC market or products that are going to share architectural aspects across various markets.

Which shows itself off real well with the Athlon throttling the current comparable MIPS processors in SGI workstations running render tests under Maya. With the exception of the IR class machines and the like x86 PCs have closed the gap with non PC hardware designed for 3D natively.

You're also contradicting yourself a bit by extolling the virtues of a "fully" programmable GPU vs. a software rasterizer (assuming you're talking about total rasterization and not just setup), since code running utilizing a "fully programmable" GPU is essentially a software rasterizer in itself (excepting certain fixed-functions like setup).

The difference is in the level of load it will place on dedicated hardware that is designed explicitly with a set of limited functions, those dedicated to graphics, versus a 'general purpose' CPU which is designed for protien folding.

Could you elaborate more on what you're trying to point out there? It's seems a bit too much of a broad generalization.

Pulling off the same effects that pixel shaders handle on a CPU using software rasterization is roughly 0.1% the speed. You can test this using the MS DX software rasterizer or compare visualization render engines and time the impact applying certain effects has on a render.

I think you really need to realize just how much computation a teraFlop is... Neither the GScube (the 16 that I've used, and the 64 that I've seen, but not used), nor the SX-4 and 5 that I got to mess with in school were acheiving even half that much computation. And I have yet to see anything done in real-time on any current GPU that comes close (the P10 will be interesting though).

Extrapolating out a TFLOP from a GFLOP based on the dozens of software render engines I've used it works out to not even close to being competitive to hardware rasterizers in three years following the current trends. TNT v NV2A. On a GFLOP CPU(actually, a GHZ Athlon with a max GFLOP rating of 4) I've seen what kind of frames I can render out in 3 minutes, 1/10,800 of real time. When the Sony die hards were saying 6TFLOPS in 2003, which they were, I still wasn't impressed in terms of what it could do compared to hardware rasterizers(my three minute test would cover that also, in theoretical terms it would cover up to 40TFLOPS). Of course, I'm using render engines that have only had roughly a decade of refinement and tweaks to perform at their maximum on x86 hardware. You get better anti aliasing and filtering then we currently have by a sizeable margin, but lack the level of model complexity and effects unless you want to push the render times over the three minute mark.
 
We get to 1600x1200x64x4x and then what is left? The increases in bandwith needs due to increasing resolution demands has significantly outpaced those on the rasterization side, not too much longer and that end of bandwith will not be an issue.

Heh, generally I agree. I said nearly the same exact thing in some other thread. ;) Though for me, the "point of diminishing returns" is more like this:

1600x1200x32 x 16X AA + advanced filtering. 60 FPS. I think there is a clear difference between 4X and 16X FSAA...though beyond that it may get tough to justify for real-time gaming. I may end up agreeing with you on the 64 bit color issue though, at least for the back-buffer. ;) I also might increase the FPS to 85, to coincide with syncing at a reasonable monitor refresh rate.

Once we reach that point, any additional bandwidth required, will be for increased rasterization demands...which I agree tend to increase at a much slower pace.

I'm interested in doing some extrapolation to find out what kind of absolute bandwidth we're talking about here. I still haven't found any benchmarks for a game like Quake3, at 1600x1200x32, 4X FSAA, 8X anisotropic filtering, on a GeForce4 Ti.

Anyone care to help out?
 
Its all moot IMO, it seems very unlikely Sony will not use a dedicated graphics unit for the PS3 ... with the research into the high speed pipelined eDRAM macro for a 3D graphics architecture they reported on last year and all.

I doubt being harder to develop for will be a hurdle to adoption if the performance ratio is large enough ... if its 10x faster than what you can get on a desktop for instance I doubt it will be. Developers will have to learn how to do parallel programming (and not just seperating the GUI into threads :) sooner or later, and its unlikely Sony's tools and architecture will be as hostile to automation of the process as they were with the EE.
 
"'96 100MHZ SDR 1.49GB/sec"

Yeah

"'01 300MHZ DDR 8.49GB/sec"

More like '02 325Mhz DDR 10.4GB/s


"'06 900MHZ QDR 53.64GB/sec"

Hmm, memory operating at a clock frequency of 900Mhz, there is a reason that things like QDR and DDR have come about you realise. Something to do with the vast difficulty of clocking memory high.

Besides, doubling the amount of data sent per clock (SDR->DDR->QDR) is more inefficient than doubling the memory bus width. You will not acheive your maximum memory bandwidth. REmember, as you increase the amount of data sent per clock (via QDR or wider bus) you greatly increase the problems associated with memory granularity.


"That's if we stick to a 128bit bus."

Not possible for many many years, unless some radical new memory type is invented.
 
Something to do with the vast difficulty of clocking memory high.

That's my point. We've been told of the "vast difficulty" in increaseing memory clocks for the past 5 years,yet it still will have increased 4X in the past 5 years. (Assuming 400 Mhz DDR is available this summer / fall.) Has anything changed? Is is not reasonable to expect 2-3X clock increase in the next 5 years?

REmember, as you increase the amount of data sent per clock (via QDR or wider bus) you greatly increase the problems associated with memory granularity.

Well, I don't know about "greatly", but sure, there is not a 100% gain in effecive bandwidth. However, at the same time, hardware architects are getting more and more efficient with memory bandwidth with each generation of technology....as far as I can tell outpacing the rate of "lost efficiency" due to wider and multi-pumped busses.

I don't understand your last statment...are you saying that going higher than 128 bit bus is not possible for many years?
 
That's my point. We've been told of the "vast difficulty" in increaseing memory clocks for the past 5 years,yet it still will have increased 4X in the past 5 years. (Assuming 400 Mhz DDR is available this summer / fall.) Has anything changed? Is is not reasonable to expect 2-3X clock increase in the next 5 years?

Dont confuse double data rate as a doubling in clock frequency. It is a *damn* site easier to double the data rate than it is to double the clock speed (in the case of going SDR->DDR). Increasing the clock speed gives a linear increase in speed, this is not true with DDR and QDR. The more data you transfer per clock in that way the less efficient the RAM will be, because all of those transfers have to be to the same page in memory. If you are writing out data to a z-buffer for pixel sized polygons then each 32bits of z_data are going to be written pretty much randomly into the z-buffer from the ram controllers point of view.


"Well, I don't know about "greatly", but sure, there is not a 100% gain in effecive bandwidth. However, at the same time, hardware architects are getting more and more efficient with memory bandwidth with each generation of technology....as far as I can tell outpacing the rate of "lost efficiency" due to wider and multi-pumped busses."

More and more efficient with memory bandwidth? like the crossbar memory architecture. How much can that be expanded? really? you move to a 256bit bus and now you have 4 64bit wide DDR busses. Now if that is even possible it will only be as efficient as a 128bit wide SDR interface because of the aformentioned limitation. Even then, look how complex the GF3's memory controller is already, is it rally practical.

IMGTEC still havent moved away from a synchronous memory controller, its all they need. Wonder how much cheaper that controller is than the one on the GF3.


"I don't understand your last statment...are you saying that going higher than 128 bit bus is not possible for many years?"

No, im saying that true 900Mhz SDRAM isn't possible for many years.
 
Back
Top