If PS3 can really do 1Tflops

BenSkywalker · Mar 6, 2003

Panajev-

Fact is, you are basically saying "well if they had a basically purely software Rasterizer and T&L engine 1 TFLOPS would be e-machines level"...

T&L power will be pretty damn near irrelevant on any of the next gen consoles. Try having a TFLOP general purpose CPU handle a 1000 instruction shader op with conditional branches running FP32. The programmability of the GPU is a given, the problems a CPU will have trying to emulate one is also.

Take any of the DX9 shader demos floating around(that actually use PS2.0/VS2.0) and try to find a processor that can it run it ~1% the speed of the GPUs after optimizing and recompiling the code to run in software. Considering basic desktop processors are pushing out ~20GFLOPS right now, that would cover up to ~2TFLOPS using current hardware.

I don't think it's particularly relevant as Sony wouldn't be stupid enough to ship another souped up Voodoo1 with the PS3, I'm just talking about the fact that a general purpose processor pushing out 1TFLOP isn't going to be able to compete with a dedicated chip at rasterization. IF they built a CPU that could, the architecture would be horrible for general code.

Vince-

Going by that patent; I'd say comparing the TFlop Broadband Engine, or whatever name marketing throws on it, would be better compared to competing platorms when viewed either as: (a) Just the MPUs (b) Part of a complete system. Not comparing a MPU/CPU like device against a rasterizer.

Of course the comparison is unfair, that is the entire reason I brought it up. This thread is about what 1TFLOP of CPU power is going to do for graphics, it isn't going to do much if it doesn't have some solid rasterizing power behind it when facing the competition.

Why would persuing a micro-polygon route be inferior than the more PC-centric system which faces namely storage and bandwith constraints?

PC centric? The way PC 3D graphics function are by attempting to best emulate CGI, something that is becoming increasingly apparent.

Thus, I'd tend to feel that the computational capabilities are significant and programmable enough to compete in a relative shader implimentation and if it has a respectable sampling rate in addition- would it be out of line to say its the closer praxis of the PRman ideology, as opposed to the PC's current design?

Maybe with a PFLOP of power, not a TFLOP. Without a decent rasterizer offloading a significant portion of the rendering the PS3 wouldn't stand a chance on the visual front. It isn't a question of design philosophy, it is a matter of a TFLOP being significantly short of what they would need to have, not even their 6.6TFLOPS they first claimed would get it done, not without a decent rasterizer.

Let's say for the hell of it that they had enough power to run raw shader ops at twice the speed of a GPU straight through the CPU. What happens when they need to read back data from the frame buffer? Are they going to process the information, hand it off to the rasterizer for basic rasterization tasks(set up, tri+ani filtering), write it out to the framebuffer, then read it back to the CPU, process the information, hand it back over to the rasterizer for final rasterization? Even if they had the raw CPU power to run the shader ops twice as fast as a GPU they would still end up significantly slower under anything less then ideal conditions. GPUs have a whole bunch of transistors sitting around doing nothing a great deal of the time for the small percentage of the time when they are needed. Adding this to a general purpose processor would leave you with significantly lower levels of programmability when comparing like transistor counts. If they came up with a general purpose processor that was truly as good as a GPU at rasterizing tasks it would a lousy 'general purpose' processor.

We need several orders of magnitude more power before software rasterization can compete with current GPUs, let alone what will be available by the time those chips get here. At some point it is certainly possible that the CPU could replace the GPU, but that is decades away, at least.

V3 · Mar 6, 2003

If the PS3 has a 1TFLOP chip and a rasterizer equal to that of the GS then it would get its ass kicked in the visual department

PS3 pixel engine is probably like GS, except it will be most likely be faster. You have all those APUs to deal with shaders. I don't think it will have those reg comb hardware thingy. My speculation anyway.

As far as visual department, I think its more of developers side of things.

London Geezer · Mar 6, 2003

BoddoZerg said:
london-boy said:

The thing is, the console market and the High-end video cards market are TOTALLY different.....

remember that someone buying an NV30 KNOWS why he's buying it and why choosing either that or an R300.

in the console business it does not work like that. and i think we've seen that already with PS1 and PS2....

Click to expand...

In any case, I just feel that the sheer scale of pre-hype we've been seeing for PS3 is completely unwarranted and can only lead to trouble.

You don't see MS and Nintendo hawking Teraflops and Gigapixels for the Xbox2 and GC2, do you? The common answer is - "Well they haven't settled on a final design". My reply is this - The XBX2 and NGC2 are set on the same timescale as the PS3. If the first two don't have a completely finalized design, what makes you believe the third does? Having a generalized outline for multiprocessor cells is very different from having everything set in silicon with defined functionality and known clockspeeds.

The thing is, it's not just nVidia and other 3d graphics people. You also don't see Intel and AMD quoting Teraflops for a processor to be released in 2005. You'll see them talking about advances like 64-bit CPUs and 60 nm processes... but you don't see them quoting a 1000x increase in CPU power, or using phrases like "as powerful as all the computers in the world combined". That kind of performance-quoting pre-hype is generally reserved for the likes of Bitboys, pre-product launch Transmeta, or Deam Kamen's mysterious "Ginger". It is simply not a respectable business practice and I despise it.

You are just confirming my point further...

the PC industry (and PR) work differently from the console industry.

when a new 3D accellerator comes out, u know exactly what you are going to get:
1) because the wait is 6 months
2) because who buys a new 3D accellerator knows what he's talking about, knows about pipelines, bandwidth and all that crap.

when *joe smith* buys a new PS3-4-5-6-7 for his son he does not know what is inside the box. all he knows is that *ooooohhhhh it's 10000000000x more powerful than the previous ps!!!*

Sony knows that and they are using it. whether its good or bad than thats another argument. i think they know their shit and should go for it.

V3 · Mar 6, 2003

We need several orders of magnitude more power before software rasterization can compete with current GPUs, let alone what will be available by the time those chips get here.

Those pixel engines, are not software rasterizer though. And those APUs aren't general purpose CPU.

What happens when they need to read back data from the frame buffer? Are they going to process the information, hand it off to the rasterizer for basic rasterization tasks(set up, tri+ani filtering), write it out to the framebuffer, then read it back to the CPU, process the information, hand it back over to the rasterizer for final rasterization?

Isn't the PS2 already doing something like this ?

Vince · Mar 6, 2003

BenSkywalker said:
T&L power will be pretty damn near irrelevant on any of the next gen consoles.

I disagree; and we'll leave it at that.

Let's say for the hell of it that they had enough power to run raw shader ops at twice the speed of a GPU straight through the CPU. What happens when they need to read back data from the frame buffer? Are they going to process the information, hand it off to the rasterizer for basic rasterization tasks(set up, tri+ani filtering), write it out to the framebuffer, then read it back to the CPU, process the information, hand it back over to the rasterizer for final rasterization? Even if they had the raw CPU power to run the shader ops twice as fast as a GPU they would still end up significantly slower under anything less then ideal conditions. GPUs have a whole bunch of transistors sitting around doing nothing a great deal of the time for the small percentage of the time when they are needed. Adding this to a general purpose processor would leave you with significantly lower levels of programmability when comparing like transistor counts. If they came up with a general purpose processor that was truly as good as a GPU at rasterizing tasks it would a lousy 'general purpose' processor.

Ben, whats the diffrence between a Fragment/Pixel Shader and a Vertex Shader at a fundimantal level? Or the diffrence between a VU and VS?

Also, I'll state again, looking back on that patent - the 'visualizer' chip had roughly 1/4 the computational functionality of the MPU in what would be traditionally considered the 'front-end' of a 3D pipeline - as they had labeled. Perhaps if Panajev would post the relevent diagram, I'd be much appreciative.

Correct me if I'm wrong, but in the simplest terms running a 'shader' is merely executing a program which is run on computing resources. If you have a pool of computing resources (a methodology the PC indusry is moving to), why can't shader programs (eg. Vertex to start) be run on them. What prevent me from drawing 2 polygons for every pixel on a CPU and then running shaders on them before handing them off to a highly clocked, but simpler, rasterizer which then kick it out to the framebuffer?

Or, what prevents me from running a more Brazil like system (as opposed to PRman's REYES) and run a ray-tracing routine that's divided between APUs which each find independent rays on a Cell-like MPU? Do this at a comperable speed in a fragment/vertex shader (which should be combined by DX10 IIRC).

We need several orders of magnitude more power before software rasterization can compete with current GPUs, let alone what will be available by the time those chips get here. At some point it is certainly possible that the CPU could replace the GPU, but that is decades away, at least.

Nobody is talking about software rasteization. In case you don't get it and untill someone proves or explains diffrent - A design like Cell isn't that much diffrent than a P10 or other advanced architecture. Take a VU and multiple by 72 with onboard eDRAM and allow them to form virtual 'pipelines' as arbitrated by PUs based on the task.

Whats computed in 'software'? The actual traditional 'rasterization' is done in hardware - even if, for arguments sake, it's nothing more than a Graphic Synthesizer times 4 clocked at, say, 1Ghz. You're getting sub-pixel accuracy, aswell as the other benefits of having each pixel = polygon or whatever it is.

I fail too see this whole 'software' exectured defecit... especially when running more-true global illumination models.

Fafalada · Mar 6, 2003

Take any of the DX9 shader demos floating around(that actually use PS2.0/VS2.0) and try to find a processor that can it run it ~1% the speed of the GPUs after optimizing and recompiling the code to run in software.

Comparing physical shader speed (none of the sampling and filtering crap)... you are underestimating the cpu by quite a bit. And if I also "remove" all memory read/writes from that CPU code, to level the ground to actual math processing, wanna take bets on that %?

T&L power will be pretty damn near irrelevant on any of the next gen consoles. Try having a TFLOP general purpose CPU handle a 1000 instruction shader op with conditional branches running FP32.

I don't quite understand what you're trying to say here. A 1000 instruction shader sounds like it would use a lot of T&L power to me. (whether those transforms are per pixel, per vertex, per freaking voxel, or whatever else, is really quite irellevant to their math complexity).
At any rate, looking at that patent, the APUs pretty much seem to only physically 'see' their local memory, so it's pretty much given you would be getting near 100% execution time... outside that you are really left with flops to flops in that comparison of yours - if overall performance of various basic math ops is balanced, performance will be too. (do note that that particular patent also leaves dumb operations like sampling etc. to dedicated silicon)

Vince · Mar 6, 2003

Yeah, like what he said...

Fafalada said:
do note that that particular patent also leaves dumb operations like sampling etc. to dedicated silicon

Seriously though, this is what I ment to say in the second part of my reply concerning the GS*(4) part. If you look at the patent, there is dedicated hardware doing the 'back-end' jobs (faf's dumb ops) with all the Cell-based aspects found on the MPU and front-end of the Visualizer.

Panajev2001a · Mar 6, 2003

Panajev2001a · Mar 6, 2003

whether those transforms are per pixel, per vertex, per freaking voxel, or whatever else, is really quite irellevant to their math complexity

Panajev2001a · Mar 6, 2003

I want to say something quickly: why would be a DX10 GPU faster at executing very long shaders with conditional branches than a modern well designed CPU ?

Thinking about both processors executing shaders from fast local memory to eliminate CPU-to-Main Memory bottleneck...

It sounds to me that with modern compilers ( let's even consider x86 with maybe better SIMD extensions ) and the Branch Prediction HW CPUs do have, conditional branching IS less of a problem as you process branches, most of the time, much more efficiently...

Predication is a nice word of "execute the paths of the branch until the condition is evaluated"... it sounds nice, but it also means wasting silicon area ... Yes they are kept busy, but most of them are doing a menaingful job...

GPUs are moving to conditional branching with predication, following IPF's route... how efficient are going to be their compilers and drivers harnessing the next generation GPUs predication and data speculation ? ( once you introduce conditional branching you introduce a whole new set of problems too, that CPU makers )...

You have seen how many years Intel has spent optimizing their compilers ( kudos to them... )... I do not know how easy will be the jump to full programmability for GPU makers and if that is more or equally difficult that the specialization...

Ok, I am off-track as usual...

It is true that Cell doesn't seem to be heavvy on branch prediction either ( I do not know if the PU's PPC core was stripped of it or not )... probably they will use thread to do data and control speculation ( spawning new threads when a branch is encountered and the right result from one of the threads is saved when the condition is evaluated [transforming as it happens with predication, a conditional branch into a data dependency issue] )...

My main issues with your argument are: 1) The Cell patent does not suggest the elimination of HW based functions like triangle set-up, texture filtering, sampling ( triangle set-up might be bypassed... who knows maybe you are not using triangles even

) which would be left to dedicated silicon...

2) What is the problem with executing those long shaders with APUs, under direction of the PUs, and then sending the result to a very fast Pixel Engine ? When are we executing a shader in HW and when are we not ?

T&L wise, all in all EE's VU1 is better than either one of NV2A's twin Vertex Shaders ( if we had two VU1s we could do direct comparison... still VU0+VU1 get very close to the twin Shaders throughput while packing much better flexibility )...

BTW, You say T&L will not be a factor in next generation... well I disagree, as others did... fully dynamic and global lighting models ( introducing things like radiosity or ray-tracing ) will be a huge tax on any next-generation system and you need RAW horse power to do those... and IMHO the end result would be pretty good looking

Sorry for the messyness of this post of mine

Panajev2001a · Mar 6, 2003

Vince said:
Panajev2001a said:

GeForce FX is already over 250 GFLOPS isn't it ?

Click to expand...

I believe the Nv30 has roughly 250GFlops aggrigate; with around 50 of that in the fragment shader IIRC. If this means anyting tangible.

Yes, that is very tangible... with current FLOPS rating for the patent's Visualizer we could have GeForce FX performance purely with software rasterization and T&L done all on the Visualizer... but would it be a good thing ? I'd like more Radeon 9800 PRO level of performance

( jab to nVIDIA hehe )...

Moot point anyways as PS3 will not do all the rasterization in software...

megadrive0088 · Mar 6, 2003

Maybe with a PFLOP of power, not a TFLOP. Without a decent rasterizer offloading a significant portion of the rendering the PS3 wouldn't stand a chance on the visual front. It isn't a question of design philosophy, it is a matter of a TFLOP being significantly short of what they would need to have, not even their 6.6TFLOPS they first claimed would get it done, not without a decent rasterizer.

So what this sounds like to me is: dont expect CGI movie level visuals in PS3 games because 1 TFLOPs of power isnt anywhere near enough--also that Sony's previous claim of multi teraflops of power wouldnt be enough, either. we wont be seeing movie quality CGI in realtime until, perhaps, PS4, if it had 1 PFLOP performance, or so. more importantly, no matter how much computational power a machine has, you would still need enough rasterizing power and visual quality, to reach CGI quality in realtime.

randycat99 · Mar 6, 2003

I'm not entirely convinced the potential for pre-rendered CG-quality in realtime gaming won't be possible. How many FLOPs do you think a Pixar renderfarm entails? Now scale that rating down from movie resolution to TV-ish resolution. Is it comparable to a TFLOP?

MfA · Mar 6, 2003

The only problem is squeezing those damned artists into the console.

randycat99 · Mar 6, 2003

Without a doubt, that will be a paramount issue for all next generation consoles. It's very easy to throw together half a dozen effects indiscriminantly to proof a hardware design. It's not easy to build an intricate artistic depiction that looks integrated, deliberate, and inherently artistically motivated, even if the hardware that could effortlessly pull it off is sitting right in front of you. Aside from whether or not the talent is present, will the time and budget be present? Suffice to say, there are a good deal of factors which challenge that a game from a next generation console might not look that much different from what we have already (by no fault of available processing resources) with maybe an exception of more polys, nicer textures, and some fancier filtering.

megadrive0088 · Mar 6, 2003

"I'm not entirely convinced the potential for pre-rendered CG-quality in realtime gaming won't be possible. How many FLOPs do you think a Pixar renderfarm entails? Now scale that rating down from movie resolution to TV-ish resolution. Is it comparable to a TFLOP?"

I doubt that even 100 TFLOPs would be enough. plus its not just about floating point operations per second. its not JUST computational power, as i said. its also about rendering power. the number and quality of pixels & vertices you can draw/render/rasterize. what can be pushed out and displayed on-screen in realtime. the type of shading FX you can do. its many things. its bandwidth. its memory. its image quality. The PS3 might have alot of computational power (still not enough tho) but it wont have enough rendering power or image quality to do CGI movies in realtime.
we are along way from that. perhaps PS4 will start to bridge the gap. but CGI is always a moving target. its always getting better.

those CGI movies are made with thousands of CPUs running for months or years in non-realtime. even if you had a thousand CPUs, you still couldnt render those movies in realtime. because it takes them so much time (non-realtime) to begin with. and even the more efficient and hardwired GPUs couldnt do it in realtime either. although they could help speed up the non-realtime creation of those movies.

PS3 isnt going to have the power to render CGI movies in realtime, by any stretch of the imagination. however, what it might be able to do is render things that start to approach the other kinds of CGI that you see, in television shows for instance. probably by 3rd generation of PS3 software, you will begin to see visuals that are like, say Voltron3D or Transformers Beast Wars (just as 2 examples)

and even better example is, the CGI cut scenes in games, which are far below the level of CGI films, that is what you might expect from PS3.
alot of people have said the CGI in FFX might be possible on PS3 in realtime.

Paul · Mar 7, 2003

I dunno.. Doom3 at max settings looks like a low budget CG to me, and this is the non finished Alpha version.

IMO PS3 games will look like the FMV's in FFX. I mean, if they don't hit that bar then where will they hit? They will obviously blow DOOM3 away.

Paul · Mar 7, 2003

When you mean CGI.. I assume your talking about the final fantasy movie?

IMO PS3 could render the characters in real time, although NOT in game. Sony's GScube which was 16 PS2's in one box did this fine, and im sure PS3 will be alot powerfull than just 16X.

speng · Mar 7, 2003

I doubt that even 100 TFLOPs would be enough

Do you guys grasp the concept of what one trillion operations per second means in processing power?

This is the range of super computers just a few years ago.

Take this it's ranked 70 in the worlds fastest computers nov 2002:

The new supercomputer system of IBM's eServer p690 series deploys a total of 260 Power4 processors working at a clock speed of 1.3
GHz and performing four floating-point operations (flops*) per cycle.
This leads to a theoretical peak performance of 5.2 Gflops per processor, or a system top speed of 1.35 Tflops

Speng.

randycat99 · Mar 7, 2003

I don't think people do grasp what huge numbers we are talking about here. I'm somewhat at a loss when someone posts that 1 TFLOPs isn't anywhere near enough to accomplish realtime CG or somewhere near that w/o hairsplitting the details at TV-ish resolutions.

1 TFLOP

That would be informally equivalent to a renderfarm containing over one thousand 400 Mhz PowerMac G4's. Those aren't exactly pokey machines just using one of them. Has Pixar used anything remotely as extensive as that in the past to make a movie? ...or how about a hundred 3 GHz P4's? (can you imagine the electric bill for that farm??? Guy at CompUSA says, "...and you are looking for a UPS for wha???"

) It's rather amazing this level of CPU horsepower can even exist in a little box sitting on your desk.

If PS3 can really do 1Tflops

Similar threads