Tim's thoughts

For the same amount of logic, there's no way any other architecture can possibly outperform dedicated hardware.
 
The hard problems in CPUs are AFAICS cache misses and branch mispredicts. In a CPU, when a cache miss hits you, you can keep excuting instructions until your reorder window is full, and then you are stuck for a few hundred clock cycles, unable to do anything meaningful. Branch mispredicts will flush your pipeline, so that you lose 10-30 clock cycles and thus 20-90 execution slots. In a GPU, the inherent parallellism of the tasks it does means that both issues can be worked around: for texture cache misses, you simply shift execution over to another pixel; for both cache misses and branch mispredicts, you can guess when they will happen and fill up the vertex/pixel pipeline with other vertices/pixels than the one you are anticipating the cache miss/mispredict for - this way, near-100% utilization of execution units is possible even if you get a cache miss and a branch mispredict every 3 or so shader instructions, whereas a CPU in the same situation will be lucky to sustain 1% of execution unit utilization. And: cranking up the number of execution units and number of pipeline steps (for higher clock speeds) will give the GPU a nearly linear speedup (barring bandwidth issues) whereas the CPU, struggling with cache misses and mispredicts (which get relatively worse as clock speeds increase) and the limited parallellism available in a serial instruction stream will see much smaller speedups.

Also, note the address bandwidths of CPU and GPU architectures: a Pentium4, when doing cache traffic, has an address bandwidth of 50 Million addresses per second (800 FSB divided by burst length of 16), whereas an GeforceFX 5950 has an address bandwidth of 1.9 Billion addresses per second (950 Memory bus, divided by burst length of 2, multiplied by 4-way crossbar). That's a factor of 38 difference which I don't see CPUs closing anytime soon.
 
In a SMT capable CPU (like P4 Hyperthreading versions) the other thread can continue executing when the other thread experiences a cache or branch misprediction. I also doubt that GPUs can keep a near 100% utilization of one pipeline when you get pixel shaders with alot of branches. I really doubt that.

The memory bandwidth and latency difference is certainly a good point. However, I think that as we concentrate more and more on shaders that the GPU gets more important that its memory bandwidth and latency. That is where CPUs might become closer.
 
The point about swapping execution between pixels apply just as much even if you run complicated shaders on every pixel. In CPUs, branch mispredicts are expensive because you need to discard a lot of instructions that you have fetched/decoded/executed for the current thread of execution. In a GPU, you will typically fetch and execute instructions belonging to different threads (1 thread will typically be 1 vertex or 1 pixel) every clock cycle, so the pipeline will be filled with instructions belonging to N different threads, none of which need to be discarded in case of a branch mispredict. So the penalty of the mispredict may well be Zero in the GPU.

As for SMT CPUs, the basic GPU pipeline most closely resembles the Cray MTA processor, a sorta-RISC processor which runs at several hundred MHz and, just by juggling 100+ threads in hardware exhibits >98% of theoretical efficiency even though it has neither a branch predictor nor a data cache (!).
 
zidane1strife said:
For the same amount of logic, there's no way any other architecture can possibly outperform dedicated hardware.
Hmmm, what if the former(other) has far better memory, and b/w...
The dedicated hardware is likely to make more efficient use of its bandwidth. Remember the guiding principle in texturing: 'assume you cache miss'. This is at the core of the design of all modern VPU's.

Expanding available bandwidth when you can't make use of it is a waste of money.

Of course, having all-SRAM, or other low/zero latency memory would greatly reduce miss latencies. But bandwidth isn't the key there; latency is. PC2100 DDR CAS2 can be faster than PC2700 CAS3, and that's on a CPU bus where the hit rate is targetted at 95%+, not 0%.
 
Gubbi said:
MfA said:
Cell, Bluegene, Merrimac ... are these more like GPUs or modern CPUs? Personally I find them closer to GPUs.

I question the flexibility of Intel and AMD in making a 180 and leaving the safe path of concentrating on legacy applications. To compete with the likes of Sony (and GPUs) they will need to severely compromise serial performance, and I dont think they are ready for that.

I think they'll be ready if there's a market for it. Streaming architectures like Merrimac or CELL has yet to prove themselves in a market place. They have a limited set of applications to run, but those applications will fly like sh*t off a silver shovel.

Whether these architectures can compete against dedicated GPUs in the graphics market or against traditional CPUs in the general purpose market has yet to be seen. The optimist thinks they will outperform both. The pessimist that it will have, at best, mediocre performance in those fields.

Cheers
Gubbi

Optimist, pessimist... so that leaves me, right ? ;)

hehe

not really huh ? :(

Oh well, my opinion is that in serial task the efficiency of processors like CELL will not be high at all... pretty pathetic compared to what it can do in terms of theoretical peak performance.

Compared with top of the line Pentium 4 processors of today you would find the CELL-like processor being slower in single serial tasks ( more about the multi-tasking later ), but that is not necessarily a death blow to CELL-like processors on Desktop PCs as I doubt we need so powerful Pentium 4 chips in what everyday users ( aside from 3D Gaming and other computationally intensive tasks ) do.

The point is that for most of the serial work Destop PC users do ( Web browsing, Word Processing, E-mail, etc... ) even such an inefficient ( as far as this kind of serial work is concerned ) solution would be able to run those single applications quite well.

Then you have to take in consideration how most users work on their Desktop PCs: they often like to run several applications at the same time ( most of them do even without knowing because they leave all sort of junk running in the background ) and the crescent degree of multi-tasking ( E-mail client, two or more Web Browser windows open, Word or Excel, etc... ) works towards highly parallel solutions rather than against them, mitigating their inherent inefficiency in serial work by pushing the parallel execution aspect.

3D Graphics would fly on it quite well IMHO, yet the comparison against normal GPUs is a bit unfair because it assumes that CELL would be used as a 100% software rasterizer and does not assume that you might have ( in the form of a streamlined, but always present Rasterizer ) some sort of dedicated logic to help some of the most basic functions.

Texture filtering will not be done in software as the cost in dedicated logic is very small ( compared to the speed hit you would have by running it in software ).

That can be said about some other things.

After all, Suzuoki did bother to specify an alternate Processor Element with a Pixel Engine and specific Image Cache that takes the place of 4 APUs.

I would not mind seeing something CELL-like on the Desktop, maybe even paired to a nice GPU ( the two are not mutually exclusive even if for cheaper systems they could find it cheaper buyt still ok to work with embedded rasterizer and the CELL-like processor ).
 
zidane1strife said:
For the same amount of logic, there's no way any other architecture can possibly outperform dedicated hardware.
Hmmm, what if the former(other) has far better memory, and b/w...
Not going to happen. Not realistically, anyway. CPU's have modular memory. Modular memory will never be as fast as fixed memory (i.e. the fact that you can't upgrade the memory on a graphics board means that there can be tighter tolerances on the memory, and therefore you get faster memory), and a general purpose CPU typically needs much more memory than a GPU, so the memory is typically much slower anyway.
 
the more i think about this the more i can't believe tim would believe this. given tim's timeline, he's expecting the world to move back to software rendering around the same time microsoft is releasing an os that requires hardware acceleration. if i was a developer, and i knew for a fact that everyone who is running operating system L had a certain level of hardware, i'd better have a pretty good reason for not supporting it.
c:
 
see colon said:
the more i think about this the more i can't believe tim would believe this. given tim's timeline, he's expecting the world to move back to software rendering around the same time microsoft is releasing an os that requires hardware acceleration.
When did he state his timeline on CPU's taking over graphics?
 
http://www.gamespy.com/legacy/interviews/sweeney.shtm

"Gamespy - Do you ever think you'll tinker with a voxel engine, or combining a voxel and a polygon engine?

Tim - I don't think voxels are going to be applicable for a while. My thinking on the evolution of realtime computer graphics is as follows:

1999: Large triangles as rendering primitives, software T&L.

2000: Large triangles, with widespread use software-tesselated curved surfaces, limited hardware T&L.

2001: Small triangles, with hardware continuous tesselation of displacement-mapped surfaces, massive hardware T&L.

2002-3: Tiny triangles, full hardware tesselation of curved and displacement-mapped surfaces, limited hardware pixel shaders a la RenderMan.

2004-5: Hardware tesselation of everything down to anti-aliased sub-pixel triangles, fully general hardware pixel shaders. Though the performance will be staggering, the pipeline is still fairly traditional at this point, with straightforward extensions for displacement map tesselation and pixel shading, which fit into the OpenGL/Direct3D schema in a clean and modular way.

2006-7: CPU's become so fast and powerful that 3D hardware will be only marginally benfical for rendering relative to the limits of the human visual system, therefore 3D chips will likely be deemed a waste of silicon (and more expensive bus plumbing), so the world will transition back to software-driven rendering. And, at this point, there will be a new renaissance in non-traditional architectures such as voxel rendering and REYES-style microfacets, enabled by the generality of CPU's driving the rendering process. If this is a case, then the 3D hardware revolution sparked by 3dfx in 1997 will prove to only be a 10-year hiatus from the natural evolution of CPU-driven rendering."
------------------------------------------------------------------------------------

i posted this a few pages back, but i'll post it again in case anyone missed it. if longhorn come out in 2005/6 as is expected, it would be fairly close (about a year) away from when tim expects video hardware to become "a waste of silicon", except it'll be a requiorement for the worlds most popular os.

he also states that non-traditional rendering primatives (voxels, splines, ect) will comonplace, and i do see this happening but not on the cpu. as gpu's become more programable i'd imagine that they would begin supporting non-polygon primatives as well. in fact, doesn't dx10 (or maybe it was just ps3.0, don't remember exactly) has support for hardware acceleration of voxels?
c:
 
see colon said:
i posted this a few pages back, but i'll post it again in case anyone missed it. if longhorn come out in 2005/6 as is expected, it would be fairly close (about a year) away from when tim expects video hardware to become "a waste of silicon", except it'll be a requiorement for the worlds most popular os.

he also states that non-traditional rendering primatives (voxels, splines, ect) will comonplace, and i do see this happening but not on the cpu. as gpu's become more programable i'd imagine that they would begin supporting non-polygon primatives as well. in fact, doesn't dx10 (or maybe it was just ps3.0, don't remember exactly) has support for hardware acceleration of voxels?
c:
Well, this is very old stuff, nearly five years old. Obviously Tim Sweeney misread the market. He believed that triangle processing would accelerate more quickly than pixel processing, which is not the case. He, quite possibly, also believed that CPU's would advance faster than they have. CPU evolution is slowing down, and Moore's Law will be obviously broken within the next few years.

Edit: Btw, by "obviously broken" I mean it will be obvious within a few years time that Moore's Law no longer applies.
 
"Well, this is very old stuff, nearly five years old."
-----------------------------------------------------------------------------------
the quote is old, but tim sticks to his guns as far as the industry going back to software rendering. and as far as i know he hasn't changed his outlook as far as when it will happen.

there were newer quotes on voodooextreme's "ask tim seweeney" page, but since that page no longer exists, this is the only other timeline i've seen. and if memory serves, the information was pretty much the same.

"Obviously Tim Sweeney misread the market"
--------------------------------------------------------------------------------
he actualy did a pretty decent job predicting shader use, t&l, ect. everything seamed to fall in line (within reason) to aprox. when he predicted. so as far as gpu's went his predictions were ok, but he misread the cpu market. perhaps he thought the performance/clock would be better (like it had with previous generation chips compared to thier older counterparts) than it is with the current generation. *cough*p4*cough*
c:
 
see colon said:
he actualy did a pretty decent job predicting shader use, t&l, ect. everything seamed to fall in line (within reason) to aprox. when he predicted. so as far as gpu's went his predictions were ok, but he misread the cpu market.
I don't think so. Here's what you quoted:

1999: Large triangles as rendering primitives, software T&L.

2000: Large triangles, with widespread use software-tesselated curved surfaces, limited hardware T&L.

2001: Small triangles, with hardware continuous tesselation of displacement-mapped surfaces, massive hardware T&L.

2002-3: Tiny triangles, full hardware tesselation of curved and displacement-mapped surfaces, limited hardware pixel shaders a la RenderMan.

2004-5: Hardware tesselation of everything down to anti-aliased sub-pixel triangles, fully general hardware pixel shaders. Though the performance will be staggering, the pipeline is still fairly traditional at this point, with straightforward extensions for displacement map tesselation and pixel shading, which fit into the OpenGL/Direct3D schema in a clean and modular way.

2006-7: CPU's become so fast and powerful that 3D hardware will be only marginally benfical for rendering relative to the limits of the human visual system, therefore 3D chips will likely be deemed a waste of silicon (and more expensive bus plumbing), so the world will transition back to software-driven rendering. And, at this point, there will be a new renaissance in non-traditional architectures such as voxel rendering and REYES-style microfacets, enabled by the generality of CPU's driving the rendering process. If this is a case, then the 3D hardware revolution sparked by 3dfx in 1997 will prove to only be a 10-year hiatus from the natural evolution of CPU-driven rendering."

1999: Mostly correct, but this was right after he made the quote.

2000: Software tesellated curved surfaces pretty much never appear, except in Quake3 and in Q3 engine games. The first (very primitive) pixel shaders become available for the GeForce/GeForce2 cards under OpenGL.

2001: This was when the GeForce3 and Radeon 8500 were released. There is no significant hardware tesellation of curuved surfaces (The GeForce3's is never used due to performance, the R200's isn't flexible enough to be of much use). Primitive pixel and vertex shaders are available.

2002-3: Still no displacement mapping available to any significant degree (Parhelia does it, but has no marketshare), Very good pixel shaders become available. Triangles are still not close to "tiny" in games.

2004: We expect fully general pixel shaders to be available next year, but subpixel triangles just aren't going to happen.

Basically, Tim really misread how the 3D market would progress. He thought pixel filling would advance more slowly, and triangle processing more quickly. Now, I do have to admit that it is highly disappointing just how slowly triangle processing has advanced. It is really far past time we had some robust higher-order surfaces support.
 
Chalnoth said:
Now, I do have to admit that it is highly disappointing just how slowly triangle processing has advanced. It is really far past time we had some robust higher-order surfaces support.
agreed
 
"Basically, Tim really misread how the 3D market would progress. He thought pixel filling would advance more slowly, and triangle processing more quickly"
---------------------------------------------------------------------------------
ok, i can see your point now. the only thing i don't really agree with is....

"2002-3... Very good pixel shaders become available. Triangles are still not close to "tiny" in games. "
--------------------------------------------------------------------------------
i don't really consider the pixel shaders we have today "very good". overall, they are not bad, and much better than dx8, but are still rather limited in use for games. the few games ect we have available that use a decent amout of shaders (halo, 3dmark's mother nature and tr:aod come to mind) run pretty slow in the grand scheme of things. and they use only a few shaders (and even fewer "dx9" shaders, prompting many people to complain about the lack of "true" dx9 games).

in my view, tim overestimated the progress that was made in vertex proccessing, but was pretty on target with pixel proccessing. many of the things he predicted were implemented in hardware around when he predicted they would be but never used either for software/hardware/political reasons.
c:
 
...many of the things he predicted were implemented in hardware around when he predicted they would be but never used either for software/hardware/political reasons.

Scratch the political reasons and we have an agreement here and we could be even closer to reality. It's not a rare occurance that as things evolve overestimations occur that simply cannot fit all in the hardware manufacturers want or have planned in the past.

I actually agree with Chalnoth and Althornin what HOS for instance concerns. I expected to see true hardware support for those as early as dx9.0 and I'm slowly starting to doubt that we'll see the necessary support even in the next API iteration after all.

Now look again here:

2002-3: Tiny triangles, full hardware tesselation of curved and displacement-mapped surfaces, limited hardware pixel shaders a la RenderMan.

ROFL :D Just where the heck is it? It should have been here already and all we got is some senseless pre-sampled blah implementation.

IMHO predictions that go that far into the future are more than just unsafe, because it's fairly impossible to encount/foresee all factors and all possible changes that might occur and play a significant role. Last but not least one IHV introducing a specific feature does not guarantee market/developer adoption.

I gave all the points in this thread some more thought and I could start seeing some valuable points from the pro-CPU side of things, all up to the point where texture filtering (as Dio pointed out repeatedly) would come into play.

<snip>

The major problem with floating point textures and pixel pipelines at the moment is that they do not implement what has become the essential features of the integer pipeline (texture filtering, antialiasing, etc.). This substantially limits their usefulness as a general pipeline for realtime 3d graphics. They are, however, very useful for offline work where filtering can be done in (shader) software and performance is not an issue. They are also currently useful for some specialized realtime pixel shaders.

With integers it is easy to implement a large amount of computation directly in hardware in parallel. Floating point requires far more transistors to implement. As a result, it makes no sense to dedicate all those transistors to fixed functions. Allocating all those transistors to floating point only makes sense if you make the pipeline programmable.

The problem is that while you might do dozens of operations in parallel in a single clock per pipe for integer vectors, you are generally limited to as little as one (or a few) for floating point.

This was a lesson learned long ago for other types of hardware. As soon as you increase the sophistication of the data types and the computations, direct hardware implementation is no longer feasible. You must rely on software. As soon as you do this the entire hardware design picture changes dramatically. It becomes extremely important to maximize frequency, data availability, and software computation parallelism to squeeze the most out of all those transistors in each pipe. Since you can perform far fewer operations per pipe per cycle, you must increase the number of cycles and the number of pipes dramatically.

In the future, I expect to see much more emphasis on frequency and the number of pipelines than in the past.

http://www.beyond3d.com/forum/viewtopic.php?p=61943&highlight=floating+point#61943

Here's another interesting thread attempting predictions in mid 2002 for the foreseeable future:

http://www.beyond3d.com/forum/viewt...=asc&highlight=floating point&start=0

2nd page:

CPUs, however, only improve performance approximately linearly based on the same metric. This is because CPUs get most of their performance improvement from increased frequency and get very little additional benefit from increased transistor count. Additional transistors are usually allocated to more cache, larger buffers, etc. all of which offer only a few percent improvement.
3d graphics chips therefore improve in performance far faster than CPUs and most of this due to increased transistor counts.
The move from .15m to .13m should therefore provide (.15/.13)^3 or ~1.5 times the overall performance. However, the next generation of hardware will use higher precision computations which require a significant amount of extra transistors. To keep up with performance improvement expectations, 3d vendors will need to rely on some other techniques other than just process improvements. Those vendors that do a better job of this will fare better this generation. There are ways of course other than just process improvements to increase transistor count, and there are ways to improve performance other that just increasing transistor count such as memory bandwidth improvements and reducing the amount of computation needed.
 
Ailuros said:
IMHO predictions that go that far into the future are more than just unsafe, because it's fairly impossible to encount/foresee all factors and all possible changes that might occur and play a significant role.
True, but such predictions must be made by game engine developers. In this sense, Tim Sweeney didn't do as good of a job predicting the direction of hardware development as John Carmack did.

see colon said:
i don't really consider the pixel shaders we have today "very good".
Current hardware can render extremely complex pixel shaders. But you're right, the software is lagging, as it always has.
 
"see colon wrote:
i don't really consider the pixel shaders we have today "very good".

Current hardware can render extremely complex pixel shaders. But you're right, the software is lagging, as it always has."
------------------------------------------------------------------------------------
right, software does always lag behind hardware. but in this case, games with just a few shaders bring even the fastest hardware down from screamingly fast to just acceptable levels. take halo, for example. it's probly the most shader intensive fps available today, and performance is an issue on pretty much any hardware. read any review or gaming hardware forum and you'll find mention of it.

basicly what i'm saying is that anything that uses alot of shaders is only using a few dx9 shaders, performance drops and the visual effect isn't that impressive. the effects aren't really that complex; most shader effects basicly look like EMBM.

in this generation, we basicly have one piece of hardware that has an instruction limit (and limited to fp24) that forces you into multi-pass sooner (causing performance to drop), or has single pass support for longer shaders (and fp32) but the performace is so low it's unuseable anyway. not exactly what i'm impressed with.

could it be a lack of properly coded software? sure it could. i've checked out several tech demos and benchmarks (usualy the beeding edge of graphic effects) that use ps/vs2.0 and nothing seams to be a huge step over the shaders from last generation. i'd define complex shaders to be the like those use for cimema quality cg, and this generation of hardeware just doesn't seam capable of running them at playable speeds in any sort of real gaming enviroment. but if you don't count hardware tesselation as being available (since no software uses it, it's technicaly available in some hardware), you can't count complex shaders either.

c:
 
see colon said:
right, software does always lag behind hardware. but in this case, games with just a few shaders bring even the fastest hardware down from screamingly fast to just acceptable levels. take halo, for example. it's probly the most shader intensive fps available today, and performance is an issue on pretty much any hardware. read any review or gaming hardware forum and you'll find mention of it.

Halo runs very good on a Geforce 3 with a 733MHz Intel Celeron processor ;)
People will always take for granted that XBox is just a PC in a box, when it's actually very far from that.

Rendering into textures, vertex buffers, pre/post processing vertex buffers, textures, rendertargets with the CPU is cheap. You can cast surface formats. There's no AGP bottleneck, no runtime->driver overhead.

From what I heard porting Halo to the PC was hell.
There's no surprise the hardware requirement went way up for doing the same amount of work.

The largest bottleneck in PC graphics is the PC architecture itself.
 
Back
Top