Will PC graphics hit a deadend?

Each achieves the same ends (more or less), but MS + AF is an extraordinary efficiency gain over SS. But MS + AF also takes more logic to implement, particularly when you include the color compression that makes MS + AF significantly more bandwidth efficient than SS. Other examples are hierarchical Z, early Z reject, and Z compression. Yet another example is tile-based deferred rendering. Another is texture compression.

I know I'm nitpicking here, but I have a few minor disagreements here.

a) the biggest gain between MS/AF vs SSAA is IMO fillrate to boot with. I don't see current accelerators losing 1/4th of their fillrate with 4xMSAA/2xAF (assume 2xRGMS is virtually fillrate free on NV35; 3600MTexels/sec fillrate, with 8xS and the 4xOGSS part the fillrate drops to >900MTexels/sec)

b) TBDRs (albeit admittedly rare and once in a blue moon) have been around since the dawn of 3D (it simply doesn't fit as an example in the consensus you've put it - see "One time algorithmic increases").

As for exotic AA algorithms, wether they'll be pure fragment AA or similar to Z3 techniques, I'm pretty sure that if IHV's decide to implement either/or, or a clever combination of those, they will not carry the same shortcomings as for example on Parhelia and it's FAA.

Finally concerning that big vs small issue in terms of silicon, IMHO the best example that would show the trend of things to come are small SoC's aimed for the wireless gaming market, where forecasts show a quite significant growth in the foreseeable future. The high end MBX is virtually a >Dreamcast System on chip, it's small seems efficient enough, has small power requirements and it's just on a 120MHz clockrate@13nm. Jumping in the future to smaller processes I don't see why higher clockrates and the inclusion of advanced shader capabilities is impossible.

If that still isn't good enough, have a closer look how "huge" notebooks have become compared to the past and what their current specifications are; size hasn't changed by any chance, while specifications have multiplied ever since? I personally don't see where these imaginary walls or limits are, people keep seeing, there will always be sollutions for everything and evolution takes place from silicon up to algorithms.
 
And can you still tell the difference between The Hulk and a film of a theoretical real-life Hulk?

Comparing CG Hulk and Big Lou painted green, I think the CG Hulk looks more believable. Real time CG still have a long way to go, before something like CG Hulk.

I read, that CG Hulk has 100 layers just for skin. It gives a good result. But I think even offline renderer still can progress.

GPUs haven't reached 1 billion transistors yet, so its still awhile yet before even thinking about limits.
 
GPUs haven't reached 1 billion transistors yet, so its still awhile yet before even thinking about limits.

True. The other question then is, if those amounts of transistors are really going to be needed for the time some seem to predict.

The more efficiency increases in existing transistor volumes (relatively speaking), the longer it will take to reach a 1B Transistor "necessity".
 
Ailuros said:
there will always be sollutions for everything.

:) Thank god for optimism.

For anyone who does not balk at a little heavier but still accessible to the semi-layman reading, this text might be of interest.
It's a pretty short overview, but if equations turn you off, skip directly to the Conclusion section at the end.
http://www.research.ibm.com/journal/rd/462/frank.html

Its title says it all - "Power-constrained CMOS scaling limits"

Entropy
 
Dave H made some very insightful comments about the state and the future of the industry. I want to add some comments about the real-time graphics ASICs.

The graphics ASICs in the near feature will have to do a major shift in the used algorithms. Today’s graphics processors are based on the same design principles (the same graphics pipeline) as the first SGI graphics solutions, with the addition of some features like programmable transformations and shading. This pipeline wasn’t designed for efficient rendering of photorealistic scenes.
The problems will begin to arise in the next few years:

-First with the introduction of flow control in the fragment level, each pipeline will not execute the same amount of instructions, making obsolete and impractical the old concept of the pipeline. The obvious solution to that problem is to use an array of computational resources and schedule instructions on them. So the graphics companies will have to face completely new (for them) problems like instruction scheduling, branch prediction and general compiler optimization.

-When polygons will become smaller than the size of a pixel (to implement subpixel precision displacement mapping, essential for realism), the whole separation of vertices and fragments will not make any sense.

The first problem is already partially addressed in the NV30 architecture, so I don’t think we will see any farther stalls in this domain. But note the amount of effort that took to nVidia and the performance is still not on par with the old pipelined designs, like the R3xx.
That makes me wonder if the NV30 is really faster than a top of the line P4 in complex shading and in the long term how future graphics chips will compete with future CPUs using fast vector processing units (like SS3/Altivec), considering that Intel and the other CPU manufactures have much more experience in instruction scheduling and compiler optimization and Intel will introduce new vector instruction in every major CPU release (Prescott, Tejas).

The second problem (small polygons) I think will be much more difficult to address, as it requires a complete paradigm shift to micro-polygon architectures.

PS. Matrox FFA (and I think also Z3, I can’t find the paper) is not an analytical antialising technique. With analytical techniques the integrals in the antialising equations are solved without sampling, using pure mathematics (analytical integration).
 
Pavlos said:
PS. Matrox FFA (and I think also Z3, I can?t find the paper) is not an analytical antialising technique. With analytical techniques the integrals in the antialising equations are solved without sampling, using pure mathematics (analytical integration).
..requiring per-pixel clipping of polygon against polygon. That's unlikely to 'fly' any time soon.
 
Dave H said:
And not to undersell K8--Opteron in particular is a world-beating part for the low-end x86 server space.

If you consider Opteron to be a "low end" x86 server-chip, then what is a hi-end x86 server-chip? Xeon? And since they are building supercomputers that use the Opteron, can it be considered a "low-end server-chip"?
 
arjan de lumens said:
AFAIK, Gallium Arsenide is not really useful except for niche products - it draws something like ten times as much power as silicon and has awful yields once you get more than about a million transistors per chip. Something similar applies to most of the alternative semiconductors as well.
Power consumption: I find that exceedingly hard to believe. With about five times the electron mobility, it should be capable of about one fifth the power consumption.

Yields: This is likely, but only due to the fact that the fabs haven't dealt much with the substance. Silicon has had decades for the manufacturing processes to be refined. It will take time for another substance to become as good, but there are others (such as GaAs) that show much more promise than Silicon, if the appropriate levels of R&D are applied.
 
Nemesis77 said:
Dave H said:
And not to undersell K8--Opteron in particular is a world-beating part for the low-end x86 server space.

If you consider Opteron to be a "low end" x86 server-chip, then what is a hi-end x86 server-chip? Xeon? And since they are building supercomputers that use the Opteron, can it be considered a "low-end server-chip"?

Well here is the way I see it.

The low end of server market space in general is the x86 type. I mean they simply cost less than other systems. By saying that the opteron "is a world-beating part for the low-end x86 server space", that only means that it is the best in that space. You named the only other contender, the Xeon. Xeon used to have the "low end x86 server space" to itself, kind waste to call it a space until the opteron actually arrived. So he is really only saying good things. On the Athlon64 thing he maybe right tho. With a 64 bit memory bus until later next year, Athlon64 maybe should have been named duron64.

*Note you could say the AMD Athlon MP was in the low end x86 server space as well, but you didnt see very many in reality.
 
BlackAngus said:
Well here is the way I see it.

The low end of server market space in general is the x86 type. I mean they simply cost less than other systems. By saying that the opteron "is a world-beating part for the low-end x86 server space", that only means that it is the best in that space. You named the only other contender, the Xeon. Xeon used to have the "low end x86 server space" to itself, kind waste to call it a space until the opteron actually arrived. So he is really only saying good things. On the Athlon64 thing he maybe right tho. With a 64 bit memory bus until later next year, Athlon64 maybe should have been named duron64.

*Note you could say the AMD Athlon MP was in the low end x86 server space as well, but you didnt see very many in reality.

Well, yes, low-end servers are usually x86-machines. But I read your post as "Opteron is in the low-end of x86-server market", not that it's part of a market (x86-servers) that as a whole are low-end when compared to likes of POWER4, Sparc, Itanium etc.

I'm not sure that am I making any sence here :LOL:

I do think that Opteron has the possibility to move to hi-end, as demonstrated by the Cray-deal. The CPU has the needed features and performance to do that. Of course, there might be CPU's that are faster, but they are also alot more expensive, and you could get 2 or more Opterons for the price of one hi-hi-end CPU.
 
Chalnoth said:
arjan de lumens said:
AFAIK, Gallium Arsenide is not really useful except for niche products - it draws something like ten times as much power as silicon and has awful yields once you get more than about a million transistors per chip. Something similar applies to most of the alternative semiconductors as well.
Power consumption: I find that exceedingly hard to believe. With about five times the electron mobility, it should be capable of about one fifth the power consumption.
GaAs components rarely match the theoretical electron mobility value due to defects - the practical value is around half the theoretical one, so the electron mobility of GaAs is really less than twice that of Si. The intrinsic resistivity of GaAs is generally much higher than Si and coupled with the fact that the thermal conductivity is worse, you haven't got a good receipe for a high frequency processor running nice and cool.
 
Nemesis77 said:
Dave H said:
And not to undersell K8--Opteron in particular is a world-beating part for the low-end x86 server space.

If you consider Opteron to be a "low end" x86 server-chip, then what is a hi-end x86 server-chip? Xeon? And since they are building supercomputers that use the Opteron, can it be considered a "low-end server-chip"?

I misspoke. I should have said something like "low-end/x86", or just left it at "low-end". x86 does not extend into the high-end of the server market, although in terms of delivered systems from top-tier OEMs, Xeon is currently ahead of Opteron on that measure. (But that has nothing to do with chip-level differences between the two.)

Supercomputers are a very different market from high-end servers, with much less stringent RAS requirements and often very different performance characteristics. The suitability of Opteron for some supercomputing tasks doesn't necessarily have much to do with its usefulness for commercial big iron.
 
V3 said:
And can you still tell the difference between The Hulk and a film of a theoretical real-life Hulk?

Comparing CG Hulk and Big Lou painted green, I think the CG Hulk looks more believable.

Sure: I meant compared to if there actually were such a thing as a Hulk that would exist in real life and could then be shot on film. Then again, Lou painted green still has some advantages in terms of correct lighting, realistic interaction with his environment, and realistic-looking "animation".

The point is, we still have a long way to go before CG of a complex object in a complex environment will be mistaken for real life.
 
Ailuros said:
I know I'm nitpicking here, but I have a few minor disagreements here.

a) the biggest gain between MS/AF vs SSAA is IMO fillrate to boot with. I don't see current accelerators losing 1/4th of their fillrate with 4xMSAA/2xAF (assume 2xRGMS is virtually fillrate free on NV35; 3600MTexels/sec fillrate, with 8xS and the 4xOGSS part the fillrate drops to >900MTexels/sec)

Clearly, yes. My point was just that MS + AF also can have a nice bandwidth improvement over SS when color compression is in use. But the main improvement is certainly fillrate.

b) TBDRs (albeit admittedly rare and once in a blue moon) have been around since the dawn of 3D (it simply doesn't fit as an example in the consensus you've put it - see "One time algorithmic increases").

TBDR is still a one-time algorithmic change that leads to significant efficiency improvements (in most cases), and it still takes more logic and design complexity to implement compared to IMR. It is somewhat of an outlier with respect to my overall point, because as you note it hasn't been a lack of a transistor budget that has kept TBDRs out of the mainstream until now. For one thing, the efficiency gains are usually large enough that you can achieve similar performance with an overall smaller transistor count. But given a naive IMR and a TBDR with otherwise identical execution resources, the TBDR will require more transistors.

As for exotic AA algorithms, wether they'll be pure fragment AA or similar to Z3 techniques, I'm pretty sure that if IHV's decide to implement either/or, or a clever combination of those, they will not carry the same shortcomings as for example on Parhelia and it's FAA.

Agreed. And I think the main factor in postponing their implementation is that avoiding the artifacts of Matrox's FAA requires a more complex algorithm and thus greater hardware resources than designers want to spare with current transistor budgets. As Moore's Law works its magic, designers' evaluation of this tradeoff will change.
 
TBDR is still a one-time algorithmic change that leads to significant efficiency improvements (in most cases), and it still takes more logic and design complexity to implement compared to IMR. It is somewhat of an outlier with respect to my overall point, because as you note it hasn't been a lack of a transistor budget that has kept TBDRs out of the mainstream until now. For one thing, the efficiency gains are usually large enough that you can achieve similar performance with an overall smaller transistor count. But given a naive IMR and a TBDR with otherwise identical execution resources, the TBDR will require more transistors.

I doubt that a high end TBDR today can get away with significantly less transistors than a high end IMR; my bet would be that due to shaders taking up a lot of transistors that they should be damn close.

As far as the past goes NV11 + T&L = 19M vs STG4500 = 15M. Efficiency was higher for the latter back then, yet I don´t see any significant differences in transistor counts, especially since it lacked a T&L unit.
 
Ailuros said:
I doubt that a high end TBDR today can get away with significantly less transistors than a high end IMR; my bet would be that due to shaders taking up a lot of transistors that they should be damn close.

A TBDR won't shade fragments that end up being occluded. An IMR often will, unless a "software deferred" rendering style is being used, ala Doom3. So in fragment shader-heavy scenes, a TBDR can achieve the same realized fillrate as an IMR that has greater fragment shader execution resources.
 
Neeyik said:
GaAs components rarely match the theoretical electron mobility value due to defects - the practical value is around half the theoretical one, so the electron mobility of GaAs is really less than twice that of Si. The intrinsic resistivity of GaAs is generally much higher than Si and coupled with the fact that the thermal conductivity is worse, you haven't got a good receipe for a high frequency processor running nice and cool.
There are two types of defects possible:
1. Crystal lattice defects: the lattice isn't exactly regular. I can imagine how this would be somewhat more of a problem than it is for silicon, but that doesn't mean it's a problem that can't be solved. Refining the annealing process should solve the issue (annealing: heating and cooling the surface after both substances are "sprayed" on the same substrate).

2. Impurities: A stray atom that wasn't meant to end up in the final substance makes its way in. This is obviously a manufacturing defect that is even more solveable than it is for silicon.

Anyway these are just issues to be overcome. I think that companies are going to be forced to go for GaAs (or another high-mobility semiconductor) regardless of the added challenge (and therefore money) in going for the substance.

Just remember that silicon has had decades for process refinement. Attempting to use the same machinery and processes for something like GaAs is bound to produce issues (one easy way to see this: using silicon machinery will undoubtedly lead to contamination of the semiconductor with silicon, and manufacturers will be hesitant to do this as that machinery, after working with GaAs, will no longer be able to be used for Silicon, due to contaminants), so the costs for changing substances are daunting. But a wall is coming, and it is coming fast. The semiconductor industry is going to have to find ways around this wall, no matter how unpleasant they may be.
 
A TBDR won't shade fragments that end up being occluded. An IMR often will, unless a "software deferred" rendering style is being used, ala Doom3. So in fragment shader-heavy scenes, a TBDR can achieve the same realized fillrate as an IMR that has greater fragment shader execution resources.

Let´s make it more specific then: how much would you predict would the difference in transistor count be between a PS/VS3.0 TBDR and an equivalent IMR?
 
Dave H said:
And with respect to realtime graphics ASICs:

First off, the notion that we are anywhere close to "good enough" is silly...

And, for that matter, can you still tell the difference between a film of actors (projected in a theater) and people in real life?

... We have a long, long way to go.

Silly old me. Yes I can tell the difference. Does that difference matter to me? Not really. I seem to be perfectly happy to accept various arbitrary shortcomings in depictions of settings, without them actually impinging on the experience at all. There's a low poly system called T.E.X.T. you might have seen. Another one is called T.H.E.A.T.R.E. I can sit through sessions of both without thinking, "Ooooh, just needs a few more triangles..."

I know this is a thread for card-carrying triangle fetishists, I'm simply saying that the big step was taken by Quake 1, and constantly polishing this way of depicting a world is not getting us significantly further.

A.
 
I know this is a thread for card-carrying triangle fetishists, I'm simply saying that the big step was taken by Quake 1, and constantly polishing this way of depicting a world is not getting us significantly further.

Albeit realism is by far not a triangle or poly issue exclusively, it's not the thread alone that aims in that direction, rather the entire board.

Fetishism is a pathological case last time I checked, whereby I don't see anything wrong with wanting improved quality. On the other side of the riverbank are of course the nostalgic "relic-fetishists"; it would be quite laughable if I see someone getting apologetic why he doesn't get a kick out of playing Q1 on an ancient V2, in 800*600*16, with pixels sized as big as my finger, completely aliased and blurry as hell. :oops:

**edit: typos and syntax arghhh
 
Back
Top