when will 512bit memory bus come?

512-bit buses on standard PCBs isn't practically doable, as far as I can see - unless you are willing to do boards with 20+ layers, which will have serious board yield issues and have trouble fitting into standard expansion slots. Doing a 512-bit bus within a multichip module shouldn't be too hard, though, but its costs a bit extra (not sure just how expensive MCMs are these days; you may get a pad-limited GPU as well, which further drives up cost) and makes it essentially impossible to distribute memory separately from the GPU.
 
Hyp-X said:
Entropy said:
Exactly the same argument was made as little as one year ago re:128 vs. 256 bit paths to memory.

And what does this show?
That it didn't happen as soon as some did expect.
Some might though that games would pop up using 10+ instruction shaders overnight, but it didn't happen.

I didn't say that rendering mode will be common in 2004, I'm sure it wont be, software tech doesn't move very fast. (For various reasons that are often discussed here, so I don't feel like repeating them.)

But think about it: 128bit arrived in 1998, 256bit arrived in 2002, 4 years later. By that trend 512 is due in 2006. And I do think that those techniques will be significant by then.

The assumption you are making (and which I believe is fallacious) is that 2006, programmers will do what we envision today and in such a way so as to make bandwidth a less important factor.

I'll try to examplify. Consider the 9700 vs the 9500Pro.

As long as we render according to 1998 expectations, the 9500Pro performs just as well as the 9700. The difference comes when we perform higher level filtering with anti-aliasing. Now, that wasn't high on the agenda back in 1998, and possibly wouldn't be seen as such a basic requirement today either, unless the R300 had made it so by making it available. In a way, providing "excess" bandwidth provided the opportunity to make something practical that otherwise wouldn't be, which proved to be popular and desireable and thus marketable and profitable.

What you are assuming is that methods that are currently too bandwidth intensive won't become popular, nor will the techniques currently available be used in such a way so as to strain the memory subsystem (example: DOOM3 requiring massive fillrate). I don't think such assumptions are likely to be true.

Furthermore, I'll repeat my two main evolutionary points:
a, memory clocks generally evolves slower than logic.
b, graphics readily benefits from parallellization.

So in 3-4 years, when GPUs are made on 65nm processes instead of 150 nm, not only will their clocks have scaled, but the amount of parallellism is also likely to have increased. Overall, the data processing power of GPUs will grow much faster than the memory subsystem clock.

Final example is CPUs. Note how their bandwidth needs have outstripped the memory technology development, in spite of massive increases in the amounts of cache and being fundamentally single threaded, as opposed to the parallell processing of the graphics ASICs. So even if we assume that GPUs will develop in the general programmability direction of CPUs, this does not necessarily imply reduced bandwidth needs, quite the contrary particularly since we add massive parallellism.

Imagine we could ask a GPU designer in 2005 if he could squeeze more out of his alotted silicon given twice the bandwidth to feed it.
You are implying that he'll answer "no, not really".
I'm saying he'd be more likely to answer "hell, YES!".


(As to the specific means to achieve that bandwidth - What paths technology will take in the future is basically about cost/benefit. Manufacturing is a major part of that, but so is tradition. Up until a year ago 128-bit memory subsystems on motherboards where regarded as horribly expensive high-end. The nForce changed that, and a year later we have multiple sources of dirt cheap 4-layer boards supporting 128-bits worth of DDR400. Just over a year ago, 256-bit memory subsystems on gfx-cards were a theoretical possibility that most dismissed in favour of DDR2 at high clocks. Today ATI, nVidia, 3Dlabs and matrox all use 256-bit memory subsystems and it seems inevitable that it will migrate to low cost cards, just as 128-bit buses did. Will 512-bit memory subsystems become a reality? Damned if I know. It sure seems ugly at this point in time. But I wouldn't discount it out of hand. Going wider has two big points in favour of it. One is that there are no liscensing fees involved, the other is that it is a devil we know. There are alternative memory technologies that offer high bandwidth but where neither of these important points are true.)

Entropy
 
Even a layman like me can figure out the following:

I'd say that current architectures need to increase in efficiency first above all, with the existing bus widths, before doubling the latter will have to be a necessity.

Speaking of (and it might be a dumb question) is today a 256bit bus really a blessing under all conditions? (please don't hurt me it's just a generic question :) ).
 
It's an unstated assumption that I'm sure everyone in the discussion is aware of, but just to bring it out into the open:

There is an upper limit to the number of pixels that can be displayed every second. With today's monitors, any ability to push pixels beyond, say, 1600*1200 @ 100 fps is essentially useless; and it is unlikely that in three or four years monitors will have advanced much on this measure. (Maybe up to 2048*1536 with a refresh rate high enough to justify up to, say, 120fps, which works out to twice the number of pixels per second. But that's something of an upper limit.) A more realistic measure--the point at which it generally makes sense to turn up every other graphical detail rather than go for higher resolution or framerate--will probably be around 1280*1024 @ 80fps for most 2006 gamers.

And, of course, for many games we're already easily capable of 1600*1200 at 100fps with a top-end GPU today. In other words, not more pixels, but better ones.

The thing is, most known techniques for achieving that--and considering the hardware of 2006 is probably just hitting the drawing boards at Nvidia and ATI today, there's little reason to believe we'll see unthought-of techniques in that time frame--rely on a balance tilted much more in the direction of processing power as compared to bandwidth than do today's rendering techniques.

The only obvious bandwidth-hungry technique I can think of is higher levels of multisampling. And it's compounded somewhat by the fact that, as geometry gets more complex, and thus more screen pixels are edges, color compression becomes less effective and multisampling takes an even higher bandwidth hit.

But I doubt multisampling will be the AA technique of choice three years from now. The bandwidth issues are obvious. But the footprint issues may be nearly as bad. At a high resolution, and with increasingly detailed textures, you can't expect to get much beyond 8xMSAA before you're maxing out a 256MB card. And I don't expect 512MB cards terribly soon.

There are analytical techniques that are just smarter ways to do AA. (Matrox's FAA is an early example; Z3 a more complicated version that should fix almost all of the problems with Matrox's implementation.) Unfortunately, they're not used in current cards because they take up too much silicon and still have some implementation kinks to be worked out. In that, they're similar to where today's multisampling + anisotropic filtering combination was a couple years ago--a much smarter alternative to the current brute-force way of doing things (supersampling), that wasn't quite ready for prime-time using the prevailing technology.

Analytical AA techniques shift the cost of doing AA from bandwidth and memory footprint to processing power and silicon area. Similarly, as more surfaces are colored by procedural shaders rather than texture mapping, anisotropic texture filtering (which takes a balance of bandwidth and fillrate) will be called upon less, and compute-intensive analytical methods of shader antialiasing (built into the shaders themselves) will handle more of the non-edge antialiasing.

Think for a moment of the main sources of memory traffic:

1. Z reads/writes
2. Color writes (and reads in the case of alpha blends)
3. Texture reads

All of them scale (obviously) with framerate; however, as I've mentioned, framerates today are generally high enough that we don't need any further improvements there.

#1 and #2 scale with increasing resolution and multisampling; #3 scales somewhat with increasing resolution, but not with multisampling. Again, we're hitting the resolution limits of today's monitors (and probably tomorrow's) already. We'll probably see an increase in multisampling in the short term, but I doubt much beyond 8x; and if we drop 8x MSAA for an analytical AA technique, that's a huge gift of bandwidth dropped into our laps.

#3 scales somewhat with increased texture size, especially at high resolutions. And textures will continue to get more detailed, although there are space requirements to worry about, particularly with high levels of multisampling. (Presumably more detailed textures would quickly fill the void left if and when we switch from multisampling AA to analytical AA.)

#3 scales with increasing amounts of AF, but 16x is already enough for all but the very very most extreme angles, which in turn occur so rarely that they would barely cause any performance hit. (Look at how small the hit is going from 8x to 16x on an R3xx; that's because extremely few pixels actually require more than 8x.)

#3 scales as we use more textures per fragment. And we will continue to use more textures per fragment...but at a slowing rate, as more and more fragments are colored by complex shaders of which fetching a few textures is only a small part.

All three scale with true overdraw. We'll surely see higher levels of scene overdraw, but by the same token we'll have a lot more silicon area to donate to overdraw reducing techniques.

Of course some of them scale with multiple passes, depending on what's done on the passes. (A Doom3 style first-pass only involves #1, of course.) This fits in with the use of MRTs, which promises to become pretty prevelant.

And that's all I can think of off the top of my head. It's a reasonably long list, but all of the items are relatively small in scope, or relatively far along towards their likely limits. Meanwhile, all the big important new areas that realtime rendering looks set to move into are processor heavy and bandwidth light.

Does that mean that the rise in required bandwidth will be adequately taken care of simply by rising DRAM clock speeds? It's hard to say. One important thing to note is that, while the techniques used in today's rendering haven't become substantially less bandwidth-intensive than they used to be, there has been a rash of bandwidth-saving optimizations incorporated into GPUs in the last couple years, which arguably put off the need for 256-bit buses by perhaps a year or so. I wouldn't say that more such techniques aren't on their way, but it would seem like the low-hanging fruit (or at least the API-transparent low-hanging fruit) has been picked.

And then, as has been noted, there is the perennial posibility of eDRAM to consider. The feasibility of that depends mostly on progress with mixed logic/DRAM processes. It's an open question whether we'll ever see an eDRAM part for the PC; if we do, it obviously argues for a narrower external bus.

Eh; all in all, I'll go out on a limb and say it all depends on the timing of multisampling being replaced by a much more bandwidth-friendly edge AA method. But I wouldn't be surprised if 256 bits is with us for a long time, or forever.
 
Personally, I was suprised when ATI announced the Radeon 9700 Pro with its 256-bit bus. I didn't think that manufacturers would go for the added cost of a 256-bit bus. In particular, there are other ways to increase effective memory bandwidth.

I think that my mistake with the 256-bit bus was that I had thought that other, more cost-effective technologies would come into play before a 256-bit bus became viable. I guess I was wrong.

But I do contend that it will take quite a long time for a 512-bit bus to become viable. The 128-bit bus was prevalent in the high-end for about 5 years or so before the Radeon 9700. Within five years, the silicon microprocessor industry itself will be in jeapordy, and companies will need to seek out competing technologies in order to further improve computational performance. To attempt to predict anything about the computing industry in five years is a fool's errand.
 
I'm sorry this is so late a response, I've been busy.
Dave H did a good job of sorting out main contributors to short term increases in bandwidth need.
Dave H said:
And that's all I can think of off the top of my head. It's a reasonably long list, but all of the items are relatively small in scope, or relatively far along towards their likely limits. Meanwhile, all the big important new areas that realtime rendering looks set to move into are processor heavy and bandwidth light.

The problem I have with this is twofold - the first is that these factors multiply. So a factor of two here, a factor of three there, another factor of two, ... the compund factor becomes quite large! Particularly in the light of what kinds of memory speed improvements we are likely to see over the same time, and it is that ratio which will determine just how high the pressure will become to increase bandwidth by other means than just waiting for higher clocked RAM technology.

The other major disagreement I have with Daves post is the implication that the current situation as regards fillrate and framerates is just fine.

Dave H said:
....however, as I've mentioned, framerates today are generally high enough that we don't need any further improvements there.
This, in my book, just isn't true.

If we use Daves' rather conservative 80 fps target, we note that the 9600Pro, arguably the best non-top-of-the-line card right now, coupled with a high end host system, manages 17,9 fps in Splinter Cell using Beyond3Ds demo, and the 9800Pro manages 31 fps at 1600x1200. That's average fps without AA or AF. Since it's average fps, we can typically assume that minimum fps is less than half those numbers. Thus the 9600 Pro is roughly a factor of 10 below Daves target fps, and the 9800Pro is roughly a factor of 5. Now, lots of people feel that AA and AF is desireable and worth paying for. Unfortunately, Beyond3D haven't tested FSAA+AF performance with the 9800Pro using Splinter Cell, presumably because the performance would be utterly unplayable at high resolutions. However, using the mostly CPU limited Serious Sam test, we find that even the 9800Pro drops from 84 to 35 fps when 6xAA and 16x adaptive AF is enabled. The situation is likely to be worse still with the more fillrate limited Splinter Cell, but lets be very generous, and say that with AA and AF enabled, we are at least a factor of 10 removed from where we would like to be in terms of fillrate.
And that's with a game that has been available for some time, and arguably the fastest graphics card in existance!

DOOMIII is soon upon us. In itself, it is probably even more demanding than Splinter Cell, and in its wake will follow games that will use the same basic engine but add more demanding lightning, more polygons for the models, involve areas that will incur more substantial penalties in terms of overdraw... in short, we will run smack into Daves list of compounding factors above. There is no question that game designers will be severely constrained by lack of fillrate. Nor can there be much doubt that having more fillrate available will bring gamers direct and tangible benefits in terms of enjoyable resolutions and framerates - straightforward benefits people are prepared to pay for.

We are not talking about a situation several years hence here, this is the situation right now, which will only become more acute in the coming year. Short term, from where we stand today, overall fillrate demand of games looks likely to grow much faster than memory clock speeds. How the graphics hardware manufacturers will deal with that pressure remains to be seen.

Long term - well, we'll see about that as well. :)

Entropy

PS. The most interesting piece of technology news lately, haven't seen it mentioned here. It has implications.:
http://www.digitimes.com/NewsShow/N...000000000000000000000000001258&query=OLED
 
Entropy said:
DOOMIII is soon upon us. In itself, it is probably even more demanding than Splinter Cell, and in its wake will follow games that will use the same basic engine but add more demanding lightning, more polygons for the models, involve areas that will incur more substantial penalties in terms of overdraw... in short, we will run smack into Daves list of compounding factors above.
I'd just like to point out one minor point:
In DOOM3, adding more polygons to the models won't increase overdraw. The most significant hit should be from the calculation of the shadow hulls, which means more CPU power required.
 
Entropy said:
DOOMIII is soon upon us. In itself, it is probably even more demanding than Splinter Cell, and in its wake will follow games that will use the same basic engine but add more demanding lightning, more polygons for the models, involve areas that will incur more substantial penalties in terms of overdraw... in short, we will run smack into Daves list of compounding factors above. There is no question that game designers will be severely constrained by lack of fillrate. Nor can there be much doubt that having more fillrate available will bring gamers direct and tangible benefits in terms of enjoyable resolutions and framerates - straightforward benefits people are prepared to pay for.

It's funny you bring up D3 as I was planning to come up with it as an example...

Think about the R300 running the R200 path. It contains 7 texture sampling (worst case), 2 of witch are normalization cube-maps, used to normalize the light and the halfway vectors for the lighting calculations.

Now think about the R300 running the ARB2 path. I samples all textures the same, except it doesn't need the cubemaps. So it needs less bandwidth.
Is it faster? No, it's a little bit slower. (And on the NV30 it's much slower.)

So D3 is not bandwidth limited, at least not with the ARB2 path that is the highest quality mode.
 
Actually I was a bit careful in my wording and used "fillrate". I simply don't have any real life data that would reliably let me evaluate the effect of bandwidth alone, the preliminary tests only show the severe hits on framerate incurred by increasing the resolution, and that the picture is even more grim when AA is enabled.

That's a problem with drawing firm conclusions from future games.
We can only judge from the preliminary data on hand.

(I should have been more careful about D3 because as Chalnoth pointed out, increasing the polygon load will have a greater impact on the CPU. )

Overall though, I feel quite confident about the overall gist of the post - D3/HL2 is likely to shift the goalposts in such a way so as require substantially higher fillrate than what we see in the market today. Coupled with increasing expectations for useable AA and AF, the overall market demand for bandwidth isn't likely to decrease.

At least not from where I stand.
Others may stand higher and see further though. It has happened before. ;)

Entropy
 
Dave H said:
It's an unstated assumption that I'm sure everyone in the discussion is aware of, but just to bring it out into the open:

There is an upper limit to the number of pixels that can be displayed every second. With today's monitors, any ability to push pixels beyond, say, 1600*1200 @ 100 fps is essentially useless; and it is unlikely that in three or four years monitors will have advanced much on this measure.

I'm not as sure of that as I once was.
http://www.adtx.co.jp/en/pdf/products/lcd-md22292b_win_e.pdf
http://www.creativepro.com/story/news/16974.html

9.2 Million pixels at 85Hz is 782MPixels/sec.
Its $8K at the moment, but I see LCD displays getting cheaper and cheaper.

Anybody know what the decay rate is for LCD prices? The last time I saw a price on this display it was ~10K, a year or so ago.
 
Entropy said:
... and that the picture is even more grim when AA is enabled.

That can sound as bandwidth limited at first since we know that MSAA is fillrate free, right?
Wrong.
MSAA does 1 color calculation per pixel, but needs multiple Z/stencil operations. So it can take fillrate, and on shadow passes this is the direct limiting factor.

That also means that the shadow calculations are fillrate and not CPU limited (at least with AA enabled).

Entropy said:
(I should have been more careful about D3 because as Chalnoth pointed out, increasing the polygon load will have a greater impact on the CPU.)

Well, yes and no.
It depends on where those polygons are spent.

If you tesselate the existing models it will mostly come out as a CPU hit.

But if you increase the complexity of the surface (like adding "bumps"), the fillrate impact will be much more severe.
 
LOL!

Why do I get the distinct impression you're henpecking me to death on the minutiae of D3 rendering (where, I might add, neither of us has truly solid data to go by as far as bandwidth is concerned), rather than tackle the main issue - will we face increasing demands on bandwidth as we go forward?

I see trends, and the current market, and when I extrapolate from these the answer I get is: Yes, we face increasing demands on bandwidth.

The argument that in the future we will use shader code which in principle should be less bandwidth dependent, and thus the current status quo should be sufficient in the future, fails to convince me. It doesn't connect to the realities of the hardware shader capabilities in the market today or in the near future (which is what determines the directions taken - got to have a market to sell to), it doesn't relate to current methods for AA+AF which is what people are learning to appreciate and which is a driving force in the market. Thus the shader argument projects too far into the future - it assumes hardware that isn't here, running applications that aren't around even as known projects, it doesn't adress the problem of inertia from supporting a continuity of evolutionary hardware, and it assumes that when shader code is finally the established standard, bandwidth will no longer be a limiting resource, something that flies in the face of pretty much all computing experience known to man.

It's not that I question that shader use will be a growing trend in games, but rather the conclusions regarding bandwidth need for future hardware/software. Projections into the future is a tricky business. A lot of technical decisions and directions will be decided by what is marketable with reasonable profit margins at a given point in time, which are tricky calls to make even for ATI and nVidia obviously.

But as far as bandwidth is concerned, to paraphrase the immortal words of ex-agent Smith - "We need more!".

It is inevitable.

Entropy
 
Entropy said:
Overall though, I feel quite confident about the overall gist of the post - D3/HL2 is likely to shift the goalposts in such a way so as require substantially higher fillrate than what we see in the market today. Coupled with increasing expectations for useable AA and AF, the overall market demand for bandwidth isn't likely to decrease.
Granted, but I think that we will see a trend of increased graphics processor power/memory banwdidth ratio needed in software.

Take DOOM3 as an example.
I believe the rendering passes are as follows:
1. Render once, with color and stencil writes off, creating depth information for the entire scene. One z read and one z write per pixel.
2. Calculate and render shadow hulls, requiring a single z read and stencil write per pixel.
3. Render surfaces. One z read, one color write, and multiple texture reads per pixel (plus, sometimes, one color read from multipass).
4. Repeat steps 2-3 for each light.

Now, steps one and two are going to have a very low fillrate/bandwidth ratio required. For step three, it depends on how many texture reads are needed compared to the shader length. I would contend that now that shaders are much more flexible, game developers are going to be moving away from using mostly textures for the information to generating that information in the shader (as in the cubemap lookup for specular highlight example in DOOM 3).

So, part of the decrease in memory bandwidth needs (relative to processing power needs) are a given based on the stencil shadowing algorithm. I believe most other shadowing algorithms will have similarly-low memory bandwidth needs. But the shading still needs to be done, and that is both the unknown and the deciding factor. I contend that game developers are going to be going for fewer and fewer texture reads (with respect to the number of shader instructions) just because they can, because the shaders are that much more flexible.

While bandwidth needs will increase over time, pure processing power needs of the GPU will increase more quickly.
 
I think it is plain that there will be a multitude of factors dictating how we progress from here, some of them not related to gfx-tech per se.

A major one that I surprisingly haven't seen discussed is how the X-Box will affect upcoming games. The X-Box will continue to be sold at least up until 2006, and it is quite reasonable to assume that many developing houses will target their titles for release both on the PC and the X-Box, given the comparatively large volume of active gamers in the X-Box market as the installed base grows. After all, that was the intention of Microsoft, and so far it seems to have worked out to some extent.

But the X-Box is a double edged sword, and what will it mean for the uptake of DX9 features?
For the life of me, I can't really see other than that as we progress from here, the X-Box will shift from being a device that so far generally has helped lift the baseline for PC games, to progressively turn into being something that will generally make the uptake of higher level functionality slower, and that such features will tend to be limited to optional "effects" (much like DX8 level shaders has largely been used for pretty water rendering). Using an engine that requires DX9 level functionality would in effect cut you off from the installed base of X-Box gamers.

Other market related issues is the lack of capable DX9 hardware. It doesn't look as if nVidia, still the largest gfx supplier, will produce anything that will reach R300 capabilities this year. Knowing that, and given the X-Box, would you produce a game for 2004 that required R300 level shaders? And knowing that, will the hardware providers focus on bringing high level shader performance up to snuff, or will they focus on features that will allow PC gamers to play X-Box level games at higher quality? I wouldn't underestimate the market inertia, and the X-Box' influence on it for years to come.

Entropy
 
While I don´t disagree that the importance of arithmetic/computational efficiency will increase by a higher degree than bandwidth requirements itself, bandwidth will still have to increase inevitably over time.

Unless I´m missing something I can´t imagine an accelerator based on today´s standards to come along with 5Gigapixels/sec fillrate and just 8GB/sec memory bandwidth either.
 
Most of the best PC games in the coming few years will not also be simultaneously developed for the X-Box.
 
Chalnoth said:
Most of the best PC games in the coming few years will not also be simultaneously developed for the X-Box.

No idea what bigger developing houses intend to do in the longrun, but I also have a hard time believing that we'll be moving apart from the lowest possible common 1.1 shader denominator that soon.

Unless someone wants to make my day and try to convince me that today's budget dx9.0 accelerators can run version 2.0 shaders at adequate speeds ;)
 
Me said:
Dave H said:
It's an unstated assumption that I'm sure everyone in the discussion is aware of, but just to bring it out into the open:

There is an upper limit to the number of pixels that can be displayed every second. With today's monitors, any ability to push pixels beyond, say, 1600*1200 @ 100 fps is essentially useless; and it is unlikely that in three or four years monitors will have advanced much on this measure.

I'm not as sure of that as I once was.
http://www.adtx.co.jp/en/pdf/products/lcd-md22292b_win_e.pdf
http://www.creativepro.com/story/news/16974.html

9.2 Million pixels at 85Hz is 782MPixels/sec.
Its $8K at the moment, but I see LCD displays getting cheaper and cheaper.

Anybody know what the decay rate is for LCD prices? The last time I saw a price on this display it was ~10K, a year or so ago.

Hi Me

I see on that CreativePro site the Viewsonic was $8000 2 years ago. You saying it's still $8000 now??

Just asking.

US
 
On pricewatch that monitor is currently sitting at about $6300. We've still got a while to go, but damn it would be nice to have such a huge, high-resolution display....

Anyway, you really resurrected an old thread, didn't you?
 
Chalnoth said:
Personally, I was suprised when ATI announced the Radeon 9700 Pro with its 256-bit bus. I didn't think that manufacturers would go for the added cost of a 256-bit bus. In particular, there are other ways to increase effective memory bandwidth.

I think that my mistake with the 256-bit bus was that I had thought that other, more cost-effective technologies would come into play before a 256-bit bus became viable. I guess I was wrong.

But I do contend that it will take quite a long time for a 512-bit bus to become viable. The 128-bit bus was prevalent in the high-end for about 5 years or so before the Radeon 9700. Within five years, the silicon microprocessor industry itself will be in jeapordy, and companies will need to seek out competing technologies in order to further improve computational performance. To attempt to predict anything about the computing industry in five years is a fool's errand.
in hindsight, looking at the NV30 (GeForce FX 5800 Ultra), it seemed like NVIDIA had a similar idea about 256-bit bus ..
 
Back
Top