How could eDRAM be included in a PC part: Chime in with your theories

Farid · Mar 7, 2007

As you know, Nvidia is working on a part that integrates eDRAM. Sure, they're focusing on handhelds first with that, but you'd expect it to invade other market segments eventually.

Interesting in concept, it still leaves us with this question: how are they going to do it?
While eDRAM makes sense in an embedded product, since render targets and outputs are fix values and/or easily controlled ones, that can't be said for PC part.

So what are the options available for the inclusion of eDRAM in a IGP/low/middle end part? Having the whole VRAM comprised of eDRAM is not imaginable, and tilling would require developer input, so what's left? Unless it's automatic tiling of course, which might or might not be very optimal...

Uttar and me exchanged a few theories on that, well, mainly one, and we thought that it would be a good topic of discussion. So, let's get imaginative, folks.

nicolasb · Mar 7, 2007

Well, maybe they'll do what the Bitboys' infamously vapoury Glaze 3D was supposed to do, which was to use eDRAM for the frame buffer while keeping textures and geometry in some other sort of memory. With Glaze 3D, if the resolution got too high, it would use the eDRAM only for the Z-Buffer values.

Much of the frame-buffer bandwidth (back in Glaze 3D days, anyway) was taken up with Z-buffer reads and writes, so even a fairly small amount of onboard eDRAM could supposedly increase performance (i.e. fill-rate) significantly. If we're talking about a low-end part, then maybe main memory is fast enough to do just the texture reads, and using eDRAM frame buffer would give respectable performance with no onboard memory on the card...?

Geo · Mar 7, 2007

Well, I'm reminded of Richard Huddy's observation about a year ago, which basically was "no way". He pointed at the resolution issue and problem of scaling designs down the family in a cost-effective way. IGP parts are really cheap after all. . . That's the other advantage that handhelds have, they're typically going to be able to charge a bit more premium for the good ones.

wolf2 · Mar 7, 2007

Good topic.

I am not as technical as you guys so I have more questions than answers:

1. One thing I never understood were Nvidia's turbocache architectures. Is there some insight that can be gained from that effort as to what they are doing for the notebook market?

2. In Ibm's announcements for their performance oriented edram, they indicate in the relative near term that as much as 48MB of edram may be embedded in PowerPC's in the 2008 timeframe. Assuming TSMC's process is of similiar size (but optimized for low-power), why can't a 5 or 8MB chunck of eDRAM be embedded in a notebook chip for use as the VRAM? It seems the power savings in this topology would be fairly dramatic as you would eliminate close to 250MB/sec of bandwidth fetch from the shared memory for servicing the display.

3. If neither 1 or 2 above pan-out.....Given the TSMC process is optimized for low-power, what is the architecture for low/mid notebooks that give the most bang for the buck in saving power?

Arun · Mar 7, 2007

First, the not-insanely-technical-and-long part...

wolf2 said:
Assuming TSMC's process is of similiar size (but optimized for low-power), why can't a 5 or 8MB chunck of eDRAM be embedded in a notebook chip for use as the VRAM? It seems the power savings in this topology would be fairly dramatic as you would eliminate close to 250MB/sec of bandwidth fetch from the shared memory for servicing the display.

Based on the cell sizes given by TSMC and IBM, I would indeed assume that the density of their EDRAM are very similar - I already noted this in the news post.

As for using EDRAM to save frontbuffer bandwidth on laptops, this is an interesting approach, but it seems a bit expensive to me for a rather small benefit. Remember that 5-8MB of EDRAM is still quite expensive today. I do not have the insider knowledge to judge this properly, but I would tend to believe that 16MB of stacked low-power DRAM would be a better design choice for that purpose. This even allows you to have a full 2560x1600 frontbuffer there. Sadly, as I said, I do not have the insider knowledge to estimate the exact cost and power implications of this approach.

And now, here comes the long part of the post! For those who don't have the time or desire to read it, the basic idea is to store the compressed version of the framebuffer in EDRAM. And because the same memory areas are used for the uncompressed or less-compressed versions of the data, this saves substantial bandwidth even in the worst-case scenario.

---

My idea for this is fairly simple. Consider how framebuffer compression likely works because of memory burst lengths. And consider that you likely have intermediary compression stages between "fully compressed" and "fully uncompressed".

Consider a GPU with X compression levels and colour had Y compression levels. So X-1 and Y-1 levels if you exclude "uncompressed" from there. Now, consider how that works on current GPUs due to memory burst lengths. The most aggressive compression level would likely only require writing and/or reading one burst of data, while the uncompressed level would require writing and/or reading several bursts of data.

However, the memory area used for the uncompressed data includes the one that would be used for the various levels of compressed data. So, given this, let us consider a maximum compression ratio of 4:1 and what happens if only one memory area out of four is exclusively present in EDRAM, while the three others are exclusively present in VRAM. In the worst case, you only save 25% bandwidth; in the best case, if all the data in the framebuffer is fully compressed (very unlikely except for test scenes with only one big triangle!), no read/write access to VRAM is required at all. The average savings should be very high for a reasonable amount of EDRAM.

So, let us also consider an extension of this scheme to support arbitrary resolutions and higher utilization rates of the EDRAM. Instead of reserving the same number of burst-sized memory areas for every block of pixels, a variable number of areas could be reserved. The driver would inform the GPU that burst-sized memory areas 1 and 2 are always in EDRAM for every block, while area 3 is in EDRAM for one block out of Y, where Y is either an integer or a floating point number. Areas 4, 5 and 6 would always be in VRAM. This could also be tuned separately for Z, Colour and Stencil if need be. In the extreme case, even the most aggressive level of compression is not guaranteed to be in EDRAM.

From my point of view, and as I have already said in the past, I believe TBDR rendering to also be usable as an efficient form of memory footprint compression. An interesting hybrid implementation I can think of is a natural evolution of Zhu's (now at NVIDIA; previously CTO of GigaPixel) patent on dynamic allocation of memory blocks for TBDR rendering.

The biggest problem with TBDR rendering for memory footprint compression is that certain areas, for example those with pixel-sized triangles, the footprint is going to be higher than that of even a naive IMR. The solution to this, assuming you are willing to dedicate enough hardware to the problem, is to allow a tile of pixels to revert to IMR rendering if the footprint required would exceed that of the IMR implementation. If the API exposes order-independent transparency, then tiles which need that information could be forced never to switch to IMR mode, and the allocator would then spill to VRAM, System RAM, etc.

It is difficult to be confident of how memory footprint would be affected by such a hybrid implementation without having access to the data that only those working at Imagination Technologies (or at NVIDIA, arguably, given they still have GigaPixel employees!) have access to. I would tend to believe the results could be extremely interesting, however.

Anyway, I am getting carried away. Even without use of TBDR-like technology, I hope that this post should clearly prove how usable EDRAM is for PC solutions. A number of optimizations could be done to further minimize the costs of this implementation, logic-wise, such as basing whether something is in EDRAM based on the upper bits of the memory address (although this would waste some VRAM!). I would tend to believe this should be cheap enough to implement as it is, however, at least relatively to the size of the EDRAM macro.

Other aspects of the GPU could obviously benefit to a lesser extend of using EDRAM or similar techniques instead of SRAM. For example, on an IGP or a handheld GPU, you could significantly increase the size of the L2 texture cache without increasing costs, if you already need EDRAM somewhere else in the design. I would tend to believe the effects of a much larger texture cache wouldn't do miracles, but every bit of bandwidth you can save counts in these market segments. As long, of course, as your costs remain reasonable.

EDIT: Please note that this is purely speculation, and not based on any insider information.

wolf2 · Mar 7, 2007

Arun Demeure said:
First, the not-insanely-technical-and-long part...
As for using EDRAM to save frontbuffer bandwidth on laptops, this is an interesting approach, but it seems a bit expensive to me for a rather small benefit. Remember that 5-8MB of EDRAM is still quite expensive today. I do not have the insider knowledge to judge this properly, but I would tend to believe that 16MB of stacked low-power DRAM would be a better design choice for that purpose.

If EDRAM is used as bulk frame-buffer, you must look at the matrix of benefits and liabilities relative to the existing topologies:
Compared to LOW/MID IGP: EDRAM offers better performance and lower power at higher cost.

Compared to HIGH-end Discreet: EDRAM offers equal performance with lower power at perhaps comparable cost. (comparable cost because savings accrue from less-expensive cooling, smaller battery for comparable lifetime, smaller physical footprint (no external DRAM modules)).

So perhaps bulk-frame-buffer EDRAM makes more sense for the HIGH Discreet segment, not the LOW/MID. Unfortunately, cost, power, performance are not quantifiable by me, so this is speculation only. Perhaps someone has more insight into these?

Arun Demeure said:
My idea for this is fairly simple. Consider how framebuffer compression likely works because of memory burst lengths. And consider that you likely have intermediary compression stages between "fully compressed" and "fully uncompressed".

Overall, if I understand this, it seems like a most-recently-used caching algorithm although I don't know which is cached (compressed or uncompressed data). Hard for me personally to comment on it.

PS: What is the acronym "TBDR"?

Arun Demeure said:
Other aspects of the GPU could obviously benefit to a lesser extend of using EDRAM or similar techniques instead of SRAM. For example, on an IGP or a handheld GPU, you could significantly increase the size of the L2 texture cache without increasing costs.

Consider simply keeping all on-chip caches, buffers, scratchpads the same size and simply converting to EDRAM.

To my somewhat layman's knowledge this has the benefit of smaller die-size, less power with somewhat comparable performance. Isn't this what the game is all about? Maybe its as simple as that for this 1st generation of EDRAM?

NocturnDragon · Mar 7, 2007

wolf2 said:
PS: What is the acronym "TBDR"?

Tile Based Deferred Renderer

Arun · Mar 7, 2007

wolf2 said:
Compared to LOW/MID IGP: EDRAM offers better performance and lower power at higher cost.

Yes, in the IGP market, this would not reduce costs at all; only reduce power and increase performance. As such, it might not make sense for the lowest-end parts of the market.

Compared to HIGH-end Discreet: EDRAM offers equal performance with lower power at perhaps comparable cost. (comparable cost because savings accrue from less-expensive cooling, smaller battery for comparable lifetime, smaller physical footprint (no external DRAM modules)).

It is downright unthinkable to consider the complete removal of DRAM modules. 768MiB of EDRAM would require a 2000mm2+ die on 65nm, while the G80 is "only" 480mm2 on 90nm! (and it's already the largest consumer GPU, ever...) - this amount of memory is necessary for textures and render targets.

What is very possible, however, is that you could substantially reduce your costs *and* improve performance by adding on-die EDRAM. That's because high-end GDDR3/GDDR4 is very expensive, and using similar amounts of lower-end memory would be much less expensive. Another important aspect is that for a graphics card retailing at a given pricepoint, the ASP for the GPU would be higher; that's because some of the costs would shift from the DRAM (and the cooling?) to the chip.

So perhaps bulk-frame-buffer EDRAM makes more sense for the HIGH Discreet segment, not the LOW/MID.

I honestly believe it will eventually make sense in every single market segment, but certainly so not this generation. And as the generality and programmability of graphics processors extend further, the justification behind having significant amounts of EDRAM/ZRAM/whatever will vary over time.

Overall, if I understand this, it seems like a most-recently-used caching algorithm although I don't know which is cached (compressed or uncompressed data). Hard for me personally to comment on it.

No, it is not a most-recently-used caching algorithm at all. That would not be very effective, I think. The idea is to put as much of the compressed data as possible in EDRAM (rather than in VRAM!) and spill over elegantly to VRAM when compression fails. And even in that worst-case scenario, EDRAM is used to reduce the bandwidth requirements by a fixed percentage, which depends on resolution.

PS: What is the acronym "TBDR"?

Tile Based Deferred Rendering, which is the rendering architecture used by Imagination Technologies' PowerVR division. You might know them for the MBX and SGX architectures in the handheld space, the Kyro family of desktop GPUs in the DX7 era, and the Dreamcast GPU - among other things...

Consider simply keeping all on-chip caches, buffers, scratchpads the same size and simply converting to EDRAM.

The minimum macro size looks to me to be too small for that to work. The L1 texture caches, for example, are just a couple of kilobytes, while the macro sizes quoted by TSMC are from 4Mb (512KiB) to 256Mb (32MiB)... While smaller macros are not impossible (the GoForce 5300 seems to use a 2Mb macro), it is very unlikely they could viably be more than an order of magnitude smaller!

Furthermore, using even just a single bit of EDRAM means you need to use process tweaks that most likely add 10%+ to the wafer cost. If you're not using enough of it, it's just not worth it. Also note that I'm not saying my proposed technique(s) is/are absolutely optimal; I think they could deliver some very interesting performance characteristics, however...

silent_guy · Mar 8, 2007

Arun Demeure said:
What is very possible, however, is that you could substantially reduce your costs *and* improve performance by adding on-die EDRAM. That's because high-end GDDR3/GDDR4 is very expensive, and using similar amounts of lower-end memory would be much less expensive.

Is GDDR4 really that much more expensive than, say, DDR2? (I honestly have no idea.)
Even if GDDR is double as expensive than DDR2, wouldn't it still be cheaper than a large chunk of EDRAM?

One other question: Are there currently high-speed processes available that support EDRAM? Aren't those just too leaky to be practical and won't it then consume more power than less?

Arun · Mar 8, 2007

My first answer is rather long, so if you don't have the time or desire to read it, the executive summary is that, unsurprisingly, GDDR4 should be much much more expensive than DDR2, but I'm a bit more unsure about the pricing of mid-end GDDR3. This is especially true because I believe the pricing curves of DDR2, GDDR3 and GDDR4 are quite different, and the DDR2 market is currently pseudo-collapsing very quickly. -35% in less than 2 months, anyone? (and those following Vista's introduction, too!)

silent_guy said:
Is GDDR4 really that much more expensive than, say, DDR2? (I honestly have no idea.)

Well, I've been meaning to poke someone to ask Samsung for all the numbers for a while, but I still don't have those, so sadly I can't give you precise figures!

However, this link should give you an idea of the prices for mainstream DDR2 modules: http://www.digitimes.com/bits_chips/a20070307VL200.html

For comparaison's sake, my understanding is that for cutting-edge RAM, the price can be as much as 2.5-3x those numbers. So it does tend to be a pretty significant part of the bill of materials in the high-end. I've never found any reliable source for mid-end RAM, however, such as 1.4GHz GDDR3 currently. The iSuppli PS3 analysis (sigh...) implies that 256MiB of GDDR3 costs ~30$. So assuming that number didn't drop as fast as DDR2, that's still a significant relative cost.

One reason why this is very hard to quantify is that the mainstream DRAM (and NAND!) markets have, arguably, really been crashing lately. So if you compare numbers from 1 month ago for one memory type compared to another one from 6 months ago, you might get nasty surprises. But I don't know whether GDDR3 and GDDR4 have been dropping along with DDR2. My understanding is that they haven't, which would further increase the advantage of using DDR2.

So really, one key part of your question is all about predicting how DRAM prices will evolve in the future. I won't pretend to have sufficient knowledge to do an accurate estimate (but hey, neither do the actual analysts apparently, given how wrong their recent predictions have been for DRAM and NAND!) but here's my analysis anyway. Take it for what's it's worth, and with a large amount of salt.

The NAND market has been collapsing earlier than the DRAM market. And companies are probably realizing that there is no upcoming killer app and no true recovery in sight. So, some are shifting their fabs from NAND to DRAM production. They're certainly hoping for Vista to make a massive impact... sadly, it won't. Every PC sold today is already using Vista, so what they're really hoping for is: a) Vista improving PC sales. b) the amount of memory in Vista PCs increasing faster than is usually the case. Both of those points are, I believe, rather unlikely to be 'true enough' to make an impact, however.

So what happens as vendors shift production is that instead of having one ridiculous collapse for NAND, you have a smaller but still very real collapse for both NAND and DRAM. It gotta be fun to be at companies that 'diversified' in both NAND And DRAM, betting that both wouldn't be weak at the same time, heh!

However, I would tend to believe GDDR3/GDDR4 won't follow the same curve, because the supply/demand dynamics there are quite different. Will they still drop substantially anyway? Maybe. I would expect low-end GDDR3 to drop slightly as a competitive reaction to DDR2 prices dropping, at least! Now, if DRAM manufacturers such as Samsung shifted their capacity to GDDR because pricing is better there currently, things could also get quite interesting... The competition there is smaller though, and they have less market share to gain, so I'm skeptical they would do so.

Even if GDDR is double as expensive than DDR2, wouldn't it still be cheaper than a large chunk of EDRAM?

This response might surprise you, but I'm going to tell you I believe that doesn't matter. I believe the primary advantage of EDRAM, from NVIDIA or AMD's perspective, are ASPs.

Consider a $50 chip that requires $50 of DRAM on the reference board. Now, imagine what happens if you cut the DRAM cost to $25 but increase your chip cost to $75, so that you need to sell it at that price to have similar margins. So, both of these products have roughly the same bill of materials, and yet you have just increased your gross profit by 50%! The point, from my perspective, is really to shift the semiconductor dollars toward the GPU and away from the other parts of the bill of materials.

On a related note, consider that both NAND and DRAM prices are dropping rapidly. Wanna guess what's going to happen to mobile phones targeted at a given price point over time? My prediction is that the costs are going to shift more towards the logic chips. That's because it doesn't make sense for manufacturers to increase the amount of DRAM and NAND used as fast as the prices are dropping. That's my prediction at least, and I think it could be a very interesting dynamic - but we'll see what happens in the future, anyway!

One other question: Are there currently high-speed processes available that support EDRAM? Aren't those just too leaky to be practical and won't it then consume more power than less?

My understanding is that TSMC's EDRAM process is power-optimized, but it would indeed be interesting to get some more precise specifications there since that's obviously a very important part of the equation. But if they can use it for handhelds (see: GoForce 5300), I would at least expect it to be usable for a 100W+ desktop GPU!

wolf2 · Mar 8, 2007

Just generally in the DRAM market......keep in mind there is COST and then there is PRICE. COST is what it costs to produce. PRICE is what it sells for.

The DRAM market being a commodity market means there is very little correlation between COST and PRICE since everything is supply and demand based. GDDR4 has very little cost difference compared to DDR2 but it does have a much higher price at introduction because it is harder to find and marketed heavily to power user's who are willing to pay.

===============================

When talking about EDRAM:

EDRAM has a $/mm2 cost and it has an impact on good die per wafer. EDRAM also has a benefit of eliminating interchip IO drivers and their associated power.

External DRAM has a cost to manufacture + profit for the manufacturer + profit for the integrator + physical design-in costs (physical footprint + cooling).

As can be seen from the above two paragraphs, you really have to correlate a lot of information at a very fine level to figure out when you have a benefit in any given use of EDRAM vs. external (or IGP) DRAM.

Nvidia being a smart company will likely use the benefits of EDRAM (low-power, smaller bit-size compared to embedded SRAM) to improve the positioning in a particular market segment. For example, they might be able to improve performance in the double-chip IGP market adequately to realize an AERO graphical display very inexpensively. The performance could come from larger caches and buffers in the die using the same die area as the old SRAM.

They might also, simply make a less expensive 2-chip IGP solution by keeping cache/buffers the same size (in bits), but use less die area.

Arun · Mar 8, 2007

wolf2 said:
GDDR4 has very little cost difference compared to DDR2 but it does have a much higher price at introduction because it is harder to find and marketed heavily to power user's who are willing to pay.

That is probably true. The price is paid by NVIDIA and AMD, however, not directly by the power users. But you are probably right that if needed to be competitive with other solutions, GDDR manufacturers could sell it at a lower price. How much lower, however, I don't know.

EDRAM has a $/mm2 cost and it has an impact on good die per wafer. EDRAM also has a benefit of eliminating interchip IO drivers and their associated power.

Right. EDRAM has a /mm2 cost, a per/wafer cost and a yield cost. On a related note, this is why I'm a big fan of Z-RAM, in theory. It has slightly higher density than EDRAM, but also has no extra per wafer cost (if using a SOI process, which is a big IF I'll grant you! SOI adds 10%+ to the cost, but also helps for other things) and they claim it has no yield cost either.

As can be seen from the above two paragraphs, you really have to correlate a lot of information at a very fine level to figure out when you have a benefit in any given use of EDRAM vs. external (or IGP) DRAM.

Yes, definitely. It is sadly extremely hard to have absolutely all the information at our disposal however, as onlyh true insiders have access to it.

For example, they might be able to improve performance in the double-chip IGP market adequately to realize an AERO graphical display very inexpensively.

I would argue that challenge has already been solved, and EDRAM won't really improve costs there. It might, however, improve power consumption.

Megadrive1988 · Mar 9, 2007

I've been waiting for almost a decade for EDRAM to come to PC cards.

first known use of it was for the 3DO MX (aka M3 or "M2.5") - canned

next known use was for the Rendition Verite V4400E -canned

then Glaze3D IIRC - indefinite hold/canned

then PS2's Graphics Synthesizer - first EDRAM product that came to market

then ArtX's Flipper for GameCube - 2nd EDRAM produce that came to market

wolf2 · Mar 9, 2007

Megadrive1988 said:
I've been waiting for almost a decade for EDRAM to come to PC cards.
........then ArtX's Flipper for GameCube - 2nd EDRAM produce that came to market

You forgot about Neomagic in there somewhere

silent_guy · Mar 9, 2007

Arun Demeure said:
... long story, see above ...

Interesting read. Still digesting.

Some questions...
Let's assume you replace GDDR4 by DDR2 and add a lot of EDRAM. You're fighting for performance dominance, so EDRAM has to replace the bandwidth that was lost by switching to DDR2.

Do you think this is at all possible without going to tile based rendering? Or are there consequences of TBR that make it harder to implement DX10?
What is currently the biggest bandwidth eater in a GPU? Is it the ROP or texture fetching? If it's the latter, EDRAM won't give you much additional benefits, right?

Isn't it more likely, that, if EDRAM is used, they'll still use GDDR4 anyway, as an additional performance booster?

Arun · Mar 9, 2007

First I just thought I'd insist that wolf2 has a point wrt likely production costs of DDR2 vs GDDR4, compared to the actual prices paid by AMD/NVIDIA. It's likely that Samsung's margins for GDDR4 are much higher than for DDR2, for example. That further complicates things, because it makes it difficult to predict prices if other indirectly competing solutions, such as EDRAM, appeared.

silent_guy said:
Do you think this is at all possible without going to tile based rendering? Or are there consequences of TBR that make it harder to implement DX10?
What is currently the biggest bandwidth eater in a GPU? Is it the ROP or texture fetching? If it's the latter, EDRAM won't give you much additional benefits, right?

As mostly anything in 3D graphics, it's going to vary per-game, per-level, per-frame and inside a single frame. One second texturing bandwidth might be much higher than framebuffer bandwidth, and the next it might be much less important. So, unsurprisingly, you need to take the average into consideration, but also the likely variations and bursts. The same reasoning applies for unified shading architectures.

Isn't it more likely, that, if EDRAM is used, they'll still use GDDR4 anyway, as an additional performance booster?

If EDRAM only managed to accelerate framebuffer-related bandwidth, then clearly texturing-bottlenecked scenarios and GPGPU would suffer. I think it would still be reasonable to use lower-end RAM for a given price segment, but perhaps not by an absurd margin (mid-end GDDR3 instead of GDDR4, for example?) and I'm unsure how much of a cost saving that might deliver.

In the end, if EDRAM was a saving of gigantic proportions and had no problems whatsoever, it'd already be on the family-wide IHV roadmaps. And as far as I know, it isn't...

Kaotik · Mar 10, 2007

Megadrive1988 said:
then Glaze3D IIRC - indefinite hold/canned

While canned, they IIRC did produce working chips too.
Also, your list missed BitBoy's "Axe", which also had eDRAM, it was a DX8.1 chip.
Here's a working protoype of an Axe board:

aaand a blockdiagram of the chip

Dave Baumann · Mar 10, 2007

I would look towards power saving initiatives as opposed to performance.

KimB · Mar 11, 2007

Well, that and possible cost savings. It is conceivable that eDRAM could provide higher performance on lower-end parts for the same cost, but that would depend on a number of factors. I would not expect eDRAM to be useful for the highest-end parts, because modern high-end designs are able to make full use of very large die areas without being significantly memory bandwidth limited.

Dave B(TotalVR) · Mar 21, 2007

nicolasb said:
Well, maybe they'll do what the Bitboys' infamously vapoury Glaze 3D was supposed to do, which was to use eDRAM for the frame buffer while keeping textures and geometry in some other sort of memory. With Glaze 3D, if the resolution got too high, it would use the eDRAM only for the Z-Buffer values..

It would be remarkably wasteful, currently, to use Edram for texture memory when texture reads enjoy cache hit rates often over 75%. Conversely, Z-buffer access is essentially random - so it would greatly benifit from being in Edram. The same is less true for the framebuffer - but they are often intertwined in GPU's for more efficient memory bus usage. When you have X bytes of EDram though you always run the risk of needing X+1 bytes to store the information you have assigned to it. This will be especially true with AA techniques of which all involve oversampled buffers of some kind. 4x FSAA quadruples your z and F buffer footprint. MSAA (IIRC) quadruples your Z footprint only [actually somebody please correct me on this I just know that's not entirely true]. In short, having your Z and F buffers in Edram could be a dead end without full on investment in edram capacity because paging out of the edram into local memory when you've been designed to use the Edram is not an option IMO. It'll be the i740 revisted.

Likewise, when you are not using high rez and AA (perhaps because the fillrate requirements are too high in the game) you are potentially wasting usable edram - unless you start dynamicly shifting around what it is used for. That opens up a whole world of complication and could end up with a memory arbiter as big as the rendering core! (ok, exhaggeration).

What might be an interesting idea is if the driver can produce polygon batches, big ones, to send to the GPU - pipelining as it goes. Might couple very well with geometry instancing, plus you could chuck entire player models in one go. So while batch 1 is being rendered batch 2 can have it's textures, vertex info and shader routines loaded into eDram (if they are not there already). This will boost your texture cache efficiency to 100%, essentially, drop latency through the floor and possibly allow you to employ some interestign effects on a per model basis to work with instancing.

Smells of MRT...

As for the 100% texture efficiency, that might not actually be so true, because if you load a whole texture into edram the chances are not all of it is going to be used - but - that means no penalty for using a non-linear texture compression algorithm like Jpeg compression. I believe this is the case in the PS2....

nicolasb said:
Much of the frame-buffer bandwidth (back in Glaze 3D days, anyway) was taken up with Z-buffer reads and writes, so even a fairly small amount of onboard eDRAM could supposedly increase performance (i.e. fill-rate) significantly. If we're talking about a low-end part, then maybe main memory is fast enough to do just the texture reads, and using eDRAM frame buffer would give respectable performance with no onboard memory on the card...?

Well, the area probably of most beniifit in that way is IGP's, if you give them Edram for their F and Z buffers you could possibly start approching AIB performance - so long as you don't exceed the eDram size. having said that, who the hell expects 1920x1200 with 4xAA at 60fps from an IGP?

1280x1024 with no AA at 60 FPS is a good target though, requiring 20MB of uncompressed space with double buffering. Given current claims of a lossless 4to1 compression ratio on those buffers though 5MB might be enough at a stretch

How could eDRAM be included in a PC part: Chime in with your theories

Farid

Artist formely known as Vysez

nicolasb

Geo

Mostly Harmless

wolf2

Arun

Unknown.

wolf2

NocturnDragon

Arun

Unknown.

silent_guy

Arun

Unknown.

wolf2

Arun

Unknown.

Megadrive1988

wolf2

silent_guy

Arun

Unknown.

Kaotik

Drunk Member

Dave Baumann

Gamerscore Wh...

KimB

Dave B(TotalVR)

Similar threads