If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#1 | |
|
Member
Join Date: Nov 2007
Posts: 945
|
The next gen speculation thread started to have interesting debate about memory bandwidth vs memory amount. I don't personally want to contribute to the next gen speculation, but the "memory bandwidth vs memory amount" topic is very interesting in it's own. So I decided to make a thread for this topic, as I have personally been doing a lot of memory access and bandwidth analysis lately for our console technology, and I have programmed our virtual texturing system (and many other JIT data streaming components).
Historically memory performance has improved linearly (very slowly) compared to exponential (Moore's law) growth of CPU performance. Relative memory access times (latencies) have grown to be over 400x higher (in clock cycles) compared to first PC computers, and there's no signs that this development will slow down in the future, unless we invent some radically new ways of storing data. None of the currently known future technologies is going to solve the problem, just provide some band aid. So we need to adapt. Some links to background information first: 1. Presentation by Sony R&D. Targeted for game technology programmers. Has a very good real life example how improving your memory access pattern can improve your performance by almost 10x. Also has nice charts (slides 17 and 18) showing how memory speed has increased historically compared to ALU: http://harmful.cat-v.org/software/OO...ng_GCAP_09.pdf 2. Benchmark results of a brand new x86 chip with unified memory architecture (CPU & GPU share the same memory & memory controller). Benchmark shows system performance with all available DDR3 speeds from DDR3-800 to DDR3-1866. All other system settings are identical, only memory bus bandwidth is scaled up/down. We can see an 80% performance (fps) improvement in the gaming benchmark just by increasing the DDR3 memory clock: http://www.tomshardware.com/reviews/...0k,3224-5.html 3. A GPU benchmark comparing old Geforce GTS 450 (1 GB, GDDR5) card to a brand new Kepler based Geforce GT 640 (2 GB, DDR3). The new Kepler based card has twice the memory amount and twice the ALU performance, but only half of the memory bandwidth (because of DDR3). Despite the much faster theoretical shader performance and twice the memory amount, it loses pretty badly in the benchmarks because of it's slower memory bus: http://www.anandtech.com/show/5969/z...gt-640-review- Quote:
--- --- I will use the x86 based Trinity APU [link 2] as my example system, as it has close enough performance and memory bandwidth compared to current generation consoles (it's only around 2x-4x faster overall) and it has unified memory (single memory bus shared between CPU & GPU). It's much easier to talk about a well known system, with lots of public benchmarks results around the net. Let's assume we are developing a vsync locked 60 fps game, so each frame must complete in 16.6 ms time. Let's assume our Trinity system is equipped with the fastest DDR3 it supports (DDR3-1866). According to Tom's Hardware synthetic bandwidth benchmark, this configuration gives us 14 GB bandwidth per second. Divide that by 60, and we get 233 MB bandwidth per frame. Let's round that down to even 200 MB per frame to ease up our calculations. A real game newer utilizes memory bandwidth as well as a synthetic benchmark, so even the 200 MB per frame figure is optimistic. Now I know that my game should never access more than 200 MB of unique memory per frame if I want to reach my vsync locked 60 fps. If I access more memory, my frame rate dips as the memory subsystem cannot give me enough data, and my CPU & GPU start stalling. How about CPU & GPU caches? Caches only help with repeated data access to the same data. Caches do not allow us to access any more unique data per frame. Also it's worth noticing that if you access the same memory for example at beginning of your frame, at middle of your frame and at end of your frame, you will pay as much as if you did three unique memory accesses. Caches are very small, and old data gets replaced very fast. Our Trinity CPU has 4 MB of L2 cache and we move 200 MB of data to the cache every frame. Our cache gets fully replaced by new data (200/4 =) 50 times every frame. Data only stays in cache for 0.33 ms. If we access it again after this period, we must fetch it from the memory again (wasting our valuable 200 MB per frame bandwidth). It's not uncommon that a real game accesses every data in the current working set (on average) twice per frame, leaving us with 100 MB per frame unique accessible memory. Examples: Shadowmaps are first rendered (to textures in memory) and sampled later during lighting pass. Physics simulation moves objects (positions & rotations) and later in frame those same objects are rendered (accessing those same position and rotation datas again). However let's keep the theoretical 200 MB per frame number, as engines differ, and access patterns differ (and we do not really want to got that far in the analysis). In a real game you can likely access only around 100 MB - 150 MB of unique memory per frame, so the forthcoming analysis is optimistic. A real game could likely access less memory and thus have a smaller working set. So far we know that the processing and rendering of a single frame never requires more than 200 MB of memory (we can't reach 60 fps otherwise). If your game has a static scene, you will not need more memory than that. However static scenes are not much fun, and thus this scenario is highly unlikely in real games (except for maybe a chess game with a fixed camera). So the billion dollar question becomes, how much does the working set (memory accesses) change from frame to frame in a 60 fps game? In a computer game, objects and cameras do not really "move" around, they get repositioned every frame. In order for this repositioning to look like smooth movement we can only change the positions very slightly from frame to frame. This basically means that our working set can only change slightly from frame to frame. According to my analysis (for our game), our working set changes around 1%-2% per frame in general case, and peaks at around 10%. Especially notable fact is that our virtual texturing system working set never changes more than 2% per frame (textures are the biggest memory user in most games). We assume that a game with a similar memory access pattern (similarly changing working set from frame to frame) is running on our Trinity example platform. Basically this means that in average case our working set changes from 2 MB to 4 MB per frame, and it peaks at around 20 MB per frame. We can stream this much data from a standard HDD. However HDDs have long latencies, and long seek times, so we must stream data in advance and bundle data in slightly bigger chunks than we like to combat the slow seek time. Both streaming in advance (prefetching) and loading in bigger chunks (loading slightly wider working set) require extra memory. Question becomes, how much larger the memory cache needs to be than our working set? The working set is 200 MB (if we want to reach that 60 fps on the imaginary game on our Trinity platform). How much more memory we need for the cache? Is working set x2.5 enough (512 MB)? How about 5x (1 GB) or 10x (2 GB)? Our virtual texture system has a static 1024 page cache (128x128 pixel pages, 2x DXT5 compressed layer per page). Our average working set per frame is around 200-400 pages, and it peaks as high as 600 pages. The cache is so small that it has to reload all textures if you spin the camera around in 360 degrees, but this doesn't matter, as the HDD streaming speed is enough to push new data in at steady pace. You never see any texture popping when rotating or moving the camera. The only occasion where you see texture popping is when the camera suddenly teleports to a completely different location (working set changes almost completely). In our game this only happens if you restart to a checkpoint or restart the level completely, so it's not a big deal (and we can predict it). If the game behaves similarly to our existing console game, we need a cache size of around 3x the working set for texture data. Big percentage of the memory accessed per frame (or stored to the memory) goes to the textures. If we assume for a moment that all other memory accesses are as stable as texture accesses (cache multiplier of 3x) we only need 600 MB of memory for a fully working game. For some memory bandwidth hungry parts of the game this actually is true. And things are even better for some parts: shadow maps, post processing buffers, back buffer, etc are fully generated again every frame, so we need no extra memory storage to hold caches of these (cache multiplier is 1x). Game logic streaming is a harder thing to analyze and generalize. For example our console game has a large free roaming outdoor world. It's nowhere as big as worlds in Skyrim for example, but the key point here is that we only keep a slice of the world in memory at once so the world size could theoretically be limitless (with no extra memory cost). Our view distance is 2 kilometers, so we do need to keep full representation of the game world in memory after that. Data quality required for a distance follows pretty much logarithmic scale (texture mip mapping, object geometry quality, heightfield quality, vegetation map quality, etc etc). Data required as distance grows shrinks dramatically. This is of course only true for easy cases such as graphics processing, heightfields, etc. Game logic doesn't automatically scale. However you must scale it manually to reach that 200 MB per frame memory access limit. Your game would slow down to halt if you just tried to simply read full AI data from every single individual NPC in the large scale world, no matter how simple your processing would be. Our heightmap cache (used in physics, raycasts and terrain visualization) keeps around 4x the working set. We do physics simulation (and exact collision) only for things near the player (100 meters max). When an object enters this area, we add corresponding physics objects to our physics world. It's hard to exactly estimate how big percentage of our physics world structures are accessed per frame, but I would estimate around 10%. So we basially have a 10x working set "cache" for physics. Basically no component in our game required more than 10x memory compared to its working set. Average requirement was around 3x. So theoretically a game with similar memory access patterns would only need 600 MB of memory on our example Trinity platform. And this includes as much texture resolution as you ever want (virtual texturing works that way). And it includes as much other (physics, game logic, etc) data as you can process per frame (given the limited bandwidth). Of course another game might need for example average of 10x working set for caches, but that's still only 2 GB. Assuming game is properly optimized (predictable memory accesses are must have for good performance) and utilizes JIT streaming well, it will not benefit much if we add more main memory to our Trinity platform beyond that 2 GB. More memory of course makes developers life easier. Predicting data access patterns can be very hard for some styles of games and structures. But mindlessly increasing the cache sizes much beyond working set sizes doesn't help either (as we all know that increasing cache size beyond working set size gives on average only logarithmic improvement on cache hit rate = diminishing returns very quickly). My conclusion: Usable memory amount is very much tied to available memory bandwidth. More bandwidth allows the games to access more memory. So it's kind of counterintuitive to swap faster smaller memory to a slower larger one. More available memory means that I want to access more memory, but in reality the slower bandwidth allows me to access less. So the percentage of accessible memory drops radically. |
|
|
|
|
|
|
#2 |
|
Member
Join Date: Apr 2004
Location: Australia
Posts: 2,352
|
Very impressive analysis Sebbi and thanks for sharing your thoughts. I am curious though about the Sparse Voxel Octree technique used in UE4, it was mentioned to be memory intensive on the system, so do you think the same logic applies here as well?
|
|
|
|
|
|
#3 |
|
Senior Member
Join Date: Jan 2012
Location: Leicestershire - England
Posts: 1,464
|
Fantastic write up. And a very interesting topic..on the other thread therein talk of 8gb ram..but as you state having loads of ram like that can unbalanced the system and be not a great deal of use if there is not enough bandwidth...
Nvidia have balanced their kepler 670-680 very well in regards to both parameters..where as amd gcn has massive amounts of ram..and also more bandwidth..but at least in the comparison I read all those extra resources only gave a slight advantage at high resolutions..so was that a waste on amds part? Or is it because pc games are not made to take advantage of all those resources in the same way a console would? 8gb of ddr 3 looks to me to be a complete waste of time, because you would have to stick a mammoth wide (386 -512bit ?) Bus to achieve the required bandwidth to make the most of that ram. Of course you could have another edram setup...but then that wouldn't change the bandwidth to main ram...so again the use of having 8gb is not as advantageous as it looks at first glance. For me you are better having 4gb of lightning fast ram unified..something like gddr 5...that would be much more usefull imo. What about the latency and read speeds of a hdd vs a fast ssd?? Would going for a smaller 2gb of ultra fast gddr 5 mated to a very fast ssd be more beneficial than 8gb ddr3, 64mb edram and a hdd? |
|
|
|
|
|
#4 |
|
a.k.a. Ingenu
Join Date: Feb 2002
Location: Apsley, U.K.
Posts: 2,738
|
There are still a number of games/engines that just load levels, those would benefit from more memory as it would mean bigger levels, I expect them to be a dying bread though.
(Using memory as a giant I/O cache, subdividing the world at creation to have perfectly smooth game experience, quite understandable when you have to read from optical drives.) On a second note, I'd like to emphasis the need for ECC as memory amount and bandwidth increases. It's not a problem we can just continue to ignore, all modern CPU should already have ECC. (Google published results of 3–10×10−9 error/bit·h.) On a third note I'd urge gamers not to purchase those CPU with embedded (immediate mode) GPU, hoping that if enough people do that Intel/AMD will improve their offer for "just" CPU. (Plus that's really a waste to have something you won't use
__________________
So many things to do, and yet so little time to spend... |
|
|
|
|
|
#5 |
|
Senior Member
Join Date: Jun 2003
Posts: 2,570
|
If in a stream like benchmark AMD is only getting ~50% memory bandwidth then they either have a bottleneck at some point between the CPU and the memory controller or have a sub-optimal memory controller.
In the case in question, we are talking about a max memory bandwidth in the range of 75-100 GB/s and a realistic bandwidth on the order of 70-80% of peak for a range of 56-75 GB/s. With gives a per frame bandwidth of roughly 1GB or a bit higher at 60 FPS. This for the ranges you gave works out to an optimal memory size around 3-10 GB. But we haven't even factored in the issue of HDD bandwidth and access times vs optical bandwidth and access times. BR has an order of magnitude higher access latency than HDD and at its peak roughly half the bandwidth of a modern HDD. This further pushes up the pre-buffering/streaming requirement for an actual game engine. Then if the design has a high speed temporary buffer of reasonable size (32MB+), this also reduces the amount of non-static texture data that must be stored and read further increasing the relative size of the texture bandwidth and therefore the streaming texture cache space required. So while the original post was informative, I don't believe that it captures the real scope nor details of this as it relates to what might be seen in next gen hardware. It also is an example of a single game design that may or may not have levels and load time between levels. Part of my comment on the 2 GB as a load cache specifically relates to games that do have levels and load times between levels. Using 2GB to stream in the next level data while on the current level would without a doubt improve the user experience.
__________________
Aaron Spink speaking for myself inc. |
|
|
|
|
|
#6 | |||
|
Senior Member
Join Date: Jun 2003
Posts: 2,570
|
Quote:
Quote:
Quote:
__________________
Aaron Spink speaking for myself inc. |
|||
|
|
|
|
|
#7 | |||
|
Member
Join Date: Nov 2007
Posts: 945
|
Quote:
On 5400 RPM hard drives (used in laptops and gaming consoles) the loading times would be double of that: 176 seconds for 8 GB and 266 seconds for 12 GB. Anything over 20 seconds, and the game experience gets degraded. I have stopped playing some games (for example Gran Turismo 5) because of too long loading times. I personally prefer my games (the games I develop) to have zero loading times if at all possible. We had 3 second loading times in Trials Evolution (because we streamed almost everything). Quote:
Quote:
Of course if you are having completely linear game and you have extra 2 GB to burn, you can load your next level at the same time you play the current one. But I don't see this as good use of resources. Instead of this extra 2 GB I would choose higher clocked memory to improve my frame rate, increase my view distance and to add more dynamic physics driven stuff to my levels (all these require bandwidth). |
|||
|
|
|
|
|
#8 | |
|
Regular
Join Date: Aug 2006
Posts: 6,853
|
Quote:
So all the people crying for 4GB in PS4 are nuts? You also build your argument around 60 FPS, which is nice and all, but the vast vast majority of games are 30. We even saw devs like Insomniac publicly announce a switch from 60 to 30 this gen (for Ratchet). So, using Aaronspink's analysis at 1GB 60 FPS, a typical 30 FPS game could access 2GB per frame. I guess this does put a hard limit on some things, but what if the content changes rapidly frame to frame (as it would seem to in any video game...)? Wouldn't the 8 GB system be at an advantage over the 2B that now has to go to an HDD or something magnitudes slower to get new data? In essence the 8GB system could "buffer" 3 additional frames, vs 0 for the 2GB. Or am I totally not getting it? |
|
|
|
|
|
|
#9 |
|
Beyond3d isn't defined yet
Join Date: Jan 2008
Location: New Zealand
Posts: 3,042
|
This explains why PC games let you save and reload from anywhere whereas console games use checkpoints, I get it now.
25GB/S = ~2.5-8.3GB working set. 50GB/S = 5-16.6GB 100GB/S = 10-33.2GB Does that make sense? 25GB / 30 * 3:1-10:1? Effectively that means that current consoles could have used a lot more memory even at the same bandwidth. Am I right?
__________________
It all makes sense now: Gay marriage legalized on the same day as marijuana makes perfect biblical sense. Leviticus 20:13 "A man who lays with another man should be stoned". Our interpretation has been wrong all these years! |
|
|
|
|
|
#10 | ||
|
Senior Member
Join Date: Jun 2003
Posts: 2,570
|
Quote:
Intermediate data is better served by a high speed buffer memory rather than the main pool. But given enough thread level parallelism, which in a graphics workload you will have in spades, you should be able to sustain on the order of 80-90% effective bandwidth with narrow channel DDR4 given an appropriately designed memory controller. The reality of the situation is you effectively end up paying a ~4x capacity cost for around 50% higher bandwidth between GDDR and DDR. This will likely get worse over time as the types of device that need/can afford GDDR get smaller further pushing up the cost of GDDR relative to DDR. The trend is pretty clear, in that conserving large pool bandwidth is going to be more of a factor going forward with some relief coming from relatively small high bandwidth intermediate buffer memories (eDRAM and stack/wide io dram). Quote:
Another interesting conundrum, is say you have 2-3 GB per frame of bandwidth but only 2 GB of memory. In that case, you simply cannot use the bandwidth as you will never be able to stream in assets at a high enough rate.
__________________
Aaron Spink speaking for myself inc. |
||
|
|
|
|
|
#11 | |
|
Senior Member
Join Date: May 2008
Posts: 1,136
|
Quote:
Can you please provide a link to this 100GB DDR3 memory? Thanks. |
|
|
|
|
|
|
#12 | |
|
Senior Member
Join Date: May 2008
Posts: 1,136
|
Quote:
|
|
|
|
|
|
|
#13 |
|
Grumpy Mod
Join Date: Dec 2004
Location: In a pretty pink padded cell
Posts: 26,051
|
The majority discussion about the choice of RAM for next gen machines wasn't addressing the discussion of BW versus capacity, so has been moved to its own discussion here.
__________________
Shifty Geezer ... Tolerance for internet moronism is exhausted. Anyone talking about people's attitudes in the Console fora, rather than games and technology, will feel my wrath. Read the FAQ to remind yourself how to behave and avoid unsightly incidents. |
|
|
|
|
|
#14 | ||
|
Grumpy Mod
Join Date: Dec 2004
Location: In a pretty pink padded cell
Posts: 26,051
|
Quote:
Quote:
Most Economical Capacity = BW (GB/s) / 25 or something similar?
__________________
Shifty Geezer ... Tolerance for internet moronism is exhausted. Anyone talking about people's attitudes in the Console fora, rather than games and technology, will feel my wrath. Read the FAQ to remind yourself how to behave and avoid unsightly incidents. |
||
|
|
|
|
|
#15 |
|
Member
Join Date: Aug 2005
Posts: 318
|
seem like an ssd would solve some of the problems with having 2GB but given the cost its unlikely.
So as dice/epic have stated 2GB would not be enough wouldn't that be really telling of this debate where we have 2GB vs 8GB. For them to make public comment that means either Sony would not listen to direct feedback or what?? I remember epic doing the same last Gen with x360 but dont remember them doing it in public. I feel the debate would change almost 100% if we are comparing 4 GB gddr5 vs 8GB DDR3/4. |
|
|
|
|
|
#16 | |
|
Senior Member
Join Date: May 2008
Posts: 1,136
|
Quote:
|
|
|
|
|
|
|
#17 |
|
Beyond3d isn't defined yet
Join Date: Jan 2008
Location: New Zealand
Posts: 3,042
|
One thing perhaps to keep in mind is that the OS may use a lot of memory as the needs of the device expands between generations. It may have a very high residence to use ratio. I.E. It may use a lot of memory, however it likely won't use a lot of bandwidth.
__________________
It all makes sense now: Gay marriage legalized on the same day as marijuana makes perfect biblical sense. Leviticus 20:13 "A man who lays with another man should be stoned". Our interpretation has been wrong all these years! |
|
|
|
|
|
#18 | |
|
Senior Member
Join Date: May 2008
Posts: 1,136
|
Quote:
You bring up a good point that the memory amount covers both cpu and gpu. That said, I currently have Win7 Pro, 8 browser windows, 2 Adobe pdf's, 1 vpn connection, 2 rdp sessions, and a full virus scan running (which hammers the piss out of my cpu) and I'm still only utilizing 1.3GB of memory. At 2GB that wouldn't much for a game running too but at 4GB I'd still have 2.7GB available (but no cpu |
|
|
|
|
|
|
#19 |
|
Moderator
Join Date: Feb 2002
Location: Redmond, WA
Posts: 3,177
|
While I agree in principle with the OP, I wanted to point some things out.
The available memory bandwidth per frame is only interesting as it applies to total memory if you can predict with some degree of certainty which 200MB or so you're going to touch, and you can actually get it off disk before you will need it. So while above some threshold more memory doesn't help you with higher res textures that doesn't make it useless. The big issue with a lot of memory as has been pointed out is filling it. I think it's interesting for parametric data that expands many times it's read size and can't be computed on use. What most would refer to procedural content, at some level you can consider parametric content to be data compression with extreme compression ratios. Really the memory is still just a cache, but it's a cache for computation rather than disk reads. It's one area I'd be seriously looking at going forwards. Voxelization of data is interesting, and slight changes in orientation, can radically change the way structures like this are walked, requiring vastly more memory than the actual amount walked. You also don't want to have to keep writing the data wasting you're bandwidth when you're repeatedly reading it. Again a computation cache. As an aside The Sony paper is interesting but doesn't age well, you can still kill yourself with virtual function calls, but it's nothing like it was circa PS2 or worse PS1 with direct mapped single digit kilobyte caches. |
|
|
|
|
|
#20 | |
|
Grumpy Mod
Join Date: Dec 2004
Location: In a pretty pink padded cell
Posts: 26,051
|
Quote:
__________________
Shifty Geezer ... Tolerance for internet moronism is exhausted. Anyone talking about people's attitudes in the Console fora, rather than games and technology, will feel my wrath. Read the FAQ to remind yourself how to behave and avoid unsightly incidents. |
|
|
|
|
|
|
#21 |
|
Member
Join Date: Aug 2011
Posts: 370
|
No, it doesn't. Everyone keeps ignoring that in the google report, they found bit errors to scale linearly with the physical volume of the ram, and not the amount of bits. This agrees with the theory that nearly all bit errors are caused by radiation, either from the materials the chips are made from, or from outside sources. As you scale the ram to smaller processes, the bit error rates go down. 8 chips of ram always have roughly the same amount of errors, regardless if they are 2GB of DDR4 or 512MB of GDDR3.
|
|
|
|
|
|
#22 |
|
a.k.a. Ingenu
Join Date: Feb 2002
Location: Apsley, U.K.
Posts: 2,738
|
I meant it was related to the amount of data going through the chips, and having more DIMM/chips.
(We usually have more DIMM when we get more memory, but you are correct.) I'm not sure whether the google report talks about what I meant though... Also we may be going off-topic.
__________________
So many things to do, and yet so little time to spend... |
|
|
|
|
|
#23 | |||||
|
Member
Join Date: Nov 2007
Posts: 945
|
Quote:
Please don't tell me you think a 70-100 GB/s unified memory architecture is considered "slow" by today's standards. Not even Intel's highest end 12 thread Sandy Bridge E and the fully enabled 16 thread Xeon server CPU versions are equipped with a memory system that fast. Quad channel DDR3-1600 is the fastest officially supported, and it provides a 51 GB/s theoretical bandwidth (37 GB/s in benchmarks, not far from AMDs utilization percentages: http://www.anandtech.com/show/5091/i...gh-end-alive/4). These chips cost 1000$+ and the motherboards supporting quad channel memory aren't cheap either. Lets look at highest end desktop APUs available with unified memory. Dual channel DDR3-1600 is the maximum officially supported memory for Intel's flagship desktop APU (Ivy Bridge). Dual channel DDR3-1833 is the maximum officially supported memory for AMD's flagship desktop APU (Trinity). Memory bandwidths are 25.6 GB/s and 29.2 GB/s respectively. These figures match perfectly with my calculations for the "slow" memory system (common DDR3 memory at highest commonly available clocks). Of course you can find memory kits designed for CPU overclockers. I actually bought these kind of premium memory sticks to my old Q6600 based desktop. The problem with these kind of enthusiast kits is that they are produced in very low quantities (cherry picked parts), and thus the price is very high. For example cheapest DDR3-2400 kit (2 x 4 GB) I found from newegg.com was G.SKILL Ripjaws Z series at 96.99$. In comparison you will find standard DDR-1600 kits (2 x 4 GB) for 40.99$. As DDR-1600 is the highest officially supported on Intel platforms, it is commonly used in brand new high end gaming desktops, and thus is the most relevant high volume product that we can still somehow qualify as "slow and cheap". Quote:
However no matter how excellent EDRAM is, it cannot increase the maximum total accessible unique memory per frame. It can "only" (drastically) reduce the waste for double (or even higher) access counts to same memory regions, and thus get us more near to the theoretical maximum (= 200 MB unique memory per frame, assuming we still use the current highest end desktop APU unified memory systems as our "system of choice"). I have already stated in many threads how much I like the EDRAM in Xbox 360, so I don't do that again Quote:
Quote:
Quote:
However if the parametric generation consumes more bandwidth than the access of the generated data, then I am a huge supporter for caching it. For example in our virtual texturing system, the terrain texture is generated (blended with a complex formula) from a huge amount of artist placed decals. In the worst case areas there are almost 10 layers of decals on top of each other, but we burn that data once to the virtual texture cache, and during the terrain rendering a single texture lookup is enough (generated data gets repeatedly reused 60 times per second just like loaded data from HDD). That's not the main point of the paper. Yes it's nice that you can evade some branches and virtual calls, but the main point (and main performance gain) is the improved memory access pattern. Component model is a good approach, and many developers are using it in their newest engines. Last edited by sebbbi; 02-Jul-2012 at 00:23. |
|||||
|
|
|
|
|
#24 | |
|
Moderator
Join Date: Feb 2002
Location: Redmond, WA
Posts: 3,177
|
Quote:
Most of the win in the streaming case is not poluting the cache with data you will never read. As an aside one of the things that irritates me about new college grads is the lack of understanding of basic memory architecture, and behavior. None of this stuff is rocket science. |
|
|
|
|
|
|
#25 |
|
B3D Shockwave Rider
Join Date: Feb 2002
Posts: 1,810
|
Wasn't the Jon Olick demo of the ID Tech6 stuff over 1 gig with just a single model on screen?
__________________
When God plays an online shooter he plays Shadowrun. He buys resurrection first round and selects Dwarf. www.shadowrunshow.com |
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|