Is UE4 indicative of the sacrifices devs will have to make on consoles next gen?

Am i wrong understanding that their "general purpose reflection method" is the star of the show and what gives the infiltrator demo that CGI look?.
 
Last edited by a moderator:
I think it's more because of the physically based materials and rendering (everything being mostly dark and metallic also helped here too I suppose). That and the wonderful and copious amounts of blur to trick your eyes even more.
 
I think it's more because of the physically based materials and rendering (everything being mostly dark and metallic also helped here too I suppose). That and the wonderful and copious amounts of blur to trick your eyes even more.

I thought he materials rendering was part of the new reflection algorithm.
 
I just meant that everything looking cgi-ish was due to physically correct shading, lighting, and reflectivity as a whole picture. So yep it's part of the reason. The parts where you could see the floor and its puddles reflecting what's around helped a lot there.
 
We say Draw Call overhead but we mean State Changes and Validation overhead. (and a few other things, like copies/shadow copies...)
We simply call it draw call overhead, as we cannot measure the state change overhead directly. The draw calls submit the changed state to the GPU. Without draw calls, the state changes are very cheap indeed (as pretty much nothing happens) :)
This is all easily testable yourselves guys so stop making crap up.
My DX10/11 test results are already posted here in the forums a few years ago. Granted, my old 5870 isn't exactly the same as my new 7970. AMDs multithreaded rendering has been fixed since. At that time it was slightly slower to use multithreaded rendering than a single thread (no matter how many cores you used). Multithreaded rendering was fixed in AMD drivers after Civ5 was launched (it was the first AAA title that used DX11 multithreaded rendering). So the situation is now better.
you simply don't need to change state as much since GPUs these days can dynamically index/branch a lot better than they used to.
Yes, you can go all the way down to a single (indirect) draw call per viewport (plus some indirect dispatch calls) if you keep the whole scene / mesh / texture data on GPU and do the whole scene setup / culling on GPU (negating the draw call cost completely). That's possible (and also saves lots of data traffic from CPU->GPU), but I don't think the majority of devs are going to implement systems like that any time soon (as long as they have to also support current gen DX9+ consoles and DX10 low end PCs with the same code base).
With a texture change in between, I can do 30k draws at 60fps. Changing a constant buffer reference, 60k. Updating and changing a constant buffer (map_discard, set), 35k.
That's pretty much the same that a single x360 thread can do (when the low level API is used instead of DirectX). I also stated this a few years ago in my DX11 performance analysis post.

Constant updates are much more efficient on consoles, since the GPU can directly fetch the constant data from a memory address. No transfers (or map/unmap synchronization) are needed at all. Also on consoles you don't need to apply a whole constant buffer, and thus waste a lot of GPU memory because of resource alignment (if you want to keep lots of small buffers in GPU memory). DX11.1 improves this situation with the new constant buffer APIs that allow partial binding and partial update (you can basically use a single big memory chunk for all your constants). Unfortunately DX11.1 needs a service pack for Windows 7 (and Vista support is not yet announced at all). Source: http://blogs.msdn.com/b/chuckw/archive/2012/11/14/directx-11-1-and-windows-7.aspx.

In our engine we do not change textures at all, as we are using virtual texturing (both on PC and on x360), so I haven't needed to profile texture change performance on any platform. Similar virtualization techniques are possible for meshes and constants (and are performing very well). So I will expect the "draw call" overhead to be a solved issue in a few years (when games do not need to support DX9 and DX10 anymore).

[UPDATE]: Just wanted to add, that 30k+ draw calls aren't exactly efficient for GPU either. It's more efficient to launch fewer big tasks (good latency hiding because lots of parallelism + all pipelines always have work to do, GPUs cannot execute lots of small tasks simultaneously, we need lots of active threads to fully utilize it, etc). The amount of draw calls is not just a CPU problem, it affects GPU as well.
So yeah, I'm not buying the argument that you can't do pretty much whatever you want already on PC. Like I said, some of the best looking stuff (Frostbite, etc) uses a few thousand draw calls, and you could use 10x more and still be fine. Sure the overhead will be lower on consoles, but prove to me that it matters.
Agreed, it doesn't matter for current generation games that are designed for eight year old CPUs. Those CPUs are less powerful than a single Sandy Bridge core. Current gaming PCs have at least four cores and games don't utilize them all. So the render thread could take a whole core, and the GPU driver could take another without any problems. The driver overhead was a much bigger problem when high end gaming PCs still used Core 2 Duos (and other dual core CPUs). Similar problem will be visible when next generation game ports for PC start to appear. A next gen game logic/physics would likely tax the quad core CPU already pretty heavily, so there's no longer two free CPU cores that could just be dedicated for draw call overhead / driver overhead.
 
Last edited by a moderator:
We simply call it draw call overhead, as we cannot measure the state change overhead directly.
Sure, I'm just pointing out that the overhead depends on what state has changed between the draw call (and to some extent the total number of resources referenced by a command buffer), so I don't like lumping it all into one basket :)

At that time it was slightly slower to use multithreaded rendering than a single thread (no matter how many cores you used). Multithreaded rendering was fixed in AMD drivers after Civ5 was launched (it was the first AAA title that used DX11 multithreaded rendering). So the situation is now better.
I still don't heavily recommend multithreaded rendering for a number of reasons. My results were pure single-threaded.

That's possible (and also saves lots of data traffic from CPU->GPU), but I don't think the majority of devs are going to implement systems like that any time soon (as long as they have to also support current gen DX9+ consoles and DX10 low end PCs with the same code base).
True, but you don't have to go that far to get the majority of the benefit. Some basic culling, shadowing and instancing support usually gets you down below 5000 draw calls in any reasonable scene.

Constant updates are much more efficient on consoles, since the GPU can directly fetch the constant data from a memory address. No transfers (or map/unmap synchronization) are needed at all.
To be fair, on PC's with shared memory/integrated graphics you can avoid the copies as well (map/discard semantics are sufficient). There's still a fairly high overhead, but honestly constants are definitely one place where you can start just fetching them from memory (buffers) if you become CPU-limited. Frankly I'm not really sure that we need something as specific as constant buffers in the future (caches should be sufficient) anyways, but obviously devs will have to deal with them for a couple more years..

In our engine we do not change textures at all, as we are using virtual texturing (both on PC and on x360), so I haven't needed to profile texture change performance on any platform. Similar virtualization techniques are possible for meshes and constants (and are performing very well). So I will expect the "draw call" overhead to be a solved issue in a few years (when games do not need to support DX9 and DX10 anymore).
Agreed, and this is a good approach. The whole spoon feeding of every minor change into a serial command buffer has to stop, even on the GPU side. We'll of course keep working to reduce the overhead as much as possible, but it's not a good long term strategy anyways.

Similar problem will be visible when next generation game ports for PC start to appear. A next gen game logic/physics would likely tax the quad core CPU already pretty heavily, so there's no longer two free CPU cores that could just be dedicated for draw call overhead / driver overhead.
Possibly, but I hope developers will mostly add complexity to the shaders/GPU work itself rather than to the command interface between the two (which as mentioned is a bad long term plan). Several of the big PC developers have already started to realize this direction and I imagine even on consoles it will still be a win.

Regarding increased physics/logic workload, I'm looking forward to the CPU cores not sitting idle most of the time in games :) That said, you'll still have significantly more FLOPS available on PC quad cores than next gen console CPUs, and higher IPCs as well, so I'm not overly concerned with them being out-muscled even with a thread eaten by the graphics driver. Dual cores will have a bit more trouble, specifically if games don't bother to target AVX[2] (since IIRC 256-bit ops are double-pumped on Jaguar, so not a huge win).

I also feel the need to mention, as always, that part of the overhead on PC is because it has to actually do more stuff (i.e. share the GPU). As consoles get more complicated and more generally useful I expect them to add some level of overhead/tax for that as well.
 
Last edited by a moderator:
Also the one area PS4 really excels in is RAM. I rather think of it almost as an 8GB video card (of course not exactly, but I'm guessing the majority of that RAM will be used for graphics). In that case, it will be two generations before PC cards really even catch up. They are at 2GB now as standard, I'm not even sure 4GB will become the standard for the next gen of cards. I'm not sure how an (fairly standard) 8GB DDR3+2GB video card PC stacks up to the PS4 in terms of quickly GPU accessible data, but I'm guessing not really well.

The current high end standard is 2-3GB with 4-6GB options being available at the ultra high end.

The reason for this limitation is 2Gb memory chips being the maximum density currently available. PS4 is able to achieve 8GB of memory because 4Gb memory chips will be available when it launches - those chips will also be available to PC GPU's. Thus we can expect a minimum of doubling memory capacity for PC GPU's around the time of PS4's launch. That is 4-6GB standard with 8-12GB high end options. Of course the requirement to maintain parity with the console may may those high end options more common than they are today (since there's no technical limitation stopping their implementation).

Thus when PS4 launches, it's entirely possible the top end PC GPU's may be sprting between 8 and 12GB of local memory.
 
A PC doesn't have to have data either in video memory or on a slow laptop HDD. You can cache data in main ram and transfer it to video memory as required.

For a game like Rage you wouldn't get any benefit to having more than about 1GB of video ram (if even that) and it can easily achieve 1:1 pixel to unique texel ratio at 60 fps. Even PCI-E 1 is plenty fast enough to transfer all the texture data you'll need for upcoming frames, just so long as you don't try and swap in 500 MB of textures between one frame and the next.

It's unfortunate that PCs are expected to load vast swathes of data that won't be needed for aeons into small amounts of video ram, while the vast majority of main ram goes unused.
 
I'd be disapointed if a next gen game had the same graphics rage does. As soon as you throw in some dynamic lighting, less static environments, better shaders, better post, more transparancies and what not, the needed memory will rapidly increase.
 
Thus when PS4 launches, it's entirely possible the top end PC GPU's may be sprting between 8 and 12GB of local memory.

Currently the top is the Geforce Titan at $999 with 6GB. 2 less than the PS4.

Then I see on newegg there are some 670 and 680 with 4GB, though they are at a significant price premium. Then of course the 7970-50 at 3GB.

I think it will be a while before the standard enthusiast GPU will be sporting 8GB. Again that "standard" right now is 2GB (7850, 7870, 660TI, 660, 650 TI Boost, most 680, 670, etc).

I'm not convinced the standard mid-class enthusiast GPU will move to 4GB next gen, and there's no clear direction when "next gen" of GPU's comes either (it may not even be 2013 according to rumors). But even if it does, that's still half of PS4.

Though ironically, next gen consoles will likely speed up GPU VRAM increases anyway, because of ports.
 
There's 7970's already at 6GB as well (under $600) but I doubt that even 3GB cards will wind up at any disadvantage to next gen consoles. If the need isn't there they might not make the move, but OEMs will certainly offer the option as the higher density chips will exist.
 
Most spread videocard vram is 1024 MB and there's not any significantly better gpu in the top 10 gpu steam stats compare to what's in Orbis (1st is even an Intel HD 3000..). That's the PC gaming right now, not some limited editions of 500$ gpu with double amount of memory.
Sony's choice of GDDR5 will certainly fuel some evolution for the standard. 8gb chips by 2015-2016 seems likely now.
 
Most spread videocard vram is 1024 MB and there's not any significantly better gpu in the top 10 gpu steam stats compare to what's in Orbis (1st is even an Intel HD 3000..). That's the PC gaming right now, not some limited editions of 500$ gpu with double amount of memory.
Sony's choice of GDDR5 will certainly fuel some evolution for the standard. 8gb chips by 2015-2016 seems likely now.

That is very interesting to read. For everyone else interested in steam hardware survey:

http://store.steampowered.com/hwsurvey

Only 10% have 2GB VRAM. More VRAM is even lower, with 3GB at 0.6%

I wonder how much RAM PS4 will use/reserve for non-gaming tasks. All this social gaming stuff will eat lots of additional RAM (e.g. upload last 15min gameplay video, friendslist, voice chat, ...). It will be interesting to see if we get a number for the actual RAM available for the game.
 
Most spread videocard vram is 1024 MB and there's not any significantly better gpu in the top 10 gpu steam stats compare to what's in Orbis (1st is even an Intel HD 3000..). That's the PC gaming right now, not some limited editions of 500$ gpu with double amount of memory.
Sony's choice of GDDR5 will certainly fuel some evolution for the standard. 8gb chips by 2015-2016 seems likely now.

The thing is, that based on steam survey, around 5m PC gamers are next-gen ready today and many still waiting for upgrade for next-gen games releases or new GPU series.

Next gpu series in mid range [200-250$] will have 3gb+ ram.

Ps. Also PS4 wont have 8gb of VRAM, but 8gb of total ram. I think that 5GB of VRAM will be max they'll use, but mostly around 3.5-4gb.
 
Last edited by a moderator:
As soon as you throw in some dynamic lighting, less static environments, better shaders, better post, more transparancies and what not, the needed memory will rapidly increase.
Fully dynamic lighting actually uses less memory than static. Static lighting for big levels needs baked lightmap data for every surface and/or large grids of precomputed light probes. Fully dynamic lighting on the other hand only calculates the lighting for the currently visible viewport. The goal here is to only process things that affect the surfaces seen in the single image (since lighting is recalculated every frame). A good fully dynamic lighting solution thus doesn't want to generate/use huge data structures (and couldn't even afford spending clock cycles and memory bandwidth generating them over and over again every frame).

Better post effects and more transparencies are mainly consuming clock cycles and memory bandwidth. You don't need more memory for those. You need better performance.

I agree that less static environments would consume more memory. Especially if the environment changes are permanent, and especially if we have more fine grained control over the environment (can break/modify terrain/objects in various ways). Current gen games tend to remove all marks of destruction/bodies/shrapnel very fast. But the biggest thing that more memory brings is ease of development. With small memory you need to stream in/out everything rapidly (just in time). It's hard to create algorithms that predict things properly in all scenarios (since HDD data seeking is slow, and popping isn't desirable). Of course the bigger memory allows more variety in the game world (less copies of the same model/texture are required, more unique models/textures are possible).
 
I agree that less static environments would consume more memory. Especially if the environment changes are permanent, and especially if we have more fine grained control over the environment (can break/modify terrain/objects in various ways). Current gen games tend to remove all marks of destruction/bodies/shrapnel very fast. But the biggest thing that more memory brings is ease of development. With small memory you need to stream in/out everything rapidly (just in time). It's hard to create algorithms that predict things properly in all scenarios (since HDD data seeking is slow, and popping isn't desirable). Of course the bigger memory allows more variety in the game world (less copies of the same model/texture are required, more unique models/textures are possible).

On the PC many of these things could be stored in main memory though.

For a multiplayer map you could have several GB of unique textures that you baked bullet decals to, and you could store level geometry in main memory too, modifying both based on game events and caching to GPU memory for drawing. If done out of the viewport you wouldn't even need to do it in real time and so PCI-E latency wouldn't even be much of a problem. PCI-E bus goes both ways. :eek:

"8 GB GDDR5" seems to have taken over the internet, with hordes of hardware junkies only able to compare GDDR5 with GDDR5, as if memory with less than 176 GB/s of bandwidth can't actually store anything.
 
When I worked on both the embedded side and the "open system" side of a real-time system project, I can see 2 distinct approaches.

On the embedded side, we *micro-manage* everything. We wrote our own drivers, database-like file system, real-time scheduler, etc. We prioritize everything and make the absolute and final call in dropping any low priority job. There is nothing that stands between us and getting the job done "on time". We don't really need to code in assembly, but C will do. The entire vertical stack from app logic down to the file system, OS calls are scheduled by an all-powerful custom scheduler we wrote.

On the open system side, we use abstraction, APIs and h/w that are an order or two more powerful than the embedded side. We ran impressive benchmarks. As one of the posters pointed out above, no micro-management is needed. The h/w could handle lot's of computation (The highest FLOP count in the industry at that time).

When the rubber hits the road, the embedded side handles the heavy lifting because it could deliver the important jobs at the exact time we need things done, even under extreme, unpredicted load conditions in the field. The open system side is slightly more unpredictable due to its complexity and "free running" nature. But the open system side can be upgraded quickly/yearly.

I think it depends on what you want. Both approaches are there for a reason, and they may not be the same thing. Performance has many aspects. UE4 is written in an open system environment, then perhaps it will be more forgiving to a thicker layer run-time (because it's designed for an "open" environment). But it doesn't mean the embedded approach is wrong, inferior, bad or unnecessary.
 
Currently the top is the Geforce Titan at $999 with 6GB. 2 less than the PS4.

Yes 2 less than the PS4 because it uses the largest memory chips currently available. PS4 and GPU's from around the time it launches will use chips with twice the memory density. Titan (and the other 6 and 4GB GPU's) already sport the maximum possible memory configurations with todays technology. Technology when the PS4 launches will allow twice as much memory so it's not unreasonable to expect a Titan successor in early 2014 to sport 12GB.

Then I see on newegg there are some 670 and 680 with 4GB, though they are at a significant price premium. Then of course the 7970-50 at 3GB.

I think it will be a while before the standard enthusiast GPU will be sporting 8GB. Again that "standard" right now is 2GB (7850, 7870, 660TI, 660, 650 TI Boost, most 680, 670, etc).

As I said, the standard high end being 2GB now means it will almost certainly be 4 when chips with twice the density are available at the end of this year. So you have your regular high end chips with 4GB and the ultra high end options with 8GB (and 12GB) options. That;s no different to now using the highest density chips available.

I'm not convinced the standard mid-class enthusiast GPU will move to 4GB next gen, and there's no clear direction when "next gen" of GPU's comes either (it may not even be 2013 according to rumors). But even if it does, that's still half of PS4.

Most rumours point to end of 2013 at the moment but even if that doesn't happen it will still be early 2014 within 6 months of the consoles launch. That's close enough to consider them contemporary with those consoles. You said it will be 2 generation before PC cards even catch up, that's clearly not true if generation zero will already have ultra high end options with more memory.

Though ironically, next gen consoles will likely speed up GPU VRAM increases anyway, because of ports.

That's exactly what I was saying. Today we have 4 and 6GB GPU options when they are frankly, completely unnecessary. Next year PC GPU's will be scrambling for all the memory they can get to achieve (marketing) parity with consoles and since the technology will be available to allow that, what makes you think there won't be options that push the limits of technology when it's actually needed when there are similar options now when it's not.

If your argument is the price premium of Titan and 4GB 670's/680's then consider the price of the 3GB 7950 compared with those 4GB parts (or 2GB 680's). The cost of a GPU often bares little resemblence to the amount of memory on board.
 
Last edited by a moderator:
"8 GB GDDR5" seems to have taken over the internet, with hordes of hardware junkies only able to compare GDDR5 with GDDR5, as if memory with less than 176 GB/s of bandwidth can't actually store anything.
Yep. 8GBs of GDDR5 isn't as impressive as lay0eople think. Sony wanted 8 GBs, but they were going to release with 4 GBs and that'd have been plenty. Pushing the memory amount up required another 4GBs from somewhere, but that doesn't mean the performance of that extra 4 GBs for its intended purpose necessitated high BW GDDR (and let's be honest, the BW of PS4 effectively halved per GB when the RAM increased, so the amount of time you can read and write to each GB of RAM dropped considerably). But adding DDR3 would have added complications in new mobo and developer issues, so Sony took an expensive but convenient approach. It's not comparable to 8 GBs GDDR5 on someone's GPU though. I seriously doubt PS3's unified 8Gbs will give a remarkable improvement over a fast GPU with plenty of system RAM to preload resources from. Potentially a game could instantly switch from one set of assets to another in that 8GBs RAM where a PC would have to copy them over from DDR3 across the slow PCIE bus, but that's not going to be a likely scenario, especially when playing cross-platform titles designed around Durango and PC limits. And a faster GPU (MUCH faster) will handle computed effects better anywhere.

8 gigs is nice (7 GBs accessible, maybe. We've been told the OS is in fenced off memory, so that's RAM the game can't touch). It'll give the platform legs. It won't make a crazy difference versus PCs with 3+ GBs VRAM and 16+ GBs system RAM, especially when that VRAM is higher BW and the GPU is faster. In truth the nature of the APU and compute might deliver better results than the additional 4 GBs RAM, as developers can really design for that hardware synergy now.
 
Why did the devs ask for 8GB RAM ? I remember reading the headline of some developers championing/requesting for 8GB.

As I understand, Sony didn't dream up the specs this time round. They consulted all the key devs.

Why include something that's expensive and marginal (assmung it won't make a good difference compared to lesser RAM like 4 or 6GB GDDR5).
 
Back
Top