Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 02-Jan-2013, 18:42   #26
Andrew Lauritzen
AndyTX
 
Join Date: May 2004
Location: British Columbia, Canada
Posts: 2,236
Default

Quote:
Originally Posted by OpenGL guy View Post
More general hardware implies more recompilation, not less. It's the removal of fixed-function bits that trigger some of these things.
What I mean by "more general" here is stuff like texture units should be able to handle all of the formats and modes exposed by the APIs totally dynamically (ideally per-lane once you go bindless). It's allowed to slow down on more complex requests of course, but it should not need software statically looking at the bound state and putting little tweaks into the shader when it changes.

Being part of several API and hardware iterations, I'm well aware of the trade-offs here, but there really ultimately are just two solutions... APIs go lower level and expose implementation details or hardware actually obeys the higher-level commands. The current status quo is not really great.

Quote:
Originally Posted by OpenGL guy View Post
This is one issue with benchmarking. Getting a hitch on the first instance of a new effect might be annoying, but if you're using those same effects for the next 30 minutes, who will remember the hitch that happened 29 minutes ago?
But "new effects" come up all the time in games, so it's not really okay to hitch when you see something you don't recognize. When I walk around a corner and oh my gosh there's a new shader, it's not okay to take 100ms to work it out. We only accept it because that's how it has always been, not because it really needs to be that way.

And like I said, I'm well versed in theproblems as both a gamer and developer and I'm not trying to trivialize them; rather I'm trying to get us focused on metrics and hardware/software improvements that actually better model the gamer experience and stop just mindlessly cramming more ALUs onto GPUs so that I can render with 6 30" monitors instead of 5 (slightly kidding here, but you get my point). And I'm preaching to myself and my own employer as much as anyone else
__________________
The content of this message is my personal opinion only.

Last edited by Andrew Lauritzen; 02-Jan-2013 at 21:14.
Andrew Lauritzen is offline   Reply With Quote
Old 02-Jan-2013, 21:07   #27
CNCAddict
Member
 
Join Date: Aug 2005
Posts: 286
Default

It's great to see this taken so seriously. Maybe it will help get PC games closer to the smoothness of a console experience...but anyhow Scott has a new write up that mentions this thread.

http://techreport.com/blog/24133/as-...esting-methods
CNCAddict is offline   Reply With Quote
Old 02-Jan-2013, 22:52   #28
I.S.T.
Senior Member
 
Join Date: Feb 2004
Posts: 2,561
Default

And http://techreport.com/news/24136/dri...ies-of-updates

BTW, I'm glad I wound up causing a bit of a stir here by posting that link back in the Radeon thread. It's lead to some incredibly interesting discussion.
I.S.T. is offline   Reply With Quote
Old 02-Jan-2013, 23:17   #29
Davros
Darlek ******
 
Join Date: Jun 2004
Posts: 11,059
Default

cant the game pre-compile shaders during level load (a bit like ut pre-caches textures (in d3d renderer, doesnt pre-cache in ogl) ) ?
__________________
Guardian of the Bodacious Three Terabytes of Gaming Goodness™
Davros is offline   Reply With Quote
Old 02-Jan-2013, 23:23   #30
OpenGL guy
Senior Member
 
Join Date: Feb 2002
Posts: 2,333
Send a message via ICQ to OpenGL guy
Default

Quote:
Originally Posted by Davros View Post
cant the game pre-compile shaders during level load (a bit like ut pre-caches textures (option in opengl renderer) ) ?
Yes, they can. But some games only do this for benchmark runs. Here's how some games (and benchmarks) "warm" the system for benchmarking:
- Load critical assets
- Draw frame
- Prior to calling Present(), clear screen to black
- Draw "Loading..." or some other progress meter
- Call Present()
- etc.
- Once all assets are "warm", run normal benchmark

Obviously, you don't want to double the benchmark time by running through all frames, so shortcuts are taken ("time" could be sped up to shorten number of frames generated, for example).
__________________
I speak only for myself.
OpenGL guy is offline   Reply With Quote
Old 02-Jan-2013, 23:41   #31
Andrew Lauritzen
AndyTX
 
Join Date: May 2004
Location: British Columbia, Canada
Posts: 2,236
Default

Quote:
Originally Posted by OpenGL guy View Post
Yes, they can. But some games only do this for benchmark runs.
Right but the issue is which state gets "compiled in" with a shader and which is dynamic is opaque. It's unreasonable for the application to assume that *all* state is compiled in and go through all of it (hey, maybe an implementation has to recompile a shader when a bound constant buffer size changes, who knows!), so there's not really a reasonable solution on PC.

Pretty much all games do create shaders at level load time, but many drivers compile them lazily due to pulling in additional compiled state at draw call time. The idea with DX10's state structures was to try and eliminate some of that by grouping up relevant state and declaring it immutable up front, but of course it doesn't map perfectly to any one implementation and thus it ends up being fairly useless to that end as well.

One interesting question though is why do graphics drivers not at least cache compiled shaders across runs (i.e. to disk or similar). Purely concerns over reverse engineering (that would seem odd these days)?
__________________
The content of this message is my personal opinion only.

Last edited by Andrew Lauritzen; 02-Jan-2013 at 23:56.
Andrew Lauritzen is offline   Reply With Quote
Old 03-Jan-2013, 02:51   #32
Dave Baumann
Gamerscore Wh...
 
Join Date: Jan 2002
Posts: 13,445
Default

Quote:
Originally Posted by Davros View Post
cant the game pre-compile shaders during level load (a bit like ut pre-caches textures (in d3d renderer, doesnt pre-cache in ogl) ) ?
And note, this has been done for a long time - remember people complaining about BF3 load times? This was why...
__________________
Radeon is Gaming
Tweet Tweet!
Dave Baumann is offline   Reply With Quote
Old 03-Jan-2013, 03:03   #33
OpenGL guy
Senior Member
 
Join Date: Feb 2002
Posts: 2,333
Send a message via ICQ to OpenGL guy
Default

Quote:
Originally Posted by Andrew Lauritzen View Post
One interesting question though is why do graphics drivers not at least cache compiled shaders across runs (i.e. to disk or similar). Purely concerns over reverse engineering (that would seem odd these days)?
Nvidia does cache compute kernels, but not Direct3D. I presume the issue is the thousands of shaders a single game creates.
__________________
I speak only for myself.
OpenGL guy is offline   Reply With Quote
Old 03-Jan-2013, 03:08   #34
Homeles
Member
 
Join Date: May 2012
Posts: 234
Default

Quote:
Originally Posted by Bludd View Post
I think this is a very interesting metric and it coincides with Anand taking an interest in performance consistency in SSDs.
Good catch.
Homeles is offline   Reply With Quote
Old 03-Jan-2013, 04:23   #35
Andrew Lauritzen
AndyTX
 
Join Date: May 2004
Location: British Columbia, Canada
Posts: 2,236
Default

Quote:
Originally Posted by OpenGL guy View Post
I presume the issue is the thousands of shaders a single game creates.
But I mean they already ship the bytecode for those shaders. Certainly it can get out of hand with permutations like in the original Far Cry (IIRC) where the patches were massive because of small changes causing the full cross product of shaders to have to be recompiled/distributed, but I can't imagine the overhead would be massively high for the number of shaders that typical games use. Also with the recent popularity of deferred shading and "ubershaders", the issue isn't as bad as it was a few years ago. Of course the "cache" can have a maximum size and an eviction policy as well.

Anyways you may be right about size concerns, but for some games I could see it being a benefit.

And if the driver is adding too large a cross product of its own on top of what the application is requesting in terms of shaders, that's a problem too
__________________
The content of this message is my personal opinion only.

Last edited by Andrew Lauritzen; 03-Jan-2013 at 04:45.
Andrew Lauritzen is offline   Reply With Quote
Old 03-Jan-2013, 04:34   #36
swaaye
Entirely Suboptimal
 
Join Date: Mar 2003
Location: WI, USA
Posts: 7,286
Default

Quote:
Originally Posted by Dave Baumann View Post
And note, this has been done for a long time - remember people complaining about BF3 load times? This was why...
Going way back - Far Cry has a big shader cache directory AFAIR.
swaaye is offline   Reply With Quote
Old 03-Jan-2013, 05:27   #37
3dilettante
Regular
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 5,382
Default

Quote:
Originally Posted by Andrew Lauritzen View Post
Yes totally agreed in this case and didn't mean to imply that. But trust me, such things have been done and are routinely done on separate threads (which is far less harmful thankfully, although graphics drivers eating a whole thread for themselves is another topic...).
Just for clarity's sake, do you mean it is objectionable that a driver have its own thread, or that its thread has a disproportionate share of active CPU cycles?
It seems reasonable to give a program tasked with arbitrating between two systems of arbitrary composition running at user-interactive rates over a long(ish)-latency interface at least some freedom from possible blockage if it shared room in the same loop with other functionality.

Quote:
Originally Posted by Andrew Lauritzen View Post
One interesting question though is why do graphics drivers not at least cache compiled shaders across runs (i.e. to disk or similar). Purely concerns over reverse engineering (that would seem odd these days)?
Copyright?
Hasn't this come up with earlier attempts at binary translators for different CPU ISAs?
Copyright lawsuits have been brought to bear for in-memory copies of copyrighted data, much less copies of software in a translated form on disk.
If there's a possible advantage of a console versus an open PC, it's that the implied locked-down nature of the console and licensing to the console company may provide consent for such an action, whereas storing a transformed copy of a work by one unknown party for the use of someone similarly unknown provides nothing to assuage content creators.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline   Reply With Quote
Old 03-Jan-2013, 07:48   #38
Ethatron
Member
 
Join Date: Jan 2010
Posts: 437
Default

Quote:
Originally Posted by Andrew Lauritzen View Post
Anyways you may be right about size concerns, but for some games I could see it being a benefit.
I wouldn't think the size (in bytes) but the count of permutations is critical here. You have to implement a search-tree solution which guarantees that you access cached code faster than you can compile it. Hash-keying the multidimension space of shader-code+externals isn't exactly for free.
Graphics IHVs are not Oracle, they don't have DB-performance groups, and they shouldn't.

As far as I can see there has been no possible issue guessed which can not be coped with at the developer side. You did though say "maybe we should start caring about worst-case instead of throughput" (freely recreated ), and that's what I think as well, but that has to sink into the programming patterns of engine-programmers. How do you do that? Get more embedded-realtime-system programmers into the game-industry?
Ethatron is offline   Reply With Quote
Old 03-Jan-2013, 08:27   #39
Bludd
Eric the Half-a-bee
 
Join Date: Oct 2003
Location: The cat detector van from the Ministry of Housinge
Posts: 2,129
Default

Quote:
Originally Posted by Dave Baumann View Post
And note, this has been done for a long time - remember people complaining about BF3 load times? This was why...
BF2 no?
Bludd is offline   Reply With Quote
Old 03-Jan-2013, 08:30   #40
Billy Idol
Senior Member
 
Join Date: Mar 2009
Location: Europe
Posts: 3,957
Default

Thanks Andrew, great post, great topic. It happened to me that in Far Cry 3 FRAPS showed good fps, but it all felt jittery (in the village) and I scratched my head and could not understand whats going on. You gave me good insight and a possible explanation. So in short, I fully agree with you that (averaged) fps should not be the only measurement/benchmark. Digital Foundry analysis e.g. includes the controller input latency as well for certain games.

Would a framerate smoothing by interpolation approach as presented by the force unleashed developers help out to decrease the impact of such performance spikes (or maybe make it even worse) and reduce the jittery feeling?
__________________
I bid farewell with a rebel yell...
Billy Idol is offline   Reply With Quote
Old 03-Jan-2013, 10:30   #41
lanek
Senior Member
 
Join Date: Mar 2012
Location: Switzerland
Posts: 1,123
Default

Quote:
Originally Posted by Bludd View Post
BF2 no?
Same problem on early time of BF2 ( before a patch or driver was released reduced it... but on other extend, BC2 and BF3 have show similar things ( and have been quickly fixed ) .. Its because their home engine use this method since the start.

Quote:
Originally Posted by Billy Idol View Post
Thanks Andrew, great post, great topic. It happened to me that in Far Cry 3 FRAPS showed good fps, but it all felt jittery (in the village) and I scratched my head and could not understand whats going on. You gave me good insight and a possible explanation. So in short, I fully agree with you that (averaged) fps should not be the only measurement/benchmark. Digital Foundry analysis e.g. includes the controller input latency as well for certain games.

Would a framerate smoothing by interpolation approach as presented by the force unleashed developers help out to decrease the impact of such performance spikes (or maybe make it even worse) and reduce the jittery feeling?
Its another problem on FC3, related mainly to the engine / driver and specially it touch any brand.. In this case it is really " stutter " and not microstutter ( even if latency frametimes, dont forcibly put microstutters visible )

Last edited by lanek; 03-Jan-2013 at 10:42.
lanek is offline   Reply With Quote
Old 03-Jan-2013, 11:02   #42
Gipsel
Senior Member
 
Join Date: Jan 2010
Location: Hamburg, Germany
Posts: 1,448
Default

Quote:
Originally Posted by Andrew Lauritzen View Post
Then, assuming that pipeline is running in a relatively steady state, the game can time how long it takes from the point that it fills the queue to the point where another slot opens up to get the rate of output of the pipeline. This is what pretty much every game does, and this is the same timing that FRAPS is measuring.
I always thought this is NOT was Fraps is actually doing. AFAIR it just measures the time difference between subsequent Present() calls. So it does not measure frame latency, but just inverted throughput on a single frame granularity.
Gipsel is offline   Reply With Quote
Old 03-Jan-2013, 11:57   #43
Novum
Member
 
Join Date: Jun 2006
Location: Germany
Posts: 317
Default

Quote:
Originally Posted by swaaye View Post
Going way back - Far Cry has a big shader cache directory AFAIR.
Still the same with every CryEngine game.
Novum is offline   Reply With Quote
Old 03-Jan-2013, 14:36   #44
sonen
Junior Member
 
Join Date: Jul 2012
Posts: 11
Default

Props to Mr. Dave Baumann for being open about the issue, as it's rare these days.

Hope he doesn't catch any heat from above because of it.
sonen is offline   Reply With Quote
Old 03-Jan-2013, 15:00   #45
Andrew Lauritzen
AndyTX
 
Join Date: May 2004
Location: British Columbia, Canada
Posts: 2,236
Default

Quote:
Originally Posted by 3dilettante View Post
Just for clarity's sake, do you mean it is objectionable that a driver have its own thread, or that its thread has a disproportionate share of active CPU cycles?
The latter - I meant it in terms of it "eats an entire thread" and runs active code on it a large amount of the time. Just try pinning a game to thread 0 on an NVIDIA machine sometime and see what happens to your performance in D3D, even for a single-threaded game

Quote:
Originally Posted by 3dilettante View Post
Copyright?
Interesting point; perhaps that is indeed part of the reason.

Quote:
Originally Posted by Ethatron View Post
You have to implement a search-tree solution which guarantees that you access cached code faster than you can compile it. Hash-keying the multidimension space of shader-code+externals isn't exactly for free.
True, but I feel like most of this already done - at runtime - in drivers anyways, so it already has to be "fairly" optimized. Certainly there'd be issues where spinning up a slow disk isn't worthwhile, but moving forward people should mostly have SSDs... I dunno, food for thought

Quote:
Originally Posted by Ethatron View Post
As far as I can see there has been no possible issue guessed which can not be coped with at the developer side.
I agree a lot of stuff can. Take a look at TechReport's results for Battlefield 3 for instance... nice smooth frames on almost all implementations. That's no accident... repi knows his stuff But still, like I said, some of it is guesswork depending on your level of relationship with the major ISVs. Certainly people are usually willing to tell you - under NDA - for instance which state you should avoid changing at runtime/precache, but it's not exactly constant, even for a single vendor, so it's a bit brittle.

Quote:
Originally Posted by Gipsel View Post
I always thought this is NOT was Fraps is actually doing. AFAIR it just measures the time difference between subsequent Present() calls. So it does not measure frame latency, but just inverted throughput on a single frame granularity.
Indeed FRAPS does measure the delta time between Present calls. My point was that number is basically the same as measuring the delta time elsewhere in the game loop. It might be a frame or so "different", but you'll see the same patterns and spikes. (That is, assuming the game isn't doing some sort of smoothing/filtering on the raw time deltas, but as I mentioned, I know of no games that currently do that.)

Quote:
Originally Posted by sonen View Post
Props to Mr. Dave Baumann for being open about the issue, as it's rare these days.

Hope he doesn't catch any heat from above because of it.
Agreed. I felt a bit sorry for him after reading some of the TR comments, but thanks for responding and clarifying in any case.

And again, sorry for any implication that my post here had anything to do with antagonizing AMD or anyone else... it was meant to be an industry-wide call to action and Scott's recent data was simply a convenient example.
__________________
The content of this message is my personal opinion only.
Andrew Lauritzen is offline   Reply With Quote
Old 03-Jan-2013, 15:08   #46
pcchen
Moderator
 
Join Date: Feb 2002
Location: Taiwan
Posts: 2,485
Default

Quote:
Originally Posted by Andrew Lauritzen View Post
One interesting question though is why do graphics drivers not at least cache compiled shaders across runs (i.e. to disk or similar). Purely concerns over reverse engineering (that would seem odd these days)?
Caching to disk is probably going to be even slower than a recompile (if you use a HDD). Caching in RAM may have memory usage issues. Reverse engineering is probably not a big issue, and you can always encrypt your cached data.
pcchen is offline   Reply With Quote
Old 03-Jan-2013, 15:15   #47
Dave Baumann
Gamerscore Wh...
 
Join Date: Jan 2002
Posts: 13,445
Default

Quote:
Originally Posted by Andrew Lauritzen View Post
Agreed. I felt a bit sorry for him after reading some of the TR comments, but thanks for responding and clarifying in any case.
Ehh, don't mind me. I've been around the internets long enough.

Quote:
And again, sorry for any implication that my post here had anything to do with antagonizing AMD or anyone else... it was meant to be an industry-wide call to action and Scott's recent data was simply a convenient example.
While I like the scientific application of the analysis some work does need to be done to understand the thresholds. While I cannot deny there are spikes (the driver team have spent a lot of time analyzing where they are coming from and smoothing them out!), I can't say that in the games I've played they have been noticeable, likewise I do not see end user feedback for this type of issue (except, maybe outside of a few known problem titles).
__________________
Radeon is Gaming
Tweet Tweet!
Dave Baumann is offline   Reply With Quote
Old 03-Jan-2013, 19:09   #48
almighty
Naughty Boy!
 
Join Date: Dec 2006
Posts: 2,469
Default

I dread to think the latency I'm getting with Quad 7970's but I don't really notice anything.....
almighty is offline   Reply With Quote
Old 03-Jan-2013, 19:22   #49
Gambler FEX online
Junior Member
 
Join Date: Jul 2002
Posts: 11
Default

Isn't this called a paradigm change? Instead of 60fps (or 120fps for gamers with a 120hz monitor), I'd love to see the individual frametimes very close to 16.33ms, or 8,33ms for 120hz. Does Nvidias adaptive v-sync help, or complicate this issue?
Gambler FEX online is offline   Reply With Quote
Old 03-Jan-2013, 19:23   #50
Alexko
Senior Member
 
Join Date: Aug 2009
Posts: 2,815
Send a message via MSN to Alexko
Default

Quote:
Originally Posted by Dave Baumann View Post
Ehh, don't mind me. I've been around the internets long enough.


While I like the scientific application of the analysis some work does need to be done to understand the thresholds. While I cannot deny there are spikes (the driver team have spent a lot of time analyzing where they are coming from and smoothing them out!), I can't say that in the games I've played they have been noticeable, likewise I do not see end user feedback for this type of issue (except, maybe outside of a few known problem titles).
Do you expect the optimisation of whatever causes these spikes to translate into a measurable increase in frame rate as well?
__________________
"Well, you mentioned Disneyland, I thought of this porn site, and then bam! A blue Hulk." —The Creature
My (currently dormant) blog: Teχlog
Alexko is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 05:30.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.