AMD GPU14 Tech Day Event - Sept 25'th

Will it not be easier to implement new features or even make evolve GCN1 based console when you have the advantage of a low level API access ? And Sony and MS have the possibility to upgrade the APU /GPU in 3 years and release a PS4 v2 and XB One v2 with better GPU ?.. They have allready do it by the past... ( without saying with process advance, they can got some real gain on TDP, performance ) .
What happens when the game coded with low-level details of the new revision runs on the old hardware?
Exposing low-level details means there's no intermediary to intercept your mistakes. CPU architectures are rife with low-level details that stick around because you can't wantonly crash software on your platform because you suckered people into using feature X in a certain way.

Maintaining a higher level of abstraction means the software doesn't have details on how the hardware achieves an end result, so if it's a new vector ALU, a re-ordered data path, or turtles, it's none of the software's business.

Consoles already have low-level access, and they barely change the core architecture because they need consistency. The 360 shrink that put the GPU and CPU on the same die actually spent silicon on a fake bus unit that pretended to be a slow external bus so that the chip behaved identically to the old ones.
 
It falls back to dx11 (or whatever ) I guess

The hypothetical is a new revision of a console, which wouldn't have that fallback.
I don't think it would be acceptable even if there were. Unless Mantle brings nothing to the table--and in that case why bother, a fallback would lead to older consoles having an inferior experience.
 
I don't know how much this applies to real world performance in games, but it is interesting nonetheless.
These numbers are fairly meaningless w/o CPU usage numbers. Submitting from a single hardware thread at some point you'll become CPU limited - which is what these numbers indicate. If you calculate time per drawcall, you'll see that somewhere between 300 and 2100 drawcalls dude became CPU-bound.

ZQXgNHq.png


There's a mostly fixed CPU cost of each drawcall which is indicated by the almost flat line at the bottom. The reason it's not completely flat is that once you try pushing more and more drawcalls per frame, you decrease the number of "houskeeping" operations pushed down the pipe (clear, present, etc.) which also take time. So this time saved on, say, less presents gives you some extra time to perform draw calls, which shows up on the graph as decreased drawcall time.

The bottom line is: yes, draws take time. This time is consumed by pushing data from UMD to KMD, memory operations (allocations, mapping, building paging buffers and what not) and can't be avoided if you want multitasking operating system that's responsive and works with more than one GPU type.
 
it says "1 display port with mst"
Does that mean they supply a mst (they are $100)
or that there is a mst built in and you just need some sort of dumb splitter
it also says use "any of the outputs" that was not the case with the 6950 you had to use the displayport
with an "active" adapter if you didnt have a dp monitor, this caused a lot of confusion for some people.

ps: are the 2 dvi ports dvi-i or 1xdvi-i + 1xdvi-d as on the 6950 (bloody penny pinchers)
 
it says "1 display port with mst"
Does that mean they supply a mst (they are $100)
or that there is a mst built in and you just need some sort of dumb splitter
it also says use "any of the outputs" that was not the case with the 6950 you had to use the displayport
with an "active" adapter if you didnt have a dp monitor, this caused a lot of confusion for some people.

ps: are the 2 dvi ports dvi-i or 1xdvi-i + 1xdvi-d as on the 6950 (bloody penny pinchers)

Side says 2 x dual link DVI, so that would surely mean dvi -d.
 
it says "1 display port with mst"
Does that mean they supply a mst (they are $100)
or that there is a mst built in and you just need some sort of dumb splitter

Looking the numbers of adaptaters including Active you have in the 7900 boxes.. I think its not a problem for them to include an Multi stream or splitters for DP .. should cost even less.
 
These numbers are fairly meaningless w/o CPU usage numbers. Submitting from a single hardware thread at some point you'll become CPU limited - which is what these numbers indicate. If you calculate time per drawcall, you'll see that somewhere between 300 and 2100 drawcalls dude became CPU-bound.

ZQXgNHq.png


There's a mostly fixed CPU cost of each drawcall which is indicated by the almost flat line at the bottom. The reason it's not completely flat is that once you try pushing more and more drawcalls per frame, you decrease the number of "houskeeping" operations pushed down the pipe (clear, present, etc.) which also take time. So this time saved on, say, less presents gives you some extra time to perform draw calls, which shows up on the graph as decreased drawcall time.

The bottom line is: yes, draws take time. This time is consumed by pushing data from UMD to KMD, memory operations (allocations, mapping, building paging buffers and what not) and can't be avoided if you want multitasking operating system that's responsive and works with more than one GPU type.
I was not CPU bound.

This is my CPU utilization with less than 100 draw calls
http://i2.minus.com/iOPjPdyK3T9a5.jpg

With 23k draw calls (that spike before the end of the graph)
http://i4.minus.com/iJq4S1WZS18z6.jpg

With 100k draw calls
http://i1.minus.com/ibaWNyioP6DyO5.jpg
 
Last edited by a moderator:
You could change core usage to 1 in msconfig and retest stuff with high priority process to get better results.

You mean setting Editor to only use one core? Editor is maxing out one core from the get go. With two cores enabled there is no performance increase.
 
If you weren't CPU bound on draw calls/setup (which I agree has yet to be proven), what is the point of the test? What are you even measuring that has any relevance to Mantle?

Also this obviously isn't a great way to measure API overhead, since there's a lot more going on in an engine that relates to objects/instancing than just draw calls/3D API interaction. Typically you want to set up a microbenchmark that changes a specific set of state between each draw call (and the cost of the draw call will vary depending on which state that is!) and ensure that everything is offscreen/culled on the GPU.
 
If you weren't CPU bound on draw calls/setup (which I agree has yet to be proven), what is the point of the test? What are you even measuring that has any relevance to Mantle?

Also this obviously isn't a great way to measure API overhead, since there's a lot more going on in an engine that relates to objects/instancing than just draw calls/3D API interaction. Typically you want to set up a microbenchmark that changes a specific set of state between each draw call (and the cost of the draw call will vary depending on which state that is!) and ensure that everything is offscreen/culled on the GPU.

That was only test how CryEngine 3 reacts to increasing draw calls, when nothing more is affected, except by creating some sprites.
 
Back
Top