Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 14-Jan-2003, 12:07   #1
BoardBonobo
Senior Moment
 
Join Date: May 2002
Location: SurfMonkey's Cluster...
Posts: 1,813
Default Getting the most out of current GPU design.

I remember that somebody made a post saying that there are lots more low hanging fruit on the 3D optimisation tree to be picked before a big change has tooccur in GPU\VPU design. This seems especially pertinent now that complexity and speed issues are effectively starting to cap PCB design.

What are these further optimisations and and how would they be implemented to the best effect?
__________________
"We're a virus with shoes" - Bill Hicks
"The fact that a believer is happier than a sceptic is no more to the point than the fact that a drunken man is happier than a sober one. " — George Bernard Shaw
"The Tree of Life is Self-Pruning" - The Darwin Awards
BoardBonobo is offline   Reply With Quote
Old 14-Jan-2003, 13:04   #2
Fuz
Member
 
Join Date: Apr 2002
Location: Sydney, Australia
Posts: 373
Send a message via ICQ to Fuz
Default

Good question. I beleive it was SA who posted the info before.
I too would like to know what the "low hanging fruit" that have not been picked yet are.
Fuz is offline   Reply With Quote
Old 14-Jan-2003, 13:22   #3
mboeller
Member
 
Join Date: Feb 2002
Location: Germany
Posts: 852
Default

IMO;

he talked about the 2pass deferred rendering now used in the DeltaCrome. So maybe he works for S3.
mboeller is offline   Reply With Quote
Old 14-Jan-2003, 15:04   #4
nAo
Nutella Nutellae
 
Join Date: Feb 2002
Location: San Francisco
Posts: 4,308
Default

Quote:
Originally Posted by mboeller
he talked about the 2pass deferred rendering now used in the DeltaCrome. So maybe he works for S3.
3DLabs..I believe.
nAo is offline   Reply With Quote
Old 14-Jan-2003, 17:11   #5
Arun
Unknown.
 
Join Date: Aug 2002
Location: UK
Posts: 4,883
Default

AFAIK:
2 Pass rendering can increase FPS, so it can be nice.
But it also increases latency. So it's futile and annoying for Q3 or UT2K3, since latency is the most important factor there.
As for RPGs, where mostly smoothness is important, this could be really useful.

To know what "low hanging fruit" are left, it's a good idea to try to see what are the main problems in the GPU pipeline.

1. Memory Bandwidth. Where is most bandwidth used today, if we activate 4x AA?
First, Z Reads & Writes. Z Compression and Hierarchical Z is already used, and few things can be done here.
Second, Static Geometry. Vertices which aren't transferred over AGP every frame are read from memory every frame. Very little is done about that. A solution is compression done in the VS. This is currently possible, but few programmers use it. In the future, maybe it'll become more common if memory bandwidth becomes even more of an issue.
Third, Color Writes. This probably takes little memory bandwidth thanks to Color Compression.
2. Bottlenecks. Current architectures are either transform-bound or fillrate-bound. As I said in another thread, an idea might be to use shared calculators, so that bottlenecks doesn't really exist anymore. I'm not saying everything should be shared, but a good part.


Uttar
Arun is offline   Reply With Quote
Old 14-Jan-2003, 18:04   #6
Basic
Senior Member
 
Join Date: Feb 2002
Location: Linköping, Sweden
Posts: 846
Default

2 pass rendering for IMRs (z-pass first) can increase framerate. And if it do so, it will also decrease latency.
There wouldn't be any interleaving of the passes. So if you get higher framerate, that means that the sum of rendering times for the two passes is decreased, which in turn means lower latency.
Basic is offline   Reply With Quote
Old 14-Jan-2003, 18:41   #7
Chalnoth
 
Join Date: May 2002
Location: New York, NY
Posts: 12,679
Default

Quote:
Originally Posted by Basic
2 pass rendering for IMRs (z-pass first) can increase framerate. And if it do so, it will also decrease latency.
There wouldn't be any interleaving of the passes. So if you get higher framerate, that means that the sum of rendering times for the two passes is decreased, which in turn means lower latency.
Correct, but if done in the driver, then it will require a significant amount of caching by the driver, increasing overall CPU and system memory bandwidth requirements. Depending on the game, this may or may not be a good thing. This will also double the power needed in the geometry stage of the pipeline, which could be problematic for performance in certain situations.

Personally, I'd just like to see games do this themselves. Doing it in the driver is challenging at the very least, and a potential performance problem at the worst (exchanging one bottleneck for another).
Chalnoth is offline   Reply With Quote
Old 15-Jan-2003, 06:20   #8
Arun
Unknown.
 
Join Date: Aug 2002
Location: UK
Posts: 4,883
Default

Quote:
Originally Posted by Basic
2 pass rendering for IMRs (z-pass first) can increase framerate. And if it do so, it will also decrease latency.
There wouldn't be any interleaving of the passes. So if you get higher framerate, that means that the sum of rendering times for the two passes is decreased, which in turn means lower latency.
Actually, I was assuming IMR 2 pass meant that you first do Z and cache everything. Then, you simultaneously prepare the next frame Z and render the scene with the cache.
That would seem to exploit more parallelism, IMO...

And yes, it could decrease latency. But then fillrate would have to be the bottleneck. If you're memory-bound, it's completely useless. And since you got to read static vertices two times and stuff, it might indeed make your memory-bound slightly more easily. But there's also only one Color Write. All of that is not too important, however. But if the game is geometry limited for example ( not like that exists... ) , you better not hope for lower latency!


Uttar
Arun is offline   Reply With Quote
Old 15-Jan-2003, 07:31   #9
DemoCoder
Regular
 
Join Date: Feb 2002
Location: California
Posts: 4,732
Default

Deferred shading can be done to increase speed on DX9 GPUs. First pass: render Z, and write pixel shader parameters to FP frame buffer/MRT. Second pass, effectively a 2D video post-processing pass, render one full screen quad, setup your huge 128+ instruction pixel shader. Voila: no overdraw, no wasted expensive pixel shading, no wasted recomputation of T&L on second pass.

If you had true dynamic branching, you could even pack multiple shaders into one pixel shader and branch based on an object ID value written in the frame buffer.
DemoCoder is offline   Reply With Quote
Old 15-Jan-2003, 08:40   #10
Chalnoth
 
Join Date: May 2002
Location: New York, NY
Posts: 12,679
Default

Quote:
Originally Posted by DemoCoder
If you had true dynamic branching, you could even pack multiple shaders into one pixel shader and branch based on an object ID value written in the frame buffer.
Speaking of which, we may well see even larger supported packed framebuffer types (256 bits per pixel and up) in order to store varied information for multipass rendering.

I remember seeing one technique already that uses a packed 128-bit framebuffer to do all of the lighting in a DOOM3-style technique by just rendering the one screenspace quad.

Out of curiosity, I wonder if there will ever be an incentive to move to 64-bit floating-point precision in the pipelines? If we move to full 32-bit z-buffers soon (which I really want to see), and z-buffer errors are not yet eliminated, we may need to for optimal precision.
Chalnoth is offline   Reply With Quote
Old 15-Jan-2003, 09:23   #11
Dave Baumann
Gamerscore Wh...
 
Join Date: Jan 2002
Posts: 12,989
Default

Quote:
Originally Posted by Uttar
But then fillrate would have to be the bottleneck. If you're memory-bound, it's completely useless.
Why would you have to be fillrate limited? Thats the entire point of rendering Z first so as to optimise you early z rejection routines and save on a lot of fillrate (well, texel/shader).
__________________
Expand. Accelerate. Dominate.
Tweet Tweet!
Dave Baumann is offline   Reply With Quote
Old 15-Jan-2003, 10:02   #12
Nagorak
Member
 
Join Date: Jun 2002
Posts: 854
Default

Quote:
Originally Posted by Uttar
AFAIK:
2 Pass rendering can increase FPS, so it can be nice.
But it also increases latency. So it's futile and annoying for Q3 or UT2K3, since latency is the most important factor there.
As for RPGs, where mostly smoothness is important, this could be really useful.
What RPGs need is competent programmers who actually have a clue what they are doing. As it stands now, ATi and Nvidia could specifically tailor their drivers for RPG performance and the games would still run like total crap because of the horrible coding behind those games.
Nagorak is offline   Reply With Quote
Old 15-Jan-2003, 18:19   #13
Arun
Unknown.
 
Join Date: Aug 2002
Location: UK
Posts: 4,883
Default

Quote:
Originally Posted by DaveBaumann
Quote:
Originally Posted by Uttar
But then fillrate would have to be the bottleneck. If you're memory-bound, it's completely useless.
Why would you have to be fillrate limited? Thats the entire point of rendering Z first so as to optimise you early z rejection routines and save on a lot of fillrate (well, texel/shader).
That's exactly what I meant.
Here's the full quote:
Quote:
And yes, it could decrease latency. But then fillrate would have to be the bottleneck.
My point is that for it to decrease latency, fillrate got to be the bottleneck and not memory.


Uttar
Arun is offline   Reply With Quote
Old 16-Jan-2003, 02:36   #14
pcchen
Moderator
 
Join Date: Feb 2002
Location: Taiwan
Posts: 2,358
Default

Why? You can still be memory bound and 2-pass still reduce latency. For example, if most of your memory bandwidth goes to texture fetch, 2-pass can eliminate most of them and therefore reduce latency.

However, if most of your bandwidth goes to frame buffer access, 2-pass won't buy you much, perhaps even slow you down.
pcchen is online now   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
NiGHTS - A Study in Design Cohesiveness Lazy8s Console Technology 14 05-Jul-2004 18:49
BASIC Adventure and Strategy Game Design for The TRS-80 Scythe Wielder General Discussion 4 25-Feb-2004 06:36
Microsoft to own every GPU? Cyborg 3D Architectures & Chips 26 14-Jul-2002 11:15
S3 & PowerVR Matt General 3D Technology 105 20-Mar-2002 17:51


All times are GMT +1. The time now is 15:26.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.