If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#1 |
|
Senior Moment
Join Date: May 2002
Location: SurfMonkey's Cluster...
Posts: 1,813
|
I remember that somebody made a post saying that there are lots more low hanging fruit on the 3D optimisation tree to be picked before a big change has tooccur in GPU\VPU design. This seems especially pertinent now that complexity and speed issues are effectively starting to cap PCB design.
What are these further optimisations and and how would they be implemented to the best effect?
__________________
"We're a virus with shoes" - Bill Hicks "The fact that a believer is happier than a sceptic is no more to the point than the fact that a drunken man is happier than a sober one. " — George Bernard Shaw "The Tree of Life is Self-Pruning" - The Darwin Awards |
|
|
|
|
|
#2 |
|
Member
|
Good question. I beleive it was SA who posted the info before.
I too would like to know what the "low hanging fruit" that have not been picked yet are. |
|
|
|
|
|
#3 |
|
Member
Join Date: Feb 2002
Location: Germany
Posts: 852
|
IMO;
he talked about the 2pass deferred rendering now used in the DeltaCrome. So maybe he works for S3. |
|
|
|
|
|
#4 | |
|
Nutella Nutellae
Join Date: Feb 2002
Location: San Francisco
Posts: 4,308
|
Quote:
|
|
|
|
|
|
|
#5 |
|
Unknown.
Join Date: Aug 2002
Location: UK
Posts: 4,883
|
AFAIK:
2 Pass rendering can increase FPS, so it can be nice. But it also increases latency. So it's futile and annoying for Q3 or UT2K3, since latency is the most important factor there. As for RPGs, where mostly smoothness is important, this could be really useful. To know what "low hanging fruit" are left, it's a good idea to try to see what are the main problems in the GPU pipeline. 1. Memory Bandwidth. Where is most bandwidth used today, if we activate 4x AA? First, Z Reads & Writes. Z Compression and Hierarchical Z is already used, and few things can be done here. Second, Static Geometry. Vertices which aren't transferred over AGP every frame are read from memory every frame. Very little is done about that. A solution is compression done in the VS. This is currently possible, but few programmers use it. In the future, maybe it'll become more common if memory bandwidth becomes even more of an issue. Third, Color Writes. This probably takes little memory bandwidth thanks to Color Compression. 2. Bottlenecks. Current architectures are either transform-bound or fillrate-bound. As I said in another thread, an idea might be to use shared calculators, so that bottlenecks doesn't really exist anymore. I'm not saying everything should be shared, but a good part. Uttar |
|
|
|
|
|
#6 |
|
Senior Member
Join Date: Feb 2002
Location: Linköping, Sweden
Posts: 846
|
2 pass rendering for IMRs (z-pass first) can increase framerate. And if it do so, it will also decrease latency.
There wouldn't be any interleaving of the passes. So if you get higher framerate, that means that the sum of rendering times for the two passes is decreased, which in turn means lower latency. |
|
|
|
|
|
#7 | |
|
Join Date: May 2002
Location: New York, NY
Posts: 12,679
|
Quote:
Personally, I'd just like to see games do this themselves. Doing it in the driver is challenging at the very least, and a potential performance problem at the worst (exchanging one bottleneck for another). |
|
|
|
|
|
|
#8 | |
|
Unknown.
Join Date: Aug 2002
Location: UK
Posts: 4,883
|
Quote:
That would seem to exploit more parallelism, IMO... And yes, it could decrease latency. But then fillrate would have to be the bottleneck. If you're memory-bound, it's completely useless. And since you got to read static vertices two times and stuff, it might indeed make your memory-bound slightly more easily. But there's also only one Color Write. All of that is not too important, however. But if the game is geometry limited for example ( not like that exists... ) , you better not hope for lower latency! Uttar |
|
|
|
|
|
|
#9 |
|
Regular
Join Date: Feb 2002
Location: California
Posts: 4,732
|
Deferred shading can be done to increase speed on DX9 GPUs. First pass: render Z, and write pixel shader parameters to FP frame buffer/MRT. Second pass, effectively a 2D video post-processing pass, render one full screen quad, setup your huge 128+ instruction pixel shader. Voila: no overdraw, no wasted expensive pixel shading, no wasted recomputation of T&L on second pass.
If you had true dynamic branching, you could even pack multiple shaders into one pixel shader and branch based on an object ID value written in the frame buffer. |
|
|
|
|
|
#10 | |
|
Join Date: May 2002
Location: New York, NY
Posts: 12,679
|
Quote:
I remember seeing one technique already that uses a packed 128-bit framebuffer to do all of the lighting in a DOOM3-style technique by just rendering the one screenspace quad. Out of curiosity, I wonder if there will ever be an incentive to move to 64-bit floating-point precision in the pipelines? If we move to full 32-bit z-buffers soon (which I really want to see), and z-buffer errors are not yet eliminated, we may need to for optimal precision. |
|
|
|
|
|
|
#11 | |
|
Gamerscore Wh...
Join Date: Jan 2002
Posts: 12,989
|
Quote:
|
|
|
|
|
|
|
#12 | |
|
Member
Join Date: Jun 2002
Posts: 854
|
Quote:
|
|
|
|
|
|
|
#13 | |||
|
Unknown.
Join Date: Aug 2002
Location: UK
Posts: 4,883
|
Quote:
Here's the full quote: Quote:
Uttar |
|||
|
|
|
|
|
#14 |
|
Moderator
Join Date: Feb 2002
Location: Taiwan
Posts: 2,358
|
Why? You can still be memory bound and 2-pass still reduce latency. For example, if most of your memory bandwidth goes to texture fetch, 2-pass can eliminate most of them and therefore reduce latency.
However, if most of your bandwidth goes to frame buffer access, 2-pass won't buy you much, perhaps even slow you down. |
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| NiGHTS - A Study in Design Cohesiveness | Lazy8s | Console Technology | 14 | 05-Jul-2004 18:49 |
| BASIC Adventure and Strategy Game Design for The TRS-80 | Scythe Wielder | General Discussion | 4 | 25-Feb-2004 06:36 |
| Microsoft to own every GPU? | Cyborg | 3D Architectures & Chips | 26 | 14-Jul-2002 11:15 |
| S3 & PowerVR | Matt | General 3D Technology | 105 | 20-Mar-2002 17:51 |