If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#1 |
|
Junior Member
Join Date: Sep 2006
Posts: 15
|
When (briefly) scanning through rumours about the coming 8800-series from Nvidia (as well as R600) I have not yet read anything on granularity and tex:alu ratios.
For some applications (variaties of volume rendering in my case) we have noticed some interesting things lately: * Granularity and internal cache strategies are important. So important that in my case a 16-pipe X1800 beats a 24-pipe 7900 GTX by a *large* margin. * Multiple ALUs do not seem to be of help in my case. So X1800 is more or less the same as the new X1950 XTX, clock differences aside. So, any news about the coming HW on these aspects? I seem to remember ATI hinting that in the long run we should expect even more asymmetrical tex:alu ratios. |
|
|
|
|
|
#2 | |
|
Dangerously Mirthful
Join Date: Feb 2002
Location: Winfield, IN USA
Posts: 15,314
|
Quote:
__________________
Elite Bastards - Adminish “Be polite, be professional, but have a plan to kill everybody you meet.” - General James N. Mattis |
|
|
|
|
|
|
#3 |
|
Unknown.
Join Date: Aug 2002
Location: UK
Posts: 4,883
|
Out of curiosity, what kind of workload is that exactly? What are your ALU ops like (MUL/ADD, or something else?), and your TEX ops? (single channel? FP16? Trilinear? Anything?) - furthermore, is the problem the actual granularity, or is it the systematic costs of 2+ cycles (iirc) for any branching operation on G7x? Also, is it strictly impossible to offload the branching instructions to the VS on G7x-like architectures, since they're basically free there? (I'd assume not, but heh)
Anyway, this specific thing on G80 wasn't discussed much yet that I can see - so it seems like a good idea to do so here Uttar P.S.: It's worth noting that unless you're using "exotic" branching methods, in an unified architecture, the granularity will be the same for vertex or pixel shading in an unified architecture. At least, that's the case on Xenos (granularity of 32) - G965 proves that you can do things a bit differently by using scalar for the PS and Vec4 for the PS, thus dividing the VS's granularity by 4!
__________________
Focusing on non-graphics projects in 2013 (but I still love triangles) "[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions." |
|
|
|
|
|
#4 |
|
Junior Member
Join Date: Sep 2006
Posts: 15
|
The application in question is a form of volume rendering based on single-pass raycasting with on-the-fly gradient computations.
|
|
|
|
|
|
#5 |
|
Junior Member
Join Date: Sep 2006
Posts: 15
|
Uttar, apparently I was way to fuzzy when explaining my situation
What should I do to find the answers for your questions, short of posting the actual shader code? Some of the questions I can answer on my own, but I would be most helpful for any hints on how to properly "diagnose" my case. |
|
|
|
|
|
#6 |
|
Regular
|
|
|
|
|
|
|
#7 |
|
AndyTX
Join Date: May 2004
Location: British Columbia, Canada
Posts: 1,885
|
Doesn't surprise me - we've had similar to results with raytracing algorithms. When traversing hierarchical data structures, thread granularity matters a lot!
|
|
|
|
|
|
#8 |
|
Unknown.
Join Date: Aug 2002
Location: UK
Posts: 4,883
|
|
|
|
|
|
|
#9 |
|
Crazy coder
|
Most shaders that use dynamic branching will perform significantly better on X1x00 cards, unless the branching is trivial and extremely coherent. I've seen the same thing in some of my demos, like for instance the Selective Supersampling demo, where the branching provided a significant performance boost on ATI and very little on Nvidia (if it didn't drop even, can't remember).
|
|
|
|
|
|
#10 |
|
Junior Member
Join Date: Sep 2006
Posts: 15
|
So, interesting observations on current HW aside...
Any hints on where things are going for mainly Nvidia but also ATI? It seems natural that efficiency for branched shader code should only get better, especially in Nvidias case. But perhaps changes in architecture could have big costs in this area? What about the ratio between tex and alu units? I think I remember reading somewhere about 128 shader units and 32 tex for the highend 8800. Since I am not 100% updated on the inner workings I might have over-simplified things greatly... But if this is the case the texture units would only increase from 24 to 32? |
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|