Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 23-Oct-2006, 07:15   #1
freka586
Junior Member
 
Join Date: Sep 2006
Posts: 15
Default 8800-series, granularity and tex:alu ratios

When (briefly) scanning through rumours about the coming 8800-series from Nvidia (as well as R600) I have not yet read anything on granularity and tex:alu ratios.

For some applications (variaties of volume rendering in my case) we have noticed some interesting things lately:
* Granularity and internal cache strategies are important. So important that in my case a 16-pipe X1800 beats a 24-pipe 7900 GTX by a *large* margin.
* Multiple ALUs do not seem to be of help in my case. So X1800 is more or less the same as the new X1950 XTX, clock differences aside.

So, any news about the coming HW on these aspects?
I seem to remember ATI hinting that in the long run we should expect even more asymmetrical tex:alu ratios.
freka586 is offline   Reply With Quote
Old 23-Oct-2006, 07:17   #2
digitalwanderer
Dangerously Mirthful
 
Join Date: Feb 2002
Location: Winfield, IN USA
Posts: 15,314
Default

Quote:
Originally Posted by freka586 View Post
Granularity and internal cache strategies are important. So important that in my case a 16-pipe X1800 beats a 24-pipe 7900 GTX by a *large* margin.
What application causes that to happen?!?
digitalwanderer is offline   Reply With Quote
Old 23-Oct-2006, 07:48   #3
Arun
Unknown.
 
Join Date: Aug 2002
Location: UK
Posts: 4,883
Default

Out of curiosity, what kind of workload is that exactly? What are your ALU ops like (MUL/ADD, or something else?), and your TEX ops? (single channel? FP16? Trilinear? Anything?) - furthermore, is the problem the actual granularity, or is it the systematic costs of 2+ cycles (iirc) for any branching operation on G7x? Also, is it strictly impossible to offload the branching instructions to the VS on G7x-like architectures, since they're basically free there? (I'd assume not, but heh)

Anyway, this specific thing on G80 wasn't discussed much yet that I can see - so it seems like a good idea to do so here


Uttar
P.S.: It's worth noting that unless you're using "exotic" branching methods, in an unified architecture, the granularity will be the same for vertex or pixel shading in an unified architecture. At least, that's the case on Xenos (granularity of 32) - G965 proves that you can do things a bit differently by using scalar for the PS and Vec4 for the PS, thus dividing the VS's granularity by 4!
__________________
Focusing on non-graphics projects in 2013 (but I still love triangles)
"[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions."
Arun is offline   Reply With Quote
Old 23-Oct-2006, 08:04   #4
freka586
Junior Member
 
Join Date: Sep 2006
Posts: 15
Default

The application in question is a form of volume rendering based on single-pass raycasting with on-the-fly gradient computations.
freka586 is offline   Reply With Quote
Old 23-Oct-2006, 08:29   #5
freka586
Junior Member
 
Join Date: Sep 2006
Posts: 15
Default

Uttar, apparently I was way to fuzzy when explaining my situation
What should I do to find the answers for your questions, short of posting the actual shader code?
Some of the questions I can answer on my own, but I would be most helpful for any hints on how to properly "diagnose" my case.
freka586 is offline   Reply With Quote
Old 23-Oct-2006, 13:43   #6
Jawed
Regular
 
Join Date: Oct 2004
Location: London
Posts: 9,867
Send a message via Skype™ to Jawed
Default

Quote:
Originally Posted by Uttar View Post
At least, that's the case on Xenos (granularity of 32)
64 in Xenos - 4 clocks of 16.

Jawed
Jawed is offline   Reply With Quote
Old 23-Oct-2006, 14:53   #7
Andrew Lauritzen
AndyTX
 
Join Date: May 2004
Location: British Columbia, Canada
Posts: 1,885
Default

Quote:
Originally Posted by freka586 View Post
The application in question is a form of volume rendering based on single-pass raycasting with on-the-fly gradient computations.
Doesn't surprise me - we've had similar to results with raytracing algorithms. When traversing hierarchical data structures, thread granularity matters a lot!
Andrew Lauritzen is offline   Reply With Quote
Old 23-Oct-2006, 18:20   #8
Arun
Unknown.
 
Join Date: Aug 2002
Location: UK
Posts: 4,883
Default

Quote:
Originally Posted by Jawed View Post
64 in Xenos - 4 clocks of 16.
Oopsy, that'll teach me to write that kind of stuff before I think twice

Uttar
Arun is offline   Reply With Quote
Old 24-Oct-2006, 00:37   #9
Humus
Crazy coder
 
Join Date: Feb 2002
Location: Stockholm, Sweden
Posts: 3,216
Send a message via ICQ to Humus Send a message via MSN to Humus
Default

Quote:
Originally Posted by digitalwanderer View Post
What application causes that to happen?!?
Most shaders that use dynamic branching will perform significantly better on X1x00 cards, unless the branching is trivial and extremely coherent. I've seen the same thing in some of my demos, like for instance the Selective Supersampling demo, where the branching provided a significant performance boost on ATI and very little on Nvidia (if it didn't drop even, can't remember).
__________________
[ Visit my site ]
I speak for myself and only myself.
Humus is offline   Reply With Quote
Old 24-Oct-2006, 10:06   #10
freka586
Junior Member
 
Join Date: Sep 2006
Posts: 15
Default

So, interesting observations on current HW aside...
Any hints on where things are going for mainly Nvidia but also ATI?

It seems natural that efficiency for branched shader code should only get better, especially in Nvidias case. But perhaps changes in architecture could have big costs in this area?

What about the ratio between tex and alu units?
I think I remember reading somewhere about 128 shader units and 32 tex for the highend 8800.
Since I am not 100% updated on the inner workings I might have over-simplified things greatly...
But if this is the case the texture units would only increase from 24 to 32?
freka586 is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 17:19.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.