If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#3501 |
|
hardware monkey
Join Date: Mar 2007
Posts: 3,900
|
Re: triangle setup rate
are we 100% sure Fermi sets up 4x tri/clk (assuming 4 GPCs) in all situations? I'm curious because this could finally be the answer to increased performance in MSFS. |
|
|
|
|
|
#3502 |
|
Regular
|
Meh, I'm no journalist ... I wouldn't even know who to ask for official statements on this, and even if I did I don't think "to satisfy my personal curiosity" will convince them to spend time on me. So I just threw the conspiracy theory out there and hoped someone else would find out whether this is just a public document error or if it runs deeper. Someone who does run a site perhaps. It would be fine and dandy too if some of the AMD/IMG lurkers on this board could just say something like "Microsoft told us about this, the public documents are just a mess" (nudge nudge).
|
|
|
|
|
|
#3503 |
|
Senior Member
|
Official docs say "up to" - as always. My question wrt exactly this hasn't been answered yet. Makes me wonder…
__________________
English is not my native tongue. Before flaming please consider the possiblity that I did not mean to say what you might have read from my posts. Work| RecreationWarning! This posting may contain unhealthy doses of gross humor, sarcastic remarks and exaggeration! |
|
|
|
|
|
#3504 |
|
Join Date: May 2002
Location: New York, NY
Posts: 12,678
|
My understanding of this is that they have four parallel units, but those units may at times be stalled waiting for the results of other units. They've put a lot of work into attempting to make sure that these four parallel geometry units are used as optimally as possible, but in reality we can't expect a 4x increase in hardly any aspect of geometry performance.
__________________
April 20, 1979 - America must never forget. |
|
|
|
|
|
#3505 |
|
Epsilon plus three
Join Date: Feb 2002
Location: Chania
Posts: 7,762
|
Of course are claimed values usually peak values, but I can't figure out at the moment (since I'm way too tired) why you couldn't process at least 2 Tris/clock to feed four raster units.
__________________
People are more violently opposed to fur than leather; because it's easier to harass rich ladies than motorcycle gangs. |
|
|
|
|
|
#3506 | |
|
Member
Join Date: Dec 2009
Posts: 581
|
Quote:
Add that to the not so huge texture improvements and possibly low clock speeds , and you get the picture of GF100 making about 80% BEST CASE of GTX285's performance , and hence an even lower advantage over HD5870 (possibly 20%) , However in tessellation it will trounce GTX 285 by a huge margin . Unless that is wrong , and there is 4 tri/clk in normal rendering . |
|
|
|
|
|
|
#3507 |
|
hardware monkey
Join Date: Mar 2007
Posts: 3,900
|
Hmm, perhaps I should rephrase my question then.
Can Fermi setup more than 1 non-tesselated tri/clk? edit: I have a feeling this discussion should be in the other thread. |
|
|
|
|
|
#3508 | |
|
Join Date: May 2002
Location: New York, NY
Posts: 12,678
|
Quote:
That said, other limitations, such as bandwidth or ability to parallelize non-tessellated triangles between the units, may prevent much performance improvement from having the additional geometry units.
__________________
April 20, 1979 - America must never forget. |
|
|
|
|
|
|
#3509 | |
|
Member
Join Date: Jan 2010
Posts: 114
|
Quote:
a) it was an honest omission and b) the omission could hinder its use and actually hurt GF100's potential edge? |
|
|
|
|
|
|
#3510 |
|
Member
Join Date: Dec 2009
Posts: 581
|
Hey guys , Could someone explain to me whether GF100 could output more than 1 tri/clk in non tessellated situations or not ? and why ?
This question is in the other thread too .. |
|
|
|
|
|
#3511 |
|
Tiled
Join Date: Oct 2003
Location: Kings Langley, UK
Posts: 2,675
|
For GF100, yes, but the aggregate rasterisation area is no bigger than prior hardware could rasterise in a clock.
__________________
A major redesign of the core ALU pineapple boomerang fortress. |
|
|
|
|
|
#3512 | |
|
Senior Member
Join Date: Feb 2002
Posts: 2,019
|
Quote:
Obviously more (u,v)'s per clock is better, but determining how much better is the tricky part. It's unclear if GF100 can achieve full rate without tessellation, but it should be easy to test with a custom app. The potential limitation is that there is a single index buffer per draw call. So they need to parallelize processing of the index buffer to make a single draw command run faster than 1x. This is non trivial. |
|
|
|
|
|
|
#3513 |
|
Tiled
Join Date: Oct 2003
Location: Kings Langley, UK
Posts: 2,675
|
It doesn't matter if it's tesselated by hardware or not, since you can draw tiny triangles all by yourself if you so wish.
__________________
A major redesign of the core ALU pineapple boomerang fortress. |
|
|
|
|
|
#3514 | ||
|
Regular
|
Quote:
This is what I basically said in the first post about this ... still as valid as then. Stop making me repeat myself ... are you people just tag teaming to make me dig myself in ever deeper or what? Quote:
|
||
|
|
|
|
|
#3515 |
|
Member
Join Date: Dec 2009
Posts: 581
|
Thanks Mr.Rys , but I have to wonder : you guys said that the reason why no body cared to double the number of Hardware Rasterizers is that you have to figure out what to do when triangles overlap , or share vertices .. how is that different in GF100 situation ? how did Nvidia overcome this seemingly difficult obstacle ?
|
|
|
|
|
|
#3516 | |
|
Join Date: May 2002
Location: New York, NY
Posts: 12,678
|
Quote:
http://www.anandtech.com/video/showdoc.aspx?i=3721&p=2 Though I must say that I was mistaken. The GF100 has 16 geometry units, not 4. So I think we can definitely expect faster geometry throughput all around. That said, the triangle setup is in the raster engine, of which there are four, so we should expect, in ideal conditions, that the GF100 can do 4 triangles/clock (I don't think the raster engine has the same out-of-order execution problems as the PolyMorph engine).
__________________
April 20, 1979 - America must never forget. |
|
|
|
|
|
|
#3517 |
|
hardware monkey
Join Date: Mar 2007
Posts: 3,900
|
|
|
|
|
|
|
#3518 |
|
Senior Member
|
|
|
|
|
|
|
#3519 |
|
Senior Member
|
It's currently 8 ppc/raster unit. If triangles are larger than 32 pix you don't necessarily benefit but only move the bottleneck to the rasters instead of the tri setup. Mainstream parts will be affected based on Nvidias choice of implementation, i.e. their number of GPCs.
__________________
English is not my native tongue. Before flaming please consider the possiblity that I did not mean to say what you might have read from my posts. Work| RecreationWarning! This posting may contain unhealthy doses of gross humor, sarcastic remarks and exaggeration! |
|
|
|
|
|
#3520 |
|
Member
Join Date: Jun 2008
Location: Copenhagen
Posts: 554
|
|
|
|
|
|
|
#3521 |
|
Tiled
Join Date: Oct 2003
Location: Kings Langley, UK
Posts: 2,675
|
We don't know yet, raster area may be variable (I'd expect so). If I'm honest, my biggest wonder in the last few days is whether there's really more than one unit at all. Parallelisable setup into one fixed-area rasteriser makes some sense, where it can work on up to four input triangles in a clock.
__________________
A major redesign of the core ALU pineapple boomerang fortress. |
|
|
|
|
|
#3522 | |
|
hardware monkey
Join Date: Mar 2007
Posts: 3,900
|
Quote:
Thanks! |
|
|
|
|
|
|
#3523 | |
|
Join Date: May 2002
Location: New York, NY
Posts: 12,678
|
Quote:
__________________
April 20, 1979 - America must never forget. |
|
|
|
|
|
|
#3524 |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,071
|
I interpreted that the aggregate raster output rate is 32 pixels per 1/2 shader clock. Depending on where the shader clocks go and what the L2/ROP domain is set to, the bottleneck could go between raster and ROP depending on the exact configuration and clocks of a given derivative.
__________________
Dreaming of a .065 micron etch-a-sketch. |
|
|
|
|
|
#3525 |
|
Regular
Join Date: Jun 2003
Posts: 6,160
|
|
|
|
|
![]() |
| Tags |
| delay, fermi, geforce, gf100 |
| Thread Tools | |
| Display Modes | |
|
|