If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#1 |
|
Unknown.
Join Date: Aug 2002
Location: UK
Posts: 4,883
|
Hey everyone,
Just thought I should start a new thread about this, since it might become a fairly big subject. Didn't see any yet. The NV30 is 1FP/TEX unit and 2FX units/pipe There two possible things nVidia could have done, since they kept their 4 pipes: - 1FP/TEX unit, 1 true FP unit and 1 FX unit/pipe - 2FP/TEX units and 1 FX unit, with the FP/TEX units only being able to do 1 independent fetch/clock instead of 2. - 1FP unit, 2 TEX units, 2 FX units with a ridiculous amount of cheating. My guess is actually number two. With the old configuration, there was some sharing between FP & Tex, but TEX could do 8/clock, so I guess it had quite a bit additional trannies too. So, with this, you wouldn't need as much additional trannies for the texturing, and the whole design thus becomes possible at 130M transistors with other overall optimizations. Any feedback, comments, ideas? Uttar Uttar |
|
|
|
|
|
#2 |
|
Senior Member
Join Date: Feb 2002
Location: Somewhere not *that* rotten in Denmark
Posts: 1,197
|
I'm thinking about the same thing, but I'm haven't seen any info or benchmark that gives any hint at what have been changed. Number two option does look promising, but right now I feel clueless. Sorry.
__________________
Best regards, LeStoffer |
|
|
|
|
|
#3 |
|
Regular
Join Date: Feb 2002
Posts: 5,951
|
Uttar,
Heh...could you clarify your definition of FP/TEX, FX and "True FP" units? |
|
|
|
|
|
#4 |
|
Unknown.
Join Date: Aug 2002
Location: UK
Posts: 4,883
|
Yeah, we do have very very little info about it ( still more than about the NV40, though, hehe! )
The most we have is: http://www.hardocp.com/article.html?art=NDcyLDEy It seems 100% obvious nVidia is capable of 2FP/clock, but got lower efficiency ( register usage performance hits remain I guess, although they might have been lowered, who knows ) in most cases. The cases where it wins would likely be when it benefits from its bigger native instruction set. This would slightly increase efficiency with FX too, because you could do 1FX/FP and 1TEX op in parallel, instead of always having to do 2TEX ops to get max efficiency. So now, the NV35 is a lot nearer to a 8x1 than the NV30, even though it's still practically a 4 pipelines architecture. Funny, eh? Uttar EDIT: For Joe: FP/TEX: unit who can do either FP or TEX ops, not in parallel. In the case of the NV30, you could do either 4FP ops to 8TEX ops. True FP: Unit who can do FP ops in 1 clock, no sharing with texturing. FX: Unit who can do FX ops in 1 clock |
|
|
|
|
|
#5 | |
|
Regular
Join Date: Feb 2002
Posts: 5,951
|
Quote:
|
|
|
|
|
|
|
#6 |
|
Unknown.
Join Date: Aug 2002
Location: UK
Posts: 4,883
|
Well, based on the NV30 pipeline threads, I think it was finally agreed that there was a unit which could do either 1FP op/clock/pipe or 2TEX ops/clock/pipe
Or at least, that's the practical POV. There's obviously some dedicated trannies for each type of operation, but much of it is probably shared. My idea is that with the NV35, it's 1FP op/clock/pipe or 1TEX op/clock/pipe for 2 FP/TEX units. The FX unit is obviously the integer unit, for INT12 operations. Uttar |
|
|
|
|
|
#7 |
|
Regular
Join Date: Feb 2002
Posts: 5,951
|
OK, I think we're on the same wavelength now.
Options 2 and 3 really seem like the only feasible possibilities to me. It might actually be somewhat of a combination of the two. I think the only way to really ascertain what's going on, is to have both the 5800 and 5900 side by side and run through several pixel shading tests...with several sets of drivers. |
|
|
|
|
|
#8 |
|
Off-season
Join Date: Feb 2002
Location: On the pursuit of happiness
Posts: 3,019
|
I think it's either the first or the second variant. But I tend to believe the first. That would mean either 8xFP or 8xTex + 4xFP per clock, which IMO best explains why the FX5900 is close to R9800Pro, but rarely surpasses it in 2.0 shaders although the FX is clocked higher.
__________________
Binary prefixes for bits and bytes |
|
|
|
|
|
#9 | |
|
Unknown.
Join Date: Aug 2002
Location: UK
Posts: 4,883
|
Quote:
There's two serious variants, and the third which is really much more of a paranoid dream. Uttar |
|
|
|
|
|
|
#10 |
|
Member
Join Date: May 2002
Location: Slovenia
Posts: 420
|
I actually got reply on this from NVidia 2 hours ago.
|
|
|
|
|
|
#11 | |
|
Regular
Join Date: Feb 2002
Posts: 5,951
|
Quote:
|
|
|
|
|
|
|
#12 | ||
|
Off-season
Join Date: Feb 2002
Location: On the pursuit of happiness
Posts: 3,019
|
Quote:
The second variant would be 8xFP or (4xTex + 4xFP) or 8xTex MDolenc, interesting information. If that's true it should be significantly faster than R300 in shaders that use few registers.
__________________
Binary prefixes for bits and bytes |
||
|
|
|
|
|
#13 |
|
Unknown.
Join Date: Aug 2002
Location: UK
Posts: 4,883
|
No it's not
The difference all lies in parallelism. It's easier to get parallelism with ( 4x FP or 4x TEX ) x 2 than with (4xFP or 8xTex) + 4xFP MDolenc: VERY interesting info! That would most definitively justify the "Force FP16" flag nVidia has got MS to put in a future revision of DX9! That most certainly explains the "12 ops/clock" number from the outdated PR docs I leaked a while back. Anyway, very nice info. I guess nVidia is gonna have a fair bit of trouble with the new FP16/FP32 switching though. I guess the hit comes when there's switching in the same pass. Funny performance hit, hehe. Uttar |
|
|
|
|
|
#14 | ||
|
Off-season
Join Date: Feb 2002
Location: On the pursuit of happiness
Posts: 3,019
|
Quote:
__________________
Binary prefixes for bits and bytes |
||
|
|
|
|
|
#15 | |
|
Off-season
Join Date: Feb 2002
Location: On the pursuit of happiness
Posts: 3,019
|
Quote:
__________________
Binary prefixes for bits and bytes |
|
|
|
|
|
|
#16 | |
|
Senior Member
Join Date: Feb 2002
Location: Somewhere not *that* rotten in Denmark
Posts: 1,197
|
Quote:
__________________
Best regards, LeStoffer |
|
|
|
|
|
|
#17 |
|
Regular
Join Date: Apr 2003
Location: Louvain-la-Neuve, Belgium
Posts: 523
|
Nice thread
I don't have any number that could help me talking without any (or not too much) doubt about NV35 pipeline organization. Actually my guess was that NVIDIA has kept the same pipeline as NV30 (including FX units) with one more unit per pipeline: a floating point one or a FP/tex one (or FP/adress processor). In regard with HOCP Shadermark results, it seems like there is another change to increase FP shader power. I thought that NVIDIA had doubled the number of without-performance-hit-usable registers. But MDolenc information makes sense too (but isn't it a too big change from NV31-34-30 ?). If it's true I think that it's a pretty nice design. This way, the NV35 has the same theoretical throughput that the Radeon 9800/9700 has in case of 2 texture lookups + 2 FP ops. The NV35 has an advantage when there's more FP ops than texture lookups but on the other side needs more optimised shader with less dependence. If it's true, the only drawback from NV30 would be the loss of the double FX multiplication power in fixed point units (5 multiplication FX ops per cycle possible). Everything else should be faster or a lot faster. One possible question is: are the new FP units able to do every operation? Maybe they can just do simple operations and only the FP/tex unit is able to do every complex operation? (it's just a question I'm asking me The FP16/32 question remains. If NVIDIA has kept the same register access organization, FP16 remains very gainful as it allows access with no performance drop to 4 registers instead of 2. Using FP16 and FP32 in the same pipeline could be a problem when dealing with registers usage optimisation. So it should be better to use only FP32 or only FP16. |
|
|
|
|
|
#18 | |||
|
Chief Spastic Baboon
Join Date: Jun 2002
Location: Location, Location with Kirstie Allsopp
Posts: 2,258
|
Quote:
Quote:
|
|||
|
|
|
|
|
#19 |
|
Senior Member
Join Date: Feb 2002
Location: CT
Posts: 2,024
|
Woah, I hadn't expected that until NV40. I had no idea the NV30 was that broken. Well, I did, but I dismissed the possibility too soon, it appears.
|
|
|
|
|
|
#20 | |
|
Regular
Join Date: Apr 2003
Location: Louvain-la-Neuve, Belgium
Posts: 523
|
Quote:
It's great if NV35 can work properly at full speed with the ARB2 path. |
|
|
|
|
|
|
#21 |
|
Senior Member
Join Date: Nov 2002
Location: Edmonton, Alberta, Canada
Posts: 1,765
|
Eck. . . definitely a case of driver optimization. If we go into "conspiracy theory mode", we can speculate that NVidia purposely broke ARB_fragment_program support so that hardware sites would have no choice at all but to use the NV30 path for the benchmark. . .
EDIT: misread a post. . . |
|
|
|
|
|
#22 |
|
Senior Member
Join Date: Aug 2002
Location: Miami, Fl
Posts: 1,036
|
But are those 12 fp units just the shader units, the number of shader units and texture units combined, or are they all capable of functioning as either?
__________________
"Friendship is unnecessary, like philosophy, like art... It has no survival value; rather it is one of those things that give value to survival." -C.S. Lewis |
|
|
|
|
|
#23 | |
|
Chief Spastic Baboon
Join Date: Jun 2002
Location: Location, Location with Kirstie Allsopp
Posts: 2,258
|
Quote:
MuFu. |
|
|
|
|
|
|
#24 |
|
Senior Member
Join Date: Feb 2002
Location: CT
Posts: 2,024
|
Any increase in actual floating point performance is a very good thing, as long as testing bears out that it is real...all the other performance issues were not nearly as significant as this and the impact on DX 9 moving forward. Thinking back to the GDC slides and what it proposes for the HLSL ps_2_a target, this evolution seems natural and according to nVidia's original plan for NV3x (hmm...and also in line with some speculation I had intended for the forums, but restricted to some PMs due to a disappearing thread).
I don't see nVidia blatantly lying about this, and it makes sense within the assumptions about the NV30 transistor count that I abandoned a while ago as unrealistic, and the good news is that Wavey has an NV35 to put through its paces. The bad news is that he won't have as much to tease us about with regards to surprises with the results until he finishes. Oh, wait, that's only bad news for him :P. |
|
|
|
|
|
#25 |
|
Senior Member
Join Date: Aug 2002
Location: Miami, Fl
Posts: 1,036
|
I hope he puts NV35 through various shader benchmarks, including the ones developed by our very own forum members.
__________________
"Friendship is unnecessary, like philosophy, like art... It has no survival value; rather it is one of those things that give value to survival." -C.S. Lewis |
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| NV35 might be misunderstood... | Luminescent | 3D Architectures & Chips | 52 | 17-Jun-2003 01:10 |
| Is the NV3x influenced by ILDP? | Arun | 3D Architectures & Chips | 10 | 12-Jun-2003 12:34 |
| NV31 closer to NV35 than NV30? - More pipeline mysteries. | boobs | 3D Architectures & Chips | 26 | 14-Mar-2003 02:38 |
| NV30 AND NV35 specs revealed? | Steve | 3D Architectures & Chips | 1 | 15-Jul-2002 15:57 |
| "leaked" NV30 & NV35 specs. | Nappe1 | 3D Architectures & Chips | 22 | 15-May-2002 18:23 |