Since no one on nVnews knows...

K.I.L.E.R

Retarded moron
Veteran
TOMCAT from Rage3D:


quote:
--------------------------------------------------------------------------------
READ THAT :

http://www.tt-hardware.com/article.php?sid=4148
[ http://translate.google.com/transla...Flanguage_tools ]

OK it is french but it is incredible : and everybody should try to read it !

Well the GF FX need special software support because it can not do floatingpoints operations and Texturing at the same time : the hardware is not able to do both things at the same time like the R9700/R9800 can do.
--------------------------------------------------------------------------------



Is this true?
This is kind of a drawback if true.
 
DaveBaumann said:
http://www.beyond3d.com/forum/viewtopic.php?t=5150&postdays=0&postorder=asc&start=0

(Wouldn't surprise me if that thread was the original source)

Hi,

It's not very nice to say that. I wrote this article. I'm not a reader (but this could change ;) ) of your board which obviously seems to be the reference in 3D graphics. Like you, 2 of my readers asked me if my inspiration wasn't in fact B3D's one. Evildeus (which seems to be a regular member at B3D) states it too on another board. It's a little hurtful.

In fact, I have these details about this architecture since the middle of March. I wanted to keep them for the review but unfortunately it's difficult to have a NV30 from NVIDIA. The dev guy who helps me to find these details is pretty reliable but don't want to be known. Maybe the article should have lost a part of its credibility with the name of his company (it's not ATI…) and he's not authorised to help reviewers find details about graphic chip architecture and business.

I'm not really pleased to be accused of "inspiration theft" but on the other hand, I was really pleased to discover some nice threads on this board. I find some really nice details and Thepkrl results about registers bring some answers about the big questions I have on the GeFFX architecture: how can NVIDIA optimise the utilisation of many registers in a so deep pipeline? In regards to Thepkrl results, the answers could be simple: they're not.

( Sorry for my poor English ;) )

King Regards,

Damien Triolet
Editor / Developer
TT-hardware



PS : Another version of the architecture draw : http://www.tt-hardware.com/img/divers01/geffx_5800b.gif . This one says exactly the same but is a little more complex. I have not used that one in the article because many readers would not have understood it very well and because it's more influenced by my interpretation.
 
I'm not really pleased to be accused of "inspiration theft" but on the other hand, I was really pleased to discover some nice threads on this board.

Well, I didn't accuse, I just said "it wouldn't surprise me..." since I'm rather jaded to it happening rather frequently, of late.

If you've discovered this from you own testing, then good - we'd welcome you to participate more here.

Cheers. :)
 
Hey Tridam,

Since I *did* accuse you ( by saying "Most of it probably comes from B3D" at www.notforidiots.com/GPURW.php ) , I guess I'd have to apologize...
Sorry for supposing things without proof like that.

Anyway, if your source is credible - VERY interesting info for the 5200. There wasn't any analysis of the 5200 anywhere, or anyway nothing reliable. This figure certainly makes sense.

BTW, a little correction: I don't think the FP unit natively support INT12. I think they simply use FP16 ( which also got 10-bit mantissa ) to emulate INT12. So it actually got higher precision than requested sometimes, but nobody is ever gonna complain about that.

Seen in your article ( which I read in the original language, BTW, since my primary language is French too ) that you couldn't comment on the NV35, too. Oh, come on, I won't say it's you that told it ;)
Oh well, this trick doesn't even seem to work anymore...

Although it does seem rather obvious the NV35 is the following:
8 pipes ( instead of 4 )
1 FP unit per pipe, which can also do 8 independent texture fetches or 8 dependent texture fetches ( I'd be surprised if it could do 16 independent texture fetches, but who knows... )
1 FX12 unit per pipe

Giving a 8FP/8FX design, compared to a 4FP/8FX design. I'd guess the Vertex Shader part of the NV30, though, is mostly unchanged ( a few optimizations there and there are likely, though )

Ohhh, that reminds me I'll have to post it on GPU:RW. It's just speculation, but darn, the news is so slow those days... And I can't afford to let it die! :)


Uttar
 
I'll try to find time to participe more here :)

DaveB and Uttar : there is no problem. Actually, I get easily upset but not for a long time :D


Yeah, the analysis of the 5200 is interesting. In fact, in the first version of the article, I haven't talked about it because I was not sure about the one full pipeline structure. After I published the article, I had many phone calls from different manufacturers who wanted to have or give precision. One of them clearly shows me the one pipeline structure. As NVIDIA has not denied it, I added the GeForce FX 5200 analysis. I'm still thinking about the possibility of 1 or 2 FX12 unit in another pipeline, but I'm not sure.

The FP unit emulates FX12 but the extra precision if it exists comes from the extra bits of the FP32 mantissa. But the number which goes out of the FP unit in case of FX12 calculation is a FX12 number. The extra precision is just available inside the FP unit. This extra precision in calculation exists on FP16 too. The unit is FP32 and every calculation is a FP32 one. In FP16, the first bits of exponent are "0" and the last bits of mantissa are "0". FP16 is faster because of lower memory requirement and more registers available.


About NV35, I can't talk too much :( I promised to keep for me the information NVIDIA told me ;)
But I don't know everything. For example, I don't know if the FP unit and the texturing units will be able to work at the same time. I hope this will be the case. But I think that strategically, it could be not interesting for NVIDIA to change that. So I don't know what they'll do with the FP unit.

8 pipelines or not ? that is the question ! :D Think about it and you should find the answer ;)
 
Tridam,

Due to recent speculation on NV35, a certain website has rumored it will sport ~130 million transitors (~5 million transistors more than NV30). How is it that 8 full pipelines/fp fragment units could be implemented in such a low amount of transistors? If NV30 only sports 4 fp pipelines, then NV35:
1. Has a greater transistor count than rumored
2 Fixes major bugs present in the NV30, prevented it from using all its transistors effectively.
3. It does not implement 8 full fp32 fragement shader pipelines and gains performance from other enhancements.

Which of these do you think is the case? (Anbody?)
 
How about if they only have one fixed point 12 bit unit per pipe, rather than the two in the NV30?

This would mean that they have the same fixed point performance (still 8 fixed point units), but they now have twice the floating point performance (8 rather than 4).
 
McElvis said:
How about if they only have one fixed point 12 bit unit per pipe, rather than the two in the NV30?

This would mean that they have the same fixed point performance (still 8 fixed point units), but they now have twice the floating point performance (8 rather than 4).

That's precisely what I'm suggesting, actually :)
The current NV30 is:

4FP or 8TEX
8FX

The NV35 probably is:
8FP or 8TEX
8FX

Luminescent: The problem is that the NV30 is:
A) Buggy. Some transistors are wasted.
B) Unoptimized ( in the idea of using less transistors to do the same thing ) - Even though there were a lot of delays, it's still nowhere near as optimized as our good ole NV25.

Remember the NV20->NV25 jump? They added a VS, made LMA more efficient, and they barely increased transistor count because they optimized the whole design a lot more. You can expect something similar with the NV30->NV35 jump.
Also, as I said above, the NV35 will not sport more per-clock FX power.

And the NV35 is really 8 pipelines. With a 256-bit bus, nVidia would be moronic to have 4 pipelines. A *lot* of reliable people have already confirmed this :)

The only things we still need to know about the NV35 are:
1. Heat / Noise?
2. Are there changes beside the 8 pipelines? For example, Triangle/s changes? More efficient branching? ( The NV30's branching is slower than the R300, compared to a no-branching case! :( ) And what are the minor new features rumors have been talking about? ( the things Carmack was asking them for, according to the same rumors )
3. Is register performance improved? And what about FP/Tex sharing?
3. What's the final frequency of the different models?

Of course, confirming / denying some of the things I say can always be fun - I've been taking unreliable rumors as possible more and more often those days, because there are way too little of them... So I've got to compensate for it in a way or another.


Uttar
 
In fact, NVIDIA need the keep the same pipeline structure for the whole geforce fx family. For a simple reason: the same code will run optimally on every GeForce FX: 5200, 5600, 5800 and NV35. So they have to keep 2 FX12 units in every pipeline.

Are you sure that sources claiming 8 pipelines for NV35 are reliable ?


NV30 -> NV35 jump will be similar to NV20 -> NV25.

I think that register performance will be improved.

In fact, I think that in NV30, the FP unit and the address unit share the same transistors and that's why they can't do FP calculation and texturing at the same time. I hope they'll split them up. The pipeline would just be deeper and the other-geffx-styled code would run on it without any performance drop. This small change could improve FP shader performance with a factor 2!
 
Tridam said:
In fact, NVIDIA need the keep the same pipeline structure for the whole geforce fx family. For a simple reason: the same code will run optimally on every GeForce FX: 5200, 5600, 5800 and NV35. So they have to keep 2 FX12 units in every pipeline.

Not really. All they need to do is keep some Integer support. Heck, they didn't even publicly disclose the "2 FX units for 1 FP unit" thing. Few developers probably know about it. And if something runs optimally on 2 FX 1 FP, it'll also run optimally on 1 FP 1 FX - you'll just waste some FP power to do integer stuff more often. And considering how much you can gain in cases where there's not so much integer, it's truly worth it.

Are you sure that sources claiming 8 pipelines for NV35 are reliable ?

I'm pretty sure they're reliable, yes. The idea is that the NV30 is a "real 8 pipeline architecture" - that is, some people could consider a type of 4 pipeline architecture 8 pipeline. So, the sources are reliable, but they could have confused some info or seen things in a way we don't.
But anyway, not having 8 pipelines with a 256-bit memory bus is just plain dumb. And I mean it. With a 128-bit memory bus, a 8 pipeline architecture is dumb. But with a 256 bit one...


I think that register performance will be improved.
Let's hope so :) But just how much? Because the register situation is quite dramatic, really... Even more so considering MS insists that registers got to be FP32.

In fact, I think that in NV30, the FP unit and the address unit share the same transistors and that's why they can't do FP calculation and texturing at the same time. I hope they'll split them up. The pipeline would just be deeper and the other-geffx-styled code would run on it without any performance drop. This small change could improve FP shader performance with a factor 2!

It couldn't increase shader performance in a real-world situation by a factor of 2, though. More like 1.5x. Although, considering Wavey's "confirmation", I'd guess it's already confirmed it's part of the equation :)

And even if all we had was that and increased register performance, it wouldn't be enough to beat the R350. And then there's the R390 coming...

There are a lot of possible configurations for the NV35 - the question is how much they've been able to fit in 130M transistors. Something like 8 FP units, 8 FX units, and 8 decoupled TEX units could be amazing - but in 130M transistors? Unlikely.


Remember too that nVidia wouldn't be able to do marketing related on a true 8 pipeline architecture, because they still claim the NV30 is 8 pipeline - and they can't contradict themselves! :)

So, even if you actually recieved some information from nVidia, it wouldn't surprise me if that information really didn't tell everything.


Uttar
 
Uttar said:
Tridam said:
In fact, NVIDIA need the keep the same pipeline structure for the whole geforce fx family. For a simple reason: the same code will run optimally on every GeForce FX: 5200, 5600, 5800 and NV35. So they have to keep 2 FX12 units in every pipeline.

Not really. All they need to do is keep some Integer support. Heck, they didn't even publicly disclose the "2 FX units for 1 FP unit" thing. Few developers probably know about it. And if something runs optimally on 2 FX 1 FP, it'll also run optimally on 1 FP 1 FX - you'll just waste some FP power to do integer stuff more often. And considering how much you can gain in cases where there's not so much integer, it's truly worth it.

Yes, 1 FX unit and 1 FP unit could be fine but you must have 8 pipelines in this case.

Uttar said:
Are you sure that sources claiming 8 pipelines for NV35 are reliable ?

I'm pretty sure they're reliable, yes. The idea is that the NV30 is a "real 8 pipeline architecture" - that is, some people could consider a type of 4 pipeline architecture 8 pipeline. So, the sources are reliable, but they could have confused some info or seen things in a way we don't.
But anyway, not having 8 pipelines with a 256-bit memory bus is just plain dumb. And I mean it. With a 128-bit memory bus, a 8 pipeline architecture is dumb. But with a 256 bit one...

You don't need to have 8 pipelines with à 256-bit memory bus. Improving texture samples par cycle could do the same and FSAA needs 256 bits bus for high performance. Actually, the GeForce FX 5800 is pretty well balanced in regard with its memory bandwidth. If you improved the throughput of its pipelines, you need more bandwidth.


Uttar said:
I think that register performance will be improved.
Let's hope so :) But just how much? Because the register situation is quite dramatic, really... Even more so considering MS insists that registers got to be FP32.
Some acces to registers could be broken in NV30. Maybe NVIDIA could correct this or expand some internal datapaths to register ?

Uttar said:
In fact, I think that in NV30, the FP unit and the address unit share the same transistors and that's why they can't do FP calculation and texturing at the same time. I hope they'll split them up. The pipeline would just be deeper and the other-geffx-styled code would run on it without any performance drop. This small change could improve FP shader performance with a factor 2!

It couldn't increase shader performance in a real-world situation by a factor of 2, though. More like 1.5x. Although, considering Wavey's "confirmation", I'd guess it's already confirmed it's part of the equation :)

Of course, 2 is the maximum theorical.

Uttar said:
And even if all we had was that and increased register performance, it wouldn't be enough to beat the R350. And then there's the R390 coming...

There are a lot of possible configurations for the NV35 - the question is how much they've been able to fit in 130M transistors. Something like 8 FP units, 8 FX units, and 8 decoupled TEX units could be amazing - but in 130M transistors? Unlikely.


Remember too that nVidia wouldn't be able to do marketing related on a true 8 pipeline architecture, because they still claim the NV30 is 8 pipeline - and they can't contradict themselves! :)

So, even if you actually recieved some information from nVidia, it wouldn't surprise me if that information really didn't tell everything.


Uttar

Unfortunately, I can't comment these information but I'm pretty sure they're fully reliable.
 
Going to do a french & english version, since I know both and it'll simplify Tridam's task of understanding

Tridam said:
You don't need to have 8 pipelines with à 256-bit memory bus. Improving texture samples par cycle could do the same and FSAA needs 256 bits bus for high performance. Actually, the GeForce FX 5800 is pretty well balanced in regard with its memory bandwidth. If you improved the throughput of its pipelines, you need more bandwidth.

French: Bien sûr qu'il ne faut pas avoir 8 pipelines avec un bus 256-bit. Mais c'est mieux, et on en tire beaucoup plus de profit vu que ca donne de meilleurs performances - la compression Z & Color risquent de faire que dans certains cas, ce sera tout de même le fillrate qui ralenti le tout, tellement la NV35 a de bandwidth.

On pourrait garder 4 pipelines. Mais alors, il faudrait plus de performance Floating Point - et comment est-ce qu'on l'obtiendrait, alors? Même en divisant texture et floating point, c'est toujours beaucoup plus lent que sur la R3xx!
Je pense tout de même que 8 pipelines avec 1 FP, 1 FX et une unitée de Texturing non dépendante, ca tiendrait beaucoup plus debout. nVidia ne l'avouerait jamais, de toute manière... On ne le saura probablement même pas à L'E3.

English:Of course, you don't *need* to have a 256-bit bus with 8 pipelines. But it's better, and you can squeeze more performance out of it that way - Z & Color compression could even make it that fillrate is the bottleneck in some cases, that is for some pixels, because the NV35 got so much bandwidth!

You could keep 4 pipelines. Bu then you'd need more FP performance - and from where? You could divide texture & floating point, but that wouldn't be sufficent.
I really think 8 pipelines with 1 FP, 1 FX & 1 non dependant Texturing unit makes more sense. Of course, nVidia wouldn't give you their silicon anyway, so we'll have to test it for ourselves - we probably won't even know it at the NV35's launch during E3!


Uttar
 
Uttar said:
Going to do a french & english version, since I know both and it'll simplify Tridam's task of understanding

Tridam said:
You don't need to have 8 pipelines with à 256-bit memory bus. Improving texture samples par cycle could do the same and FSAA needs 256 bits bus for high performance. Actually, the GeForce FX 5800 is pretty well balanced in regard with its memory bandwidth. If you improved the throughput of its pipelines, you need more bandwidth.

French: Bien sûr qu'il ne faut pas avoir 8 pipelines avec un bus 256-bit. Mais c'est mieux, et on en tire beaucoup plus de profit vu que ca donne de meilleurs performances - la compression Z & Color risquent de faire que dans certains cas, ce sera tout de même le fillrate qui ralenti le tout, tellement la NV35 a de bandwidth.

On pourrait garder 4 pipelines. Mais alors, il faudrait plus de performance Floating Point - et comment est-ce qu'on l'obtiendrait, alors? Même en divisant texture et floating point, c'est toujours beaucoup plus lent que sur la R3xx!
Je pense tout de même que 8 pipelines avec 1 FP, 1 FX et une unitée de Texturing non dépendante, ca tiendrait beaucoup plus debout. nVidia ne l'avouerait jamais, de toute manière... On ne le saura probablement même pas à L'E3.

English:Of course, you don't *need* to have a 256-bit bus with 8 pipelines. But it's better, and you can squeeze more performance out of it that way - Z & Color compression could even make it that fillrate is the bottleneck in some cases, that is for some pixels, because the NV35 got so much bandwidth!

You could keep 4 pipelines. Bu then you'd need more FP performance - and from where? You could divide texture & floating point, but that wouldn't be sufficent.
I really think 8 pipelines with 1 FP, 1 FX & 1 non dependant Texturing unit makes more sense. Of course, nVidia wouldn't give you their silicon anyway, so we'll have to test it for ourselves - we probably won't even know it at the NV35's launch during E3!


Uttar

lol

No problem with english reading. Writting is more difficult. But it's fine... it's a good training :D

I can't really talk about it :( The dark side of nda... :cry:

I think that the NV35 you describe could be a good chip but that a NV35 with 4 pipes could also be good. Of course, the R3x0 would remain more powerful in FP. I personnaly think that NVIDIA won't show FP power as the main capacity of the NV35. They'll still ask developers to use FX12 because it's necessary for the GeForce FX line and they need that games run well on every GeForce FX not just on the NV35.

One exemple : NVIDIA could juste add an adress processor and keep the "old" FP/adress unit. With them they could do 1 FP and 1 non-paired texture, 1 FP and 2 paired textures or 2 non-paired textures. (I use "non/paired" because I think it's more general that "non/dependant")
 
I'll just point out that there'd be less repetition of discussion after reading the aforementioned original thread and the also aforementioned continuation, and then contrasting with what was already stated.

I point this out not because I'm criticizing the usefulness of the discussion, but because the original topic of this thread was served and the discussion fits in with the discussion in those other threads rather closely.
 
Back
Top