Xmas said:Some in-depth views of the CineFX architecture can be found over at 3DCenter. I think this might be interesting for some of you. Currently only a German version is available but an English translation is coming soon (maybe tomorrow... maybe )
mboeller said:No;
fundamentally it's an 1 Pipeline-design, able to output 4 Pixel/cycle due to the SIMD-architecture.
Uttar said:After reading the whole article, regarding the NV40:
The NV40 is considered as a "8 pipelines design", so the first question is:
8x1 pixels, 2x4 pixels ( 2 quads ) or 1x8 pixels.
Uttar said:Looking at the goal of the NV50 being a full ILDP, you may here suppose more things will slowly move towards that ideal. 2 quads would make sense here, with the functional units of both quads sharing their stuff, and working with a few instructions of difference to have more possible parallelism.
So for example if the RSQ units of one pipeline are idle, and that you got to do two RSQ in a row in one, you'd have no performance penalty - or you could have slightly different units in both paths and use the first path's units for all RSQ work.
Uttar said:Evantually, since it seems the the ADD and MUL units are not really united and got cache between them, you could even do 2 ADDs and 2 MULs in parallel.
Uttar said:Or you could do that in a completely different way, going away from the CineFX architecture, and that might infact be much more sensible, because this approach seems fairly icky.
mboeller said:No;
fundamentally it's an 1 Pipeline-design, able to output 4 Pixel/cycle due to the SIMD-architecture.
------------ ------------
| | | |
| |-| Tex |
| FP32/ | | |
| Tex | ------------
| addr | ------------
| | | |
| |-| Tex | \ / |
| | | | \/ | |
------------ ------------ /\ |-|-
| / \ |
-------------------------
| |
| FX12 |
| |
-------------------------
|
-------------------------
| |
| FX12 |
| |
-------------------------
| | | |
---------------------------------------------------------
| |
| Reg Combiner |
| |
---------------------------------------------------------
DaveBaumann said:mboeller said:No;
fundamentally it's an 1 Pipeline-design, able to output 4 Pixel/cycle due to the SIMD-architecture.
I've not read the article yet, but what is this being based on?
On a nVidia patent describing a "programmable architecture", dated December 2002, where each unit works on 4 pixels at once.
So while the overall idea makes sense and is nearly certainly right, it's hard to make sure of the details since it seems obvious the drivers are not exposing everything...
DaveBaumann said:On a nVidia patent describing a "programmable architecture", dated December 2002, where each unit works on 4 pixels at once.
Is that the file date or issue date?
So while the overall idea makes sense and is nearly certainly right, it's hard to make sure of the details since it seems obvious the drivers are not exposing everything...
Its not a question of the drivers not exposing everything for the shaders, is a question of having a compiler/optimiser tuned to the architecture, something which they haven't got right yet (and may not wholly before NV40, which will be different).
BTW (not sure how far I can go here, since some it might be NDA) when asking about some of the fundamental difference between NV30 and R300's architectures the description we came up with was "thin and deep" for NV30 and "shallow and wide" for R300, which is accurate enough - NV40 will be more along the lines of the "shallow and wide" approach.
DaveBaumann said:On a nVidia patent describing a "programmable architecture", dated December 2002, where each unit works on 4 pixels at once.
Is that the file date or issue date?
DaveBaumann said:BTW (not sure how far I can go here, since some it might be NDA) when asking about some of the fundamental difference between NV30 and R300's architectures the description we came up with was "thin and deep" for NV30 and "shallow and wide" for R300, which is accurate enough - NV40 will be more along the lines of the "shallow and wide" approach.
DaveBaumann said:Personally I'd expect patents that pertained directly to NV30 to have been filed in the 1999-2001 period, but I guess they could be late (why they would be I don't know).
|->-Shadercore (FP32)-<-|
| | | |------->--|
| TMU Bypass
| | | |
| Shader Back End <---< Registerfile
| | |
| Combiner (FX12) |
| | |
--<-Combiner (FX12) ---->---->
-->--Shadercore (2*FP32) <----> Registerfile
| | |
| TMU
| | |
--<--------
Demirug said:I believe NV40 will lock more like this:
Code:-->--Shadercore (2*FP32) <----> Registerfile | | | | TMU | | | --<--------
Uttar said:BTW, you're making a mistake talking of TMUs here I think: The correct term should be a texture lookup unit.
And either they changed things last minute, or the texture lookup unit ain't in the pipeline - it's outside, so what you got are units calling the TMUs. The idea with that was to get texturing in the VS too.
So you'd have more like:
-->--Shadercore (2*FP32) <----> Registerfile
| | |
lookup <----->
| | |
--<--------
Uttar said:Also talking about the register file, it seems to me the NV3x got a performance penalty from the beggining - by that, I mean even if you got 4 registers in the first 100 instructions, then for the last 50 you use 16, you'll have the penalty of 16 registers for all the 150 instructions. That seems like a potential optimization to me in the NV4x, with the obvious idea of increasing the size of the register file ( I think it was doubled in the NV35, but since the number of FP32 units were doubled too, that didn't have much impact I guess )
Uttar
DaveBaumann said:BTW (not sure how far I can go here, since some it might be NDA) when asking about some of the fundamental difference between NV30 and R300's architectures the description we came up with was "thin and deep" for NV30 and "shallow and wide" for R300, which is accurate enough - NV40 will be more along the lines of the "shallow and wide" approach.
DaveBaumann said:FYI, According to John Spitzer the FP units that replaced the FX12 units in NV35 are only capable of arihmetic ops such as MUL, ADD, SUB, DP3, DP4.
LeStoffer said:So the rest is still handled by the 'combined' FP/FP texture unit we know from NV30?
(Sorry to ask, I have been neglecting beyond3d this summer! )
DaveBaumann said:Why? smack