I suppose if the TMU pipeline pool just has a queue on the front (which seems pretty likely anyway) then either the vertex or fragment shader pipelines could push a texture address and filter-type command into the queue.tEd said:that's where the hybrid vertex textures might fit in
k, thanks for that I don't have access to all the R600 info though, so I didn't really consider that as a certainty for it. Now that I think about it, I don't see how it would work better otherwise.Dave Baumann said:WRT to the texture address processing - their past does one thng and their future does (more or less the same); its likely their present would do the same as well.
Heh, I see - tbh, that's not really interesting though. The R3xx architecture is quite different from NV3x's, and has no register performance penalty that I'm aware of. By going from FP24 to FP32, you increase the cost of each register by 50%, and you potentially increase the pipeline length, and thus the number of stages required. I've got no data on how long typical single-cycle FP24/FP32 ALU pipelines are, but let's assume FP32 is twice the length to take a worst-case scenario, so the register "costs" would be 3x higher. That's hardly unmanagable.Uttar, you are reading too much into the "designed at NV30 time", it could mean a multitude of things, for instance it could mean they have paid particular attention to FP32 register performance....
I don't think you really understood what I was trying to measure - we don't know exactly what the 2 and 3 of the RV530 stand for, and as such, we don't know how bandwidth-intensive those would be. I'll agree that based on that, however, I shouldn't have posted R520's numbers.Hmm, well you're not taking account of fragment shader pipeline count, which is, frankly, pointless.
No, its just it performance characteristics are somewhat more predicatable.The R3xx architecture is quite different from NV3x's, and has no register performance penalty that I'm aware of.
33%.By going from FP24 to FP32, you increase the cost of each register by 50%
There no requirement to do anything of the sort.and you potentially increase the pipeline length, and thus the number of stages required
As far as I understand, the NV3x pipeline length was significantly greater than R3xx's to cope with texture latency; the R3xx, on the other hand, had decoupled texture operations (per-quad; in the R520, it would be a single pool).
Why would anyone want to store filtered values in a cache?Uttar said:and their texture caches can store filtered texels.
"Register combiner" is just the name NVidia used for those FX units (FX9 in NV10 to 28, FX12 in NV3x).Dave Baumann said:NV30's pipeline consided of one FP32 ALU and Tex address unti, two FX untis and a register combiner
So you're saying there IS a reigster limit on the R3xx/R4xx that halves performance before the API limit? If so, I'd love to have some tests of it - I do distincitively remember some ATI guys talking about the opposite on this very forum, but my memory could be at fault here.Dave Baumann said:No, its just it performance characteristics are somewhat more predicatable.
Good point; that's why I rarely post after 10PM33%.
Of course, but that hardly explains the register problems of the NV3x. Actually, now that I look back at my info, I cannot seem to explain it anymore... The NV4x has higher absolute quad engine length/latency (but lower than a traditional increase would be) and four times the number of quad engines.NV30's pipeline consided of [...]
It's a system in the NV4x, afaik, that allows NVIDIA to save registers and reduce the "TMU" latency. I'll be blunt and say I don't know all the details, but it does exist; in that case, it isn't used as cache, but rather as temporary storage.Why would anyone want to store filtered values in a cache?
Uttar said:So you're saying there IS a reigster limit on the R3xx/R4xx that halves performance before the API limit? If so, I'd love to have some tests of it - I do distincitively remember some ATI guys talking about the opposite on this very forum, but my memory could be at fault here.
But anyway, with the theoretical data above, what do you conclude?
ANova said:How long ago now did I claim it would be 16 pipes at 600-700/1400?
dzulkeply said:16-1-1-1 R520 -> 16 x 1 = 16 pipelines
16-1-3-1 R580 -> 16 x 3 = 48 pipelines
4-1-3-2 RV530 -> 4 x 3 = 12 pipelines
4-1-1-1 RV515 -> 4 x 1 = 4 pipelines
I think you're missing the point. You only need one Crossfire card to use two cards in a crossfire setup. Buy the normal card first and then buy the Crossfire card later.Sunday said:I find somehow nonsense to have special CrossFire card in R(V)5xx series! I mean, why would you buy nonCF card? Maybe right now you don’t want CF (‘cos of the lack of mobos, or the lack of extra $ that goes for CF capable model), but some day you’ll wish to have CF setup, and it would be very convenient to be able to use your existing card. Second DVI output shouldn’t’ be a problem with adequate adapter… In mine opinion each R(V)5xx card should be CF card, that is the only way to popularize CrossFire idea…
All architectures have a register performance penalty. The number of registers that must be used to hit a penalty will be different of course. Using too many registers in a shader program will make it more difficult for the hardware to hide memory latencies.Uttar said:The R3xx architecture is quite different from NV3x's, and has no register performance penalty that I'm aware of.
Confused? Do you mean 3x ALUs (vec3+scalar) per fragment pipeline? Making R580 a triple-issue architecture?Ailuros said:That it's sound more like 8 TMUs to me and the mysterious "3" might be rather for OPs than physical units.
Jawed said:Confused? Do you mean 3x ALUs (vec3+scalar) per fragment pipeline? Making R580 a triple-issue architecture?
Jawed