With 8TU/Cx10C=>80TU, the options would be 80TA/80TF or 80TA/160TF, not 40TA/80TF - shirly... 4/8/16 address/clk?Is that a return to G80 style TMUs? 1 TA and 2 TFs?
Jawed
With 8TU/Cx10C=>80TU, the options would be 80TA/80TF or 80TA/160TF, not 40TA/80TF - shirly... 4/8/16 address/clk?Is that a return to G80 style TMUs? 1 TA and 2 TFs?
Jawed
Free trilinear?There is a rumor, that jump to GT200 is like the one from TNT to GeForce.
G92 64TA/64TF bugs me. GT200 with lower TA rate doesn't gel. Just too greedy, I guess. Clock domains could be the solution.Why not? G80 was 32TA/64TF so 40/80 isn't out of the question.
I've said this again and again: my understanding is that the TA and TF units are actually one and the same, and the exact same ALUs are used for both tasks. Right now they've got enough units for full-speed trilinear but not full-speed bilinear (which requires as many units being used for TF but more for TA) which explains their lower bilinear rates; in FP32 mode, other units are used for TF (which have shut down most of the time to save power) so there are enough ALUs for TA.How closely are TA and TF in G80-style texture units interconnected? If that's at all possible - maybe they've moved the TA into the shader-clock domain and going back to a 1:2 ratio. That'd hurt bi-performance not that much, if at all. Plus, you'd get data pretty fast into the ALUs in CUDA.
Again, I've no idea, what i am talking about.
A cut-down GT200? Well, it can't have half the units, since its dual setup would be slower than GT200. It could have 3/4 the units + 384bit bus maybe? But then in would perhaps make more sense to use faulty GT200's and disable 1/4 units and two memory channels. Still I think that the power requirements of such hypothetical card would make its existence impossible.
It's much more easier to make a dual-chip card if the memory bus width is only 256 bits, that not being the case with anything that could potentially be faster than GT200 and still be based on the same, GDDR5-unfriendly architecture.
Still, with G92, nVidia ended up with an awkward sandwich that is expensive to make and its cooling systems is noisy and difficult to replace.
I think the next generation high-end won't be here at least until a year from now. Basically the same as with G80.
Uhuh.Is that a return to G80 style TMUs? 1 TA and 2 TFs?
But even G92 is limited by memory bandwidth. On the GX2, it's especially noticeable in high resolutions (1920x1200 and above) with AA. In 2560x1600 with AA enabled, GX2 is even slower than a single G80 GTX/Ultra.I was thinking of something with 3/4th the units and 4 ROP partitions.
In that case, it would be technically possible. (Although I still belive that GT200 will be superseded by a completely new architecture, not a GX2.)Even if GT200 doesn't support GDDR5, it wouldn't take theoretically any significant resources to lay out a future chip for GDDR5. For such a theoretical thing as above even 1.8GHz GDDR5 on a 4*64bit MC sounds sufficient to me.
Well it seems we're back to G80 style "ditch stuff over the side, the die's too big" mode...Uhuh.
I thought that theory was debunked?Well it seems we're back to G80 style "ditch stuff over the side, the die's too big" mode...
But even G92 is limited by memory bandwidth. On the GX2, it's especially noticeable in high resolutions (1920x1200 and above) with AA. In 2560x1600 with AA enabled, GX2 is even slower than a single G80 GTX/Ultra.
...most likely running out of available VRAM most of the time.In 2560x1600 with AA enabled, GX2 is [...]
Very interesting read, thanks Arun!I've said this again and again: my understanding is that the TA and TF units are actually one and the same, and the exact same ALUs are used for both tasks. Right now they've got enough units for full-speed trilinear but not full-speed bilinear (which requires as many units being used for TF but more for TA) which explains their lower bilinear rates; in FP32 mode, other units are used for TF (which have shut down most of the time to save power) so there are enough ALUs for TA.
Sooner or later, FP32 TF will move to the shader core to get rid of that mostly idling silicon. Also, it may be desirable to increase the number of units in the shared TA-TF unit so that bilinear is full rate but some are idling under trilinear; after all, this is still more efficient than a traditional design with discrete TA and TF units, and is desirable both in the ultra-high-end with very high resolutions (->more bilinear) and in the low-end where there is no AF and texture settings are lower. In the mid-range, it might be less desirable, but whether you want to bother tuning it from chip to chip is very debatable.
EDIT: For all we know, maybe the majority of the shared TA/TF units are already double-pumped (but not in the shader clock domain; i.e. it's be 2x600MHz on a 8800GT, not 1500MHz). This would obviously save die space. Actually, maybe they didn't have the time to do that on G9x which would help explain the transistor count increases... And also why they can scale G98/MCP78/etc. to a 4-wide TMU, while G86 was stuck to an 8-wide TMU even for the 8300GS. However, this is obviously VERY speculative.
Just smothered in marketing if you ask me. Why wasn't G92 treated the same way?I thought that theory was debunked?
Bilinear filtering can be implemented in just 4 DP4s if you got the data from the TAs in the right form. It's nothing magical. Similarly, TA is also full of ADDs and MULs; it also requires some other operations, however, which are not sharable with TF, but these are presumably noticeably cheaper.Very interesting read, thanks Arun!
But i was under the impression, that especially for TF you'd need some highly specialized circuitry to run it (bilinear) single-cycle, after the necessary data has been fetched. So i cannot see both units sharing a majority of transistors - unless someone might take the time and explain further.
Before G80 was launched, I heard some rumours about game devs emulating DX10 on R580 shaders. I suppose it's possible since the ALUs are highly versatile, but the performance sucks so it can't be used for real. Current nVidia chips can also do almost anything through CUDA, but not everything might be really usable. Besides, DX10.1 isn't about new technologies, but about speeding up the current ones, so emulation wouldn't make any sense. nVidia says they don't care about DX10.1 because the game devs have enough problem, but in my opinion it's just smoke & mirrors maneuver for G80 being built primarily for DX9 and not for all those new DX10.1 features. The R600 was clearly designed with these in mind, though the architectural flexibility cost ATi more transistors, forcing them to use the 80nm process... and you know the rest.
Two years? No I don't think so. GT200 is a slightly or heavily modified G80, but its principles won't last another two years. Three years on the market is long enough for even the best architectures to grow obsolete. R300 was launched in August 2002, R520 came in September 2005 (with a three-month delay or so) and it was just about time the old architecture was replaced. So, just as Megadrive1988 says, Q4'09 could be the right time to release a new, DX11 based product.
That is, forgive me, total bullshit. Every manufacturing process, even from the same company, has its specifics and chips must be designed from scratch in order to be able to use it. In the past, AMD transitioned from traditional bulk process to SOI with no trouble. In the past, nVidia fabbed some of its chips (notably the famous NV30) in IBM fabs before settling at TSMC for good.
Something tells me you don't mean knowing as in Arun's (was it Arun?) definition. Four GPUs on a card, if we're talking about RV670 or RV770, that is - sorry - also total bullshit. The purpose of CrossFireX is to allow for X2 cards to work together, or with a single card. Just as you can't put two chips of the G80/R600/GT200 calibre on one card, you can't put four RV670/RV770s on a card.
By the way, Quad CrossFire scaling sucks so badly ATi would be only shooting itself in the foot by marketing it as a usable graphics solution.
so 2 3870x2s on a single PCB? i dunno .. cut down a lot... heh ...
I'm serious when I say I'll be testing something like that very soon (a week or two).
no I'm not kidding, even though it is april 1st ...
of course it is ES, but you know AMD is thinking about it.Asus dropped down to bit-tech offices yesterday afternoon to show off its new HD 3850 X3 graphics card - yes, that's right THREE 3850 GPUs on a single PCB. How does it achieve this? Using MXM modules and some clever use of heatpipes and watercooling - the cores all face towards the board and the memory on the back is heatsinked.
If G92 was supposed to be a summer 2007 GPU then surely it would have benefitted from NVIO too? If GT200 was following ~4 months later then time to market for such logic on 65nm shouldn't have been an issue if it wasn't an issue for G92. So, why pair GT200 with NVIO but not G92?Jawed: Because the die size is smaller, yes; but there are multiple reasons for why you'd want to do the I/O separately on a very large die AFAICT... The process variant if you want I/O is also different from if you don't; I wouldn't be surprised if that affected cost somehow, but I'm not completely sure.
That document relates specifically to G8x architecture, not to anything prior to it, as far as I can tell.As for filtering being a fixed-function block, those are 2004 patents. Duh, of course that was the case in that timeframe - there was no evidence whatsoever of the contrary.
If there's some documentation that relates to this then it would be nice to see. I certainly won't deny the possibility - after all the whole lot can be done programmably - and idling sub-units are bad for overall utilisation.What I'm saying is that as of G84/G86, there's now a substantial amount of sharing between TA and TF.