On the other hand ATI could focus on improving what was there since they already did DP and scatter before ... lets just wait for some benchmarks.NVidia appears to have spent a lot on CUDA-related functionality
On the other hand ATI could focus on improving what was there since they already did DP and scatter before ... lets just wait for some benchmarks.NVidia appears to have spent a lot on CUDA-related functionality
The patent document seems to imply that the output from these disparate units can be returned to the "cluster" (register file, presumably) - i.e. the staging is "real".That's called marketing...
Haven't a clue who he is.Just like the 'Integer' path in the GT200 diagrams are pure marketing spin, at least according to John Nickolls.
Even G70 doesn't use the same hardware for vertex data fetches as for texture filtering, does it?Exact same thing as in, uhhh, Voodoo Graphics, Riva 128, Rage, GeForce 256, Radeon, [...], R300, R520, NV30, NV40, G70, ...
Kinda disappointed you guys haven't wheedled this stuff out...If anything, as I said many times already, I very heavily suspect that there is a significant amount of sharing between the addressing & filtering parts on all DX10 NVIDIA chips, except maybe G80.
Until it's benchmarked we won't know the practicalities - clearly total throughput is up for both point and filtered sampling.Yes, it seems they finally decided they could no longer afford to have an incredibly inefficient TMU architecture that made neither theoretical nor practical sense. Shock, horror?
Yeah, sadly at least some of the 9800GTX 16xAF/4xAA benchmarks are blighted by the card apparently running out of memory, so the comparisons with HD4850 are unreliable.Hopefully the ROPs are sufficiently improved too so they can get nearer or even catch NVIDIA in these two respects,
Double-precision was already there in RV670, so that particular aspect shouldn't have changed...but given the die size what seems really really impressive IMO are the ALUs as there's a ridiculous amount of them and they presumably aren't any weaker (if anything, the FP64 numbers would imply they're stronger). They're definitely back in the game.
I was just assuming the same "horizontal" arrangement we saw in R6xx
If each SIMD gets a dedicated TU ("vertical" - though it's horizontal on this diagram if the SIMDs are now read as rotated 90 degrees) then that's a bit of a change - though in terms of cache I guess the effect of the change is very limited because L1 is basically only big enough for a small region of texels and any one texel will find itself in multiple L1s anyway, whether it's a horizontal or vertical mapping from SIMDs to TUs.
Jawed
it was on this site that TSMC is skipping 45nm and heading directly towards 40nm (presumably in H2 2009)And by that time comes RV770 in 40nm, and them we will see who catch who
40nm is the half node of 45nm witch is a new process node. It decrease dramatically die-size.
I wasn't implying anything so elaborate. Just talking about the strategy of making units much smaller at the cost of some efficiency/functionality.Hmm, I've never seen any sign that NVidia uses dedicated point-sampling units in G80 onwards. NVidia seems to separate-out the stages that make up "texturing", with a separate LOD/Bias unit, then an addressing unit, then sampling, then filtering (I suspect I've forgotten something). I've been assuming that point samples (vertex data fetches) are taken simply by issuing commands to the address and sampling units.
6. Clamshell mode(x16 mode)
Graphics system designers expect GDDR5 standard to offer high flexibility in terms of framebuffer and bandwidth variation. GDDR5 supports this need for flexibility in an outstanding way with its clamshell mode. The clamshell mode allows 32 controller-I/Os to be shared between two GDDR5 components. In clamshell mode each GDDR5 DRAMs interface is reduced to 16 I/Os. 32 controller I/Os can, therefore, be populated with two GDDR5 DRAMs, while DQs are single loaded and the addresss and command bus is shared between the two components. Operation in clam shell mode has no impact on system band width.
Example: System configurations with 512M GDDR5 device using a controller with 256 bit interface:
A) 8pcs of 512M GDDR5 in standard mode Framebuffer: 512 MB
B) 16pcs of 512M GDDR5 in clamshell mode Framebuffer: 1 GB
GDDR5 SGRAM will be operated in both ODT Enable (terminated) and ODT Disable (unterminated) modes. For highest data
rates it is recommended to operate in the ODT Enable mode. ODT Disable mode is designed to reduce power and may operate
at reduced data rates. There exist situations where ODT Enable mode can not be guaranteed for a short period of time, i.e.
during power up.
http://www.extremetech.com/article2/0,2845,2320134,00.aspJason Cross said:Then there's the question of what ATI has up its sleeves, given that they're on the verge of releasing their new graphics cards based on the new RV770 chip. ATI tells us they're not going to compete in this really high-end of the market with those products.
Rather, they promise we'll get close to the performance of the GTX 200 cards (say within 20%) at dramatically lower prices and power. Certainly that targets a much larger segment of the market, but the worth of that strategy all hinges on its real relative performance. For the high-end, ATI is still a couple months away from the release of their card containing two RV670 GPUs.
We don't know what shape that will take, only that it will combine two GPUs on a card in a substantially different fashion than the Radeon HD 3870 X2, and that ATI tells us that with not-yet-fully optimized drivers it already scores over 6,000 in 3DMark Vantage on Extreme settings. We've heard promises of future greatness before, and reserve judgment until we get the cards in our own hands to run our own tests.
If we knew what the Local Data Share is, maybe we could make a decent guess.So what's the "Global Data Share" that the supposed-Crossbar leads to for? I just realized it wasn't in the R600 drawing...
So what's the "Global Data Share" that the supposed-Crossbar leads to for? I just realized it wasn't in the R600 drawing...
Does that picture come from AMD?
It list 40TMU, but by tests so far HD 4850 looks to have 32TMU.
So True/Fake?
Perlin NoiseWhat's test?
Perlin Noise