Significant IP blocks are developed in TO and contribute to just about every ASIC.
[...]
=>w0mbat: The RV770 doesn't have a ring-bus? You sure of it?
http://eetimes.com/news/latest/show...XITDCIQSNDLRSKH0CJUNN2JVN?articleID=208404063
Seems to me this was a decision made before AMD came on the scene.
AMD hasn't chosen to stop making halo graphics cards - it's decided to make them with 2 GPU chips.
Jawed
It's quite possible that ATI can make a 4870x2 to take the performance crown (and do it cheaper, cooler and with less power than it's monolithic competiton), while Nvidia simply can't make a G280x2. Right there you can see the advantages of going multicore demonstrated in the form of a product that's viable verses a product that isn't.
ATI decided to get out of that dead-end early (and paid for it over the last couple of chip generations) while Nvidia tried to stick it out with monolithics for as long as possible - but it's put them behind on their next steps towards multi-GPU.
Any guess, where should be that CF port, looking at the die shot?
No more ringbus + fixed performance (by way of lots more units while keeping transistor count fairly low) seems to indicate that the ringbus naysayers were right & the ringbus was a waste of transistors?
Do we know which bit of ATI/AMD designed the RV770?
Team A: R300 -> Xenos -> RV770?
Team B: R420 -> R520 -> R600
is it possible that one RV770 has only "2 ringbus stops" ?
RV770 has no ringbus.
Radeon™ HD 4800 Series Hotfix
The information in this article applies to the following configuration(s):
* Radeon™ HD 4870 series
* Radeon™ HD 4850 series
This Hotfix improves overall performance and stability. The Hotfix includes the Display Driver and Catalyst Control Center.
http://support.ati.com/ics/support/default.asp?deptID=894
Strange - until very recently i was (supposed to be...) under the impression, that a ring bus MC was to deliver a much more efficient memory architecture than an old-fashioned crossbar-controller.Better performance, less space used and now u know how R700 will work. The two RV770 will connect via this new MC.
Remember that most of the time caches in GPUs are there for very different reasons for caches in CPUs. It's more about input/output streaming, latency compensation and bandwidth optimisation. Classic % hit rates just aren't relevant - miss with every access and it will still run at full rate. Reuse tends to be intensely localised (e.g. bilinear filtering reading 4 values and the next texture fetch happening to need 2 of those again) rather than 'now and in 1000 cycles time'. And the cache is sized 'just right' because for a given scenario there is an optimum size above which more makes no difference...effectiveness of the caches
Strange - until very recently i was (supposed to be...) under the impression, that a ring bus MC was to deliver a much more efficient memory architecture than an old-fashioned crossbar-controller.
Strange - until very recently i was (supposed to be...) under the impression, that a ring bus MC was to deliver a much more efficient memory architecture than an old-fashioned crossbar-controller.
Remember that most of the time caches in GPUs are there for very different reasons for caches in CPUs. It's more about input/output streaming, latency compensation and bandwidth optimisation. Classic % hit rates just aren't relevant - miss with every access and it will still run at full rate. Reuse tends to be intensely localised (e.g. bilinear filtering reading 4 values and the next texture fetch happening to need 2 of those again) rather than 'now and in 1000 cycles time'. And the cache is sized 'just right' because for a given scenario there is an optimum size above which more makes no difference...
Now that we have better die shots, I found that the ALUs on RV770 occupy either 28.0% or 25.3% of the die, depending on whether that similar looking sliver next to the left of the 4x10 array is redundancy or not. Feel free to update your numbers.So, RV770's 1200GFLOPs take 104mm2, while GT200's 1000GFLOPs take 153mm2:
For double-precision:
- RV770 is 11.5 GFLOPs/mm2 on 55nm
- GT200 is 6.6 GFLOPs/mm2 on 65nm
Jawed
- RV770 is 2.3 GFLOPs/mm2 on 55nm
- GT200 is 0.5 GFLOPs/mm2 on 65nm
Doesn't look like a redundant part to me. It's similar but clearly different to half-a-quad 5x1D unit (for a start it's less than half the size).Now that we have better die shots, I found that the ALUs on RV770 occupy either 28.0% or 25.3% of the die, depending on whether that similar looking sliver next to the left of the 4x10 array is redundancy or not. Feel free to update your numbers.
shouldn't that be 20 MAD/cycle?Anyway, a quad of 5x1D units, capable of 40 MAD/cycle, is 1.8 mm2.
Well yes but designing the ALUs for twice the clock would surely make them bigger - maybe not quite twice, but by a significant amount.If ATI can figure out how go asynch and double the clock without doubling the size for next gen, we'll be looking at insane computational density.
Well yes but designing the ALUs for twice the clock would surely make them bigger - maybe not quite twice, but by a significant amount.