HD 5000 series: New architecture or more a major refresh?

HD5k: New archi or a major refresh?


  • Total voters
    49
  • Poll closed .
Gotta agree what everyone else has said, refresh.
Moreover, in terms of architecture, I actually consider rv7xx a even more significant refresh than rv8xx, despite it didn't really get us anything new in terms of features. But the reshuffling of blocks (tmus no longer shared across simds but each simd has its own tmu), ditching of ring bus are imho very major architectural changes. Not to mention the simds got much smaller, though that's probably more optimization rather than architectural change (though, I suppose, part of it also has to do with the tmu reorganization, which themselves are also much simpler and more effective). RV7xx also brought changes in the ROPs.
 
I don't have one to measure throughput/latency of read after write vs. read from memory (would be interested in the results though) but in the Siggraph Asia PDF they said the global memory cache was R/W with atomics.

Thanks for pointing it out. It is indeed, a r/w cache. However, it's amount is so small that the global mem is as good as uncached.
 
Creating a suboptimal schedule for your main market for a contract you might never get is rather risky ... so no, I don't think the consoles play a huge part in it.

Not necessarily. Let's imagine a worst case situation where ATI (for argument's sake) got the contract for the next XBOX while NV didn't. Let's also imagine Sony isn't going with discrete graphics, opting for Cell2 and that Nintendo doesn't like graphics and sticks with ASCII art.

Regardless of whether NV got the contract, because it knows ATI's desktop parts will follow the next XBOX, they will also follow them (XBOX).

PS. if Microsoft/Sony had promised some major contracts already would that show up in financial filings?

Probably in a generic R&D listing. I'm sure LRB getting downsized is a symptom that *something* is happening on this front.
 
GT200 added double precision and doubled register file capacity per SM. Nvidia claims TMUs were tweaked as well. Memory coalescing was also greatly improved over G80. I don't see Cypress being much more of a change.
DP is quite insignificant given the way they added it.

The architecture remains exactly the same with one more SIMD and these ridiculous DP units, which is less than Cypress vs RV790, even if almost everything new from a functional point of view is due to DX11 compliance.
 
Those monkeys would agree with us too: http://www.youtube.com/watch?v=1DPQW0e9ufM

Lol, that's hilarious.

The architecture remains exactly the same with one more SIMD and these ridiculous DP units, which is less than Cypress vs RV790, even if almost everything new from a functional point of view is due to DX11 compliance.

What about Cypress do you consider to be a significant change? Just because you don't like how they implemented DP in GT200 it doesn't mean it wasn't a big change architecturally.
 
DP support for GT200 was a side-strapped addition to the same multiprocessor configuration. It was no more usable than what was the FP32 processing capacity in NV30/35 -- just something the developers would play with until the real thing (GF100), but in no means usable for real applications.
 
What about Cypress do you consider to be a significant change? Just because you don't like how they implemented DP in GT200 it doesn't mean it wasn't a big change architecturally.
How can you say with a straight face this was a big change architecturally? Regardless of performance, the way it was added isn't a big change in my books. I'd only count that as a big change if instruction issue would have been revamped for this for instance, but that doesn't seem to be the case.
I'd consider the increased register files, buffers as well as the increase in SMs per TPC probably more significant (and heck other GT2xx members don't even have that DP block), not forgetting the more easily found missing mul, and things like the shared memory atomics. All things considered, it's still a somewhat major refresh (not a hugely successful one mind you, but still).
Cypress doesn't have any huge changes neither (as I said I consider rv770 as a more significant change from an architecture point of view), maybe I'd consider the removal of the fixed function interpolators the biggest. Still, it has quite a few changes, from tweaked alus to different LDS/GDS and the obvious new dx11 tesselation stuff.
 
GT200 added double precision and doubled register file capacity per SM. Nvidia claims TMUs were tweaked as well. Memory coalescing was also greatly improved over G80. I don't see Cypress being much more of a change.

Yep I am in total agreement with Trini on this one.. more of an upgrade/tweak/refresh than "new architecture" ..


**Edit: IMO, this would be an interesting question for a poll. !! **
 
Last edited by a moderator:
Big relative to the changes in Cypress that it's being compared to.
There were some additions to the base architecture introduced with G84, but they were... additions.

Cypress has improved ALUs in addition to DX11 prerequisites, and even if they are not that significant in a performance perspective, ALUs were redesigned to fit the 32kB memory blocks. FMA is probably the biggest change, but it's not the only one.

TMUs and ROPs also got tweaked to accomodate new data formats.
 
I would have to call it a refresh than a new architecture. How many major architectures have they had?

R100? R200? R300? R600? and rumoured R900? I figure it to be around 4 released but I could be wrong about the earlier architectures.
 
Dot product op's being able to co-issue in a single VLIW is also significant upgrade. ;)
Yes, that's what I wanted to say. And the coissue of dependent operations work for adds, muls and muladds too (but just in pairs)! So in some sense, for the basic operations Cypress doubles the effective frequency of the shader units (by halving the VLIW width). Cypress has 320 VLIW units, but can process 640 dependent madds, adds or muls per clock. That has the potential to double the ALU utilization for low ILP workloads. You can consider this as some kind of sideeffect of moving interpolation to the ALUs, but I think this feature is pretty cool.

@CarstenS: Looks like you were right about the ALU reorder stuff ;)
 
The texture samplers actually got a step backward with halved L1 cache of 8KB, down from 16KB found in RV770.
 
The texture samplers actually got a step backward with halved L1 cache of 8KB, down from 16KB found in RV770.
There is quite some contradicting information about the L1 size of RV770. Some people appear to claim it also had only 8kB per SIMD. The measurements hinting to 16kB for RV770 are a bit difficult without knowing the oddities of the specific implementation (like replacement policy of the caches). I think I've seen the such a test somewhere on a French site which has shown roughly the same behaviour for Cypress as for RV770, i.e. a drop occurs at 16kB. If the the latency hiding works good enough, it may be possible to feed the TMUs partly from L1 and partly from L2 resulting in the the same aggregate bandwidth as feeding it with values from L1 only in some intermediate regime stretching quite a bit over 8kB. But that is just a guess, can't find that test right now.
 
I would have to call it a refresh than a new architecture. How many major architectures have they had?

R100? R200? R300? R600? and rumoured R900? I figure it to be around 4 released but I could be wrong about the earlier architectures.
Cypress is a G70+G71 style refresh. It's not heavy, but it's more than average.
RV770 was an NV40 style refresh. It's heavy.
R600 was an NV30 style new architecture. It's ugly :eek:

Nothing prior to R300/NV30 can really be included since all these GPUs were almost 100% FF.

R300 and NV30 were 2 new architectures.
R600 and G80 were the "second generation" programmable GPU architectures.
Fermi/GF100 at least seems to be the first "third generation" programmable GPU architecture.

The only other real "new architecture" is what gave birth to R520 and R580. Hell, I still wonder why R600 was that late since R500 was almost identical and was introduced before R520...
 
process woes on a huge chip. I have trouble going back and realizing it's a 512bit bus, as it had the performance of a 256bit part :)

it's not unlike Phenom or Fermi.
 
Cypress is a G70+G71 style refresh. It's not heavy, but it's more than average.
RV770 was an NV40 style refresh. It's heavy.
R600 was an NV30 style new architecture. It's ugly :eek:

Nothing prior to R300/NV30 can really be included since all these GPUs were almost 100% FF.

R300 and NV30 were 2 new architectures.
R600 and G80 were the "second generation" programmable GPU architectures.
Fermi/GF100 at least seems to be the first "third generation" programmable GPU architecture.

The only other real "new architecture" is what gave birth to R520 and R580. Hell, I still wonder why R600 was that late since R500 was almost identical and was introduced before R520...

Yeah I guess everthing prior to R300 would have to be considered a new architecture per generation. But it is hard to distinguish as even a new architecture still shares some commonalities with previous architectures like for example the R600 with the R5xx series.

R600 was late because of process problems for ATI. It also explains why R670 was 'early' if the latter was being worked on before the former was properly released.

Unfortunately theres no real firm definition as to what constitutes a refresh or new architecture so its really up to the judgement of the individual. This makes reaching an accord difficult however.
 
Back
Top