Hmm, this is kinda interesting. Remember how R600 was intended to be only a moderate performance improvement over R580? This "rumour" seems to suggest that the D3D10.1->11 transition (if this is truly D3D11) is going to cause the same kind of mediocre increase in performance.
dx9->dx10 was incomparable transition wasn't it. We cant have both on the same system
While on the other hand dx11 requirements like 32kB SP cluster local cache (L1) are just doubled up from dx10 specs. Not the mention real time tessellation and FP64 requirements. Just with that it shouldnt be just a minor update. Big wishes.
And yes dx8(r200)->dx9(r300) and dx9(r350)->dx9b(r420) and dx9b(r420)->dx9c (r580) doubled performance. Now that TSMC join the latest litography techie club, and they weren't there only 2yrs back. So from that benefit we can expect more flexibility in ATis designs in the matter of easily doubling performance. We only now need enough SPUs to doubled up performance (DX11).
In addition, chips destined for X2 boards would have been culled from the part of the pool of chips that had better than average power characteristics, leaving the thermally less desirable chips for the single-chip cards.
So yes, one could say that the chips on X2 boards burn less power, because more power hungry chips aren't allowed on X2 boards.
Exactly it's all in the chip binning and sorting. It's time consuming and the same principles couldn't be shared on MCM solution. So when you once pack it you could only get lower grade MCM if not satisfing TDP so they must produce better and worse products also, based on MCM if they used it. And even
weirder performance per watt configurations if they want to close up to 100% yield which is a reason for small MCM solutions in the first place.
So in fact the whole palette should be based on MCM which would be even more power consuming than some heavier full functional single chip from that MCM when a big part of MCM would be disabled so they have as little to nothing to waste. It's a devious marketing from my point of view. Premium products still would be a premium one with mainstream level product buyer paying price for extra power consumption that could be avoided.
Somebody beat my speculations
http://www.hardware-infos.com/news.php?news=2908
But i can hardly see why damn 32ROPs when that trash is suppose to work
@900MHz with only 50% more shaders as many expecting (only 2,16TFlops not even doubled RV770 computational power). It would be a total crap and I hope Intel's Larabee would easily beat them to the ground. It's just
too many RBEs and too little shaders for the proposed clock. This chip should go @1.2GHz or somewhere near to have respectable computational power and at that speed even 4RBEs would be more than enough. Or it's maybe
sub 90W TDP part then it's pretty prodical chip i must add. Still waiting 32nm power conscious RV840.
$649 cards are a sweet spot? everyone knows that sales at those figures were a small percentage versus the rest of the market. the sweet spot always has been below the high price/high performance parts. placing a part at 80-90% of that performance for 40% of the price seems more like 'finding a sweet spot" than marketing your new product as "previous high end price+$50"
You beat my idea
. HD5770 for me should also be 299$ but only if they make 3-4TFlops scale as i presume. Anything below could sell at 199$ in my opinion. Not that it will ever happen
What makes you so sure that it won't be AMD (again!) who will be in a position of "sweet spot" miss this time? How many times did they missed that spot for the last years? How many times NV did? RV770 is a great chip but let's not forget that it's great in comparision to its competition. And since you don't know what that competition will be you can't know that "sweet spot" we're talking about. You may hope but that doesn't mean that your hopes will become a reality. RV770 is just one chip, you can't assume that RV870 will repeat its success just because RV770 was great. G80 was great, GT200 was pants. Voodoo Graphics was great, Voodoo 3 was pants. Market always changes.
It basically not true cause envy need to revise their architecture which basically only small upgrades since pain in the ass NV30 series followed with great NV40 and then totally revamped in G70. For now envy first needs to learn how to walk and big hope not to learn how to crawl like with NV30 inbred. They in fact are on life support machines since they present GT200 as their prodigy of dying architecture. They had glory days since G70 and that is now fade away. They realize it a year ago when they trash talked AMD & Intel presenting x86 as a dinosaur cumbered with it's age and their glorious cuda as a successor to it.
Why is that bothering you? 48 TUs with 4 TUs per cluster means 12 clusters. 1200/12=100 which is 4 superscalars more than in RV770 cluster.
dx11, dx11 .... and it put it to 190% computational power of RV770 hopfully at 60% RV770's TDP
Doesn't fit with 48 TMUs.
Everything fits especially now in this every part of gpu for itself specifications. And as already has been said
Because in speculation land, anything is possible.
Increased SIMD width would be a serious miscalculation IMHO. Nv is at 32 (and attempting to reduce it), LRB is at 16 (vector masking) and AMD going for 100 (assuming 48 tu's are tie to 1200 sp's, so 1tu serving 100 wide simd in packets of 4)
I also thought they should split their 16SIMDs (80SP) into two but that still not the way to go with all that shading needs (doubled L1 from DX10) and it's not really easy to have too many independent pipelines/clusters. So for now even buildng it up to 16SIMDs x8SP per cluster =128SPs would be a way to go to build up processing power in my opinion. nV & intel have pretty obscure architectures i might add, one is with us for 6.5 years and one has 'pretty inventive' comeback after 15 years not mentioning how flanky was it at its prime time.
It's not that they need to do more. I think an RV770 or 740 or whatever with a new tessellation unit and maybe bigger local cache in the SIMDs might even be enough to be DX11 compliant.
I just don't expect all this time to have passed and for there not to be other improvements. Not "major features" but some architectural bits and bobs and tuning and re-thinking ratios to squeeze greater perf per watt, mm2, and transistor than before.
That's what we should be afraid about. Just shiny new face lift on an old engine. DX11 compliancy without any real boost inside SIMD internal structure design. needed for real FP64 power in DX11. Just a prototype like ill born NV30 but why do that when they can easily scale up the whole thing up. Probably they think of crisis and lower cash inflow so they don't want to give us an revamped architecture we could get on sub-300m2 40nm chip. Not just jet. It's too sad to even think of it.
It's like comparing the "transistor count" of RV630 and RV635. RV635 gained D3D10.1 functionality and supposedly lost 12M transistors. Transistor count isn't much use though.
They might lost it in shared L1 of R6xx architecture or some advancement in L2 production tech or even losing some weight there.
For more signigicant performance increase, die has to be bigger, too. ~400mm2 die is sufficient for 512bit bus.
What would you like to put under the hood of yours RV870 that you need 400mm2? 80TMUs and 32ROPs? We don't need that much with transitioning to that intel's proposed sick ray tracing instead of abundant texture mappings we're used to.
I'd to like more parallax occlusion maping (is any game ever used that??) to intels so ill bastard raytracing they need
to compensate traditionally poor TMU performance of their
so called GPU.
If GT300 is radical then maybe they've fixed all the wastefulness in GT200 and don't need such huge amounts of TMUs/ROPs.
If rumours of ALUs being more sophisticated are true then the implication is that they'll take yet more area per FLOP. NVidia may only be able to afford that kind of extravagance if the TMUs/ROPs go on an extreme diet. Though the wastefulness is only in the region of 30-50% I reckon, so there isn't a monster saving to be made there.
Ofc GT300 should be a radical change because GT200 is on extinction list, and really can't get much performance with that kind of self replicating complex clusters they inherited from G70. Luckily for them they jumped from 192SPs /6clusters to 240SPs/10clusters in the past cause we saw otherwise they would suck badly with 192SP top product that consumed as much as HD4870 and have 10-15 less of performance. Luckily for them they didn't have an insight that ATi would beef up their clusters from 8 to 10. And while ATi do that for a price of a penny, for nvidia it was pretty expensive thing to go from 400mm2 to over 500mm2 die size.
And finally if somebody missed
my ultimate RV870 architecture would be
2048SPUs (16clusters x16SIMDs x8 32bit SPUs (2 of them that big ALUs) easily scalable for FP64 operations needed in DX11
48TMUs (or maybe 64TMUs for better old game lot of texturing capability
)
24ROPs (6RBEs) or some
32 simpler dx11 ROPs (4RBEs) cause i don't know what dx11 specifies
Ring bus with
256bit 1GB@1.25GHz for HD5750 part (
249USD) and
384bit 1.5GB@1.25GHz for HD5770 part (or 256bit 1GB@1.40GHz) (
299USD)
6MB+ L2 cache (up from 4MB+ RV770)
Core clock:
@700-750MHz (2.85-3.05GFlops) HD5750 and
@900-950MHz HD5770 (3.7-3.9TFlops)
Considering some issues with a large number of SPUs per cluster we could see a might be future reuse design with only 12SIMDs x8SPUs but that would only total in 1536SPUs (2.9TFlops @950MHz) with a wasted die space and wasted memory bandwidth which then could be shrunk to 1.25GHz 256bit for HD5770 top model and need for a crazy yields to stay in the performance dome. And is futile if 320SPs dx10 cost only as much as 22mm2 this wouldn't save more than 34mm2 and it's better to have beefier core which you could partially disable or cripple with slower GDDR5 than something that simply doesn't reach it's peak.