AMD: R9xx Speculation

Folks, may I remind you all that this is a speculation thread about the future Ati architecture R9xx, not a "general talk about recent Ati cards."

These thread are already hard to follow as is, let's not spend pages arguing about


It's now rumored for a before the end of the year release? I'm still hearing early Q1 2011, for what it's worth. Then again, a few cards may be able to find their ways on the shelves for December, that's not a far fetched hypothesis.

And 1920 SPs would represent a ~20% increase over RV870 1600 SPs. Then again, all SPs aren't made equal, so that value in itself isn't very telling.
I don't want to boldly contradict but isn't the increase bigger than it looks?
I mean RV870 consist of 320 5-wide processors (sorry I forgot ATI nomenclature...).
1920 would equal to 480 4-wide processors.
That's in fact a 50% increase. If AMD keep the same "organisation" as for now that would be 30 vs 20 SIMD arrays.
 
Last edited by a moderator:
I don't want to boldly contradict but isn't the increase bigger than it looks?
I mean RV870 consist of 320 5-wide processors (sorry I forgot ATI nomenclature...).
1920 would equal to 480 4-wide processors.
That's in fact a 50% increase. If AMD keep the same "organisation" as for now that would be 30 vs 20 SIMD arrays.
That's part of the reason I don't buy into that, yet. We would be very fast in the 400+mm² territory.
I think a somewhat milder update with 2x12 SIMDs (384 VLIW units * 5 = 1920 SPs, would be 1536 if you don't count the t unit if it loses madd), two parallel setup engines to better feed the 2 Rasterizers Cypress has already if the triangles are smaller than 32 pixels and finally a solution for the tessellation weakness Cypress is showing (triangle rate with tessellation are only a third of the triangle throughput without tessellation) would be enough to beat a GTX480.
Or if AMD wants to really beat Fermi's geometry performance also in all theoretical tests, they could go 4x6 SIMDs, 4 parallel setups and 4 rasterizers reduced to 8Pixel/clock like Fermis. But I think that would already be overkill for SI. One can spare that for 28nm.
Could AMD go with two T alus per 4-wide processors? I mean would it make sense?
No, it makes no sense. GF100 has a 8:1 ALU:SFU ratio, GF106 6:1. Going 2:1 would waste a lot of die space without an appropriate performance gain.

Btw., does anyone know why SI is now supposed to contain the shader updates rumored for NI? Is it the same architecture after all, just SI is produced at 40nm TSMC and NI is planned for 28nm GF (with doubled shader count for top model)?
 
I believe flight simulators sometimes use DP…
If that's true then they only need that for their simulation model of the plane on the CPU. Rendering with DP is just pointless.

And I would even say that they really should invest more time in the numerical stability of their algorithms if they need DP for a game simulation model.

Why not?
Let's say I use Jacket with my Matlab programs (Mathworks already working on Open CL support on their on), I do need a solid DP performances and yet don't have the money for a professional Tesla card.

With more applications taking advantage of the GPU that might be a factor, much smaller than the Gaming performances but a factor (One more name comes into my mind is Pixel Bender).
I'm talking about consumer chips. I'm entirely with you in the point that a GTX 480 should be allowed to use its half speed DP capabilities. But for chips only sold in the consumer market it's just a waste of transistors.
 
Flight simulators need to use double precision because the size of the earth is large enough that single precision is only accurate to +- 1 meter or so at one earth radius from the origin. This is fine for mountains and such, but what about simulating a country airstrip, where the runway is almost-but-not-quite level? A bump that's merely an inch high is quite noticeable when you're rolling over it at 100 MPH during takeoff...

In practice, you could cheat by placing a relative origin somewhere near to the aircraft, and then moving it as needed every once in a while, allowing you to use single precision for elevations that are based on something else nearby, such as sea level. This of course means that you have to translate the entire terrain mesh once every few dozen miles, but is this really any more expensive than streaming it from the hard drive in the first place?

Obviously, you'd want to load a coarse mesh (or set of meshes...) from a reasonably sized globe database, which should provide resolution within a few dozen meters at the cost of a few GB of hard drive space. Finer details would then be generated via tessellation and a fractal displacement map... You'd also be able to use a compute shader to analyze the terrain and place down objects such as trees, bushes, houses, and so on with realistic placement. These would then go back through the tessellation pipeline, which would stream them from a few million points (a point for each object) to a few million meshes, ranging from billboard complexity to 1000s of polys depending on distance/size.

You could literally make a flight sim that would look like crysis up close, yet allow you to fly around the world at 30,000 feet seamlessly... I wish someone would do this!
 
Flight simulators need to use double precision because[...]

Thanks, I remembered it was something like this, but couldn't have provided such a detailed explanation.

Arguably, this is kind of a niche, but it goes to show that DP isn't entirely useless for gaming.
 
Thanks, I remembered it was something like this, but couldn't have provided such a detailed explanation.

Arguably, this is kind of a niche, but it goes to show that DP isn't entirely useless for gaming.

I would like to see someone who makes this on gpu when u can write it in C++ like the rest of the complicated flight sim code. And it would run without compatibility or performance problems on both AMD and Intel cpu-s.
Even today its hard to find a game that use more than 2 cores.
And if the code is not realy paralel either or need to wait(or feed) other threads runing on the cpu, than the gpu+cpu combo is not the best choice. Maybe the cpu+gpu performance (not just compatibility with other hardware) is also one of the problems why noone makes "real" gpu physix , just stand alone eyecandy.
 
Well, according to AMD conference call, ATI's "next" gen GPUs will be out at then end of this year. I think that pretty much puts to rest that they won't be out until next year.

Also, they expect to be supply constrained (wafers) through the end of this year at least. That's going to be a large part of what's preventing ATI from gaining even more marketshare from Nvidia. Nvidia has to be breathing a huge sigh of relief from that. Things couldn't possibly be going any better for them considering the problems with Fermi's delayed launch and low yields.

With that said, it's looking to be a very real possibility that 5870 will end up being EOL'd at a higher price than it launched.

Regards,
SB
 
But, WHY???
Are they unable to get enough room at TSMC?
Is Nvidia blocking them or simply low yields?

TSMC appears to still have problems producing enough 40 nm wafers to meet demand. And rumors have it that Nvidia secured a large allotment of 40 nm sometime around Q4 of 2009 I believe it was.

Basically, ATI misjudged both how strong demand would be for Radeon 5870 (they expected better competition from Fermi sooner) and TSMC's ability to produce 40 nm wafers.

Nvidia basically took a gamble that they would not have to do any respins of Fermi. Had TSMC been able to ramp up production of 40 nm wafers quickly, their move to secure a large allotment of wafers would have looked silly in the extreme. However, due to TSMC's problems, that potential blunder has turned into a very strategic block of AMD's capability of capitalizing on the fact that they had no competition for half a year. TSMC's continued problems with that will also contribute to making it difficult for AMD to do much Q3/Q4.

Thus, why there have been no price cuts (actually price went up in order to lower demand). AMD have basically done the only thing they could, which is maintain good margins.

Regards,
SB
 
I'm still hearing an oct/nov launch.
Also those 1920SPs wouldn't be 4D+1D anymore but 4D, if I'm reading it correctly that would be a big boost to the base performance of Caymann.. ehrm.. if I got that name right. CH also suggest a 1/2 SP/DP (2Fat+2Thin SP) rate instead of the current 1/5 rate of Cypress.

(I have no idea if that last sentence made any sense)

That's not all that in that thread, cncfc post #68
南岛肯定得A12了
一般上市最快都是最终版本芯片Tapeout6个月后
所以圣诞纸面的可能性非常大

HD6870现在的die size不超过400

(South Island certain to get A12 to finish.
Generally to hit market rapidly, each is final version Tapeout(6? Typo?) this month later, therefore Christmas paper face is a probability very big.

HD6870 now of die size not to exceed 400)

Previously cncfc said GF104s die size was 10% bigger than Cypress(ie GF104 ~370mm2) so looks like that and this new chip which is probably bigger again will compete in the marketplace.

Napoleon post #73 says this product was originally designed for 32nm and backported on the node's death. Also in common with the GF104.

(Sidenote: i saw on the anandtech review they guessed at the GF104 die size as only 320mm2 which somehow subsequently turned into "fact", this is the second time they have done this, anand previously said GF100 was 480mm2 when physically it is 530mm2. If the site wants to keep its reputation in future it should publish die size guesses as ranges if it doesn't know and not state absolute figures that the reviewer has pulled out of thin air).

(Sidenote2: Congratulations to nvidia for the glue holding down the metal cap, not a single reviewer has managed to remove it. They should sell it on their website - for things that you really want to never come apart again)
 
http://seekingalpha.com/article/214...c-q2-2010-earnings-conference-call?part=qanda

Yes, good question. What we said in the opening comments is that we see the supply constraints [on TSMC 40nm] diminishing through the back half of the year and we made that statement in the context of the accelerated Ontario ramp.

So, we're capacity constrained and therefore had to make choices about which customer is to serve to the utmost. And we generate a lot of notebook design wins over the past couple of quarters from major OEMs and once those design wins are done, the designs actually have to correct the GPUs designs down on the motherboard and therefore for OEMs to ship, they need supply from AMD. Therefore we leaned more towards the OEM notebooks in terms of supply delivering.

What that means is we shipped a lower mix than what the demand was because the notebook OEMs tend to use the lower two products in our four product stack, and therefore in effect we didn’t supply the AIB channel with all the volume we could have and hence didn’t supply the upper two products in the stack to the degree we could have which brought our mix lower in effect.

Remember, 16M chips have shipped on 40nm already.
 
How many wafers ball park is that ?

69k assuming an average die size mix of ~250mm2 and 90% average yield.
78k assuming an average die size mix of ~250mm2 and 80% average yield.
90k assuming an average die size mix of ~250mm2 and 70% average yield.
104k assuming an average die size mix of ~250mm2 and 60% average yield.
125k assuming an average die size mix of ~250mm2 and 50% average yield.
156k assuming an average die size mix of ~250mm2 and 40% average yield.
207k assuming an average die size mix of ~250mm2 and 30% average yield.

So pick a yield and scale by what you assume the average die size is. I've already factored in a 10% wafer trim factor (basically unusable space on the wafer which is a combination of geometric issues and min spacing issues).
 
What I've been pondering on lately is the following:
Everyone seems to be assuming that AMD is going to rectify their geometry performance and best/match/come close to what Fermi is offering.

There are a few arguments that make me wonder if this is really a priority at AMDs.

First, they seemed to be quite content with the tessellation performance, when designing their top three chips with the same ff-hardware. And arguably it doesn't seem like a major bottleneck in currently shipping DX11 titles (yes, actual games that is).

Second, Nvidia was pretty long rumored to be doing "soft-tessellation" implying rather lackluster performance. Now, obviously I don't know if AMD itself was misled by that also, but since even semiaccurate and before that the inquirer kept trumpeting how abysmal GF100 was to perform when faced with DX11 workloads, I'd consider the possibility at least.

Third, Nvidias geometry performance wasn't really something to boast about before Fermi. IIRC before the new architecture, Nvidias chips were capable of 0.5 drawn triangles per clock, whereas at least higher end Radeons could achieve a theoretical ratio of 1.0. This also doesn't really point into the direction, that the Santa Clarans were about to invest really heavily into this area.

Fourth, according to Nvidia, the distributed tri-setup and raster grew the whole GF100 chip by 10 percent. Now, that's probably marketing, but I tend to believe that it wouldn't be quite as cheap as single-digit square millimeters to incorporate that feat. Talking of which, they seemed to be quite proud of having succeeded at all, so it's probably no minor task you can throw in into a largely defined chip.

The question I am asking is, how likely it would be that AMD is willing to invest major ressources into a feature today mainly used for Unigine, Stone Giant and some SDK samples. I really cannot assess how upcoming games are going to really stress tessellation performance, but out of the few currently available DX11 games, I think Battleforge and BF: Bad Company 2 don't use tessellation at all.
 
The question I am asking is, how likely it would be that AMD is willing to invest major ressources into a feature today mainly used for Unigine, Stone Giant and some SDK samples. I really cannot assess how upcoming games are going to really stress tessellation performance, but out of the few currently available DX11 games, I think Battleforge and BF: Bad Company 2 don't use tessellation at all.

But on the other side maybe games like AvP and Dirt2 would have much more tesselation in games if the tesselation performance was on GF100 level. I think its much easyer for developers to implent and play with a fueture thats less limited (etc something thats near free or max 10 percent performance hit).
And if amd didnt improve tesselation than it leaves the 6k cards against nvidias tesselation bashing and dx11 done right slogans the whole year. Amd knows this very good so in my opinion they will improve tesselation.
 
Last edited by a moderator:
I highly doubt much is going to be done about geometry performance for SI. I think it's far more likely that NI might do something with regards to that as it's supposed to feature more radical changes (when both are compared to Evergreen).

I'm hoping tesselation and geometry preformance continues to go up, I would really love to have games using it more and for more things. I truly think this has the greatest potential to improve 3D graphics since programmable shaders were introduced.

I long for the day when the illusion of a 3D wall, floor, heck any surface, isn't completely and totally ruined anytime you look at it from an angle or get too close because it won't be just texture tricks anymore.

Regards,
SB
 
Back
Top