Ati following in nvidia's footsteps ?

Jawed said:
I think the new memory "system" in R520 is ATI's secret weapon. What if it turns out to perform like a 512-bit bus, but is in fact only a 256-bit bus? :D

And what if R520 runs at 700MHz core? :)

And what if we get R580, not R520?... :LOL:

Has anyone ran Ruby 2 on 7800GTX, by the way? What's the performance like?

Jawed

And then we woke up. :LOL:

Oh, memory bus; oh, memory bus. . . :D

Respinning for clocks (Wavey's "later" spins observation) is a little worrisome tho.
 
geo said:
Tho, generally, I agree with you in the larger point and have said so many times. . .NV30 wasn't as bad as R300 was really just exceptional; "discontinuity" exceptional even, IMHO. Tho that statement is liable to summon Uttar to explain why NV30 really was a disappointment on its own terms too. :LOL:
Can I, can I? :) Seriously though, you gotta be kidding yourself if you think the R300 was so good that NV30 could be considered okay. The number of last-second changes (even disregarding bad long-term design decisions). There also were failed respins and risk production runs, but that's another story entirely. That isn't to say the R300 isn't a wonderful design, though.

If anything, the NV30 is what made NVIDIA realize that "6 month product cycles" were a thing of the past. As for the memory bus thing - may I remind you all that the NV30 was to have such efficient compression that its huge bandwidth disadvantage wouldn't truly matter. Should you still believe it had better compression today (and I actually believe the contrary, because ATI had a few very nice Z-related tricks that NVIDIA didn't have in the timeframe, afaik), then you'd have to argue that GDDR2 was the cause of all their problems because their memory controller wasn't too good at it yet. Which would be even more ironic considering how much they bragged at how good it was for about 6 months, criticizing ATI on making a "cheap" implementation (compatibility mode), and ultimately that was proved untrue.

Anyhow, none of this is the core problem here. ATI thought they didn't do the same mistake NVIDIA did with the NV2A. From my point of view, they completely misunderstood the problems NVIDIA had and created even bigger ones. They are lucky to be in a better position than NVIDIA was, and to have many more engineers than they did, as well as to have a more "extendable" architecture than NVIDIA had in the NV2x->NV3x timeframe. Because otherwise, the RISK of it becoming as big of a debacle for ATI would have been huge, and the R520 could indeed have become a new NV3x. Because of these many factors however, I'm confident in ATI's ability to bring a perfectly good chip at "acceptable" yields at least, that would most likely beat the G70U in several ways at least. Three months later, though.

So, what is it that I'm saying ATI didn't understand properly? Fundamentally, the kind of agreement they got with Microsoft seems to require less resources, because they "only" design the chip. So, from an expense pov, it's a lot better than the deal NVIDIA had. And the "licensing" system means price disagreements are very unlikely.
The problem is that the problem NVIDIA had, in the NV2x->NV3x timeframe, is not expenses or price disagreements. It's chip design. Simply put, they put too much effort into NV2x/NV2A + Chipset design, and thus had few engineers on NV3x in the early stages, while still wanting to deliver in a very aggressive timeframe (Spring 2002 part, anyone?). This, among other factors, created a lot of problems, which I had neither the motivation nor the time to get into here. Simply put, it wasn't pretty.

The reason I'm saying it's worse for the R500/R520 is simple: they aren't the same chips at all, and there's a nintendo team going on at the same time. On the plus side of things, this means that the R600 will definitively benefit from all this, but I'd estimate the "best engineers" currently aren't working on the R520. And even though quantity can be a quality, in engineering, it tends not to be.


Uttar
 
Uttar said:
. . .then you'd have to argue that GDDR2 was the cause of all their problems because their memory controller wasn't too good at it yet. Which would be even more ironic considering how much they bragged at how good it was for about 6 months, criticizing ATI on making a "cheap" implementation (compatibility mode), and ultimately that was proved untrue.

Is that you, Marley? :LOL:

Oh, whoa, shades of furballs past! I'd totally forgotten about how intense and testy that particular discussion got. Kinda like the internal/external pcie/agp bridge discussion.
 
Uttar said:
. . . but I'd estimate the "best engineers" currently aren't working on the R520. And even though quantity can be a quality, in engineering, it tends not to be.

I think more of their best engineers are working on R520. . .now. Particularly since the "silo" rules Wavey (and ATI) have described strongly suggest they couldn't have shifted any of the C1 team to Revolution. It would be interesting to know (tho I doubt we'll ever find out) if it were some shiftees coming off C1 that bailed R520 out in April/May.
 
Heh, call me stupid but I can't even really figure out if you're agreeing or disagreeing with me here, geo :)
As for:
I think more of their best engineers are working on R520. . .now.
Well, I don't disagree. Obviously the design is done now, so if anyone can help on the R520, they might be relocated to that (especially so engineers specialized in post-tapeout steps). But the problems with the R520, if any real ones actually exist which does seem quite obvious to me at this point, must have been created before the R500 was finished. They can help fixing problems, but there might have been less of them in the first place had they always been working on the R520.
The keyword here, of course, is "might".

Uttar
 
Uttar said:
Heh, call me stupid but I can't even really figure out if you're agreeing or disagreeing with me here, geo :)

Hmm, I "grew up" in a highly politicized company environment. . .this makes me a bit elliptical sometimes. :)

I do agree with what you're saying re ATI & R520 --particularly pointing out how much more ambitious C1 is re their existing arch than NV2A was. The mitigating factor being, as you also pointed out, that R520 is apparently *not* as ambitious re R420 compared to NV30 vs NV25. And even so, apparently the spot where pretty much everyone agrees they did make a signficant change --the memory bus-- is where it jumped up and bit 'em on the ass. Having the "A" team on that almost certainly would have made a difference to some degree. . .or we might as well all be widgets instead of people, and as much as companies sometimes like to act like that is true, it just isn't.

I like to point at Rialto as where the snowball started rolling downill, resource-wise. Plus all indications are that they originally didn't intend to follow NV on SLI. . .and then market-forces pretty much forced their hand, stretching already extended resources that much further. Then, on top of that, Wavey's CrossFire preview suggests that they originally thot they were going to do it without the compositing chip (and connecter?), and it wasn't until they got deep into it that they realized they needed to do so. . .requiring even more engineering resources/time.
 
geo said:
And even so, apparently the spot where pretty much everyone agrees they did make a signficant change --the memory bus-- is where it jumped up and bit 'em on the ass.

But this is the thing that is worth delaying for, worth getting right. It's a golden bullet for the memory choking we are seeing in G70. One of the reasons why R300 was so successful was that ATI could actually use the power of the VPU through the better memory bus. Maybe ATI are trying the same trick again. They probably realise that without a well balanced memory, they are into the law of diminishing returns just by adding more processing/speed/pipe fragments to the core.
 
Bouncing Zabaglione Bros. said:
But this is the thing that is worth delaying for, worth getting right. It's a golden bullet for the memory choking we are seeing in G70. One of the reasons why R300 was so successful was that ATI could actually use the power of the VPU through the better memory bus. Maybe ATI are trying the same trick again. They probably realise that without a well balanced memory, they are into the law of diminishing returns just by adding more processing/speed/pipe fragments to the core.

But of course. I just used that as evidence in the whole "impact of having your best engineers elsewhere" discussion.
 
Bouncing Zabaglione Bros. said:
Let's wait to see what R520 looks like and when it arrives. There's no point comparing G70 to R420 and then proclaiming that Nvidia is way out in front.
The surprising thing is that R480 holds up quite well to the G70. Most of the time, even in GPU-limited scenarios, R480 is less than one resolution step away from the G70, often finding itself around the midpoint between NV40 and G70.

I certainly hope ATI can get its yields in check.
 
It quite depends on the title benched, wouldn`t you think?FEAR, which seems to be very shader packed, sends the G70 fairly ahead from what I`ve seen.
 
geo said:
Jawed said:
I think the new memory "system" in R520 is ATI's secret weapon. What if it turns out to perform like a 512-bit bus, but is in fact only a 256-bit bus? :D

And what if R520 runs at 700MHz core? :)

And what if we get R580, not R520?... :LOL:

Has anyone ran Ruby 2 on 7800GTX, by the way? What's the performance like?

Jawed

And then we woke up. :LOL:

Oh, memory bus; oh, memory bus. . . :D

Respinning for clocks (Wavey's "later" spins observation) is a little worrisome tho.

How about a less complicated, but equally painful scenario: ATI planned on having faster memory available with the R520 but this has failed to materialize and is not under their direct control.
 
If that were the case they would just release it on the fastest available. Sometimes discussions about a memory bus isn't necessarily just about the memory interface.
 
DaveBaumann said:
If that were the case they would just release it on the fastest available. Sometimes discussions about a memory bus isn't necessarily just about the memory interface.
Did anyone else happen to notice Wavey's italicized bits? That's usually one of his clever little ways of dropping a biiiiig clue.... ;)
 
What could be so special about the new memory bus? Memory bandwidth is not increasing fast enough so make a bus which is more efficient. I don't know anything about memory busses inside of a gpu and don't how good the effeciency currently is.

What i could imagine is that because it is a shared bus writing/reading different kind of data and a more intelligent management of this could maybe increase efficiency or something.

It's safe to assume that it still be 256bit and probably gddr3 and even if the memory controller supports gddr4 already the hints are more related to the bus inside the gpu and how it's accessed
 
geo said:
The wheels on the bus go round and round. . .

You are alluding to the previously discussed "token-ring" style of bus? Wasn't this more related to job scheduling rather than the actual memory interface? Or are you agreeing with the speculation at the time that this was a way of having extremely efficient memory access almost the whole time?
 
The job scheduling was my stupid idea, I believe. Forget it!

I'm not qualified: but it's nice to speculate that a ring bus architecture within R520 could be part of the efficiency improvements.

Jawed
 
Not that I have any special double-secret info I'm holding out on [he added quickly]. But Wavey has been on a path since last October, and he hasn't deviated thru half a dozen or more allusions more or less evenly spaced out over that time. I have to assume his info has only gotten better in the interim. And he did it again above with that "isn't necessarily just about the memory interface". Now, I suppose it is entirely possible that the token-ring part was speculation on *how* they were going to accomplish some combination of better efficiency, granularity, coherency, sharing between units, compression(?), etc at the time he put it forward in the Oct-Dec timeframe last year. . .and that he's since gotten better info on the details that may or may not look like the token-ring.

But I've got my horse, and I've gone to the whip hand with the finish line in sight. :LOL:

Edit: Btw, Jawed's "act like a 512-bit bus" above, while a lovely dream, strikes me as a bit aggressive. Given how constrained we are on bandwidth, it seems to me that it could be quite a bit less of an improvement than that and still be well worth doing. From my pov, as little as a 25% practical improvement would make it worthwhile; making 1400 memory give you results of 1750 memory (tho of course I hope it does better than that).

Edit2: Btw, they got a marketing name for whatever-it-is, Wavey? Or do they figure it is too deep in the internals for that? At any rate, I like "WhirlyGig" if it is the token-ring. :LOL:
 
Granularity and coherence have got to be the key concepts for the outward memory interface (i.e. to the 256MB or more of main memory).

It's notable that Xenos's EDRAM unit supposedly has optimisations for coherence but at the same time does not use compressed data formats in order to improve effective bandwidth. So it seems Xenos isn't a good model for possible design changes in R520. That plus the fact the bus is 1024-bits wide, I think :!:

Within the GPU, memory bandwidth/latency is limited by clockspeeds, bus-width (die-space consumption and utilisation, e.g. diagonal routing) and bus-architecture versus the number of clients.

Cell's EIB (a four-ring bus) should be a big clue. Cell's SPEs are nothing more than multi-core stream processors. A GPU is primarily a group of multi-core stream processors.

In PS3's Cell's case we're looking at 7-way cores (ignoring the PPE). In a GPU we're looking at one set of 8-way cores (or more, for vertex shaders) and another set of 16-way cores(or more, for fragment shaders) - though arguably arranged into four-way quad-cores. Arranged around and between these cores are various caches and set-up engines.

Jawed
 
And the 100% practical improvement implied by your 512-bit comment? Do you think that within reach, or getting greedy there? :) And where does most of that come from? Being able to share better between units without having to shuttle data to/from main memory?
 
Back
Top