G70 memory controler

Zvekan

Newcomer
How did nVidia manage to clock the memory controler in G70 to such high speeds as ~1.8 GHz effective without major redesign?

ATI thrumpeted their Ring Bus controler and its ability to support faster memory as traces arent all connected to the same spot therfore allowing more efficent cooling. How did nVidia manage it with traditional design?

I always though that the original GeForce 7800 GTX (256 MB one) would ship with faster memory and when it finally launched with memory clocks comparable to 6800 Ultra I thought that memory controler was unable to work with faster memory but they proved me wrong.

Does ATI's solution bring any significant advantages or was it mostly marketing?

Zvekan
 
Nvidia had to redesign their PCB layout for the 7800 Ultra. ATI's memory controller does indeed have benefits and a higher bandwidth.
 
There's often a crossover point when something "new" is underused in its first implementation and something "known" can be pushed.

One of ATI’s points about their controller is that the data paths are moved away from the memory controller itself, and instead the ring bus is placed around the edge of the chip decreasing the wire density over the memory controller, potentially increasing core clock speeds – but, this is probably not really that important for this generation, and will become so on future generations (at this point in time I’m expecting the mainstay of this controller to last through at least the R5xx and R6xx generations).

The benefits right now are more centred around the fact that the implementation allows for 8 32-bit channels, as opposed to 4 64-bit channels, and the programmability they are touting. If you look at the FSAA performance hits in the 7800 GTX 512MB reviews I think the X1800 XT is faring reasonably given that it has 100MHz slower memory.
 
Dave Baumann said:
There's often a crossover point when something "new" is underused in its first implementation and something "known" can be pushed.

One of ATI’s points about their controller is that the data paths are moved away from the memory controller itself, and instead the ring bus is placed around the edge of the chip decreasing the wire density over the memory controller, potentially increasing core clock speeds – but, this is probably not really that important for this generation, and will become so on future generations (at this point in time I’m expecting the mainstay of this controller to last through at least the R5xx and R6xx generations).

The benefits right now are more centred around the fact that the implementation allows for 8 32-bit channels, as opposed to 4 64-bit channels, and the programmability they are touting. If you look at the FSAA performance hits in the 7800 GTX 512MB reviews I think the X1800 XT is faring reasonably given that it has 100MHz slower memory.

I'm definitely not one of the many very technology literate people here, but I seem to remember reading somewhere that one of the reasons that a 512-bit bus was infeasible was the trace density to support it would be too great. Could the ringbus help in this by spreading the traces around the entire radius of the socket?
 
OICA, the issue with traces is just as severe for the external memory. You'd have to route double the traces to the external memory on the board, which is already very tricky with 256-bit.
 
Zvekan said:
How did nVidia manage to clock the memory controler in G70 to such high speeds as ~1.8 GHz effective without major redesign?

ATI thrumpeted their Ring Bus controler and its ability to support faster memory as traces arent all connected to the same spot therfore allowing more efficent cooling. How did nVidia manage it with traditional design?

Oh, could it be that the problem doesn't exist (yet)? What exactly made you think it would be a problem in the first place? Would you have thought it a problem if ATI had never said anything to the effect that their new memory interface allowed for higher frequencies?

I always though that the original GeForce 7800 GTX (256 MB one) would ship with faster memory and when it finally launched with memory clocks comparable to 6800 Ultra I thought that memory controler was unable to work with faster memory but they proved me wrong.

I think they are at the mercy of the memory suppliers. Using the fastest memory around will drive up prices. I don't think they would be too happy with the price of memory being the prime cost driver. That might be a nice business for a memory manufacturer, make some GPU just to bundle it with your RAM, but not the other way around. What Nvidia and ATI dream about is probably a memory manufacturer doubling performance within 18 months, much like they did with the 6800 and X800. Unfortunately, memory is a very mature and boring market.
 
Dave Baumann said:
There's often a crossover point when something "new" is underused in its first implementation and something "known" can be pushed.

Are you trying to say that GDDR memories which are fast enough to show the true advantages don't exist yet? That would seem to say that a substantial amount of silicon exists for something which isn't needed at this point. We were all under this great impression that the Ringbus would allow GDDR4 and the G70 wouldn't be able to handle it. Seems the GTX512 nullified that idea.

but, this is probably not really that important for this generation, and will become so on future generations (at this point in time I’m expecting the mainstay of this controller to last through at least the R5xx and R6xx generations).

Well, certainly if the R520 is merely the "prototype" test to get their feet wet for future R6xx generations, but it seems like an awful lot of silicon and I have to wonder what was given up in the design if they had a simpler controller. I guess it is in a way, sorta like how the NV3x despite being a fsckup, gave NVidia key experience in design and in driver compilers that probably helped alot when designing the NV4x. The R520 isn't a fsckup, but it had problems along the way, and perhaps if it had a simpler controller, it would have alot less transistors, or better yet, more ALUs.
 
DemoCoder said:
We were all under this great impression that the Ringbus would allow GDDR4 and the G70 wouldn't be able to handle it. Seems the GTX512 nullified that idea.
Um, how ? It still doesnt use GDDR4 memory. :!:
 
Dave Baumann said:
The benefits right now are more centred around the fact that the implementation allows for 8 32-bit channels, as opposed to 4 64-bit channels, and the programmability they are touting. If you look at the FSAA performance hits in the 7800 GTX 512MB reviews I think the X1800 XT is faring reasonably given that it has 100MHz slower memory.

Do you think that higher programability is directly tied to Ring organization? Somehow I think that even standard memory controlers could be that programabile if you decide to add aditional logic and storage transistors.

Zvekan
 
Are you trying to say that GDDR memories which are fast enough to show the true advantages don't exist yet? That would seem to say that a substantial amount of silicon exists for something which isn't needed at this point. We were all under this great impression that the Ringbus would allow GDDR4 and the G70 wouldn't be able to handle it. Seems the GTX512 nullified that idea.

This is all very confused here.

For one, the GTX 512MB doesn't use GDDR4, and there are no indications that it supports GDDR4 - the memory its using is GDDR3. The ring bus itself doesn't allow for GDDR4, but the memory controller and DRAM channels do; the ring bus is an implementation detail that removes much of density of wiring crossbar switches with data channels creates over the memory controller. The ring bus also makes it cheap to support independent channels down to the width of the memory - I think this is why you haven't seen 8x32-bit controllers before, because the crossbar wiring required would be even more complex. AFAIA the mainstay of the silicon use itself is actually the memory controller, and I assume its size is due to the programmability of it.

Anyway, the comment stems from the fact that ATI are explaining what the memory controller does, how it works and why they have done it, but not all these factors necessarily come into play in the first generation - for example, one of the reasons cited for moving to the ring bus was for increased clocks, but that doesn't mean to say that their older memory architecture couldn't scale to the speeds that R520 is operating at present; its whether it would be able to operate at the speeds its going to scale to over the course of its lifetime.

Well, certainly if the R520 is merely the "prototype" test to get their feet wet for future R6xx generations, but it seems like an awful lot of silicon and I have to wonder what was given up in the design if they had a simpler controller. I guess it is in a way, sorta like how the NV3x despite being a fsckup, gave NVidia key experience in design and in driver compilers that probably helped alot when designing the NV4x. The R520 isn't a fsckup, but it had problems along the way, and perhaps if it had a simpler controller, it would have alot less transistors, or better yet, more ALUs.

http://www.beyond3d.com/reviews/ati/r520/index.php?p=32
R520 Review said:
With this in mind, and the fact that R520 doesn't increase the number of math processors in the fragment pipelines, it does appear that R520 is leaning in its clockspeed advances, at least in the form of the XT, in order to significantly increase the pixel shader throughput; perhaps this is one area that could have been beneficial to have received more focus....

...This memory controller is designed to last for some time, with capabilities for GDDR4 support already present, and so for this first incarnation in R520 it does seem a little over architected for its use.

However, we know how these companies operate and they are generally looking for as much reuse as possible; if they are looking to make a change for, say, GDDR4 then changing the current previous control and then changing the next one as well probably didn't make sense, and given the ancillary benefits ATI obviously felt that it could be beneficial this generation.
 
Zvekan said:
Do you think that higher programability is directly tied to Ring organization? Somehow I think that even standard memory controlers could be that programabile if you decide to add aditional logic and storage transistors.
As I just said in the previous post, no I don’t. I think the increased “programmabilityâ€￾ is a factor of the controls the memory controller has over the DRAM controllers. Apart from the discussion around clock speeds, I think the primary factor the ring brings is a cheap way of supporting more memory channels (at least, cheap in relation to supporting 8 with a crossbar data controller directly connected to the memory controller).
 
DemoCoder said:
Well, certainly if the R520 is merely the "prototype" test to get their feet wet for future R6xx generations, but it seems like an awful lot of silicon and I have to wonder what was given up in the design if they had a simpler controller.
Do we know how much space is taken by the memory controller? I've heard some qualitative descriptions, but that's it.

I think NVidia did a lot of memory controller research back while developing GF3/Nforce/XBoxGPU, and maybe improved on it a bit for NV40/G70 (not sure what they did for NV30, as that wasn't very bandwidth efficient). Now it's ATI's turn to figure everything out properly, and judging by the MSAA results so far, they were successful. It's likely that in future generations they won't need so much programmability once they acquire enough knowledge to hardwire it and still be efficient.

My guess is DB is taking up a lot of space. The scheduling, quick switching of the instruction to be executed, and having each quad working on different instructions (from what I've heard) require multiplying a lot of logic compared to previous generations.
 
I dunno, I took home from Dave and other's reviews that the MC took up alot of space on the die.

As for the GDDR4 remark, I meant to imply clockspeeds. There was past commentary that somehow G70 would be limited in how fast the memory bus could run vis-a-vis the R520. I think if you search the forums, you'll find some people saying that they don't think the G70 will handle 800+Mhz memory. NVidia MC is more a blackbox than ATI's, so it's hard to make comparisons. We have nice little crossbar/LMA diagrams but that's about it.
 
I'd have thought that the primary reason for GDDR4 is the scaling of clockspeeds beyond that of GDDR3.
 
GDDR4 is currently being sampled at 1.4ghz. That's a pretty significant leap over the highest speed gddr3 @ 900mhz.
 
AlphaWolf said:
GDDR4 is currently being sampled at 1.4ghz. That's a pretty significant leap over the highest speed gddr3 @ 900mhz.

Are there any indications when can we expect that kind of memory in sufficent quantities?

What about production costs, do they rise in regards to high speed GDDR3?

Zvekan
 
Zvekan said:
Are there any indications when can we expect that kind of memory in sufficent quantities?

What about production costs, do they rise in regards to high speed GDDR3?

Zvekan

Volume production starts next year. No idea on production costs relative to GDDR3, but I wouldn't expect it to be cheaper.
 
Back
Top