Next NV High-end

Dave Baumann said:
Ail, you appear to forget the ADD's as well.

If I'd take more or all aspects under account, it would take me more than an hour per post ;)
 
Ailuros said:
If I'd take more or all aspects under account, it would take me more than an hour per post ;)

Heh. Hence the communication struggle. Not to mention that as you drill down into the level of detail that the architecture above you becomes ever more important so you run the risk of being more technically correct and yet less indicative of actual performance. And the further up in generality you go the less useful the constructs become (at least over the last year or so). Oh what a bother. :LOL:
 
Chalnoth said:
I do have to mention that I think that similar to a 16 tex, 48 ALU architecture, I think that a 32 tex, 64 ALU architecture would be similarly unbalanced (given memory bandwidth constraints).

If nVidia were to keep 24 texture units with the G75 and somehow beef up the ALU's, it might make for a better usage of transistors, though I do question how well they could make use of additional ALU power without decoupling the pipelines, so it may make most sense to go with just adding a couple more quads to the G70 for the 90nm incarnation.

I thought I'd never see the day seeing you talking about memory bandwidth constraints.

Jokes aside, I saw a 61% increase in fill-rate between G70 and NV40, yet only a 9% increase in memory bandwidth. Using either 750 or even 800MHz ram would increase memory bandwidth compared to G70 by ~25-30%.
 
Chalnoth said:
I do have to mention that I think that similar to a 16 tex, 48 ALU architecture, I think that a 32 tex, 64 ALU architecture would be similarly unbalanced (given memory bandwidth constraints).

If nVidia were to keep 24 texture units with the G75 and somehow beef up the ALU's, it might make for a better usage of transistors, though I do question how well they could make use of additional ALU power without decoupling the pipelines, so it may make most sense to go with just adding a couple more quads to the G70 for the 90nm incarnation.


So we'll be looking at what, 380M at least for a modified G70? I wouldnt put it under Nvidia to leave it a 48 unit config and further modify the pipelines. They most certainly wouldnt develop it into a 3TMU core would they? Should we expect to see further additions to the vertex pipes as well?

Seems popular that its going to be 32pipe 2 TMU, simply adding 2 more quads. I'll personally be very surprised if thats the case. Its going to be Nvidias first 90nm high end PC dedicated graphics core in mass production on 90nm. Wouldnt it make sense to leave it at 48 units and to modify the cores guts further rather then risk bad yields? That would be my only cause for concern is the outputs of a high clock high pipeline core on 90nm for them. We think the G70 is working out fantastic on 24, but how long has Nvidia been binning them? And how many are infact failing tests i wonder. And most importantly how would a 90nm fab of the current G70 core effect yields if thats any idication of the G7X on 90nm with more pipes and higher clocks, which i dont see happening, at least not in good supply.

in my opinion further modification to the core and attempt at significantly increased clocks seems far more in reason then a brute bump in pipeline numbers. Otherwise Nvidia may risk serious yield problems, something that they did not do conciously with the G70 and i dont see them doing with the G7X on 90nm.
 
Last edited by a moderator:
geo said:
Heh. Hence the communication struggle. Not to mention that as you drill down into the level of detail that the architecture above you becomes ever more important so you run the risk of being more technically correct and yet less indicative of actual performance. And the further up in generality you go the less useful the constructs become (at least over the last year or so). Oh what a bother. :LOL:

Most of the MADD, ADD, MUL or what not stuff can be replicated in synthetic applications, but that's besides the point. It should be bleedingly obvious that you need more than one factor to evaluate a GPUs performance and there you need both it's advantages and disadvantages.

If you take the G70's higher ALU performance as granted (compared to R520), then you need to start with the "but's". However ATI is aware that for all intends and purposes they do need more ALU perfomance otherwise ALU unit amount wouldn't triple all of the sudden in R580 and it's not like they've noted about the 3:1 ALU/TMU relation just recently; I've been hearing it for quite some time.

My point I was trying to make in this thread is that G70 has that specific advantage over R520 and I doubt NV is going to allow itself to lose that advantage in a followup part; when you're running neck to neck you can't afford losing valuable advantages.
 
no-X said:
Is G70s memory controller able to handle so high-speed ram?

I've no idea where the threshold exactly is, but considering that R5xx can handle up to 1400MHz GDDR4 from what I've read here in these forums, I'd be very surprised if G70 would be able to handle significantly lower speced ram.
 
no-X said:
Is G70s memory controller able to handle so high-speed ram?
I know people who have OC'd their GTX memory up to 1750MHz or 875 that is.
Of cource that took quite a bit of voltmodding and huge ramsinks but it worked somewhat. The image was a heap of artefacts but I think that shows the memory controller is capable of high frequencies.
 
SugarCoat said:
So we'll be looking at what, 380M at least for a modified G70? I wouldnt put it under Nvidia to leave it a 48 unit config and further modify the pipelines. They most certainly wouldnt develop it into a 3TMU core would they? Should we expect to see further additions to the vertex pipes as well?

=/>380M is the transistor count I'd expect more or less for both future contenders.

Seems popular that its going to be 32pipe 2 TMU, simply adding 2 more quads. I'll personally be very surprised if thats the case. Its going to be Nvidias first 90nm high end PC dedicated graphics core in mass production on 90nm. Wouldnt it make sense to leave it at 48 units and to modify the cores guts further rather then risk bad yields? That would be my only cause for concern is the outputs of a high clock high pipeline core on 90nm for them. We think the G70 is working out fantastic on 24, but how long has Nvidia been binning them? And how many are infact failing tests i wonder. And most importantly how would a 90nm fab of the current G70 core effect yields if thats any idication of the G7X on 90nm with more pipes and higher clocks, which i dont see happening, at least not in good supply.

Sounds like the flipside of the coin to Chalnoth's doubts that ATI won't be able to reach R520 frequencies with the R580.

in my opinion further modification to the core and attempt at significantly increased clocks seems far more in reason then a brute bump in pipeline numbers. Otherwise Nvidia may risk serious yield problems, something that they did not do conciously with the G70 and i dont see them doing with the G7X on 90nm.

Because higher frequencies are a proven method so far to guarantee higher yields? I can see the exact opposite for the past two years.

a) The jump from 130 to 110nm is smaller (always in a relative sense) than from 110nm to low-k 90nm.

b) The G70 can reach under specific conditionals twice the performance than NV40 and that with merely a 30MHz increase in frequency. The average in today's applications might be a lot smaller than that depending on a varying number of factors, but that still doesn't mean that they'd need an extremely high clockspeed as an alternative, if they again should have chosen to increase the amount of quads.
 
G70 is pretty bandwidth limited as it is. Are you insinuating Nvidia would rather chance a 64 unit core to meet demands then shoot for further changed however significantly increased clocks on the same base as the G70? If they do infact go the route of adding another 2 quads i see the problem of yields and lacking the speed to show much improvment over their 48 unit part or even ATI's R520 and R580 parts. I think they need to look elseware. ATI has stuck with a 16pp part, no doubt Nvidia has known this a long time, we believe the R580 to be a 48 or 72 unit part? Im not sure what the common concensus is on that right now. But i just dont see Nvidia developing a higher pipeline core without the bandwidth (and it will need alot) to feed it. That seems like 2 goals, needing high clocks and hoping the yields of the 32pp part are good, so 2 goals and 2 serious risks for a jump to 90nm on the high end to me. Unnecessary for the time frame as well.

I'd expect something that advanced to be more along the lines of the G80, maybe being 72 (24pp 3TMU or 32pp 2TMU) or maybe even a 92 (32pp 3tmu) unit core. Perhaps thats pushing it, but i think not. I just dont see Nvidia going for a 32pp core just yet. Especially for a core they want to yield well from the get go.
 
Last edited by a moderator:
It would be interesting to overclock GTX's ram (~mem. controller) without additional core v-modding... If I understand it well, the memory controller is just another 110nm piece of die, which must run at the same clock, as video-ram... but 110nm@800MHz? How could be this possible? Graphics cards are rarely equipped with video-ram, which is clocked more than 20-25% above GPU clock... I always thought that this is because of mem. controller limitations.
 
SugarCoat said:
Wouldnt it make sense to leave it at 48 units and to modify the cores guts further rather then risk bad yields?
I would suspect that playing with the underlying architecture would leave nVidia more at risk of bad yields than increasing the number of pipelines (assuming the same transistor budget).

It's also rather uncharacteristic of nVidia to release more than one "core modification" before the onset of a new architecture.
 
Chalnoth said:
I would suspect that playing with the underlying architecture would leave nVidia more at risk of bad yields than increasing the number of pipelines (assuming the same transistor budget).

It's also rather uncharacteristic of nVidia to release more than one "core modification" before the onset of a new architecture.


what makes you think they havent already been playing with the architecture further?
 
SugarCoat said:
G70 is pretty bandwidth limited as it is. Are you insinuating Nvidia would rather chance a 64 unit core to meet demands then shoot for further changed however significantly increased clocks on the same base as the G70?

I'm merely speculating here as everybody else.

G70 is inevitably bandwidth limited since it has only about 9% more memory bandwidth than NV40, yet 61% more fillrate. If however the choice would be between say =/>700MHz or more units with a much lower frequency, then given the trend they followed with G70, the latter makes more sense for yields IMO.


If they do infact go the route of adding another 2 quads i see the problem of yields and lacking the speed to show much improvment over their 48 unit part or even ATI's R520 and R580 parts. I think they need to look elseware. ATI has stuck with a 16pp part, no doubt Nvidia has known this a long time, we believe the R580 to be a 48 or 72 unit part? Im not sure what the common concensus is on that right now. But i just dont see Nvidia developing a higher pipeline core without the bandwidth (and it will need alot) to feed it. That seems like 2 goals and 2 serious risks for a jump to 90nm on the high end to me.


Can we keep that whole unit stuff in the quad department for a moment to keep things easier?

R520= 4 quads (16/1/1/1)
G70= 6 quads
R580= 4 quads (16/1/3/1)

Let's have a look at bandwidths per chip:

R520 = 10000 MPixels-MTexels/s, 48GB/s bandwidth
G70 = 10320 MTexels, 6880 MPixels/s, 38.4GB/s bandwidth

While there's a 25% difference in memory bandwidth (and a much better fillrate to bandwidth relation), is the performance difference on average really at that persentage and how can one conclude that any higher performance from the R520 is purely due to memory bandwidth alone in any case? What about ultra high resolutions? (which I still take with some doubts for future Catalysts to further increase the R520's performance).
 
If they add more quads then yields might go down, but they could sell ones with defective quads as 7800gtx, 7800gt and so forth, so it would not be such a crushing blow. It seems an easier gamble than increasing frequency significantly, however I do not think they will actually do it right now.
 
Sxotty said:
If they add more quads then yields might go down, but they could sell ones with defective quads as 7800gtx, 7800gt and so forth, so it would not be such a crushing blow. It seems an easier gamble than increasing frequency significantly, however I do not think they will actually do it right now.
The move to 90nm may not lead to an overall increase in die size for a 32-pipeline G7x, though.
 
Ailuros said:
While there's a 25% difference in memory bandwidth (and a much better fillrate to bandwidth relation), is the performance difference on average really at that persentage and how can one conclude that any higher performance from the R520 is purely due to memory bandwidth alone in any case? What about ultra high resolutions? (which I still take with some doubts for future Catalysts to further increase the R520's performance).

Have you taken a look at Ratchet's X1800 preview on Rage3D ? I think that is pretty conclusive evidence that R520's advantage in the games tested is apparent primarily in bandwidth bound situations (high res + AA).

The one standout is Chaos Theory with all SM3.0 features turned on - the XT really struts its stuff there, even without AA.

So the way I see it, the XT has a lot of potential but any "wins" so far in last generation titles are down to the bandwidth advantage IMO. Hopefully we see it pull away more in more shader limited titles - FEAR, Oblivion etc and put the matter to rest.
 
trinibwoy said:
Have you taken a look at Ratchet's X1800 preview on Rage3D ? I think that is pretty conclusive evidence that R520's advantage in the games tested is apparent primarily in bandwidth bound situations (high res + AA).

The really interesting part tho, given the recent conversation with sireric, is this does not necessarily mean that NV can catch up just by providing an equal amount of raw spec bandwidth to G70.

We'll see in a bit, I guess, tho it sounds like maybe awhile before ATI's new toy is tweaked to the max for a really good judgement on that point.
 
geo said:
The really interesting part tho, given the recent conversation with sireric, is this does not necessarily mean that NV can catch up just by providing an equal amount of raw spec bandwidth to G70.

We'll see in a bit, I guess, tho it sounds like maybe awhile before ATI's new toy is tweaked to the max for a really good judgement on that point.


it still makes me wonder if the R580 is really going to boast absolutly huge improvment over the R520. Its basically shader power thats going to get the increase, someplace where the R520 already beats the GTX in the advanced area badly, i'll be very interested to see the need for something that seems as powerful as the R580 on paper until Nvidia catches up (but of course before R600 either way).

Nvidia has to work with what they have currently until the G7X, which im almost positive is going to contain architectural changes.

Why isnt the idea of simply another quad added out of the question as well, something that just popped in my head. If 32pipe is so popular, which i think seems a little rediculous, how about 90nm G7X at 28pipes, higher clocks and architectural tweaks? Would make sense considering the G70's changes (to the nV40) span back to almost the NV40 launch itself if not before. Nvidia made a decision along time ago where the G70 would sit in their core line, it would only make sense they have made further advancment since then.

Niether company go without a backup this year it seems, i would think Nvidia MUST have something else to toy with other then a simple up clocked or 1 or 2 quads added to the same core they have. I'd hope the changes go deeper then that.
 
Last edited by a moderator:
geo said:
The really interesting part tho, given the recent conversation with sireric, is this does not necessarily mean that NV can catch up just by providing an equal amount of raw spec bandwidth to G70.

We'll see in a bit, I guess, tho it sounds like maybe awhile before ATI's new toy is tweaked to the max for a really good judgement on that point.

Well I'm going with the numbers. I see an equally spec'ed XL being on par with a GT even with AA enabled and even in DirectX titles. If the 25% bandwidth advantage isn't the reason for the XT beating out the GTX with AA enabled, then the XL should be trampling all over the GT. It's so very simple when I look at it that way.

Unless ATi's future driver releases expose some untapped potential in the XL which enables it to surpass the GT, I'm going to hold my opinion that the XT's 25% bandwidth advantage is its saving grace and not the architecture itself.

The only other possibility I see given current numbers is that ATi was able to tweak the XT more than the XL which I really doubt since they're essentially the same chip.

Another question that not many have asked - why is there such a large bandwidth disparity between ATi's first and second string parts in the first place?
 
Back
Top