The G92 Architecture Rumours & Speculation Thread

Status
Not open for further replies.
I agree. GX2 style sure does seem unappealing but i think it may be true. Remember the GX2 style waterblocks at cebit during march?

But something is fishy. We are obviously not seeing the "missing link". Novemeber is coming and soon, and not a single G92 die/pcb or anything thing concrete has been leaked yet.

Could we be in for another big surprise as in G100 hitting us earlier than we thought (early in schedule)? or a mistake on nVIDIA's part which i doubt because why would they want to loosen their grip on ATi/AMD?

If they are still sticking with the 8 series moniker (8700 - G92), then could it be possible that the next gen architecture is just around the corner for nVIDIA and i.e scrapping plans for G9x refreshes in favor of the G100 all things considered?

AFAIK G100 has yet to tape out so it would be kind of hard to launch it in only a month and a half...
 
I'd argue that in your example, you'd be looking for 1 out of 2 packets in 100 apartments, which is just as hard to 1 in 50, except maybe for some extra hide-outs in the interconnecting hallways.

Seriously, I have worked on chips with very similar arrangements: a large basic building block duplicated 12 times with a central crossbar going to a memory controller. Adding building blocks can increase the chance of unexpected performance loss here or there due to some freak interactions or due to bugs in the interconnect, but the vast majority of debugging work happens on the individual cores. It's the same divide and conquer approach as initial pre-silicon verification: debug individual cores first. Then move on to interaction cases. You'll see most problem when going from 1 to 2 cores, but past that going from 2 to 12 is painless.
I don't see why it would be different for a GPU.


I wouldn't be surprised if the GTS vs GTX ratio of a wafer is close to 10 to 1, if not higher, so the GTS will have the most influence on overall margins. $280 isn't that unreasonable, right? Yet given the high expected yields (due to massive redundancy), margins on that should be really good.
Why couldn't you use the same story for even larger monolithic chips?

Well, seems you have the upperhand, as I haven't worked on chips at all:D. Anyhow, I guess both of us will have to wait and see what choice the IHVs make, as they're in a tad bit better situation to evaluate the benefits and weaknesses of either approach, eh?:)
 
Well, seems you have the upper hand, as I haven't worked on chips at all:D. Anyhow, I guess both of us will have to wait and see what choice the IHVs make, as they're in a tad bit better situation to evaluate the benefits and weaknesses of either approach, eh?
Even after years of seeing proof of accelerating technology miracles, it's surprisingly hard to envision what will be technologically possible even just a few years ahead. 30" LCD screens with 4M working pixels? Ultra-high DPI color screens in a cell phone with high quality video play back? Hell, just a year ago, I wouldn't have thought it possible to profitably produce larger-than-400mm2 dies for a consumer product!

It's really the same with development: looking back just 5 years ago, it's a miracle we got chips working with the primitive validation techniques we were using back then. Yet verification methodologies are advancing so quickly that most design companies have a hard time staying up-to-date with the latest and greatest. These days, the holy grail is in formal methods, where you try to prove mathematically that you logic is correct under all states and conditions. It's pretty hard to get right and requires a very specific kind of expertise that's not commonly available, but it's incredibly powerful when it works.

I can't back this up with numbers, but my feeling is that despite the increasing amount of transistors per chip, the amount of bugs per transistor per chip and even numbers of bugs per chips has been going down. (The correctness of the software needed to control them is a whole other issue!)

All this to say: don't bet against 1 billion and larger chips. 2 or 3 years from now, the discussion will probably look so very quaint. ;)
 
I'd bet on billion-transistor GPUs, within 2 process nodes.
It's already been done for Itanium, but the markup for chips of that size is very high.
Whenever the die size for a billion transistor GPU falls at or below G80, I'd figure it would be a good bet.

The primary obstacles are manufacturing ones.

The unknown is how bad process variation is going to get in that time frame. Defect rates are likely to be worked down to the point that they will match current defect densities.

Completely validated circuit and logic design means nothing if a pair of chips coming off the line can have maximum clocks varying by a factor of two or more and can have leakage variance of an order of magnitude.

The worse those get, the more likely the die size will need to be smaller and the more likely that chip counts rise.
 
I think G92 being the 8700 almost certainly right, there is just too big of a gap between 32 and 96 not to be filled by something which has 64 ! My aesthetic number sense demands it.

As for G90 or whatever the top end part will be I can easily imagine a faster shrinked version of G80, perhaps with an increase up to 160 stream processors. A high end GX2 type chip I also find unappealing.

Who knows though......


The reason why no one knows exactly the G92 board of details is that Foxcoon
is responsible for the NV board production , no more revelant production by Felctronics. Except the 8800 Ultra , rest of the 8800 Series are not in production.
 
Whenever the die size for a billion transistor GPU falls at or below G80, I'd figure it would be a good bet.
That would be against the trend of the last years: die sizes have been going up despite the increase of transistors per mm2.

The unknown is how bad process variation is going to get in that time frame.
Is there a problem with current process variations? If not, why do you expect this to happen now?

Defect rates are likely to be worked down to the point that they will match current defect densities.
No, they have to be even better. (And they are...)

Completely validated circuit and logic design means nothing if a pair of chips coming off the line can have maximum clocks varying by a factor of two ...
I haven't noticed major breakdowns in the correlation of timing models vs. reality, but you seem to have access to different data points. Please share.

... or more and can have leakage variance of an order of magnitude.
That's just a matter of choosing the right process. These things are very predictable.
 
Last edited by a moderator:
That would be against the trend of the last years: die sizes have been going up despite the increase of transistors per mm2.

Just on a quick tangent... My company is currently designing a chip that is 6.5cm x 6.5cm square (a giant multiplexer for hybrid imaging applications). We can fit 4 of them on an 8" wafer, and expect our yield to be <5% with cost at >$100k each. So...it can be done, for a price.
:cool:
ERK
 
That would be against the trend of the last years: die sizes have been going up despite the increase of transistors per mm2.
So has the talk of going multi-die.

Wafer diameters haven't grown beyond 300mm, so there are manufacturing parameters that have not scaled at all in recent years.

Is there a problem with current process variations? If not, why do you expect this to happen now?

AMD's had known issues with its 65nm A64s.

http://investorshub.advfn.com/boards/read_msg.asp?message_id=21353453

I know it's a post in the intel section, but the numbers are indicative of process issues at 65nm.
The clock differential between the lowest rung of the released 65nm parts to the highest is only 600 MHz, though it is likely the lower qualifying parts would have been culled.

The power consumption, if voltages and clocks were equalized, would be much , much higher.

AMD's Barcelona is said to have circuit tuning that helps mitigate some of this variation, but it obviously is not out yet in quantity, and it has clocking issues.
Repeated steppings and process tuning are likely to push Barcelona up to and over 3 GHz.
The amount of work this will likely require is part of the increasing troubles with combating variation.

There is no expectation for this to be any easier at 45nm and below.

No, they have to be even better. (And they are...)
Most of the published yield curves by various CPU manufacturers show the defect rates on well-tuned processes tend to level off pretty close to one another.
The overall rate of physical defects in the wafers is something that seems to put a floor on the defect rates.

I haven't noticed major breakdowns in the correlation of timing models vs. reality, but you seem to have access to different data points. Please share.
I don't have the numbers directly, but there were a number of interesting discussions on aceshardware on this.
Fabs don't normally release the distribution data, but there was a rough idea that the worst-case variation was something close to linear for clock timing, and much worse when it came to leakage between devices on the order of 10x (all else being equal).

Sadly, that forum has been shuttered.
This was debated by those who would have very good data, but I admit I may very well have misremembered the final amounts.

Device variation was characterized as having a linear factor between devices, something like a factor of two between the extremes, while leakage varied much more.

That's just a matter of choosing the right process. These things are very predictable.
Most predictions by IEE and ISSCC papers (and Gordon Moore) are that variation is going to suck past 65nm.
There are loads of papers on techniques to make designs more tolerant of variation, which I hope are implemented for GPUs sooner rather than later.

Intel's success at getting quad core CPUs out over a year before AMD's single-die solution shows that there are benefits to multichip at 65nm.
Intel does not intend to go single-chip until the next process node, when the die size falls in line with current dual cores.

With the expectation that design, wafer, and fab costs for future nodes are going to rise, I'm betting conservatively that this constraint would be the same or similar for GPUs.
 
Last edited by a moderator:
So has the talk of going multi-die.

Wafer diameters haven't grown beyond 300mm, so there are manufacturing parameters that have not scaled at all in recent years.



AMD's had known issues with its 65nm A64s.

http://investorshub.advfn.com/boards/read_msg.asp?message_id=21353453

I know it's a post in the intel section, but the numbers are indicative of process issues at 65nm.
The clock differential between the lowest rung of the released 65nm parts to the highest is only 600 MHz, though it is likely the lower qualifying parts would have been culled.

It's a shame this response never got the attention it deserved.
 
I'd bet on billion-transistor GPUs, within 2 process nodes.
It's already been done for Itanium, but the markup for chips of that size is very high.
Whenever the die size for a billion transistor GPU falls at or below G80, I'd figure it would be a good bet.

Just do a little math and realize that a billion transistors can easily be done on 65nm with a decent die size reduction over G80 if you are NVIDIA. After all, the move to 65nm from 90nm should allow you to double the transitors or reduce ther die size by about 50%. Or a combination in between.

Perhaps a more interesting question is will there be a 2 billion transistor part at the 55nm half step?
 
Just do a little math and realize that a billion transistors can easily be done on 65nm with a decent die size reduction over G80 if you are NVIDIA. After all, the move to 65nm from 90nm should allow you to double the transitors or reduce ther die size by about 50%. Or a combination in between.

Nothing is "easily done" in MPU design & manufacturing.

Perhaps a more interesting question is will there be a 2 billion transistor part at the 55nm half step?

:LOL: keep dreaming ;)
 
One thing to keep in mind is the percent of die space things such rops, tmus, memory controller, and video acceleration take up, and how that is going to change going forward.
 
Just do a little math and realize that a billion transistors can easily be done on 65nm with a decent die size reduction over G80 if you are NVIDIA. After all, the move to 65nm from 90nm should allow you to double the transitors or reduce ther die size by about 50%. Or a combination in between.

I recalled the transistor count for G80 incorrectly, so my guess was overly conservative.
The die size would be in line with G80 as it is now, if Nvidia goes for the density route.

Unless the density scaling is unusually bad, the die would be over 400mm2, but less than 480mm2.
 
Nothing is "easily done" in MPU design & manufacturing.



:LOL: keep dreaming ;)

If NVIDIA were to build a chip comparable in size to G80 on 55nm it would likely be on the order of 2 billion transistors, no?

And in terms of not been easy to design the parts, that is exactly why they will keep doing so. So they can keep ahead of the competition. Intel may be a node ahead in process, but that doesn't mean it will be easy (technologically or economically) to compete at the high end with NVIDIA.
 
If NVIDIA were to build a chip comparable in size to G80 on 55nm it would likely be on the order of 2 billion transistors, no?

No.

And in terms of not been easy to design the parts, that is exactly why they will keep doing so. So they can keep ahead of the competition. Intel may be a node ahead in process, but that doesn't mean it will be easy (technologically or economically) to compete at the high end with NVIDIA.

I don't expect Intel to compete with GPUs in the gaming market until at least their 2nd generation of products. The first gen is more of a "foot-in-the-door" product, intended to garner mindshare for the next release by creating brand awareness.

Which post number?
There are several on that message url.

Oops, last one.
 
Which post number?
There are several on that message url.

I imagine the last one that highlights alleged mistakes in assessing the power consumption / leakage of AMD's 65nm process and claims that the reason why Barcelona isn't clocking up yet is due to SpeedPath probs, not process issues.
 
That message had its own set of replies that I found convincing enough.

http://investorshub.advfn.com/boards/replies.asp?msg=21524484

and then a later reply to a reply

http://investorshub.advfn.com/boards/replies.asp?msg=21530969

The comparison between the bins within the 65nm process at that time showed a very wide disparity.

The lowest bin had double the power draw at max, even with a lower voltage, a value with a known quadratic relationship to power draw.

At the same 1.1V, the lowest bin drew almost three times the power.

It is no stretch to say that this disparity in binning is due to process variation, and that it was problematic, even after months of tuning.


The recent 3.0 GHz K10 demo would actually strengthen the arguments made months ago.
Apparently, a stepping that either tanks at 2 GHz can also be made to reach 3.0 (with power restrictions eased, probably).
That is something I'd call variation.
 
Last edited by a moderator:
I'm hearing things about a die-size of 17x17 for G92, which would seem to be a fairly large chip for the $199 bracket. And we all know that nV loves good margins... A chip this size doesn't fit in nV's margin model at all to say the least. Any thoughts?
 
I'm hearing things about a die-size of 17x17 for G92, which would seem to be a fairly large chip for the $199 bracket. And we all know that nV loves good margins... A chip this size doesn't fit in nV's margin model at all to say the least. Any thoughts?

Nvidia may want to place G92 into 500 ~200 USD product line, respectively.

Is there any URL you might want to share with us ?
 
Status
Not open for further replies.
Back
Top