The G92 Architecture Rumours & Speculation Thread

ShaidarHaran · Sep 28, 2007

3dilettante said:
That message had its own set of replies that I found convincing enough.

http://investorshub.advfn.com/boards/replies.asp?msg=21524484

and then a later reply to a reply

http://investorshub.advfn.com/boards/replies.asp?msg=21530969

The comparison between the bins within the 65nm process at that time showed a very wide disparity.

The lowest bin had double the power draw at max, even with a lower voltage, a value with a known quadratic relationship to power draw.

At the same 1.1V, the lowest bin drew almost three times the power.

It is no stretch to say that this disparity in binning is due to process variation, and that it was problematic, even after months of tuning.

The recent 3.0 GHz K10 demo would actually strengthen the arguments made months ago.
Apparently, a stepping that either tanks at 2 GHz can also be made to reach 3.0 (with power restrictions eased, probably).
That is something I'd call variation.

Fact is that the upcoming 5000+ Black Edition 2.6GHz chip only draws 65W on 65nm. I think 65nm is doing just fine

ShaidarHaran · Sep 28, 2007

CJ said:
I'm hearing things about a die-size of 17x17 for G92, which would seem to be a fairly large chip for the $199 bracket. And we all know that nV loves good margins... A chip this size doesn't fit in nV's margin model at all to say the least. Any thoughts?

Since when is G92 slated for the $199 bracket though?

CJ · Sep 28, 2007

ShaidarHaran said:
Since when is G92 slated for the $199 bracket though?

I could throw that the other way.... Since when is it not?

ShaidarHaran · Sep 28, 2007

CJ said:
I could throw that the other way.... Since when is it not?

tru.dat

silent_guy · Sep 28, 2007

3dilettante said:
So has the talk of going multi-die.

Fair enough. We're really have to see about what's going to happen. My belief is not that GX2-like solutions won't happen, but they don't necessarily mean that larger dies than R600 and G80 are impossible. I think we'll see both solutions for quite some time.

Wafer diameters haven't grown beyond 300mm, so there are manufacturing parameters that have not scaled at all in recent years.

I'm walking a bit out of my comfort zone here, but there aren't really that many technology parameters that are related to wafer diameter. The reason to go to larger wafers has mainly 2 reasons, both economic:

increase fab capacity: higher die throughput per handled wafer
reduce the amount of unusable wafer real estate: this is important when dies grow larger. Even with current large die sizes, it's still not that much of a factor, though it's definitely part of some equation in some cost calculation spreadsheet.

AMD's had known issues with its 65nm A64s.

I tend to conveniently ignore these kind of issues.

They are really not much of a concern for vendors who are using external fabs and standard cell design. So let me clarify: worst case electrical characteristic (which are always used to calculate the critical timing path of a chip) are really quite reliable, even in 65nm. Going forward, I don't expect major changes with this.
I suspect this is because fabs for one tend to add a certain margin of error exactly to make sure customers get what they expect.

This is less of a concern for the CPU fabs of Intel and AMD, so they'll try to get closer to the limits of their process. As long as AMD will continue produce GPUs externally (which is obviously not a given), I'd like to stay with that model, and there I believe my argument still holds.

If there are multiple timing skews for R670/G92/... it will be interesting to see how much the clocks differ from from each other. They are really close for e.g. the 8800Utra/GTX/GTS, so there is clearly not yet a problem.

There is no expectation for this to be any easier at 45nm and below.

If wires continues to play a larger role in the overall delay, I expect that variance in speed actually to go down. (Just like we're already seeing now.) You can't significantly reduce wire delays by increasing voltages.

Fabs don't normally release the distribution data, but there was a rough idea that the worst-case variation was something close to linear for clock timing, and much worse when it came to leakage between devices on the order of 10x (all else being equal).

I agree that leakage variation can be quite high within the same process. Speed variation is much less so, once again keeping my more restricted rules in mind. Unlike the GPU world, there are lot of silicon products where all chips have to run at the seem speed. (Think cell phones, modems, TV chips, ...)

Device variation was characterized as having a linear factor between devices, something like a factor of two between the extremes, ...

I assume you mean speed variation. That sounds about right. In reality, nobody really cares about the fast corner, except for hold time violation checks. Everybody simply uses the slow corner, because FABs demand it.

Intel's success at getting quad core CPUs out over a year before AMD's single-die solution shows that there are benefits to multichip at 65nm.

I'd argue that this is more a matter of getting to market faster. Just like the 7950GX2 was a nice way to crash the R580 party while the next big thing (in the same process!) was getting ready back-stage.

Anyway, my main initial argument was that debugging chips with 1B transistors didn't have to be a major burden. We deviated quite a bit from that.

DegustatoR · Sep 28, 2007

CJ said:
I'm hearing things about a die-size of 17x17 for G92, which would seem to be a fairly large chip for the $199 bracket. And we all know that nV loves good margins... A chip this size doesn't fit in nV's margin model at all to say the least. Any thoughts?

G92 is either hi-end or hi-middle, it's definately not for the $200 price range.
The way i see it is like this:
G98 < G96 < G92 [< "G90"]

And 1 bln trannies chip is surely doable at 65nm. If someone (i won't point fingers here 8)) can't do it doesn't mean that it can't be done

And with this in mind i think that if NV's gonna 'play safe' this time and do the same type of hi-end which AMD will be doing (small hi-end chip + SLI/CF card at top-end) then they will loose an opportuninty to destroy AMD performance-wise in this generation.
Tactically this is good for their financials.
Strategically this is bad for them and for the whole 3D market.
Unless... we're not seing the whole picture 8)

dnavas · Sep 28, 2007

CJ said:
I'm hearing things about a die-size of 17x17 for G92, which would seem to be a fairly large chip for the $199 bracket.

For anyone like myself who can't be bothered to remember chip sizes, but who nevertheless want to follow from home, the G80 is 21.5x22.5, G84 is 13x13, and RV630 is 13.3x11.5. 17x17 is fairly close in size to G70, and that does seem large....

17^2/21.5/22.5*90^2/65^2 leaves enough room for roughly 14% more trannies over a G80. That's obviously a rough estimate, but it certainly doesn't square with your other rumor (64 stream procs) unless that's the number for double-precision.... And I definitely wouldn't expect <8800GTS out of it if that were the case, unless they hadn't figured out how to run the 64 DP SPs at 2x half precision....

Very strange.

Farhan · Sep 28, 2007

dnavas said:
For anyone like myself who can't be bothered to remember chip sizes, but who nevertheless want to follow from home, the G80 is 21.5x22.5, G84 is 13x13, and RV630 is 13.3x11.5. 17x17 is fairly close in size to G70, and that does seem large....

17^2/21.5/22.5*90^2/65^2 leaves enough room for roughly 14% more trannies over a G80. That's obviously a rough estimate, but it certainly doesn't square with your other rumor (64 stream procs) unless that's the number for double-precision.... And I definitely wouldn't expect <8800GTS out of it if that were the case, unless they hadn't figured out how to run the 64 DP SPs at 2x half precision....

Very strange.

DP would almost certainly be 4x or more slower than SP, assuming they don't want to spend a lot of transistors on that feature. Getting DP to run at 1/2 SP speed would cost a lot of transistors. And since SP will be the main workload, i doubt they're willing to spend so many more transistors to do that.

nAo · Sep 28, 2007

DP will be 4x times slower, I believe..

AnarchX · Sep 28, 2007

CJ said:
I'm hearing things about a die-size of 17x17 for G92, which would seem to be a fairly large chip for the $199 bracket. And we all know that nV loves good margins... A chip this size doesn't fit in nV's margin model at all to say the least. Any thoughts?

This is not G92 (or the GF8700GTS/8800GT-chip), its G90 (the new high-end-part)?

Here is something said, about 600MHz core-clock at 8800GT (aka 8700GTS?):
http://www.mobile01.com/topicdetail.php?f=298&t=410038&last=3704595

Twinkie · Sep 28, 2007

I agree with AnarchX. nVIDIA never specifically mentioned that G92 will be their high end nor the idea that G92 will achieve 1Tflops. I still think theres a G90 out there unless nVIDIA is once again trying to push the "Quad SLi" concept once again into the enthusiast market seeing as the next gen nforce motherboards support, three way SLi (x16/x16/x16 or x8 for the last slot).

Could there be problems with yields so far with 65nm?

NonNative · Sep 28, 2007

Maybe there is no G90

If you looked into GeForce history line,you will see it beginning with NV10,15,20,25.......until GeForce 7 line its NV47.So could it happened to GeForce 9 too ? Only Nvidia knows

Vincent · Sep 28, 2007

I heard that G92 will be the 8800GT as a disabled ASIC.

But the G92 contains more than 64 SP.

AnarchX · Sep 28, 2007

Vincent said:
I heard that G92 will be the 8800GT as a disabled ASIC.

But the G92 contains more than 64 SP.

And "I heard", NV try to bump up clocks to compete against RV670.
So why they not just active the the other SPs to maybe 80SP.

My bet: G92 is native-64SPs with die-size around 200mm² and the 289mm² thing from CJ will be the 1 TFLOPs-chip.

vertex_shader · Sep 28, 2007

GeForce 8800 GT

D8P official product names are GeForce 8800 GT, but not before the rumored 8700 GTS.The reason is very simple, because it has been approaching the effectiveness of existing GeForce 8800 series, cool default at about 10,000.

The opponent ATI RV670 XTX.

GeForce 8800 GT, there will be two versions, but only the difference between memory capacity, 512MB and 256MB Two other specifications, 600MHz core clock, 64 SP. So this two cards can be expected only in the performance of high-resolution, high-AA settings, the game material (such as the DX-â‚¬ monster game) will be a difference.

Single-slot design, PCI-Express 2.0, built-in HDMI / HDCP, two Dual-Link DVI. But uncertainty is not the third generation PureVideo HD, briefing mentioned.

In a briefing card photos, much like long GeForce 7800 GTX, is a slender, like the single-slot radiator, radiator cover the whole card, almost see PCB, see bare boards, so there are no uncertain NVIO video chips.

November will be the first official public sale Nvidia card, followed by downstream modified version.

Link

Jawed · Sep 28, 2007

Vincent said:
I heard that G92 will be the 8800GT as a disabled ASIC.

But the G92 contains more than 64 SP.

This is just like G80 with 96 SPs being GTS and 128 being GTX.

The question is, why not release the full chip (unless the yields are abysmal)? Are we going to start hearing about the secret SKU soon?

If the 64 SP GPU performs slower than 8800GTS then I'll be surprised.

Jawed

Rosaline · Sep 28, 2007

vertex_shader, you linked to the wrong thread

For those interested, look here for the correct thread. The thread itself is on a message board, quoting information from another tech site, which claims to have seen an official press release regarding the new products.

I am still highly suspicious, to be honest, since this is exactly the same sort of thing people have been saying for months ("official briefing", etc) but then talking about different products. The proposed 8800GT does fill a gap in the product line, but if it is g80 based, why was it not released sooner, and if it is based on a new design, why are they not launching a higher-powered version immediately and why is it in the 8*00 range of products?

My only guess has to be that they are starting with the low cost, high-yield, mainstream high demand line first, which would make good business sense and allow their driver teams time to adapt to any design changes.

However I would still recommend waiting for more official news before anyone gets excited or disappointed.

3dilettante · Sep 28, 2007

ShaidarHaran said:
Fact is that the upcoming 5000+ Black Edition 2.6GHz chip only draws 65W on 65nm. I think 65nm is doing just fine

Yes, the last SKU on an end of life product line has a top bin with good leakage characteristics over 8 months after the introduction of the 65nm process.

The point of the entire discussion of process variation is that there are chip bins of the exact same stepping from the exact same process that if they were set to the exact same voltages and clocks would either leak as much as the 5000+ Black Edition or possibly many times as much.
The lower bins that didn't make the cut already leak several times more at their reduced settings.

These chips came from the same fab, same process, and possibly the same wafers.
That's called variation.

My point about process variation becoming a significant impactor of chip binning and performance scaling is not invalidated by a limited edition SKU that only exists because Phenom--a design that likely implements some techniques to combat process variation--should have been out months ago.

dnavas · Sep 28, 2007

Farhan said:
DP would almost certainly be 4x or more slower than SP, assuming they don't want to spend a lot of transistors on that feature. Getting DP to run at 1/2 SP speed would cost a lot of transistors. And since SP will be the main workload, i doubt they're willing to spend so many more transistors to do that.

Yes, sorry, I believe I was trying to say you'd have (well, might have) twice the *count* of fp32 as you would fp64 shaders, rather than twice the speed. However, as long as my ignorance is firmly out of the closet, I have a question (several) about implementation.

If my hopelessly human long multiplication is any guide, fp64 ~= a MUL followed by 3MADs, but it looks to me like between the second and third MAD you could be "carrying" a 64bit fp (well, okay, 48bits or whatever the non-exponent size is).

Question 1: does it make sense to deconstruct the wider MUL64 across four clocks, or use two SPs and two clocks (or four SPs) with additional add logic?

Question 2: does this additional add logic get surfaced in any way when running fp32?

Question 3: if you run over four clocks, and you're already using the existing ADD32 logic for MUL64, how do you get MAD64?

Pointers and clue-by-fours equally welcome

Jawed · Sep 28, 2007

http://forum.beyond3d.com/showthread.php?t=42140

Jawed

The G92 Architecture Rumours & Speculation Thread

ShaidarHaran

hardware monkey

ShaidarHaran

hardware monkey

CJ

ShaidarHaran

hardware monkey

silent_guy

DegustatoR

dnavas

Farhan

nAo

Nutella Nutellae

AnarchX

Twinkie

NonNative

Vincent

AnarchX

vertex_shader

Jawed

Rosaline

3dilettante

dnavas

Jawed

Similar threads