Next NV High-end

I don't have any webspace to host images and I don't have permission to post attachements. What I'm proposing is that "US1" on your diagram takes care of parameter interpolation, and that the blocks of cache immediately to the right are the vertex cache.

If you zoom in on one of the US units you have marked you'll see two types of black rectangles: the darker, larger, "fatter" ones (just like the blocks in the area I think might be the vertex cache) and slightly thinner rectangles, which appear to be split in half. There are 9 of these per highlighted "US" block in your diagram (e.g. 18 total tiny blocks since each of these nine are split in 2). So my theory is that these 18 blocks are register files containing state for the currently active thread(s), and that some of the larger rectangular blocks are cache which hold state for stalled threads - thus leading me to speculate that each US unit actually has 18 pipes (2 redundant pipes for yield).

Anyway, this means that each US block could tolerate 2 "fatal" defects. Given the area of each of these US blocks, this seems much more reliable than requiring 3 out of 4 US blocks to contain no "fatal" defects whatsoever.

I'm not an EE though, so I wouldn't put too much stock into what I'm saying.
 
If you look closely at each of the left-hand US blocks, as I've marked them, then you can see how they're divided into three down the left hand side. 1 and 3 are oriented one way, 2 is upside down.

b3d35.jpg


I'm afraid I can't say I find your argument about interpolation very convincing. It's a fixed function which wouldn't need the vast resources of a programmable pipeline to execute.

You appear to be describing the Scan Converter in:

012l.jpg


As to the 18-pipe per US for yield idea - I dare say that's just way too fine-grained. It's normal for whole fragment quads to be de-activated, but I don't think we've ever seen a smaller unit be de-activated.

If you're trying to decode the organisation of a US, I suppose it's worth remembering that it's Vec4+scalar (i.e. 5D) co-issue capable. It may also be capable of Vec3+scalar+scalar. etc...

Jawed
 
Well, for redundancy to make sense, a large portion of the die would have to be capable of redundancy.

That is to say, if you take the assumption that a fatal error can occur anywhere on the surface of the die, it only makes sense to have redundancies if those redundancies will save a large percentage of the cores that have one defect, two defects, etc.

For an example of what I'm talking about, the idea that each of the three ALU blocks actually contain 18 pipelines would allow for the possibility of one error in each of those three blocks. If we claim that the total area covered by these ALU's is 30% of the die (assuming 10% per shader unit), then approximately 30% of the dies that have one error would be saved by this technique. About 6% of the dies that have two errors would be saved, 0.6% of those with three errors.

So, whether or not it would be worth it would depend upon how much extra space the redundant units and the circuitry to turn them off would take, as well as how common defects are.

Personally, I suppose I don't care all that much. Hence my aforementioned lack of contribution. But there's something.
 
Crikey, I'm all for idle specualtion, but trying to flush out the trannie count for R580 from a die shot of Xenos, now that's keraaaazeeeee ;)

That said, if 302 million (or whatever the precise number is) works for NV on 110nm, then 400 million doesn't look like a crazy number for 90nm. I should say, however, that I have no direct info on the count for R580.
 
Last edited by a moderator:
Chalnoth said:
Well, we heard that as a possibility some time ago, but I don't buy it. I don't think that ~300 million transistors is enough for a 32-pipe chip. What's more, given the incredible overclocking potential that some are getting (I'm seeing people clocking the G70's up to 600-700MHz on the Futuremark ORB), on the surface it would seem that the G70's yields are pretty damned good right now.

Transistor count could probably be higher...
 
geo said:
The elliptical unstated purpose of the exercise was to wonder if the R580 some of us are wish-fulfilling on might be too big for 90nm at this stage in its life-cycle as a process. But I seem to remember you are a proponent of that point, so hopefully you got that.

Did you expect 302M on 110nm? ATI's headache if the transistor count is in a relative sense "too high" would be more clockspeed than anything else IMHO.

I've wondered by the way in the past whether 90nm will be enough for DX10 cores and not really any DX9.0 core.
 
Ailuros said:
Did you expect 302M on 110nm?

Nope. But until caboosemoose used "400m" upstream a few posts, I hadn't seen anyone use a number bigger than 350m for 90nm. It seems useful to me to get a sense for what is doable. . .even when you allow for the possibility you'll be surprised in the end anyway.
 
The question for DX10 parts and 90nm would be how high that "upstream" over 400M really is.
 
geo said:
Nope. But until caboosemoose used "400m" upstream a few posts, I hadn't seen anyone use a number bigger than 350m for 90nm. It seems useful to me to get a sense for what is doable. . .even when you allow for the possibility you'll be surprised in the end anyway.

Well, I'd point out the following (numbers from memory, so might be slightly off, the point I am making remains):

NV40 130nm 220 million
G70 110nm 302 million
A.N. Other GPU on on 90nm...

Of course, R580 is not an NV chip, but a 400 million trannie 90nm GPU would be reasonably consistent with the above.

Also, I'd be VERY surprised if R580 wasn't more than 350 million transistors.
 
geo said:
Nope. But until caboosemoose used "400m" upstream a few posts, I hadn't seen anyone use a number bigger than 350m for 90nm. It seems useful to me to get a sense for what is doable. . .even when you allow for the possibility you'll be surprised in the end anyway.

Intel is doing a 1.7 Billion transistor chip (Montecito) on 90nm. Ok most of that chip is cache, but it is still a lot of transistors.

I think a 450million transistor GPU is plausible on a mature 90nm process (H2 2006).
 
I just don't see that part being a "G71". G75 would be more like it.

Unless, of course, Wavey was right about hidden quads. :LOL:
 
I think what will be most impressive is if Nvidia can get their first LARGE 90nm part out without any major fallbacks. I have to say that they have set themselves up for success. With the release of the G70, they have a full product cycle to test and tweak.
 
Yeah, well I suspect ATI thot the same thing, if you look at when X850 came out.

But I wish NV the best with 90nm on the high-end. . .(everyone seems to agree that mid-rangy 90nm is going just swimmingly for all concerned). Process struggles might create competitive advantage in the short term for one company or the other, but I can't see that they ever do anything good for us plebes.
 
Yep, that info is BS. Here's some slightly more reliable although less precise info at least, but don't kill me if it ain't right.
G71: 16PS, 500-600Mhz, 128-bit memory bus. Less than 16 ROPs.
G72: 8PS, 450-550Mhz. No info about the ROPs.
G7x: 32PS, over 600Mhz at least in "some ways". I heard

When I say G7x, it's because I'm unsure whether the proper codename is G73,G74 or G75. I have yet to hear or read anything from anyone that's more than guesswork on this subject. Don't expect G7x before Q1 2006, most likely "late Q1 2006" or even early Q2 in fact. mid-february is the absolute minimum. Short-term, NVIDIA's answer will be a 24PS/8VS G70 with clockrate tricks ala Quadro (there was a respin afaik since it'll remain part of their product line until G80, but I'm unsure about this).
Something I'm surprised there's no info on, actually, is whether the G71/G72 have a chip-part-dependant clockspeed ala G70GL.

Anyhow, here's what I know regarding NVIDIA's desktop standalone productrange for Christmas 2005, I'd say it's quite solid:
$49 to $79: GeForce 6200(TC)
$99: GeForce 7200 (8PS)
$149: GeForce 7200 Ultra (8PS)
$199: GeForce 7600 (12PS)
$249: GeForce 6800 GT + GeForce 7800 non-GT (16PS)
$299: GeForce 7600 Ultra (16PS)
$349-399: GeForce 7800 GT (20PS)
$449+: GeForce 7800 GTX (24PS)
$549+: GeForce 7800 Ultra (24PS)

The GeForces 6600s are being phased out, as should have been expected. The 6200s will be kept for a fair bit longer as NVIDIA's lower-end solution. The 6800s will only be fully replaced when the G7x hits (and the G70 drops in price, thus), although they really aren't a very, let us say, significant part of NVIDIA's roadmap anymore.
6200/7200/7600/7800/7900 will be NVIDIA's productline until Vista/G80, Q3/Q4 2006.

Anyhow that's what I know so far, just don't kill me if I'm wrong :smile: But yes, people wondering whether the X1300 will get competition before Christmas will be pleasantly surprised I think, at least unless it got delayed, which isn't entirely out of the question sadly, considering I haven't heard tape-out confirmations yet, although I would suspect it did tape-out a few days/weeks ago. The "GeForce" 6100 and 7600s were the ones the 90nm chips that taped-out.


Uttar
EDIT: In case that wasn't obvious, there might be other G71/G72/G7x models than the ones I listed. However, afaik, there won't be a 4PS G72 at launch. G7x-wise, I'd expect one model at launch ala G70, followed by two other ones a bit later.
 
Last edited by a moderator:
Back
Top