So... what will G80/R600 be like?

SugarCoat said:
well i said this before but i guess i can do it again.

Nvidia is against Unified architecture citing that they wont use it until it shows benefit, and will problably continue to operate on highly programmable pipelines instead. This can benefit both in cost and dificulty to produce as well as overall speed (perhaps). I do not think they will launch their DX10 compatable products without a Unified part however. I expect them to very much launch either/or meduim/low end parts based on unified architecture. Both to get a feel for production and to aid in driver maturation for a flagship Unified part which may or may not come sometime in 2007 so that they arent releasing products that prove to be immature due to drivers. Lets face it, a retail product release helps both companies mature drivers far more then in house driver production. People can say company X has had so much time to do drivers that when it launches it will already be top notch, but i have never seen that as the case. The best performance drivers seem to come between 3-6 months after a launch and driver performance imrovments, both insignificant, and significant, continue through the products cycle.

The NV50 core has been in production for Vista and DX10 for quite a long while. Almost 2 years by my judgment. Both companies have had access to the ever changing DX10 API for well over a year. I see the "G" code named cores, as the stop gap between the NV40 and NV50. Think of it as hey look we've had this core on the burner for awhile but Microsoft keeps changing things as well as pushing release dates, we need to do something about our product inbetween then and now or we'll be infringing on codenames. Obviously Nvidia wont change their time table or core succession, so enter the G70 and departure for the time being of the NV codename. If there is infact a G80, i very much suspect it to be launched early or mid next year, and most certainly prior to their first DX10 part. And as soon as that DX10 part is introduced, i think we'll see Nvidia go back to NV codenames. This is literally, the best and most logical reason i can come up with for any reason of their code name departure from what they have been using for the last 5+ years.

They will keep riding this tech, NV40 derivative, modifying it through-out, keeping essentially the same SM3.0 technologies, until Vista launches (now late 2006/early 2007). Once that happens, we should see a very matured and substantially impressive/complex core, technology wise from them, as i think they have been working on it(NV50) for quite a long time.

R600 im sure will be its own wonder. Although even in its launch time table i dont think it will use imbedded DRam. Costs too much and will cause problems in games. I believe it takes specific coding to use it.

Thats my theory in a nutshell.

That is a very interesting thoery indeed. The G70 isnt much different than the NV40 is it? It is just 8 more pixel pipes and 2 more vertex shaders, high clocks, and faster memory?

We have heard about dual core solutions from Nvidia that sound like they are based on the G70 and of course SLI.

Nvidia could just be positioning themselves for the interim until Vista is near release by adding more pipelines, high clocks, more memory, or even dual core solutions.

Once DX10 shows up they will have their unified GPu ready to go.
 
Maintank said:
Doesnt DX10 requires a unified design?
No, it doesn't require that. AFAIK it requires to have the same features set on fragment and vertex shaders, but you can have that even with a non unified design.
 
G70 is because there were no more NV4x names left -- NV40,41,42,43,44,45,48. So they have only 46,47 and 49. 47 is G70. We know that there are G71,72,73 -- that's already one codenumber more than available in NV4x. Then there may be die shrinks of NV4x/G70 to 90nm and new NV4x-cores before NV50 which i believe is G80 now.
 
im sure i heard a while back that vista wasnt going to be launched with wgf2 but with wgf1.

if that were true then wgf2 would be released in 2007,

so g80 and r600 may be further away than expected and all were going to get in 2006 in upgrades of current architecture,

this would give r520 more time in the market before r580 is released.
 
DegustatoR said:
G70 is because there were no more NV4x names left -- NV40,41,42,43,44,45,48. So they have only 46,47 and 49.
Did NV48 ever show up? I was under the impression it was morphed into G72..
 
Ok, here is my competly far-fetched theory on R600.

Please feel free to rip it to shreds and proove it's utterly impossible. I would appreciate this actually :) - as I have little knowlege of the techincal production side of graphics chips afterall.


Anywho.

Looking at what ati have done with both r500 and r520, then applying these together (but far moreso) to r600, this is what I come up with.

what interested me the most about r500 was the separation of cache/low level fixed pixel functions from the mem controller and shaders. This could be taken further.
what interested me the most about r520, of course, was the new memory controller.

Combine the two. But go a bit further.

Currently, the memory controller on r520 circles the internals of the chip (from what I've read at least - although the die shots suggest otherwise). Now I was thinking, if you add pipelines, that should increase the die size. Therefore the memory bus gets bigger, therefore slower and requring more die... So. Why not actually seperate the memory controller from the shader logic. The memory controller + scheduler + ring bus can sit on it's own (far smaller) chip... Ideally this will speed up the ring.
At each node on the ring, use a similar interconnect found on the r500 (between chips), instead, connect this to a smaller daughter die.. This small chip could, say, contain 8 alus, 8 general purpous memory read/write units, 8 texture lookup units, 8 z compare units, small bit of cache, etc etc etc.
If the ring bus then has 8 nodes, there can be 8 of these smaller chips placed around the outside of the controller (forming a square 3x3 chips). physically they could be closer than the r520 daughter die, and they have the advantage of not sharing the same peice of silicon therefore hopfully significantly reducing yeild problems (and cost). Plus, of course, creating scaled back versions is far easier, and doesn't require using the full die from a more expensive product. etc etc. I know it probably wouldn't be as efficient, but surely producing 9 working 50 million transistor chips is *far* easier than producing one working 450 million transistor chip?

yeah. Sure it won't happen, but thats what I see as being a good idea.

feel free to shoot me down.

etc.
 
The problem with that idea is that the memory bus needs to be able to get data to/from the pixel processing units.
 
_xxx_ said:
Threw it away since it's 5 year old tech now? Everything from 3dfx IP is pretty outdated nowadays.

There are quite a few tidbits and past experience that has been used since the NV25 in their products that came from the former 3dfx and GP patent portofolio.
 
Chalnoth said:
Not necessarily. As David Kirk noted, it is an implementation detail. It's performance in real games that is going to be important.

Part of me really hopes that nVidia will go for a unified architecture, just because the idea of a unified architecture is so simple and beautiful. Part of me doesn't just so that we can see a good showdown between a more traditional architecture and a unified one.

The benefits IMO stretch way beyond normal 3D needs but could expand even more for GPGPU functions and many other markets NV is addressing.

ATI seems to also want to address the PDA mobile market (2nd generation) with a USC as just one example.
 
I rather like the idea or the way, 3dlabs have gone: Simple scalar units, then you don't have to worry about anyhting else, than how to improve your dispatcher and how to cram the largest possible amount of them anywhere on the chip.
(Well, not that easy, but a start)

AFAIK the US of the Xbox360-GPU are Vec4+scalar, right? Does anyone know about their "splittability"?
 
Ailuros said:
There are quite a few tidbits and past experience that has been used since the NV25 in their products that came from the former 3dfx and GP patent portofolio.

Sure, some of it is there in the GFFX but today, I doubt there is anything useful left. But then who knows? ;)
 
Graham said:
Currently, the memory controller on r520 circles the internals of the chip (from what I've read at least - although the die shots suggest otherwise). Now I was thinking, if you add pipelines, that should increase the die size. Therefore the memory bus gets bigger, therefore slower and requring more die... So. Why not actually seperate the memory controller from the shader logic. The memory controller + scheduler + ring bus can sit on it's own (far smaller) chip... Ideally this will speed up the ring.
At each node on the ring, use a similar interconnect found on the r500 (between chips), instead, connect this to a smaller daughter die.. This small chip could, say, contain 8 alus, 8 general purpous memory read/write units, 8 texture lookup units, 8 z compare units, small bit of cache, etc etc etc.

The point of the memory "ring-bus" is to speed up the communication between different parts of the chip. It would rather make it slower to implement that in another external chip, that really makes no sense. No matter how you do it, the internal version will always end up MUCH faster than the same thing implemented on external die.
 
Ailuros said:
The benefits IMO stretch way beyond normal 3D needs but could expand even more for GPGPU functions and many other markets NV is addressing.

ATI seems to also want to address the PDA mobile market (2nd generation) with a USC as just one example.
Perhaps. But if the instruction sets are identical, one could just use nothing but the pixel shaders, of which there are many more, and still get most of the performance. Unless you have an algorithm that requires a lot of linear interpolation (so that you could make use of the interpolated registers for efficiency purposes), I don't see why this wouldn't work.
 
Chalnoth said:
Perhaps. But if the instruction sets are identical, one could just use nothing but the pixel shaders, of which there are many more, and still get most of the performance. Unless you have an algorithm that requires a lot of linear interpolation (so that you could make use of the interpolated registers for efficiency purposes), I don't see why this wouldn't work.

My mind was directed more in the power consumption/die size direction and that's why I also mentioned the PDA/mobile market. In that market those two aspects are on the top of the list of any priorities. Having a base platform then that you can scale downwards does make more sense to me.
 
_xxx_ said:
Sure, some of it is there in the GFFX but today, I doubt there is anything useful left. But then who knows? ;)

This is way off topic, but I often have the feeling that NVIDIA wanted 3dfx more to gain a contract with Quantum3D than anything else. Quantum contracted NV immediately after the buyout and they were building multi-GPU configs for their Independence Systems ever since (2001?).

As for today:

Indeed, such a texturing mechanism sounded similar to 3dfx's fabled "Rampage" chip, and it didn't come as much of a surprise to find that Rampage's chief architect was in charge of NV40's texture and shader engine. We asked Emmett if this was the case with NV40 and he replied that "some in the NV4x range would feature this as it has benefits and drawbacks".

http://www.beyond3d.com/previews/nvidia/nv40/index.php?p=9

Kilgariff used to be one of the key engineers of the Rampage project and we're still in the NV4x/G7x era aren't we?
 
Graham said:
Currently, the memory controller on r520 circles the internals of the chip (from what I've read at least - although the die shots suggest otherwise). Now I was thinking, if you add pipelines, that should increase the die size. Therefore the memory bus gets bigger, therefore slower and requring more die...
Although it wasn't spoken about at the launch, I'd heard references to the Ring Bus last year and the one of the reasons gave for its implementation then was because of die size issues.

R520's memory controller itself is not a ring, the data return path is a ring round the chip; on other chips the memory controller is the same but both the client and request paths and data paths all point to that controller. If you increase the number of transistors then the conections between the clients and the memory system will increase in both cases - I think the point was that because the data return ring spans all the way raound the trip, the data return paths to the clients are likely to be shorter, so although it uses more space initially, the larger the chip grows the more savings are made in relation to their previous memory architecture.

(The size of the chip was one of the reasons I was given for the Ring Bus not existing on Xenos and for these reasons I was rather surprised to learn that it was included in RV530)
 
SugarCoat said:
Considering the age of the actual G70 chip i tend to believe that their next flagship product with significant improvments will be the G80. Anything relating to the G7x series should be minorly modified or on a new process as well as mid-low range series. However, combinging my theory with popular theory creates a strange launch schedule unlike what we've seen before if Vista launches on time. 3 high end products between january 06 and january 07.

We "know" that the G70 is what was formerly known as "NV47" and was supposed to be cancelled. Taking under account that logically anything NV4x= DX9.0 and probably anything "NV5x"= DX10, the G80 might as well be what was formerly expected to carry the NV50 codename.

Codenames though are less relevant; what I consider the most certain speculation is another SM3.0 high end part at 90nm. That's two high end parts for 2006 from NVIDIA including a DX10 part.

G7x die shrink with speed increases, which i dont think will ever exist. They'll shrink the die if they feel they have enough time to benefit cost wise from switching production to 90nm. Although i must use harsh fact and point to how fast they shut down NV40 production do to cost and the G70 supersession. So i dont expect to see this core on 90nm unless it launches before christmas time. But architecture differences as well as 30-40% higher clocks to that? I think thats pushing the realistic spectrum. Dont forget both companies are more then willing to delay launching to stock product for hard launching in quantity.

Possibly early 2006 = G7x@90nm and late 2006 DX10 GPU. Where's the third one?

That means both the R580 and the 90nm G7x (assuming its 90nm, had its architecture changed significantly, and has much higher clocks, have to be in full swing now or very soon if they're going to launch as early as people keep saying (early next year?). Something which i just dont see. I think we'll know alot more about plans in Febuary.

Timeframe for either/or depends often on uncontrollable factors, but yes early 2006 sounds reasonable for either/or. I don't expect personally and signficant architectural changes for either/or parts. One sounds like triple the ALU amount, the other with more quads.

G80 with significant improvments to the architecture, 90nm, what everyone expects from the G7x. Still remaining SM3.0. I'd expect this to be the real R580 competitor for spring/early summer. I'd also expect significant advanced shader enhancments from Nvidia here as well although perhaps not quite up to par with the R580.

G80 would build an oxymoron since it would signify a new generation in my mind. Granted "generation" has become a tad fluid these days, but since it will most likely be another SM3.0 part, it would make more sense to call it say G75 or something similar. Both are refreshes in a relative sense to today's GPUs and not a real new generation.

As for being up to par or not in terms of ALU throughput, doing some speculative math I could still figure G-whatever to be slightly ahead.

NV50 speculate what you will, perhaps 80nm, real monster fully DX10 compatable. Winter launch time frame just before holidays. IF MS can stick to a schedule. I'd expect many improvments to come like free AA/AF from both ATI and Nvidia by the time the NV50 and R600 are launched.

We already have in relative terms "free" 2xMSAA for quite some time now. Since I don't expect IHVs to move to single cycle 4xMSAA with the next generation either, I don't expect any significant changes.

As for the supposed "free AF", the only thing that still is for free on today's GPUs is garden variety bilinear; you don't even get yet trilinear for free, let alone anything AF.

We have high chances to see more sophisticated and higher quality algorithms for both AA/AF, but nothing that comes for free.
 
Ailuros said:
As for the supposed "free AF", the only thing that still is for free on today's GPUs is garden variety bilinear; you don't even get yet trilinear for free, let alone anything AF.

We have high chances to see more sophisticated and higher quality algorithms for both AA/AF, but nothing that comes for free.

I want an interview with "Tony the texture God". Somebody go make that happen. Thanks ever so much.

;)
 
Last edited by a moderator:
*BUMP*

Now let's see our predictions from a year ago.

I rather like the idea or the way, 3dlabs have gone: Simple scalar units, then you don't have to worry about anyhting else, than how to improve your dispatcher and how to cram the largest possible amount of them anywhere on the chip.
(Well, not that easy, but a start)

AFAIK the US of the Xbox360-GPU are Vec4+scalar, right? Does anyone know about their "splittability"?

:yep2:
 
Back
Top