First NV31 Announcement Rumors...

Well, you're right, having the pixel shaders being fully-functional is an absolute necessity. Still, removing the vertex shader entirely will really hurt performance in new games that are primarily CPU-limited. That, and as a low-end card, performance with slower CPU's should be a primary concern.
 
Strange...nVidia hasn't done this yet.

And they also haven't ever produced a value part with any programability at all (DX8 minimally). So perhaps nVidia is just having a rougher time than others in getting such architecture to fit into the target transistor count for a budget part?

Still, removing the vertex shader entirely will really hurt performance in new games that are primarily CPU-limited.

Yes. However, with synthetic benchmarks, software vertex shading might not hurt too much. nVidia may be counting on this.

I'd rather they remove it entirely if it saves the pixel shaders from getting castrated.

Agreed.
 
Agreed. Since most games are fill-rate limited right now, cutting back the Vertex power doesn't really hurt much. Now cutting it out entirely has more of an impact so not sure if it would ever but cut out entirely (even the Radeon 9000 still ahs 1 Vertex Engine).
 
Sorry, I got my wires crossed a bit and have had to correct my last post. NV31 is the dodgy one right now, not NV34. That makes things a little more interesting, huh? :LOL:

If they get all the bugs ironed out then it should still be very competitive with 9500Pro/9700.

NV34 just came back from the fab. It's a 0.15u ASIC. That might also suprise a few people, I guess.

MuFu.
 
R9500 Pro (8 pipeline ,1 TMU per pipeline) vs NV31 (4 pipeline, 1TMU per pipeline), i think NV31 need more higher clock(core/memory) to achieve the performance between R9500 Pro and R9700 .
 
RussSchultz said:
snk said:
MuFu said:
NV34 just came back from the fab. It's a 0.15u ASIC. That might also suprise a few people, I guess.
So, NV34 could be the revamped NV25?

Why would that follow?
Well, there has been some talk about a revamped Ti4600 being introduced by nVidia at CeBit (http://www.theinquirer.net/?article=6965). If NV34 is .15µm and is only DX9 "compatible", then it could be an NV28 with some GeForce FX technology. Just like the GF4 MX, which is a GF2 MX with some GF4 tech (e.g. memory controller, some shader functionality).

Considering NV34's model number, it should have all the DX9 functionality. As such, at least the pixel shaders should be done in hardware. Vertex shader could be software emulated.
 
The process technology has nothing to do with the functional capabilities of an ASIC. Being .15u does not indicate or counter indicate that it is derivative of the NV2x.
 
RussSchultz said:
The process technology has nothing to do with the functional capabilities of an ASIC. Being .15u does not indicate or counter indicate that it is derivative of the NV2x.
Of course not. I'm merely speculating here ;)
 
Being .15u does not indicate or counter indicate that it is derivative of the NV2x.

I disagree.

In this case, we have nVidia making 0.13 cores which, as far as we know, are "true" DX9 parts...NV30 and NV31.

We also have nVidia making DX8 cores on 0.15. (GeForce4 Ti series.)

If you were designing a "DX8/DX9 hybrid" of sorts (NV34) product targetted on 0.15, would it be cheaper to start with the already existing 0.15 NV2X core, or "port" the NV3x design from 0.13 to 0.15?

I would say the former. We know it is far from "trivial" to move one product from one fab process to another.. It's almost a complete lay-out redesign. Why wouldn't nVidia just keep the NV34 on 0.13, if it is based more on NV30 than NV2x?
 
You make it sound like chips are bred, instead of designed.

The parts that make a NV3x different from an NV2x are almost certainly not process specific. Choosing one or the other in a different process technology won't affect your layout time one bit.
 
MuFu said:
It's a 0.15u ASIC. That might also suprise a few people, I guess.
MuFu.

That would be very surprising indeed because the CEO stated many times they will move their entire line of products to 0.13um by the end of 2003.
 
Lessard said:
That would be very surprising indeed because the CEO stated many times they will move their entire line of products to 0.13um by the end of 2003.

The same CEO also said many times the NV30 would be revealed earlier than it had and that it would be in time for Christmas. Why believe what he says now? He's only giving best-case foward-looking scenarios. I think they've been unfortunate and haven't been able to hit any of their best-cases. It almost seems like they've had to go with Plan B. Then again, he did say by the end of 2003, and it's just begun.
 
You make it sound like chips are bred, instead of designed.

You make it sound like each chip is designed in a vacuum, instead of some chips being derived using other completed designs as a basis.

The parts that make a NV3x different from an NV2x are almost certainly not process specific.

Now you're starting to sound like Chalnoth with your "almost certainty" type statements.

If the parts that make NV3x and NV2x "different" are not process specific, then what's the big deal with moving one chip to a new fab? I could've sworn that you were one of the proponents of the "it's not trivial to simply move a design from on process to another." Now you're telling us that certain things are trivial, and those things are also what happen to be the difference between NV2x and NV3x designs?

Choosing one or the other in a different process technology won't affect your layout time one bit.

That makes zero sense to me. You have several years of experience and several designs of DX8 based parts on 0.15. All NV2x architecture.

You also have experience bringing an entirely new architecture ("their biggest contribution to the 3D market") to market on 0.13.

And if you're going to bring a new chip to market on 0.15, you're seriously telling us that if doesn't make one bit of difference in terms of time/investment if that product is based on NV2x or NV30 architecture? I don't buy it.

I'm NOT saying that I can't see nVidia doing a NV2x/NV3x hybrid part based part on 0.15. In fact, I give it maybe 60/40 chance that the part is NV2x based vs. NV3x based.

I just think its "almost certainly" wrong to say "Choosing one or the other in a different process technology won't affect your layout time one bit."[/url]
 
I'm saying that unless you're re-using very large sections of your design, it won't affect your schedule much by sticking in the same process. But, yes, you got me. I did say "wouldn't affect it one bit", which is obviously a not a correct statement. More to the point is it wouldn't affect it in any particularly meaningful or predictable way.

First, let me do a little bit of exposition:

In mixed signal designs, there are two aspects: analog and digital. I'll concentrate on the digital design because that is where the design of the pieces that we're most interested in(dx8 vs. dx9) is differentiated.

In the overwhelming majority of cases, the digital design done by fabless semiconductor houses these days is using a methodology called standard cell design. The standard cell design methodology takes a high level language that describes the logic flow and eventually comes up with the LOGICAL gates necessary to bring that design to life. The FAB and/or 3rd party provides a standard cell library that tells the layout programs how to convert those logical gates into a transistors.

At the high level language stage, the logical design can be complete and completely portable between fabs. You can write test and verification suites, etc to prove out the logical correctness of your design. These steps (by far) dominate the schedule.

See http://wooster.hut.fi/kurssit/s88133/slides/lecture_06.pdf

Getting a working physical design adds another dimension to the whole thing. In order for the thing to work at your design goal, timing "must be met"--essentially as data flows through the design, any path between two "registers" must be completed in a clock period. This timing information for the standard cells is provided by the vendor and fed into the timing analysis and layout programs. They'll come back and tell you which areas meet timing and which don't. The tedious process of hitting the tallest peg only to find the next tallest peg until you've met your requirements begins. This process depends on both your logic and high level block placement, in addition to the standard cell library, the process, etc. Simply re-using a block doesn't guarantee the design will meet timing, especially if your timing requirements have changed. On the other hand, going to another process with a different standard cell library doesn't guarantee the design won't meet timing. In general, if you take a design at .25 and move it to .18 exactly, you probably won't have to do much work fixing things, and probably will run about 50% faster--the inverse is the same. Moving around the different processes is much the same, either shrinking or getting larger. That being said, things get worse going further sub-micron as you get new effects that start affecting timing, and going for an unknown process certainly adds risk and uncertainty.

This physical synthesis phase of the design is something that has to be done for any design and lasts for about 10-20% of the total project schedule. This aspect of the digital design marries the design to a process and fab.

The analog design, on the other hand, is nearly completely targetted toward a specific process and even fab. This is the piece of a design which prevents easy portability between fabs and processes.

Back to the digital section, which is what we care about: The NV3x blocks that have been sucessfully synthesized at .13u may run just dandy at .15u given your target speed, or you may need to do wholesale changes to the NV2x blocks to get them to meet the target speeds (which we will only assume will be faster than your current design). The bottom line is you don't know. Beyond that, they'll likely be changing enough elsewhere to make the synthesis savings negligable. (for example, If previously you needed A+B = C to happen in a clock cycle, and now you have A+D = E, you might not be able to reduce the time of D, so you have to go redesign A).

And that is just the design aspect of it. .15 may be cheaper, or percieved safer for them and as such the business or risk factors outweighs whatever hit in schedule they factor in.

Given all that info, I'm still saying that deriving the functionality from the process is an excercise in futility. There's so many other dominating factors in the design and business side that the possible savings in time between using NV2x or NV3x technology mean diddly squat.
 
Ha. I thought that myself. No good excuses that I can use now that I won't use later (the whole not enough time thing). Eventually I'll just have to bite the bullet and write the article. The honest answer is i'm a little bit scared I'm getting the details wrong. Its ok to be a crackpot in the forums, but publishing an article...

If it means anything:
a) The house is about to be finished--a week or two at the most.
b) I'll be auditing VLSI-II in the spring with my wife
c) we're deep in the design of a new product (meaning I'll see the process again and become more familiar with it).

So, my "i'm busy" excuses have just about run out, I'll be formally learning more about it, and I'll be able to see it in action as we go through it at work. Anyways, I'll write a TOC/outline and get it to you in a couple of weeks.
 
RussSchultz said:
...

At the high level language stage, the logical design can be complete and completely portable between fabs. You can write test and verification suites, etc to prove out the logical correctness of your design. These steps (by far) dominate the schedule.

See http://wooster.hut.fi/kurssit/s88133/slides/lecture_06.pdf

...

This physical synthesis phase of the design is something that has to be done for any design and lasts for about 10-20% of the total project schedule. This aspect of the digital design marries the design to a process and fab.

I think you're underestimating the total amount of time consumed from when synthesis is first possible and when a chip actually tapes out. What you describe is possible for a low complexity, relatively low speed chip, but if you were to try to make a 100+ million transistor chip with clock speeds in the 300-400 MHz range, there is no way you'd finish all the stuff that binds you to a fab in 1-2 months (assuming a 12-18 month project schedule, 10-20% is 1.2-3.6 months). Once the logic is in good enough shape to be synthesizable, that starts immediately. It can take several to many iterations to even come close to meeting timing, and that can also require (sometimes major) changes in the logical layout of the chip. I'd probably put the number in the 30-40% range where the physical aspects of the design (the binding parts) are being worked on. It might be less if you're doing a more traditional model where all of the physical design work is done by the fab, but neither ATI nor Nvidia are likely to be doing that - it would likely produce a very non-competitive chip.

Now a lot of these things overlap with each other, so it isn't as neatly broken down as 20% of time is logic design, 20% is layout, etc... But you do get bound to a particular fab/process much sooner than 80-90% into the project. If you were to wait for all verification to be done before any synthesis or physical design work was done (as an example), your schedule would blow up to an unacceptable time frame.

RussSchultz said:
The analog design, on the other hand, is nearly completely targetted toward a specific process and even fab. This is the piece of a design which prevents easy portability between fabs and processes.

However it is also the analog parts that are most likely to get passed along between individual products, as long as they stick with a specific fab/process. All of the 0.15 Nvidia (or ATI) chips generally use the same process. When that is the case, the analog stuff is designed once for that process, and then left mostly as-is, unless they are broken or deficient in some way. So ATI's R200/RV250/R300 likely use many of the same analog parts (PLLs, DLLs, DACs, I/Os, etc...) because there is no reason to waste resources redesigning them. So if NV34 were to be a 0.15 version of some cut down NV30/31 logic, it's likely that many of the analog parts are pulled from the nv25/28. And future 0.13 designs from Nivida and ATI will probably share many of the analog parts with the nv30 and rv350.
 
Back
Top