NV35 confirmed to have 8 pipes or not?

I've been competely out of the loop lately. not keeping up with Beyond3D much at all. is the NV35 confirmed to have 8 full pipelines, in the same sense that ATI defined pipeline, and the way Nvidia did previous to NV30?

Also, are there going to be more vertex engines so that NV30 can push more geometry per clock? (as R300 pushes more, clock for clock)
 
What test would (dis)prove this?

Fillrate tests. Run them in 16bit(to eliminate mem bandwith limitations) and factor clockrate against pixel pipes and see what you end up with. If the board is running @400MHZ and you hit over 2.4GPixels, odds are fairly certain that you have an 8 pipe board. If you hit 1.6GPixels or lower, it is a four pipe board(in the traditional definition).
 
I'm expecting 8 pipe fixed function at a minimum, and also expecting the possibility of 8 pipe PS 1.3 behavior (same peak op throughput as NV30, but occurring more frequently). These are already very good and significant improvements, though people who believed the nv30 hype might expect an unreasonable increase in performance across the board (things already optimized for the nv30 might not increase in performance at all, depending on their bandwidth limitations).

I also expect PS 2.0 or PS "1.4+" execution to remain 4 pipe, with the possible important change that 4 * 4 component fp32 output might now be possible as well.
I thought of the possibility of it also going to 8 pipe for fp16 based on the assumption of broken functionality in the nv30, but I've come to think the extra chip real estate is simply dedicated to their approach for higher instruction count and dynamic branching in vertex shader hardware for the nv30, and the "rumors" and optimization discussions I know of seem to indicate this will remain 4 pixel.

Here is some more reasoning.
 
I thought nVidia all but confirmed it would be a "true eight pipe" card? All they'd need would be to allow eight color + Z pixels per clock, right?* I mean, would a 256-bit bus even be worth it for a 4 pixel/clock card?

*Ah, forgot about shaders. Well, we've only got a few weeks more before they show us, so I'm going to pass on the frantic speculating this time. :)
 
Hey, let's just ask nVidia!

I mean, they've really been extremely forthcoming with direct, straight-up responses to all technical questions over the past 12 months.
 
I wouldn't bet on 8 pipes, the way i see NV35 is just a fixed and improved NV30, ie. probably better shader performance, 256bit bus, no dustbuster, but still crappy AA, 4 pipes and so on.
 
Why do you so bloody want 8 pipelines (as 8 pixels/clock)? :rolleyes:
Lets do a little math shall we:
At 1600x1200 there are 1.92 million pixels on screen and if we are to draw this at 60 frames per second we need 115.2 million pixels per second of fill rate. Now look take fill rate of Radeon 9700 Pro for example: at 2600 million pixels per second its fill rate would suffice to overdraw each frame 22.57 times (now which game has or will have that kind of overdraw). That's 60 fps at 22.57 overdraw at resolution of 1600x1200!!
Of course you would run out of memory bandwidth even on Radeon 9700 Pro to do this. Lets say we have to draw 2600 million pixels back to front (worst possible case). For every pixel we have to: read Z (4 bytes), write Z (4 bytes), write color (4 bytes), texture read (4 bytes - simple point filtering not to complicate things further). To do this you'd need 40GB per second of memory bandwidth.
On the other hand: how many games today use simple single texturing or "one instruction pixel shaders" (multiplying a texture with constant or something)? How many games will do this in the future? There are very few situations in modern games where pure fill rate is the problem, and in this cases GeForce FX will act like an 8 pipe solution (stencil shadows for example).

But all in all, Radeon 9700 Pro still rules due to its 8 pipe architecture, because this 8 pipes also come with 8 floating point arithmetic units and 8 texturing units that can operate in parallel. On the other hand NV30 is a total mess when it comes its arithmetic capabilities. From tests we have done here on this forum it seams like NV30 is just an over clocked GeForce 3 or 4 when it comes to integer arithmetic speed and even slower on floating point arithmetic's. And that's why NV30 sucks.
If you look it from a little different angle (better angle ;)): Radeon 9700 can do 2600 million floating point operations per second, while NV30 can do 2000 million fixed point operations per second (GeForce 4 Ti 4600 can do 1200 million fixed point operations per second).
IF NV30 would be capable of 8 floating point operations per clock it would be able to do 4000 million floating point operations per second and it would leave Radeon 9700 Pro far behind. But then again NVIDIA probably wouldn't target 500MHz and would need a dust buster cooling solution... :rolleyes:
In the end with fill rates as high as they are today you don't really need to push them even higher. You should just make sure that pixel shaders run fast enough. To do this you don't need to output 8 pixels per clock instead of 4 pixels per clock as 99% of stuff will take a couple of clocks anyway.
No problem if NV35 or even NV40 are sill just 4 pipe solutions. But they need to push 8+ floating point arithmetic instructions per clock in parallel with 8+ texturing instructions just like Radeon 9700 and Radeon 9800 as NV30 doesn't do that.
 
Typedef Enum said:
Hey, let's just ask nVidia!

I mean, they've really been extremely forthcoming with direct, straight-up responses to all technical questions over the past 12 months.

LOL :) I agree, NVIDIA has a credibility problem, but I haven't found them to be credible for several years now. :)
 
The NV35 is as much of a 8 pipeline solution as the NV31, AFAIK ( 90% sure )
I've already explained this so darn many times, I *PROMISE* it's the last one I'm gonna explain it.

Basically, it's 8 pipelines, but only 4 shader pipelines, like the NV30. It does have slightly better per-clock shading performance though. I've got no real details on how personally, but my guess is decoupled texturing/FP units.


Uttar
 
Uttar said:
The NV35 is as much of a 8 pipeline solution as the NV31, AFAIK ( 90% sure )
I've already explained this so darn many times, I *PROMISE* it's the last one I'm gonna explain it.

Basically, it's 8 pipelines, but only 4 shader pipelines, like the NV30. It does have slightly better per-clock shading performance though. I've got no real details on how personally, but my guess is decoupled texturing/FP units.

Point is that NV30 has 8 shader pipelines, but only 4 ROPs. My guess is NV35 will have 8 ROPs and 12 shader pipes.
 
Mephisto said:
Uttar said:
The NV35 is as much of a 8 pipeline solution as the NV31, AFAIK ( 90% sure )
I've already explained this so darn many times, I *PROMISE* it's the last one I'm gonna explain it.

Basically, it's 8 pipelines, but only 4 shader pipelines, like the NV30. It does have slightly better per-clock shading performance though. I've got no real details on how personally, but my guess is decoupled texturing/FP units.

Point is that NV30 has 8 shader pipelines, but only 4 ROPs. My guess is NV35 will have 8 ROPs and 12 shader pipes.

No, no! Please see it in a more precise way.

The NV30 is:
4FP ops/clock OR 8TEX ops/clock
8FX ops/clock
4 color outputs/clock

I think the NV35 is:
4FP ops/clock
8TEX ops/clock
8FX ops/clock
8 color outputs/clock


Uttar
 
Lezmaka said:
I don't think we'll really know until someone can run tests on an actual card.

::sigh:: And this is where nVidia fucked up the most with NV30.

It'll take at least 2-3 generations of full honesty before anyone here believes anything said about nV's cores without heavy testing to be sure about it...
 
Even taking into account that when using pixel shaders what rules is the instruction rate a 8 pipe architecture could still be better than a 4 pipe architecture. The first has 1 fp unit per pipe, and the second 2 fp units per pipe, 8 fp units both architectures, and comparable clock and fp performance per unit. If the 4 pipe architecture could just fetch instructions for those two units (in the same cycle) from the same pixel/thread data dependences would make the 8 pipe architecture faster/more efficient. The 8 pipe architecture would exploit more of the data parallelism between pixels than the 4 pipe architecture.

Of course someone could say that a '4 pipe' architecture could fetch two instructions from different pixel/threads per cycle, but in my opinion that is the definition of a 8 pipe architecture not a 4 pipe architecture (working in 4 or 8 pixels concurrently in the same cycle). So in my opinion and taking into account that FP operations doesn't use to have single cycle latency (which allows to chain integer operations with dependences), even less with the increasing clock rates we will see in the future, a 8 pipe should be always better than a 4 pipe architecture.
 
So is it eight pipes or not?.....c'mon guy's, I need to know!.....I have people I need to go look smart in front of with the correct answer! :devilish:
 
micron said:
So is it eight pipes or not?.....c'mon guy's, I need to know!.....I have people I need to go look smart in front of with the correct answer! :devilish:

Uttar is saying that NV35 is still 4x2 as NV30. The only difference seems to be that the FP unit and texture address calculation unit are now separated units (per pipe).

Other than that I don't have a clue about what NV35 is, however I guess that trying to convert a 4 pipe design into a 8 pipe design isn't as easy as it would seem and I would be very surprised if it is 8 pipes. Those things take a lot of time. If it was so easy why would they be already working in NV50/R500?
 
Thanx RoOobo ;) Eight pipe or not, I'm scared for ATi...256bit.... 2.2ns memory at 900mhz....man!. How do you suppose the fillrate of this card will compare to ATi's best?
 
Back
Top