NV35 - already working, over twice as fast as NV30?

Overall interesting performance from the NV35.

So both R350 and NV35 will work with true 8 pipelines (8 pixels output per clock) and have a minimum of difference theoretical single pixel fillrate (R350 = 380 MHz, NV35 = 400 MHz). That's a difference of 5 percent.

In terms of memory bandwidth they both have 256 bit bus and DDR1 with R350 running at 340 (680) and NV35 at 400 (800). That's a difference of 18 percent.

Even with nVidia having a more efficient memory controller I can't really see how the NV35 will leave the R9800 Pro totally in the dust. Or maybe we have to focus on a benchmark that uses multitexturing and is fillrate limited rather that memory bandwidth limited? ;)
 
NVIDIA have been working on a new type of technique called Synergy... this is where two + two = eight. I hear that it was slightly broken in the NV30 but they have it licked in the NV35! ;)

Edit: added the smiley
 
Also take into account that the Radeon 9800 Pro has a more advance memory controller (tweaked) and improved core so the AA and AF combination should be faster over the Radeon 9700 Pro at the same clock speed. HardOCP comparisons did show this but it wasn't really that much better and may have been more the two different paths that the 3.2 Cats take between the two cards. I am using the Radeon 9800 Pro path in the drivers on my Radeon 9700 Pro.
 
What driveres were used on the NV30 and NV35? It seems the latest Nvidia drivers aren't just tuned (read: hack/cheat) for 3dMark03, but also for Quake3 as has been posted here -- http://www.beyond3d.com/forum/viewtopic.php?p=99673#99673

I wouldn't put any stock in these numbers meaning anything when trying to guestimate/estimate final performance and how it'll compare to video cards with significantly better IQ.
 
Uttar said:
Thanks I'd be surprised if the NV35 truly had double the pipelines. I'd guess they're using some clever tricks, and that in practice each of their 8 pipelines is less efficient than each of the 4 pipelines on the NV30.

Any chance the pipes were just really screwed up in NV30, and so it could only act like it had 4 pipes when using color, and they fixed it in NV35? Cuz if it really only has 5 million more transistors, that couldn't be enough to double the number of pipes, right?
 
More seriously though, my bet is still that nVidia's first attempt at a DDR-II memory controller/interface was a gigantic failure.
And I also believe ATI's Color Compression is less advanced than nVidia's, my bet is that ATI can only compress when all samples are the same and nVidia could compress when only some of the samples are identical.
nVidia probably also implemented better Z Compression - the current leader in Z Compression is the RV350, and that's kinda lame IMO
Well, I bet that ATi's Color Compression is more advanced (or the same). And I bet that you are dead wrong about it *only* compressing when all samples are the same... And I bet that.. well a lot of things. because basically i can *bet* just as much as anyone else..

The problem is it does not make you right...
 
Hellbinder[CE said:
]Well, I bet that ATi's Color Compression is more advanced (or the same). And I bet that you are dead wrong about it *only* compressing when all samples are the same... And I bet that.. well a lot of things. because basically i can *bet* just as much as anyone else..

The problem is it does not make you right...

Did I *ever* pretend it made me right? It's called speculation, you know :)
It would seem though, looking at performance, that in memory limited cases using AA ( and no AF ) , the R300 isn't 25% faster than the NV30 ( but it is faster ) , even though the memory clocks would suggest that.

Thus, I'd suppose that nVidia's memory saving techniques are more advanced than ATI's. Of course, I could be dead wrong on that. It could simply be that ATI is getting slightly higher waste out of their 256 bit memory bus than nVidia out of their 128 bit memory bus.

And even if it isn't that, nVidia's implementation could be more expensive, and maybe ATI's approach is more cost-effective. I didn't pretend nVidia ruled the free world thanks to that.


Uttar
 
Reverend said:
Hellbinder[CE said:
]because basically i can *bet* just as much as anyone else..
:?: You're admitting you're a rich sob or am I missing something here?
I think he's "admitting" that talk is cheap. :)

Still, it's all we've got right now, and much of it is interesting. Please continue.
 
Hellbinder[CE said:
]Well, I bet that ATi's Color Compression is more advanced (or the same). And I bet that you are dead wrong about it *only* compressing when all samples are the same... And I bet that.. well a lot of things. because basically i can *bet* just as much as anyone else..

The problem is it does not make you right...

Just like it didn't make you right when you said everyone was underestimating the R350, that it'd have several new and important features, be more than an overclocked R300, etc. Everybody speculates and it should be no surprise that people are wrong once in a while.
 
It may have been overestimated in some people's eyes.. but there were tweaks and architectural enhancements in the R350.

I mean how many people went on and on and on about how the NV30's longer shaders were a _really_ important feature and then ATI just blows this supposed 'killer feature' out into the stratosphere with the F-Buffer?
 
Tahir said:
I mean how many people went on and on and on about how the NV30's longer shaders were a _really_ important feature and then ATI just blows this supposed 'killer feature' out into the stratosphere with the F-Buffer?
I don't know if it's that great. The longer instruction count and the f-buffer both have the same purpose: to reduce the cost of multipassing.

Of course, the f-buffer is a big step forward, but ATI will still have a performance hit from going above the internal instruction limits.

I think that the main thing that we need right now is an auto-multipass HLSL.

I would like to stress, however, that we don't yet know whether the f-buffer is a truly new feature, or whether it has just taken this long to get it right within the drivers.
 
Well... tweaking their HyperZ implementation in R350 so that it works better with stencil ops should be something you'd be impressed by then Chalnoth... since you harped so strongly on that with the 9700..... of which the NV30 has the same problem.
 
Ichneumon said:
Well... tweaking their HyperZ implementation in R350 so that it works better with stencil ops should be something you'd be impressed by then Chalnoth... since you harped so strongly on that with the 9700..... of which the NV30 has the same problem.
That's nice, but I still need to see the results.

Remember that the main problem that I experienced was that in NWN, I had to run at the same resolution/FSAA level on the Radeon 9700 Pro (which shouldn't have happened in any case, compression or no).
 
Chalnoth said:
Tahir said:
I mean how many people went on and on and on about how the NV30's longer shaders were a _really_ important feature and then ATI just blows this supposed 'killer feature' out into the stratosphere with the F-Buffer?
I don't know if it's that great. The longer instruction count and the f-buffer both have the same purpose: to reduce the cost of multipassing.

Of course, the f-buffer is a big step forward, but ATI will still have a performance hit from going above the internal instruction limits.
*resists urge to pull out flamethrower*
I would like to stress, however, that we don't yet know whether the f-buffer is a truly new feature, or whether it has just taken this long to get it right within the drivers.
It is a new feature, end of story.
 
Do you need an auto-multipass HLSL if you have an F Buffer type solution?

I had thought it was a solution to the same problem, but from the hardware/driver side...is there something missing that I'm not aware of?
 
Chalnoth said:
I would like to stress, however, that we don't yet know whether the f-buffer is a truly new feature, or whether it has just taken this long to get it right within the drivers.

Where does this come from?

Steve Spence suggested the same thing to me and I had Sireric nearly spitting feathers in his reply when I asked about whether it was software and the viablility of a software implementation. Given this I'm fairly positive that it is hardware - at the very least the hardware is needed for the translation from x,y screen coordinate to FIFO, AFAIK.

During my chat with NV's Andrew and Adam at CeBit they said that they guys who came up with this offered it to them first, but they turned it down, so they went to ATI and implemted it.

Chalnoth said:
Ichneumon said:
Well... tweaking their HyperZ implementation in R350 so that it works better with stencil ops should be something you'd be impressed by then Chalnoth... since you harped so strongly on that with the 9700..... of which the NV30 has the same problem.
That's nice, but I still need to see the results.

http://www.beyond3d.com/reviews/ati/r350/index.php?p=21#stencil
:D
 
Of course, the f-buffer is a big step forward, but ATI will still have a performance hit from going above the internal instruction limits.
How about this, Chlanoth:
Sireric:
In our implementation of the F-Buffer, we can completly hide the latency of accessing the buffer. Writes from the fragment shader to the F-Buffer are similar to other outputs, and have no effect on the fragment execution. F-Buffer reads are similar to texture reads and we already, by architecture, hide that latency from the shader execution.

The only issue that is left is BW. The thing to note is that the F-Buffer will be invoked when the instruction count exceeds the 160 instruction limit. That means that the F-Buffer reads/writes only occur a few times every 160 instruction pass (which is at most 64 cycles). That means that F-Buffer BW is very low. Texture reads from the shader program would still dominate the BW.

In general, real time applications will not take advantage of the F-Buffer, since real time applications will limit their shader count to, at most, 1 to 2 dozens of instructions (i.e. 3dmark03 or D3). Of course, they could use them for small high-complexity object. That being said, using our F-Buffer, we were able to execute a compiled renderman shader (~500 instructions) at 50 FPS (our quadraFX board executed it at 2.7 FPS -- must be a driver bug?). In the same way, but at lower fps, we can execute much more complexe shaders (10's of thousands of instructions).
What about these facts hints at an increased performance penalty for accessing the f-buffer. The latency is effectively hidden. Where did you obtain your information?
 
Back
Top