Two new (conflicting) Rumors

BobbleHead said:
..it sounds like, "the nv30 is designed to do 64-bit, and has to do 2 passes to handle 128-bit, whereas r300 is designed to do 128-bit."...

I actually read it like that the first time through. But what that paragraph is effectively saying is that at 128-bit the color calls are equal, being 1.
But in 64-bit color the NV-30 can do 2 calls to the 9700's 1.

So going from 128bpp to 64bpp may increase performance on the NV-30, whereas on the 9700 it will not.
Or am i wrong again ?
 
BobbleHead said:
But pass that quote through the marketing->technical translator, and it sounds like, "the nv30 is designed to do 64-bit, and has to do 2 passes to handle 128-bit, whereas r300 is designed to do 128-bit." Which might also have been part of the reason why Nvidia wasn't happy about the way DX9 was being speced, and wasn't as big of a driving force in it as they had hoped.

128-bit memory interface? Designed for 64-bit color? So far nv30 isn't sounding too revolutionary...

Exactly what i was thinking...
Market translation: "Our performance sucks with 128bit color"
 
Joe DeFuria said:
However, I am merely looking at two products on the shelf, and evaluating them SOLELY on what value the give to the gamer when they were being sold. From that perspective ("which product do I buy")....considering features that won't see much (any?) use during the cards life span adds ZERO value to that purchase....again...from a product perspective.

evaluating something solely on what value it has for the time being is not particularly prudent. when i buy a $300-worth product i consider it an investment. if i had compared a gf1 to a v5500 at the time, and evaluated those solely on their present value i, too, would have picked a voodoo. and i would have erred. as purely featurewise, the gf1 would be running next-spring doom3 in 320x240 at (IIRC jc's words) 20fps, whereas the voodoo would not be running it at all (hard fact). so which would be better bang for the buck?

Back in the day....I don't know how many times I've "congratulated" nVidia for introducing T&L, etc. HOWEVER, at the same time I never once recommended an nVidia card for purchase because it had T&L.

ok, as i originally stated i might have misunderstood you.
so i take we both agree that 'hw t&l' was just one feature of gf1's featureset, and, since it was considered the 'most advanced' one, was taken by nvidia marketing as their selling banner, which fact does not make the gf1 an overall worse bang-for-the-buck compared to the v5500. correct?

Understand?

i guess so. it's always good to have your point cleared, nevertheless.
 
flf said:
Why not? You certainly seem to need coddling.
flf said:
But, no. It's this same wonderful competition that you keep droning on about being so wonderful that nailed their coffin shut.

my you have a high opinion of yourself dont you?

I have no idea why you have decided to prove your self worth by attacking me but you know if it makes youy feel important to post patronizing drivel thats up to you.

edit: self censorship :)
 
ripvanwinkle said:
BobbleHead said:
..it sounds like, "the nv30 is designed to do 64-bit, and has to do 2 passes to handle 128-bit, whereas r300 is designed to do 128-bit."...

I actually read it like that the first time through. But what that paragraph is effectively saying is that at 128-bit the color calls are equal, being 1.
But in 64-bit color the NV-30 can do 2 calls to the 9700's 1.

So going from 128bpp to 64bpp may increase performance on the NV-30, whereas on the 9700 it will not.
Or am i wrong again ?

That's one way to interpret it. But think of it from a marketing perspective. If the nv30 could really do 2x the work in the same amount of time, I'd have expected them to trumpet it quite loudly. Instead it's just that 'a 128-bit call is the same time as a 64-bit call on the 9700'. Nothing linking the nv30 64 (or 128) bit perf to the r300. Just two independent statements, both of which are probably true, but when said back-to-back can give the misleading impression that nv30 is 2x the perf of r300. And to think it came out of the mouth of someone in marketing....
 
Althornin said:
Exactly what i was thinking...
Market translation: "Our performance sucks with 128bit color"
Maybe you should read that interview another time...
It says nv30 can issue 2 ops (per pipe?) with 64 bit fp data. That does make sense according cinefx papers that claim full 128 bit fp color processing, where 9700 has 'only' 96 bit fp calculation (except textures addressing..) and obviously doesn't have the silicon to theoretically issue 2 64 bit ops per clock.
We don't know if that stuff quoted is true or not...but for sure the quote doesn't mean what you believed it meant

ciao,
Marco
 
DaveBaumann said:

so combining this, the comments from SA about dual chips for DX9 and the rumours about the 128bit bus; I come to the conclusion that the NV30 is IMHO only an 4x2Pipeline/128bit chip optimised for 64bit rendering and used like the Voodoo5500 in an dual chip layout. So performance with 64bit rendering would be normal, but be only 50% with 128bit rendering. I hope they optimised the memory interface so the textures have to be stored only once. Maybe they use an fast chip-to-chip connection.

Disadvantage : higher prices for highend boards due to the 2chips and maybe higher memory demand.

Advantage : an DX9 ready mainstream product from the start and so the comment about 90% DX9 products shipping in the end of 2003 would make sense.

but why do they have 120Mio Transistors for the NV30 then ( eDRAM?).
 
Hmm... You may recall a few years ago, when Ati's 32bpp performance was very good & competed well with Nvidia products of the time. Unfortunately, switching to 16bpp resulted in no performance increase for Ati, while Nvidia (& 3dfx) substantially outperformed Ati in this bit depth. Perhaps that's what we'll be seeing here. You may be prepared to trade 32-bit fp quality for 16-bit fp speed...
 
I don't think it's "optimized" for 64-bit rendering. It's just that 64-bit rendering uses less bandwidth, so will be faster. In the same way that I can fall back to 16-bit on a GF1,2,3/Radeon7/8/9, I can do so as well for floating point. You wouldn't say that the NVidia and ATI architectures are specifically optimized for 16-bit, would you? That's for external framebuffer precision.


Now for internal pixel shader precision, we have a similar situation. Imagine that you have a 128-bit SIMD FP unit that can do a 128-bit add, mul, dot3, etc in a single cycle.

Well, if you are using 32-bit scalars, then you can pack only 4 scalars into a register and issue 1 op per cycle. But, if you used 16-bit scalars, you could pack 8 into a 128-bit register, for or 2 64-bit vectors. Now you can effectively issue 2 per cycle.

And if you fall back to 8-bit integer components, you could fit 4 vectors per unit and issue 4 ops per cycle.

You can view this as "64-bit runs 2x as fast as 128-bit, and 32-bit runs 4x as fast" or "128-bit runs 50% slower and it is optimized for 64-bit". Basically, it's the sucky implementation that executes 128-bit ops at the same speed as 32-bit integer ops. Remember 3dfx's famous comment "anyone whose 32-bit performance is the same as their 16-bit performance has a crappy 16-bit implementation?" This was back when some 32-bit cards on the market were running 32-bit at the same speed as 16-bit and some people were saying this proved that "32-bit is now FREE". In reality, it meant those cards had bad 16-bit implementations.


I think (speculation) NVidia's architecture is designed to retire 1 (or more) 128-bit SIMD op per cycle, but if you choose less precision, more vectors can be packed into the same unit (they all have to be the same operation tho) yielding higher performance.


This is also sorta how the 3dlabs P10 works. Depending on the precision needed, it can tie together as many execution units as necessary for a given vector size (2, 3, or 4 components)


Remember, the external buffer bandwidth needs to be increased no matter what. Whether NVidia has extra optimizations like 64-bit FP precision is irrelevent. High quality FSAA, anisotropic, large textures, and stencil operations require more fillrate and bandwidth for existing 32-bit integer frame buffers.

NVidia needs to increase effective bandwidth even if they didn't have any floating point support at all. The extra bandwidth helps with old games.


128-bit FP and 128-precision is the least of their worries. If they can't match the R300's effective bandwidth, their 32-bit and 16-bit performance will suck as well.
 
darkblu said:
evaluating something solely on what value it has for the time being is not particularly prudent. when i buy a $300-worth product i consider it an investment. if i had compared a gf1 to a v5500 at the time, and evaluated those solely on their present value i, too, would have picked a voodoo. and i would have erred. as purely featurewise, the gf1 would be running next-spring doom3 in 320x240 at (IIRC jc's words) 20fps, whereas the voodoo would not be running it at all (hard fact). so which would be better bang for the buck?

hmm purchasing a $300 card is not an investment for future use in 3d graphics at the current rate of change is it? Surely it's only prudent to consider what you get now? No one, no one is going to be playing Doom 3 at 320x240@20fps on a Gf1DDR and think wow wasnt I lucky I bought this instead of a V5? Anybody I know who would invest $300 in a video card (i.e. already fairly serious about gaming) does not have anything less than a Gf2Pro in their system now and most have at least a Gf3Ti200.
 
You can view this as "64-bit runs 2x as fast as 128-bit, and 32-bit runs 4x as fast" or "128-bit runs 50% slower and it is optimized for 64-bit". Basically, it's the sucky implementation that executes 128-bit ops at the same speed as 32-bit integer ops. Remember 3dfx's famous comment "anyone whose 32-bit performance is the same as their 16-bit performance has a crappy 16-bit implementation?" This was back when some 32-bit cards on the market were running 32-bit at the same speed as 16-bit and some people were saying this proved that "32-bit is now FREE". In reality, it meant those cards had bad 16-bit implementations.

I think its more than just this if you look at the wording of the quoted interview. To me its sounds as though either each of NV30 pixel pipes are 128bit and ar capable of spitting 2 64 bit pixels per pipe or are 64bit and use pipeline combining/bit stacking over cycles to achieve 128bits.
 
I disagree. The quote says

Where the NV30 differs from the 9700 is that instead of a single 128bit colour call, it can perform two 64bit operations in the same time, providing what Adam called a 'sweet spot' between performance and colour.

This says nothing about pipeline combining. I could write the same quote for 16 vs 32-bit color or single precision vs double precision math on a CPU. All it says is that in the same span of time it takes to execute a 128-bit op, it can issue two 64-bit ops, which is exactly how I explained it in my original post. This same reasoning works for vectorized instructions on CPUs as well. If you drop back in precision, you can pack more ops per cycle.

You are proceeding from the theory that each pipeline cannot do 128-bit ops, but must steal an execution unit from another pipeline. You have no evidence for this theory, and it doesn't jive with the following quote


The 9700, however, can only call one 64 bit operation in the same time as a 128 bit, so decreasing the colour level will not enhance performance."

Let's procede on the theory that each R300 pipeline can do a 128-bit color op in 1 cycle for 8 pixels. It can also do a 64-bit color op in 1 cycle for all 8 pixels (NVidia's claim, 128-bit = 64-bit speed for R300) According to your theory, NVidia would take 2 cycles to do the same 128-bit color op for 8 pixels due to half the pipelines being shutdown, but could output 8 64-bit pixels in 1 cycle.

So in reality, the R300 would be no slower than the NV30 at 64-bit math, both outputting 8 pixels @ 64-bit in 1 cycle. So why would NVidia claim an execution advantage if the ATI card can still equal it at 64-bit? Let's assume the marketing guy is not talking about stuff he is clueless about (otherwise, we can't have this discussion), so NV30 must be faster than R300 at 64-bit.


We'd have to revise these conjectures in one of two ways to make it rational:

Theory #1: NVidia can do 8 128-bit ops in 1 cycle, and therefore 16 64-bit ops in 2 cycles. Therefore in 64-bit, NV30 is 2x R300 and equal to R300 @ 128-bit. Contradicts pipeline combining "50% slower" theory


Theory #2: NVidia takes 2 pipelines/cycles to do 128-bit, but R300 takes 2 cycles as well for 128-bit and 64-bit. NV30 only takes 1 cycle for 64-bit, therefore 2x the speed at 64-bit, but R300 == NV30 @ 128-bit (2 cycles)


Theory #3: Marketing guy doesn't know what he is talking about. R300 64-bit speed == NV30 64-bit speed, but NV30 128-bit speed is 50% slower than R300 128-bit speed.


No matter what you decide, can you really rationalize a 120M transistor NV30 that is "economical" with its transistors and, like an XP4, steals units from sibling pixel piplines? If, for example, they only included 50% of the floating point logic of the R300 in their pipelines, then what the hell are all those other transistors doing? Exotic deferred rendering architecture? I doubt it.


I bet in the end, the NV30 will be a straightforward IMR design like the R300, and the performance of the two will be mostly equivalent in many areas, and in some areas, each core will have their strength.

If the NV30 really has a 128-bit bus, and it doesn't have some truly exotic HSR, then Nvidia made an unfathomable error in their design, hitching up 120M transistors (for what?!?) and bandwidth hungry 128-bit/64-bit FP pipelines to a tiny memory bandwidth.

Would they be that incompetent, that idiotic?
 
You are proceeding from the theory that each pipeline cannot do 128-bit ops, but must steal an execution unit from another pipeline. You have no evidence for this theory, and it doesn't jive with the following quote

If you read my post you'd note that I presented two options, this was just one.
 
If the NV30 really has a 128-bit bus, and it doesn't have some truly exotic HSR, then Nvidia made an unfathomable error in their design, hitching up 120M transistors (for what?!?) and bandwidth hungry 128-bit/64-bit FP pipelines to a tiny memory bandwidth.

Would they be that incompetent, that idiotic?

Possibly neither...just greedy. ;) (Looking for lower cost = more profit.) GeForce SDR anyone?

You have to consider nVidia's "better not faster" pixels rhetoric. It may very well be that for "cinematic redering" that they are puching with all kinds of layered shader programs, a high speed 128 bit bus may be enough. Such scenes may be "VPU computational" limited on these first gen cores, rather than bandwidth limited.

So 256 bit busses may not help NV30 (or R300), when doing hevy-duty shading. For games it's another matter...still need more bandwidth.

So if an 8 pipline chip on a 128 bit bus without any exotic HSR is true...that may be idiodic from a gamer perspective, but not necessarily from the "Cinematic Renderer" perspective.

It will be interesting to see what kind of "tech demo / benchmarks" ATI and nVidia come up with to show how their product is "superior."
 
I doubt the first option: spit out 2x the number of pixels. Why? Because going from 128-bit to 64-bit buys you more bits to do more parallel pixel shading ops, but it doesn't buy you more texture units, more early Z units, stencil, alpha test, etc and everything else in the pipeline.

If it only has, say, 8 128-bit pipelines, how could it write 16 64-bit pixels? Using packed rendertargets don't count. I'm talking 16 real, honest to god, multiplesampled, z-buffered, stenciled, alpha'ed pixels.


Instead, I would bet that the NV30's basic fillrate is the same as the R300 : 8 written pixels per clock max. What 64-bit buys you is more pixel shader ops executed per cycle, not more pixels per cycle. e.g. a 10 instruction shader using 128-bit math executes in 10 cycles, but the same shader in 64-bit mode eats up 5 cycles.


My bet is NVidia is going for raw shader executable speed, not max pixel fillrate. Their whole CineFx angle depends on doing more per clock per pixel, not doing more pixels. "Smarter pixels"
 
Joe,
Even the "Smarter pixels" need more bandwidth. The new cards (R300/NV30) effectively double/quadruple the frame buffer and texture read bandwidth even on the simplest shaders. FP texture's aren't mipmapped or compressed. And a lot of "cinematic" multipass algorithms will eat up FP framebuffer bandwidth.

NVidia may be counting on the fact that their 1024-instruction long shaders with predicates won't need multipass as often, hence they can get away with a 32-bit framebuffer, but I still feel that ATI will chew them up if people start using 64/128-bit textures for storing high dynamic range maps.


But where NVidia will really lose is with gamers. Gamers won't have DX9 titles starting out, but will bench the NV30 against the R300 using 4x-6xFSAA, 16xaniso, etc and here, NV30's bandwidth disadvantage will be obvious.

It may be the case that future games will be "compute" limited, but older games aren't, and that will make NVidia look bad. It will be a rough sell for NVidia, selling the promise of "future compute-limited cinematic quality games" when their card is getting its ass handed to it by ATI on older games.

T&L situation all over again.
 
DemoCoder,

Basically, we're in agreement. Certainly, "smarter pixels" will require more bandwidth, but it is not clear if those pixels will be bandwidth limited. A lot depends on the efficiency of the shader hardware itself and likely the shader code as well.

But where NVidia will really lose is with gamers.

Agreed. The question is: would this be "by design?"...Is nVidia is willing to sacrifice some gamer support / mind and marekt share, in order to penetrate a relatively new market?

Maybe nVidia feels that there is more of a market for "Cinematic Shading Hardware" customers, vs. Hard Core gamer customers?
 
Perhaps I missed something in your argument DemoCoder, but I just don't understand your logic.

Where in this quote:
The 9700, however, can only call one 64 bit operation in the same time as a 128 bit, so decreasing the colour level will not enhance performance."

Do you get any relative performance comparison between the NV30 and 9700? Your "3 possible theories" you listed are all based off of the assumption that the PR quote said the NV30 has a performance advantage over the 9700 by using two 64 bit ops instead of a single one on the 9700.

All the RP quote did was to state that the NV30 picks up performance going from 128 to 64 bit, and the 9700 doesn't. It doesn't give any inkling of what the relative performance was in 128 bit or 64 bit modes. The NV30 could have equal capability in 128 bit mode, or half, or twice.... it's simply not mentioned in that quote.

Aside from that, I think you're probably right (especially with that last post) in that the NV30 might have the ability to handle more instructions per given number of cycles in 64 bit mode than 128 bit mode, where the 9700 should handle the same number in a given number of cycles in either mode. But again, without some additional information of the relative shader performance at some baseline point (128 bit for example), we still don't really know where we are in 64 bit mode... and in the grand scheme of things it may not matter that much anyway.

I'm not sure I can see NVIDIA making a GPU that is extremely powerful for DX9 and heavy shader titles, while being bandwidth limited and somewhat behind the performance curve for all games out now and in the next year or two. This just isn't their historical strategy... they have always gone for brute force speed while ATI and others have gone for the more feature packed, more forward looking design (with the exception of T&L).



Oh, and I'm still laughing at darkblu about the GF1 SDR playing DoomIII as being a valid concern at the time of purchase... if I bought a 9700 today and it was able to even play a game five years from now, I'd consider that a bonus, but it would never enter in to my purchasing decision today (probably because there's no way in hell I'd keep a videocard five years).
 
I'd like to interject that the text you've been quoting about this was the words of the person making the report, not the actual words of an nVidia representative. Also, he was basing off his talk with nVidia's European PR Manager and Northern European Technical Marketing Manager.

I think a lot is being read into them for a 2nd hand phrasing of Marketing and Public Relations people. Then there is also BD's point, which was also made earlier in the thread, about the lack of connection between the two statements, even if they had been made by an nVidia representative.
 
darkblu said:
evaluating something solely on what value it has for the time being is not particularly prudent. when i buy a $300-worth product i consider it an investment. if i had compared a gf1 to a v5500 at the time, and evaluated those solely on their present value i, too, would have picked a voodoo. and i would have erred. as purely featurewise, the gf1 would be running next-spring doom3 in 320x240 at (IIRC jc's words) 20fps, whereas the voodoo would not be running it at all (hard fact). so which would be better bang for the buck?

I'm with Bigus Dickus...I'm laughing at that as well. :LOL:

You are saying, that consideration of how well each of these cards will play Doom3....STILL probably a year away at this time...gives GeForce1 SDR more bang for the buck in the long run?!

What about during the what 4 years between buying your card, and Doom3 becoming available? Sorry but your logic just doesn't seem logical at all.

When I say "present value", I don't mean "only the value for the games available at that very exact moment." As I said, you do have to consider / speculate how well each card will stack up "over the life of the card." (The time between when you buy it, and when you upgrade next.).

Having said that...extrapolating 3+ years away, and reaching any conclusion OTHER than "I'm going to want to upgrade to a new card to play the latest games" is a bit foolish.

Do you REALLY think buying a GeForce SDR, because it might play Doom3 at aome absurdly low resolution with absurdly low frame rates 4 years away is a justifiable reason for purchase?
 
Back
Top