NVIDIA Fermi: Architecture discussion

Except they kinda DIDN'T say Fermi IS going to be the fastest chip in every segment, they said they EXPECT it to be the fastest chip in every segment.

There's a world of difference in that word choice.

And why do you ignore the second sentence? :D

“We expect [Fermi] to be the fastest GPU in every single segment. The performance numbers that we have [obtained] internally just [confirms] that."
 
And why do you ignore the second sentence? :D

Since the second sentence was based on a faulty premise, why bother?

The faulty premise is that Nvidia has any real clue (or actual expectation) they will have the fastest card in every SEGMENT.

Segment = price tier. A price drop puts a faster card into a lower segment. For that '“We expect [Fermi] to be the fastest GPU in every single segment' to be realized, Nvidia will have to beat AMD on price/performance when every known applicable metric favors AMD.

Ergo the 'we expect' qualifier and the superflousness of the second sentence. There is scant credibility in either sentence.
 
Last edited by a moderator:
I dont think they believe HPC will replace Graphics, more complement it similar to Professional providing profit while Graphics keeps the volumes high ie in economics an example of cross subsidization . The problem with graphics currently is the profitability appears to be progressively declining, rather than sit back they are trying to find new markets to compensate for this.

This chinese site a few months ago had some figures for the total HPC market i think from Nvidia's mid year presentation to analysts on Tegra and Telsa(about half way down under the S1070 picture):
http://news.mydrivers.com/1/145/145916_1.htm

So larger than $1bn in total. Now you just need to work out what percentage of the above nvidia figures it can get ;)

Well, I believe the oil industry ("seismic"), the military-industrial complex and the banks have a similar ethical framework to Nvidia, so they are likely to jump over. That leaves Supercomputing and Universities to the rest
 
This thread has spawned a lot of new posters. Where has this thread been posted to collect such an influx of new people?
 
This thread has spawned a lot of new posters. Where has this thread been posted to collect such an influx of new people?

No doubt. We even have people tossing about the fb term which, once upon a time, was a pb offense.
 
No doubt. We even have people tossing about the fb term which, once upon a time, was a pb offense.
Indeed. I think the term "lurk more" is applicable.

Net etiquette tip: learn the culture of a board before posting, that way one can avoid the otherwise inevitable culture clash which will end in tears.
 
This thread has spawned a lot of new posters. Where has this thread been posted to collect such an influx of new people?

Long time lurker (for many years). But this thread is so interesting and about such a promising new GPU, that it can make us de-lurk sometimes.
 
A Happy New Year to everyone. For a forum like B3D I wish for more substance and less agenda ;)
 
I dont see them as being killed now. Sure they dont have the fastest thing out, But the performance lead isn't that great single gpu vs single gpu.
Exactly, they're not being killed... they simply don't have anything competitive at all.

GT200 based boards are hard to come by except GTX260 which only competes with Juniper.
GT21x based boards simply can't outperform RV710 and RV730 based boards which are going to be replaced soon.
G9x based boards are their last acceptable products, although power draw is way too high for the GTS250.

They're not being killed, they're dominated on every market segment.

GF100 has to be substantially faster than Cypress, but the absence of performance preview reminds us the famous NV30 story, which Huang recently rewrote. If they're really confident in their product, their marketing strategy doesn't make sense.
 
If they're really confident in their product, their marketing strategy doesn't make sense.


I dont really see that the marketing dep are actually doing anything wrong per se. But I do believe that since they were gearing up for a launch earlier than march, they had been preparing to build hype, and now its gone off the boil.

And itsa bit weird for NV since they are now trying establish other markets, we are now getting marketing that doesnt relate to the wider gaming and graphics community.

To me NV feels like an awkward teenager in transition.
 
That wasn't PR that stated that, it was a product manager :rolleyes:. And its a very fast card, you wouldn't see them stating that if it wasn't, they would go through the route of value for the money or something else. Thats marketing and PR for ya. There is a fine line between a lie and spin. If that statement is false, they are just lieing.

A leaked, off the record vague statement by some unnamed product manager before the chip is even finalised? Oh, well, that proves it's true because Nvidia would never fib in that way. :rolleyes:

I'm sure Fermi would be a very fast card if it lives up to all it's design aims, but we already know that the development has not been smooth and there have been many problems. If there were power and clock problems, it wouldn't be the first time that a GPU has been released at less than what was initially envisioned.

Yes so you can't wait a week and half?

We have been waiting since October you know.
 
Area efficiency. Since all of these things are in order cores, so instruction per clock is not really a useful metric. It is useful for CPU's no doubt, which have massive amounts of area given to oooe and scaffolding to support it.

Even with in order cores the IPC can vary significantly from one design to another.
 
If, as Jawed points out, its CPU limited then the benchmarks are going to have limited use as each of the architectures you are comparing are going to be limited by the CPU. Games are fairly CPU bound (even ones that people often associate as GPU killers, like Crysis, are very CPU sensitive).

I thought the problem with Crysis was very poor batching on draw calls resulting in significantly increased communication overhead which due to the type of communication being used results in the CPU effectively being idle but unusable.
 
That's fine but that raises more questions than it answers. I'm gonna assume that you guys do lots of profiling of existing and future game workloads and use that analysis to determine where to focus with future hardware. So now you're saying that doubling texture units and doubling ALUs did not result in doubled performance because the bottleneck is elsewhere. So why didn't you guys address those bottlenecks instead of doubling up stuff unnecessarily? Honest question.

Because in all likelyhood the bottlenecks are systemic. To give an example, it took quite a while for GigE to actually deliver on its bandwidths due to systemic issues in the communications stacks between the driver and the GigE chips. The system/software infrastructure that worked fine with 100mbit just couldn't handle GigE and it wasn't honestly an easy problem to solve. Throwing more CPU at it barely made a difference, it wasn't until things were re-architected systematically that the problems were solved. That re-architecture of touched pretty much every piece of the network system, both software and hardware.

Honestly the way the graphics stack from application to video out works hasn't really changed that much from the 3dfx days...
 
and rummage for others. The register dependency scoreboarding is what determines which instructions can issue. So instructions can issue out of order.

Many an in order design has used register scoreboarding. It is but a small part of what makes an OoO design.
 
Yes Dave but I'm not buying the "system limited" argument because multi-GPU setups continue to scale higher. If we were system limited that would not be possible, therefore there is room to improve performance on the GPU side of things.

Do they scale higher or merely give better numbers? Personally, I'm in the give better numbers camp. I can display the same frame an infinite number of times, does that mean I have infinite FPS?
 
From the GPU perspective you have two of everything there, however you rarely see perfect scaling, why? Usually system limitations. Despite more engine than HD 5850, 5970's average at 25x16 is ~70%, so there is a lot lost on system dependancies.

And I'm sure you'll be willing to admit, if pressed to it, Dave that AFR is the best possible case for scaling because you effectively get 2x throughput in the comm stack. Its like the difference between TPC-D and TPC-C. One is throughput oriented and one is latency oriented.
 
Well I can't help you then.

Regardless (of whether instruction issue is out of order), scoreboarding every instruction and scoreboarding every operand is considerably more expensive than the approach seen in ATI, where scoreboarding is at the hardware thread level tracking Control Flow instructions (rather than ALU instructions or TEX instructions), which are issued in order. Waterfalling constants, LDS writes/reads and indexed register writes (within an ALU clause) create hazards for ALU instruction issue - and in that case the ALUs stall (though LDS and indexed-register operations don't necessarily stall) - so that's a pipeline state. Not the finely-grained scoreboarding that NVidia indulges in.

Jawed

Isn't that the whole point of VLIW?
 
A leaked, off the record vague statement by some unnamed product manager before the chip is even finalised? Oh, well, that proves it's true because Nvidia would never fib in that way. :rolleyes:
Remember : Our $129 part is faster than their upcoming flagship at $400

Yes, they could very well be announcing GF100 as the fastest GPU ever even if performance is only 1% above a GTX285 if we consider this statement, but only in severely biased scenarii such as Batman AA...
 
I thought the problem with Crysis was very poor batching on draw calls resulting in significantly increased communication overhead which due to the type of communication being used results in the CPU effectively being idle but unusable.

Could you please over simplify that statement ?
All I could clearly understand is that the CPU isn't being utilized well enough in Crysis, and it is something I noticed a while back .
Also is it a well documented fact , or just a guess ?
 
Back
Top