NVIDIA Fermi: Architecture discussion

Like this?

"We will soon have 6 cores ina CPU and a graphic card that can power up to 6 monitors, so why should you waste precious GPU Power to do Physics, when you have such a powerful CPU? "
 
Yea, so they have the choice:
1) Get slaughtered on CPU-physics by Intel
2) Get slaughtered on GPU-physics by nVidia

:)

However, though... A while ago I said that since faster GPUs are a dead end for graphics, since we're getting to the point of being able to run full HD resolutions at 60+ fps with the highest detail settings (with shadows, per-pixel light, SSAO and all that eyecandy). Looks like both AMD and nVidia are aware of that... AMD just tries to make raw graphics performance relevant again with Eyefinity, while nVidia tries to make its GPU be useful for more than just graphics.
 
Yea, so they have the choice:
1) Get slaughtered on CPU-physics by Intel
2) Get slaughtered on GPU-physics by nVidia

:)

However, though... A while ago I said that since faster GPUs are a dead end for graphics, since we're getting to the point of being able to run full HD resolutions at 60+ fps with the highest detail settings (with shadows, per-pixel light, SSAO and all that eyecandy). Looks like both AMD and nVidia are aware of that... AMD just tries to make raw graphics performance relevant again with Eyefinity, while nVidia tries to make its GPU be useful for more than just graphics.

Exactly!
 
Ohh, is it me or are here a lot of people happy about a paper launch (is this can even be called that)? Even going as far as claiming performance leadership with absolutely no real benchmark or even a demo, or even an ETA!

WOW
 
In short, there is no way Fermi is built on some insane business model. Well, there is, but then Jensen should be shot.
I think the largest chip is going to be rather poor on a perf/mm basis, and possibly worse than GT200. For DP, ATI is just chaining much of the logic together, AFAIK, so the cost is minimal and there so that ATI can make demos and show technical competency (much like they did in the past with physics). GF100, OTOH, is doing half speed DP. Then you have things like a more general cache, concurrent kernels, etc which all add significantly to the already costly problem of moving data around on that beast of a chip. When you look at how the various GPGPU-centric half-measures in GT200 made it substantially less competitive than G92, I can't see how GF100 will be anywhere near the efficiency of G92.

Fortunately for NVidia, the high end market can get away with huge margins due to the insanity of high end gamers. People are idiots for spending so much more for a GTX 285 for the incremental performance increase over a 4870. Moreover, I think whichever GPU maker has the better design goes for higher margins rather than trying to force their opponent into the red, so if NVidia's strategy is for long term growth in HPC, then short term uncompetitiveness shouldn't kill them. Of course, you may be right in thinking that they are vastly overestimating that future potential.

The big question is how efficient NVidia can make GF100 for the midrange and low end. RV740 is ridiculously fast for a 128-bit card with a 140 mm2 die. Remember how ATI's competitivity was terrible at the low end with RV5xx and RV6xx due to ATI's architecture having an inefficient architecture compared to NVidia, even though the high end part was probably somewhat profitable.
 
Else if it truly has something like 256 TMUs or equivalents I doubt real time fillrate could even peak to such hights.
.

Drunken Monkeys(tm)? AFAIK it's not known publicly what the 16 LS-Units are capable of, or what the whole range of their function is.

IE, their attached filtering units (16x4 values bilinearly interpolated or only 16 values (i.e. a traditional quad-TMUs worth)? Would they be the only units responsible for transactions to/from memory, i.e. are they replacing the traditional ROPs or are separate ROPs (or parts of, as Z-Compares) still present?

I think, Nvidia did not reveal very much of Fermi, there's a lot of guesswork left.
 
I think the largest chip is going to be rather poor on a perf/mm basis, and possibly worse than GT200. For DP, ATI is just chaining much of the logic together, AFAIK, so the cost is minimal and there so that ATI can make demos and show technical competency (much like they did in the past with physics).
DPFP perf/w is going to be an interesting one as well...
 
Ohh, is it me or are here a lot of people happy about a paper launch (is this can even be called that)? Even going as far as claiming performance leadership with absolutely no real benchmark or even a demo, or even an ETA!
I'm happy to have an interesting architecture to talk about.
It might not amount to a winning combination, but I do give them kudos for changing things up.
 
Ohh, is it me or are here a lot of people happy about a paper launch (is this can even be called that)? Even going as far as claiming performance leadership with absolutely no real benchmark or even a demo, or even an ETA!

WOW

I do find a new GPU architecture always very interesting and I care only a little about its performance. So it was a good techday, not more not less.


ATI has released a splendid gaming card. NV has shown interesting technology and ideas, but it might suck for gaming and it won´t be on the shelves for 4-6 months. So who cares about its gaming performance right now?
 
Ohh, is it me or are here a lot of people happy about a paper launch (is this can even be called that)? Even going as far as claiming performance leadership with absolutely no real benchmark or even a demo, or even an ETA!

WOW
Isn't the issue with paper launches that they mislead the consumer believing a product will be available to purchase, imminently? I can't really see how that's the case here.

I mean they haven't even announced a product; just an architecture.
 
Isn't the issue with paper launches that they mislead the consumer believing a product will be available to purchase, imminently? I can't really see how that's the case here.

I mean they haven't even announced a product; just an architecture.

Well it's obviously a spoiler against the 58xx launch. It's designed to put people off buying ATI and wait for the first Fermi product, whatever that may be off the back of this technology.

It can be viewed as misleading the customer in that what we've been told is Nvidia's aspirations for the next product. Even they don't know if they can make it and if it will live up to all the pre-launch tech hyping. Nvidia did the same with NV30 because they were behind the curve, but had to get the message out for people to pass over the ATI products and get the next Nvidia product instead. Obviously it's the lesser of two evils, as people will also not buy the current Nvidia products if they think there is better Nvidia technology on the horizon.
 
Well it's obviously a spoiler against the 58xx launch. It's designed to put people off buying ATI and wait for the first Fermi product, whatever that may be off the back of this technology.

I don't see anything in this presentation that would have that effect. The only people excited about what was shown are the guys interested in the technology. The people buying 5800's to run Crysis don't seem very impressed at all.

During the PhysX fluid demo did anyone else notice how that guy mentioned it was using 64000 particles but the next generation chip should/could/might do up to 1 million? It's like they don't even know how fast the thing is :rolleyes:
 
I don't see anything in this presentation that would have that effect. The only people excited about what was shown are the guys interested in the technology. The people buying 5800's to run Crysis don't seem very impressed at all.

So what was the reason for Nvidia to release all this info instead of the usual "we never comment on unreleased products"? Right on the 58xx release and six months ahead of product? They've only ever done this when they were behind an ATI release and needed to put out a big spoiler.

During the PhysX fluid demo did anyone else notice how that guy mentioned it was using 64000 particles but the next generation chip should/could/might do up to 1 million? It's like they don't even know how fast the thing is :rolleyes:

It kind of implies that Fermi is still quite a way off.
 
By the time it launches? don't hold your hat.
You don't think so? Frankly by all paper specs it should be faster. It should have more tmus (128?) and while it still has peak arithmetic deficiency compared to AMD's part, the gap should be smaller a bit (if nvidia reaches similar clock speeds as gt200), plus the improvements which should in theory help it achieve higher alu utilization. And it also has a ~50% bandwidth advantage. Granted, there are some open questions (as far as I can tell, the total special function peak rate hardly moved at all compared to gt200 - 64 vs 60 units for instance, but I'd guess it's still mostly enough) but if this thing can't beat a HD5870 in games nvidia shouldn't have bothered putting tmus/setup etc. on there at all and just sell it as a tesla card only... About the only thing I can think of why HD5870 would beat it in games is it could be severely setup/rasterization limited, if this didn't improve (especially given the seemingly compute-centric nature) then HD5870 might beat it in some situations simply because it runs at higher (core) clock and hence achieves higher triangle throughput.
 
Spoiler?

Well it's obviously a spoiler against the 58xx launch.

Really?!
Well, then it's an utter failure.
They introduced an architecture. They made not a single demo, have a card that obviously isn't working yet (when they have working cards, they tend to get demo'd) with no mention of what the gfx capabilities would be. Shots of recent A1 dies, no mention of clocks -- either GPU *or* memory. I mean, if I were looking to buy a card right now, I haven't been given a real choice, have I?

That's a terrible spoiler!

If I were AMD, I'd be feeling pretty secure right now, at least over the next quarter, maybe two.

I've been wondering most of the morning exactly why they chose now to release slideware. Very odd. Very *interesting* slideware, mind you. Some cool stuff, some disappointment. Non-coherent L1 -- well, bummer, but I don't blame them as coherence among 16 L1s would be a pain. Non-dual issue mul+add -- that's a bit of a utilization bummer. IEEE exception handling and indirect addressing are both nice.

Why choose now to tell me that, though? They have no h/w to demo, they can't possibly believe that the slideware buzz would last longer than, say, 48 hours. Why now?

Weird. It's almost like they're trying to convince, I dunno, some random manufacturer that their architecture is worth buying into. It doesn't seem at all to me that the gaming market was their target. And a number of gamers on here seem to have expressed a similar dissatisfaction.

YMMV,
-Dave
 
While you can argue about value vs cost, they do clearly have value in a variety of algorithms, and not just for "lazy" programmers.
They have value ... but per cacheline MOESI is an extreme. In cost, the amount of effort necessary to scale it and fragility of scaled up implementations.
 
Well it's obviously a spoiler against the 58xx launch. It's designed to put people off buying ATI and wait for the first Fermi product, whatever that may be off the back of this technology.

It can be viewed as misleading the customer in that what we've been told is Nvidia's aspirations for the next product. Even they don't know if they can make it and if it will live up to all the pre-launch tech hyping.
And if those aspirations are real (i.e Nvidia is planning to launch their new GPU in Q1 2010, with the architecture they describe) I don't see what the problem is.

Nvidia are being honest about their plans and people can respond as they wish.
 
mczak said:
It is noteworthy though that for other instructions, e.g. mul or add, the rate is 2/5 of the single precision rate, so still 544GFlops when using only adds or muls as long as the compiler can extract pairwise independent muls or adds (GF100 will drop to half the gflops with muls or adds).
Cypress is 1/5 for FP64 MUL (272GFlops), 2/5 for ADD (544GFlops).
 
Really?!
Well, then it's an utter failure.
They introduced an architecture. They made not a single demo, have a card that obviously isn't working yet (when they have working cards, they tend to get demo'd) with no mention of what the gfx capabilities would be. Shots of recent A1 dies, no mention of clocks -- either GPU *or* memory. I mean, if I were looking to buy a card right now, I haven't been given a real choice, have I?

That's a terrible spoiler!

You work with what you've got. The 58xx launch is now, the Nvidia spoiler has to be now. Given what little Nvidia have got to show beyond slides, simulated demos and mocked-up cards, I think they did a stellar job of stealing ATI's thunder as much as they could.

You only have to look at the enthusiast sites to see that as for the many people who are dismissing Fermi, there's just as many who want to sell their 5870s in favour of GF100, even though there's nothing to buy yet, and who knows what the situation will be in another six months.
 
Back
Top