NVIDIA Fermi: Architecture discussion

http://www.brightsideofnews.com/new...h-globalfoundries-gpu-production-in-2011.aspx

The roadmap here and the Globalfoundries process roadmap I've seen speaks against it.

Published roadmaps are broad stroke and conservative by nature.

Thuban is, literally, a 'throw whatever we got into the gaping breach' stopgap measure.

Logically, one would assume AMD is acutely focused on getting Bulldozer into the marketplace as soon as humanly possible. If things break their way, that is likely to be in Q4 2010.
 
Google's cluster handles both I/O bound and compute bound tasks. There are lots of very CPU expensive internal batch jobs they run.
Being "compute bound" however in no way, shape or form imply "maps well to GPUs".
Efficient GPGPU computing is awfully limited in what kind of problems it can address. Which is why even in most of the propaganda cases you see scaling vs CPUs that largely follows bandwidth rather than TeraFlops, (for working sets that fit in local memory, obviously).
NVidia PR would love to point to Google as a major Fermi customer. Have they done so?
 
I have doubts it will surpass Hemlock, except in cases where Crossfire scaling just sucks. But I also can't see it not coming very close of it. Let's say exactly like the HD 5870 is with the GTX 295. Slower on average, but not by much and winning in some cases, while consuming less power. If they priced this GPU (under these performance speculations) @ $550, the fact that they are late, will be almost completely ignored, given the performance/$ they are offering.

Logically, the target of Fermi on the pure GPU end would, at best, be the GTX295, particularly so considering the amount of chip resources focused on GPGPU.

The 5870 is a highly optimized, extremely efficient and very focused pure GPU chip, and at that barely achieved rough parity with the GTX295. It is unlikely Fermi will be more than marginally better, much less make a move on the 5970.

"“We expect [Fermi] to be the fastest GPU in every single segment. The performance numbers that we have [obtained] internally just [confirms] that."

We 'EXPECT' ~= 'we are desperately trying to achieve' ~= 'it's not CURRENTLY the fastest' ...

Eight or ten months of optimization/respins down the road Fermi, likely, could trounce the 5870. But the REPUTATION of Fermi in the graphics space will be set in stone when it is released, not eight or ten months from now. But a 5890 is a near certainty and after all, how much more card does on need?

Keep in mind the purpose of Eyefinity was to expand the market for current high end ATI cards as those cards are now running ahead of gaming demands at 24" monitors and below. When one can max out every game setting on every current game and achieve nicely playable framerates on their 24" monitor, one doesn't NEED a better card, and the 5850 already provides that capability. Very QUIETLY and EFFICIENTLY.

Personally, I think it totally SUX becuase I WANT keen competitiveness in the market. That 5850 card is all I need with a 1080p bigscreen and only Nvidia being on their game will drive that price into the sweet sweet sub $200 range.

But I post based on what I percieve as the hard realities of a situation, not the intracranialand of fantasyville ~= Fermi ain't gonna 'git er done'.
 
Logically, the target of Fermi on the pure GPU end would, at best, be the GTX295, particularly so considering the amount of chip resources focused on GPGPU.

Logically? I don't think so. A GTX295 is at best 40% faster than a GTX 285. You think 40% faster than GTX 285 is a logical Fermi target? Sorry but that's :LOL: worthy. I love computerbase by the way, they make these comparisons easy and constantly update their numbers (these are as of 12/31).

The 5870 is a highly optimized, extremely efficient and very focused pure GPU chip, and at that barely achieved rough parity with the GTX295. It is unlikely Fermi will be more than marginally better, much less make a move on the 5970.

Extremely efficient? Not in the least when compared to RV770. It seems like you have an opinion and are trying to convince yourself of its veracity :)
 
Calling for somebody to be banned isn't a particularly useful first post. Besides he's politely expressing an opinion, everybody has a right to do so.
 
The posts are not abusive and are related to the topic.
The opinions are strong and highly polar, but that isn't bannable.

They appear to echo a monoculture of articles and analysis, but that's not bannable either.
 
The 5870 is a highly optimized, extremely efficient and very focused pure GPU chip, and at that barely achieved rough parity with the GTX295. It is unlikely Fermi will be more than marginally better, much less make a move on the 5970.
That is because ATi's architecture is superscalar VLIW , it needs many cores to achieve maximum throughput , to compensate for the deficiencies in optimizing compiler code .

On the other hand , Nvidia's architecture is Scalar , giving the appropriate number of shaders and at appropriate clocks , it can give explosive results , and following Rys' expectations , I can truly think that it will compete with Hemlock .

after looking at Tesla specs , I think 1600MHz for the cores is doable , but power consumption is going to be awuful (probably 260W) .

In a recent article at Pcper commenting on the rumor of Fermi being castarated :

As for the part of our story about the slightly less powerful Fermi GPU that was announced and discussed at this past week's supercomputing convention, NVIDIA's reps wanted to clearly point out that there is NOT a direct correlation between the consumer GeForce products and the either the Quadro or Tesla lines of professional cards. That is clearly the case where the Quadro line seemed to take forever to integrate the GT200 products to any successful degree. The Tesla SKUs are much more complicated and NVIDIA points out that all they have done is disclose one or two upcoming SKUs, not the whole lineup. I was "guaranteed" that there would be more SKUs "with 512 shaders."

NVIDIA also specifically stated that the GF100 products that are due out next year are not going to be the same as the Tesla products discussed at the supercomputing conference and that "there will be 512 (shader) parts on both sides." What would be different between the two products would be WHEN the 512 options were introduced. It sure seemed like NVIDIA was trying to say that they would have a consumer-based 512 shader GF100 part when the GeForce lineup is revealed without just telling us.
 
Let's speculate on the relative costs/benefits of the GPGPU features of Fermi for the graphics world.

Some things like DP and ECC wouldn't help too much.
There was a paper on the possible benefits of on-chip ECC on certain persistent data structures like long-lived graphics state where an upset could do more than just temporarily flip a pixel. I wouldn't bet on Nvidia adding ECC to the fixed-function pipeline, but if it uses its caches to hold some state data they could have a slight anciliary benefit, though Nvidia could just as easily inactivate it for the sake of product segmentation.

The streamlined memory model may not help users, but could it help or hinder Nvidia's driver and compiler developers?
It may be conceptually easier to write a driver to manage resources that are arranged in an orderly memory space, though the mechanism the hardware uses to map addresses to resources may potentially be a bottleneck.

The revamped memory hieararchy may have some benefits in the larger L2, and possibly better atomic operation performance.
 
Last edited by a moderator:
Logically, the target of Fermi on the pure GPU end would, at best, be the GTX295, particularly so considering the amount of chip resources focused on GPGPU.

The 5870 is a highly optimized, extremely efficient and very focused pure GPU chip, and at that barely achieved rough parity with the GTX295. It is unlikely Fermi will be more than marginally better, much less make a move on the 5970.

Maybe you missed it, but the HD 5870 is barely 50% faster than the HD 4870. Less even than the HD 4890. If that's "efficient for you" then maybe you were one of those that thought R600 wasn't so bad...

And if Fermi based GeForce does achieve GTX 285 SLI levels, it will very much reach the HD 5970 on many occasions, even if losing by a hair. Having said that, GTX 295 performance levels are the worst case scenario.

spigzone said:
"“We expect [Fermi] to be the fastest GPU in every single segment. The performance numbers that we have [obtained] internally just [confirms] that."

We 'EXPECT' ~= 'we are desperately trying to achieve' ~= 'it's not CURRENTLY the fastest' ...

So you rationalize Fermi's performance based on what some PR guy said to someone ?...
PR is just...well...PR. Wait until the product is released to make any conclusions on if the guy was right or not.

spigzone said:
Eight or ten months of optimization/respins down the road Fermi, likely, could trounce the 5870. But the REPUTATION of Fermi in the graphics space will be set in stone when it is released, not eight or ten months from now. But a 5890 is a near certainty and after all, how much more card does on need?

So you expect that NVIDIA, that has been designing GPUs for ages and successful ones at that, will fail to outclass the HD 5870's performance, in terms of single GPU card ?!
On top of that, you also seem to "expect" that a HD 5890 will be something phenomenal. Have tou seen the HD 4890 ? Barely 10% faster than the HD 4870 across the board. Assuming the same for the HD 5890 over the HD 5870 and your standards for ATI are as low as your standards are for NVIDIA.

spigzone said:
Keep in mind the purpose of Eyefinity was to expand the market for current high end ATI cards as those cards are now running ahead of gaming demands at 24" monitors and below. When one can max out every game setting on every current game and achieve nicely playable framerates on their 24" monitor, one doesn't NEED a better card, and the 5850 already provides that capability. Very QUIETLY and EFFICIENTLY.

All games maxed on a HD 5850 ? Sorry, quite far from it...
Also, most games are console based and then ported to the PC. Nowadays, most game's main SKU is the console one, which is eventually ported to the PC. Requirements are low, which makes it possible for even last generation of cards be able to max the game, even at the highest resolutions.

As for your other "point", you seem to be saying that since a HD 5850 can offer decent performance, there's no need for a better card. I couldn't disagree more. A 8800 GTX is still enough for most games nowadays, mostly because of the reasons I mentioned above, but what about the other games ? And the upcoming ones ? Even consoles will be more powerful in the future and PC graphics need to be up to par to any of those more demanding games - like Crysis or STALKER were - so no, tech must evolve and more powerfil cards need to appear, so that software pushes the limits of that new hardware, so that even more powerful hardware appears in the future.

spigzone said:
Personally, I think it totally SUX becuase I WANT keen competitiveness in the market. That 5850 card is all I need with a 1080p bigscreen and only Nvidia being on their game will drive that price into the sweet sweet sub $200 range.

So you are openly expecting NVIDIA to fail with Fermi, since you can't seem to imagine that they can release a card that competes with the HD 5850. Even the GTX 285 does...

spigzone said:
But I post based on what I percieve as the hard realities of a situation, not the intracranialand of fantasyville ~= Fermi ain't gonna 'git er done'.

Yes, I'll surely take your word for it and not use what we know about Fermi's specs, to reach a more valuable conclusion :rolleyes:
 
Last edited by a moderator:
Fermi derivatives all have DP support(?)

And I so feel at home with my Transformers hobby right now.
 
That is because ATi's architecture is superscalar VLIW , it needs many cores to achieve maximum throughput , to compensate for the deficiencies in optimizing compiler code .:

OTOH, I'd say that ATi needs lot's of ILP in the code to achieve close to maximum throughput. As for their compiler, it is very good in terms of extracting ILP, atleast for common shader code.

On the other hand , Nvidia's architecture is Scalar , giving the appropriate number of shaders and at appropriate clocks , it can give explosive results , and following Rys' expectations , I can truly think that it will compete with Hemlock .

Actually, ati has more efficient alu's wrt nv, even if you ignore the 5x VLIW advantage they have. :)
 
Published roadmaps are broad stroke and conservative by nature.

Thuban is, literally, a 'throw whatever we got into the gaping breach' stopgap measure.

Logically, one would assume AMD is acutely focused on getting Bulldozer into the marketplace as soon as humanly possible. If things break their way, that is likely to be in Q4 2010.

There's a difference between some article claiming early 2010, your gut feeling going for Q4 and official roadmaps which are usually in the if all goes according to plan realm and are either on time or slightly delayed. The cases where something arrives earlier are rare.

As for the rest of the GF100 rambling you put up there it sounds almost as if you wouldn't want something faster on the market to have a better justification for your investment.
 
OTOH, I'd say that ATi needs lot's of ILP in the code to achieve close to maximum throughput. As for their compiler, it is very good in terms of extracting ILP, atleast for common shader code.
I actually meant that increasing total core count , would increase their minimum core utilization which helps drive performance up , for example HD4870 has a minimum of 160 core utilization , now HD5870 has a minimum of 320 .

Actually, ati has more efficient alu's wrt nv, even if you ignore the 5x VLIW advantage they have. :)
By that do you mean : Areal efficiency , or design efficiency , or instruction per clock efficiency ?
 
Captain, she's gonna blow!

Logically? I don't think so. A GTX295 is at best 40% faster than a GTX 285. You think 40% faster than GTX 285 is a logical Fermi target? Sorry but that's :LOL: worthy. I love computerbase by the way, they make these comparisons easy and constantly update their numbers (these are as of 12/31).

More likely ~60/70% faster, as an target - in an ideal scenario ... after all Fermi is pulling double duty. But the scenario is not ideal, quite the opposite in fact, and then there's the 5890 lurking in the wings which AMD will have had 9+ months to optimize (they were showing working boards and final silicon in June) by the time Fermi is released. One might expect a 20-25%, maybe more, increase in performance over the 5870 and AMD patiently waiting until Nvidia sets a hard date for Fermi's introduction and then releasing the 5890 the week before. In the charts you referenced, the 5870 clearly bests the GTX 295 at maxed out AA/AF settings at 1920 and above. The 5890 will add substantially to that lead. So even a 60/70% increase over the 285 isn't likely to get Fermi to where it needs to be.

Extremely efficient? Not in the least when compared to RV770. It seems like you have an opinion and are trying to convince yourself of its veracity :)

A recurring observation across the websites doing in depth analyses of Cypress was the surprising density of transistors per square mm achieved. Seemingly a new benchmark for graphics chips. This was bourne out in the watts/performance benchmarks which also set new highs. I posit that qualifies for the appellation 'extremely efficient'. :oops:
 
I have doubts it will surpass Hemlock, except in cases where Crossfire scaling just sucks. But I also can't see it not coming very close of it. Let's say exactly like the HD 5870 is with the GTX 295. Slower on average, but not by much and winning in some cases, while consuming less power. If they priced this GPU (under these performance speculations) @ $550, the fact that they are late, will be almost completely ignored, given the performance/$ they are offering.

I don't expect it to surpass Hemlock. $550 is too much IMHO for a 1.5GB ram SKU.
 
In the charts you referenced, the 5870 clearly bests the GTX 295 at maxed out AA/AF settings at 1920 and above. .
That is because GTX 295 is memory limited , it has 896MB per GPU , whereas HD5870 has 1 GB .
The 5890 will add substantially to that lead. So even a 60/70% increase over the 285 isn't likely to get Fermi to where it needs to be
Have you ever seen benchmarks for overclocking HD5870 to 1.0GHz ? the difference is barely 10% .
 
Last edited by a moderator:
A recurring observation across the websites doing in depth analyses of Cypress was the surprising density of transistors per square mm achieved. Seemingly a new benchmark for graphics chips. This was bourne out in the watts/performance benchmarks which also set new highs. I posit that qualifies for the appellation 'extremely efficient'. :oops:


Hmm just an observation, are you talking about mm/performance, mm/watts or watts/performance, you managed to talk about all of those in that statement........:oops:
 
More likely ~60/70% faster, as an target - in an ideal scenario ... after all Fermi is pulling double duty. But the scenario is not ideal, quite the opposite in fact, and then there's the 5890 lurking in the wings which AMD will have had 9+ months to optimize (they were showing working boards and final silicon in June) by the time Fermi is released. One might expect a 20-25%, maybe more, increase in performance over the 5870 and AMD patiently waiting until Nvidia sets a hard date for Fermi's introduction and then releasing the 5890 the week before. In the charts you referenced, the 5870 clearly bests the GTX 295 at maxed out AA/AF settings at 1920 and above. The 5890 will add substantially to that lead. So even a 60/70% increase over the 285 isn't likely to get Fermi to where it needs to be.

Well all you're really doing is low-balling Fermi and high-balling 5890 without substantiating either. What leads you to believe that an HD5890 will be a bigger jump than HD4890 was? GT200 did pretty well in the efficiency stakes vs G92 given measured vs theoretical numbers. So why lowball Fermi's advantage?

A recurring observation across the websites doing in depth analyses of Cypress was the surprising density of transistors per square mm achieved. Seemingly a new benchmark for graphics chips. This was bourne out in the watts/performance benchmarks which also set new highs. I posit that qualifies for the appellation 'extremely efficient'. :oops:

Efficiency based on die size? Wouldn't something more relevant like performance per unit be a more useful metric when discussing architectural efficiency? :D
 
All games maxed on a HD 5850 ? Sorry, quite far from it...
Also, most games are console based and then ported to the PC. Nowadays, most game's main SKU is the console one, which is eventually ported to the PC. Requirements are low, which makes it possible for even last generation of cards be able to max the game, even at the highest resolutions.

:?::oops::?:
 
Back
Top