NVIDIA confirms Next-Gen close to 1TFlop in 4Q07

B3D News

Beyond3D News
Regular
In recent analyst conferences that were publicly webcast on NVIDIA's website, Michael Hara (VP of Investor Relations) has claimed that their next-generation chip, also known as G92 in the rumour mill, will deliver close to one teraflop of performance. In a separate answer to an analyst's question, he also noted that they have no intention from diverging from the cycle they have adopted with the G80, which is to have the high-end part ready at the end of the year and release the lower-end derivatives in the spring.

Read the full news item
 
and a short mention of Intel's upcoming GPU efforts through their Larrabee project. Micahel Hara seemed far from certain about Intel's exact strategy there,

I feel ya, Michael. I'm about ready to print up "WTFIIUTWG?"* buttons and rubber wristbands. Who else wants one? :p Tho slightly more seriously, it's not just Larrabee, it's how Larrabee fits into the whole gpu picture they seem to be stitching together over there at Intel.


*"What The F*** Is Intel Up To With GPUs?"
 
Btw, re the scheduling and now taking G80 as the "norm" going forward, then we're now assuming they've just permenently slipped their schedule 6 months in two three month add-ons over the course of 2005 and 2006. So NV40 is Spring 2004, G70 is Summer 2005, G80 is Fall 2006. But now they're going to stick firm at Fall, rather than go, say, Winter 2007/8? Of course the other problem with that telling, is it's pretty clear that G80 was intended to be a Summer 2006 product originally. . . Marv said so in late 2005.
 
Hmmm, let me see.

AMD readies Barcelona and Intel pulls out the Nehalem-hammer.

AMD readies the R6xx-refresh and nVidia pulls out the G92-hammer.

:runaway:
 
I feel ya, Michael. I'm about ready to print up "WTFIIUTWG?"* buttons and rubber wristbands. Who else wants one? :p Tho slightly more seriously, it's not just Larrabee, it's how Larrabee fits into the whole gpu picture they seem to be stitching together over there at Intel.


*"What The F*** Is Intel Up To With GPUs?"
don't be missunderstood, these days NV cares much more about Larrabee than AMD. Larrabee is dangerous because nobody at NV knows what Intel is preparing. it puts a strong pressure on G100 and that's why the best NV team (leaded by Erik Lindholm for Computation macroarch) is working on it with so many resources right now (G100 will cost around 1 billion USD of R&D). NV knows that the company's survive will depend on how good will be G100. With Fusion and Larrabee 2008 will the most interesting year in GPU for long time...
 
$1B? Where did that number come from? I doubt it will be that much. I'd have to look around for the G8x family number, but it's not in that neighborhood, and that was a four year project with unified shaders and DX10.
 
I feel ya, Michael. I'm about ready to print up "WTFIIUTWG?"* buttons and rubber wristbands. Who else wants one? :p Tho slightly more seriously, it's not just Larrabee, it's how Larrabee fits into the whole gpu picture they seem to be stitching together over there at Intel.


*"What The F*** Is Intel Up To With GPUs?"



You know IMHO, I think Intel is playing everyone like fools. I feel they have something up their sleeve.
 
You know IMHO, I think Intel is playing everyone like fools. I feel they have something up their sleeve.
So you think Larrabee isn't "it"? I don't disagree, I don't think it's what they're thinking of for the "traditional GPU market" either, but we'll see.
 
You know IMHO, I think Intel is playing everyone like fools. I feel they have something up their sleeve.

Well, my issue is they have *too many things up their sleeve* to figure out how they piece together. Do they even piece together, or are timelines different enough that some bits don't actually overlap with other bits? Larrabee, X3000, PVR license, discrete gpu boards, Ray Tracing. . . . :cry:
 
Is that 1teraflop including the missing MUL or not? And why just "almost"?

Computationally, it sounds like G92 is <= 2xR600. 256 units at 1800Mhz works without missing MUL. If they're counting the extra flop (mul), though, even 192 units at 1800 is over a tflop.

I expect them to be late again as well. That would seem to leave the window open for ATI to play spoiler, except that ATI looks like it needs to carve a little transistor fat out and leave some room for some texture units. If G92 scales texture units as well as computational ability, then we're probably waiting for R700 for competition.

I'm going to refrain from commenting on the impending miracle of G100, but I will note that I bet it's gotten significantly more expensive to do R&D in the Valley since G80 days.
 
Is that 1teraflop including the missing MUL or not? And why just "almost"?
If you look at the 'Beyond G80' thread, on one page, you'll see a dicussion about 192 SPs at 2.5GHz. That would certainly collide nicely with "nearly 1TFlop" while excluding the MUL, wouldn't it?
 
Given that they claim 512Gflops for G80 I would be surprised if that "almost 1TFlop" comment was with regard to only MADD flops. I'm thinking 160 G80 class shaders @ 2Ghz. That'll put you at 0.96TFlop if you count the MUL. Almost 1TFlop of MADDs would be insane given that the fastest G80 right now only manages about 410Gflops.
 
Given that they claim 512Gflops for G80 I would be surprised if that "almost 1TFlop" comment was with regard to only MADD flops. I'm thinking 160 G80 class shaders @ 2Ghz. That'll put you at 0.96TFlop if you count the MUL. Almost 1TFlop of MADDs would be insane given that the fastest G80 right now only manages about 410Gflops.

The truth is that that shader core architecture is pure magic!.
 
Given that they claim 512Gflops for G80 I would be surprised if that "almost 1TFlop" comment was with regard to only MADD flops. I'm thinking 160 G80 class shaders @ 2Ghz. That'll put you at 0.96TFlop if you count the MUL. Almost 1TFlop of MADDs would be insane given that the fastest G80 right now only manages about 410Gflops.
Actually, he was speaking of the GPU Computing potential of the chip, and the CUDA docs don't consider the extra MUL... But yeah, it's not impossible they're bending the truth a bit, although 1 teraflop is hardly unimaginable if they lower the TEX-ALU ratio.
 
Well, my issue is they have *too many things up their sleeve* to figure out how they piece together. Do they even piece together, or are timelines different enough that some bits don't actually overlap with other bits? Larrabee, X3000, PVR license, discrete gpu boards, Ray Tracing. . . . :cry:
Take a look at this as I'm sure you know.

http://www.intel.com/research/platform/terascale/teraflops.htm


The research chip implements 80 simple cores, each containing two programmable floating point engines—the most ever to be integrated on a single chip. Floating point engines are used for accurate calculations, such as for graphics as well as financial and scientific modeling.

The die size of this chip is no bigger than your finger tip, consumes 63 watts, and performs 1 trillion floating point operations a second.
 
The die size of this chip is no bigger than your finger tip, consumes 63 watts, and performs 1 trillion floating point operations a second.
http://forum.beyond3d.com/showthread.php?t=38577
Purely IMO, it's little more than a toy. A nice toy, I'll admit, but it's still 275mm² and would be much larger if it was to actually be able to do something useful.

Unlike what you may think, its efficiency is far from perfect too, although in the grander scheme of things it could be argued that it's decent: http://realworldtech.com/page.cfm?ArticleID=RWT040307000414&p=6

It's interesting you bring up Polaris though. What makes Intel magically able to get to 1TFlop on a 275mm² die, and NVIDIA/ATI unable to? I'm sure Intel's management must be hoping that there is indeed something that prevents their competitors to get there, but sadly for them, there doesn't seem to be.
 
Intel stated that Polaris is a test vehicle for a number of techniques that might migrate to new architectures, and the most interesting of those techniques have little to do with FLOPS.

The clocking scheme, communications fabric, and the stacked memory are more important.

While getting 1 TF is not out of the reach of any of Intel's competitors, the ability to synthesize a lot of technology to make the FLOPS count usable is something Intel is getting a good start on.

GPUs already have some of the more complex clocking schemes. G80 is a nice example.

For more complex workloads, they could stand to have better communication between subunits. Some of AMD's patents with regards to array processor communications and the ring-bus might be indicators that the former ATI has some ground work there.
G80's PDC is a start for Nvidia as well.

The stacked memory that supplies the bandwidth is something Intel can leverage far more readily, thanks to its manufacturing capability.
Until TSMC offers something similar, Nvidia and AMD won't match that.
At the very least, it opens up parts of the chip that would otherwise be dedicated to large caches or extra threads needed to hide latency.
Intel's future designs might be able to skate by with less latency tolerance if the nearby memory pool is sufficient.
 
http://forum.beyond3d.com/showthread.php?t=38577
Purely IMO, it's little more than a toy. A nice toy, I'll admit, but it's still 275mm² and would be much larger if it was to actually be able to do something useful.

Unlike what you may think, its efficiency is far from perfect too, although in the grander scheme of things it could be argued that it's decent: http://realworldtech.com/page.cfm?ArticleID=RWT040307000414&p=6

It's interesting you bring up Polaris though. What makes Intel magically able to get to 1TFlop on a 275mm² die, and NVIDIA/ATI unable to? I'm sure Intel's management must be hoping that there is indeed something that prevents their competitors to get there, but sadly for them, there doesn't seem to be.

I was merely throwing out Polaris as a example because I think Larrabee is just a little scratch on the surface. This "toy" I think is their ground work on what is some things to come.

Intel has already hit the two teraflop barrier with polaris, not just the one.
 
If you look at the 'Beyond G80' thread, on one page, you'll see a dicussion about 192 SPs at 2.5GHz. That would certainly collide nicely with "nearly 1TFlop" while excluding the MUL, wouldn't it?

Yes, and it's easier to swallow 128 --> 192 units, than 128sp --> 256 dp-capable units on a process with only 90% more room....

-Dave
 
Back
Top