Nvidia GT300 core: Speculation

trinibwoy · Sep 29, 2009

Hmmm in the course of a day we've gone from GT300 showing up in 6 months to GF100 being as fast as RV870X2. Can't wait to see where we end up tomorrow

AlNom · Sep 29, 2009

To nFinity and beyond of course.

KonKort · Sep 29, 2009

Kaotik said:
Worst case? For all we know, it could be another NV30-case, the worst case is much, much worse than that

Why are you so skeptical about the chip? If this is your opinion, you will be surprised in the next months.

-The_Mask- · Sep 29, 2009

KonKort said:
Let's look to the worst case: HD 5870 X2 is 20% faster than Geforce 380. Then you must ask you for which price, because Geforce 380 is a Single-GPU (no Multi-core profiles, no micro stuttering etc.), who will not consume more energy than GTX 280. If I look to HD 5870 X2, I hope it will consume under 275 watts.

Impossible if you ask me.

dnavas · Sep 29, 2009

MfA said:
They are probably both already VLIW+SIMD at this point (well NVIDIA is more LIW+SIMD but same difference). Whatever else happens VLIW is there to stay for a while yet IMO.

Sure. Those weren't necessarily either/or scenarios....

I like VLIW's chances better if it can build an "IW" using work from multiple threads. [My understanding of HP's foray into VLIW was that it was less successful than hoped for.] Mind you, there are probably easier ways of thinking of that kind of architecture than VLIW.... Simultaneous Asymmetric Dispatch or something with less of a, err, sad acronym.

I'm also wondering if the way that DP works affected the thinking of how MADD might work. Instead of single-cycle MADD, maybe it makes more sense to have two units working on the same piece of data, across two cycles, the results of one feeding the other. Across enough work, it's basically the same speed, but it seems like less work has to happen within a cycle (fewer gates, allowing for higher clocks), and it would seem easier to expand your LIW repertoire. Of course, maybe that's how MADD works now anyway :shrug:

SirPauly · Sep 29, 2009

trinibwoy said:
Hmmm in the course of a day we've gone from GT300 showing up in 6 months to GF100 being as fast as RV870X2. Can't wait to see where we end up tomorrow

The beauty of forum conjecture at times!

Kaotik · Sep 29, 2009

KonKort said:
Why are you so skeptical about the chip? If this is your opinion, you will be surprised in the next months.

Chips being late month(s) haven't had a good track record usually

But I don't have any real opinions on how it will perform, just keeping that as a possibility too.
I'd say that in a good scenario from nVs point is that it's as much faster than Cypress as GT200 was faster than RV770

MfA · Sep 29, 2009

trinibwoy said:
Hmmm in the course of a day we've gone from GT300 showing up in 6 months to GF100 being as fast as RV870X2. Can't wait to see where we end up tomorrow

I still don't think it will ship this year ... hell, I don't think we will get a clear shipping date when the first official information is released.

KonKort · Sep 29, 2009

Kaotik said:
Chips being late month(s) haven't had a good track record usually

Why is GF100 late? How can you judge in this direction? I reported in January that Nvidia's next generation chip will come in Q4/2009. So where do you see a delay?

You cannot say that GF100 has delayed only because of the fact that AMD has got first DirectX 11-chips few weeks before.
But I will not deny that Nvidia's chip has got some problems in the summer and could be already in the market.

Slappi · Sep 29, 2009

MfA said:
I still don't think it will ship this year ... hell, I don't think we will get a clear shipping date when the first official information is released.

It will be ready for Christmas builds.

If you are lucky you will get one before Thanksgiving.

nutball · Sep 29, 2009

Anyone else getting the feeling that two parallel Universes have become entangled? With the different names, dates, specs, problems/non-problems it's like we're talking about two different parts from two different companies.

I need a lie down.

Ailuros · Sep 29, 2009

I'm not making any bets anymore. Last time I had to write a public apology to Rys LOL.

Chris123234 · Sep 29, 2009

nutball said:
Anyone else getting the feeling that two parallel Universes have become entangled? With the different names, dates, specs, problems/non-problems it's like we're talking about two different parts from two different companies.

I need a lie down.

It all makes sense now!

Slappi · Sep 29, 2009

Ailuros said:
I'm not making any bets anymore. Last time I had to write a public apology to Rys LOL.

What was the bet?

Ailuros · Sep 29, 2009

It is not always an advantage to try to be a man of your word, yet a promise is a promise. Days before the G80 launch while chatting in the B3D IRC channel I told Rys that if I manage to buy a 8800GTX before the start of December I‘d owe him a public apology. G80 was officially announced on November 8 and I was holding a Gainward Bliss 8800GTX in my hands on the 14th. Thus, I have no other choice than to send him my apologies. Don’t worry though this one is amongst the cases where I simply love being wrong.

I started a humble little write up back then with that paragraph.

Jawed · Sep 29, 2009

dnavas said:
Wouldn't SFUs mostly scale by texture unit, rather than SP?
If textures remain fixed at 8 per TPC, I would think 4 SFUs would suffice (assuming, as you say, that these aren't indicating per-lane).

D3D10.1 requires 32 vec4 attributes per vertex to be supported, as opposed to 16 in D3D10. So that doubling in interpolation workload might steer the architects in the direction of increasing interpolation rate. Except, of course, that merely by adding ALUs, the increase occurs. So really what it comes down to is rasterisation:interpolation rate.

The other important question is, what the fuck is "pull model interpolation" in D3D11?

Jawed

Jawed · Sep 29, 2009

dnavas said:
Hmm, are you arguing for moving some of the "SF" instructions into the SPs? I would think log/rcp would be really useful there (hence my previous link). SF could be relegated to sin/cos approximations, or those blue dots might be something else entirely.

That's a really bad idea:

the techniques for calculating all kinds of transcendentals have common hardware structures, so splitting these structures into distinct units is simply a waste
the general trend should be for less acceleration of transcendentals, not more - in general computation transcendentals are much less commonly used (about 5% if I remember right) than the hardware provides for (~25%)

Jawed

Jawed · Sep 29, 2009

DegustatoR said:
Why would a chip with +50% complexity be "minimum on par" with dual HD5870?..

There are so many ways to make a chip "complex". Do it smartly and you can get way more performance - it's a question of how radical you're prepared to be.

For example it's like comparing the performance of two GPUs: one with early-Z rejection and one without. How do you "measure" complexity there? All you can do is talk about how the transistor/mm²/power budgets were spent.

This is the Larrabee bet: let's say for 5 billion transistors on 28nm for 200W Larrabee overtakes the more traditional GPUs.

What I find disappointing about R800 is that apart from doubling the RBEs, deleting SPI and tweaking up the GDS size and implementing UAV/append/consume-specific buffers I don't get any real sense of the architecture taking a leap forwards. Of course, apart from the RBEs, it's hard to tell how effective the rest has been (or how well tessellation actually works). And my long standing argument is that the architecture (underlying design of units) is actually solid enough to work for a long long time. And there are games whose performance is ~1.8-1.9x HD4890. So it's not even as bad as it sometime seems. So there's a degree of wait-and-see about it.

Anywya, I'm still expecting NVidia to be pretty radical. NVidia's RV670->RV770 as it were. Plus some, with a bit of luck.

Jawed

DegustatoR · Sep 29, 2009

Jawed said:
Anywya, I'm still expecting NVidia to be pretty radical. NVidia's RV670->RV770 as it were. Plus some, with a bit of luck.

We can certainly hope for something like this. But RV670->RV770 was accomplished by eliminating mostly obvious mistakes in the R600 design and by some magic which allowed them to pack 2.5 times more ALUs in almost the same complexity (transistors and die size). So while we can hope for GF100 to somewhat repeat that success i'd say that counting on it as a "minimum" is highly unrealistic.

I expect GF100 to have a bigger performance advantage above Cypress than GT200 had above RV770 while the difference in complexity will be smaller. But I don't think that it's wise to expect Hemlock-level performance from one GF100 chip.

MfA · Sep 29, 2009

The arbitrary R/W in the LDS is important too.

I'm not convinced there are any big architectural leaps left to make, DWF seems something which can handled in software ... the only important leap left to make IMO is to fold the pixel cache into L2 (making it read/write, with coherency being guaranteed by relatively simple fences ... doesn't give the low latency cross core coherency of Larrabee, but I don't think that's really necessary). After that I don't really see how it will be much more difficult to program than say Larrabee, if you want to use the option of using the LDS with their comparitively huge gather bandwidths it will be harder to program ... but it's good to have options.

Nvidia GT300 core: Speculation

trinibwoy

Meh

AlNom

Moderator

KonKort

-The_Mask-

dnavas

SirPauly

Kaotik

Drunk Member

MfA

KonKort

Slappi

nutball

Ailuros

Epsilon plus three

Chris123234

Slappi

Ailuros

Epsilon plus three

Jawed

Jawed

Jawed

DegustatoR

MfA

Similar threads