Nvidia GT300 core: Speculation

Status
Not open for further replies.
Hmmm in the course of a day we've gone from GT300 showing up in 6 months to GF100 being as fast as RV870X2. Can't wait to see where we end up tomorrow :LOL:
 
Let's look to the worst case: HD 5870 X2 is 20% faster than Geforce 380. Then you must ask you for which price, because Geforce 380 is a Single-GPU (no Multi-core profiles, no micro stuttering etc.), who will not consume more energy than GTX 280. If I look to HD 5870 X2, I hope it will consume under 275 watts.

Impossible if you ask me. :p
 
They are probably both already VLIW+SIMD at this point (well NVIDIA is more LIW+SIMD but same difference). Whatever else happens VLIW is there to stay for a while yet IMO.

Sure. Those weren't necessarily either/or scenarios....

I like VLIW's chances better if it can build an "IW" using work from multiple threads. [My understanding of HP's foray into VLIW was that it was less successful than hoped for.] Mind you, there are probably easier ways of thinking of that kind of architecture than VLIW.... Simultaneous Asymmetric Dispatch or something with less of a, err, sad acronym.

I'm also wondering if the way that DP works affected the thinking of how MADD might work. Instead of single-cycle MADD, maybe it makes more sense to have two units working on the same piece of data, across two cycles, the results of one feeding the other. Across enough work, it's basically the same speed, but it seems like less work has to happen within a cycle (fewer gates, allowing for higher clocks), and it would seem easier to expand your LIW repertoire. Of course, maybe that's how MADD works now anyway :shrug:
 
Why are you so skeptical about the chip? If this is your opinion, you will be surprised in the next months.

Chips being late month(s) haven't had a good track record usually ;)
But I don't have any real opinions on how it will perform, just keeping that as a possibility too.
I'd say that in a good scenario from nVs point is that it's as much faster than Cypress as GT200 was faster than RV770
 
Hmmm in the course of a day we've gone from GT300 showing up in 6 months to GF100 being as fast as RV870X2. Can't wait to see where we end up tomorrow :LOL:
I still don't think it will ship this year ... hell, I don't think we will get a clear shipping date when the first official information is released.
 
Chips being late month(s) haven't had a good track record usually ;)
Why is GF100 late? How can you judge in this direction? I reported in January that Nvidia's next generation chip will come in Q4/2009. So where do you see a delay?

You cannot say that GF100 has delayed only because of the fact that AMD has got first DirectX 11-chips few weeks before.
But I will not deny that Nvidia's chip has got some problems in the summer and could be already in the market.
 
I still don't think it will ship this year ... hell, I don't think we will get a clear shipping date when the first official information is released.

It will be ready for Christmas builds.

If you are lucky you will get one before Thanksgiving.
 
Anyone else getting the feeling that two parallel Universes have become entangled? With the different names, dates, specs, problems/non-problems it's like we're talking about two different parts from two different companies.

I need a lie down.
 
I'm not making any bets anymore. Last time I had to write a public apology to Rys LOL.
 
Anyone else getting the feeling that two parallel Universes have become entangled? With the different names, dates, specs, problems/non-problems it's like we're talking about two different parts from two different companies.

I need a lie down.

It all makes sense now!
 
It is not always an advantage to try to be a man of your word, yet a promise is a promise. Days before the G80 launch while chatting in the B3D IRC channel I told Rys that if I manage to buy a 8800GTX before the start of December I‘d owe him a public apology. G80 was officially announced on November 8 and I was holding a Gainward Bliss 8800GTX in my hands on the 14th. Thus, I have no other choice than to send him my apologies. Don’t worry though this one is amongst the cases where I simply love being wrong.

I started a humble little write up back then with that paragraph.
 
Wouldn't SFUs mostly scale by texture unit, rather than SP?
If textures remain fixed at 8 per TPC, I would think 4 SFUs would suffice (assuming, as you say, that these aren't indicating per-lane).
D3D10.1 requires 32 vec4 attributes per vertex to be supported, as opposed to 16 in D3D10. So that doubling in interpolation workload might steer the architects in the direction of increasing interpolation rate. Except, of course, that merely by adding ALUs, the increase occurs. So really what it comes down to is rasterisation:interpolation rate.

The other important question is, what the fuck is "pull model interpolation" in D3D11?

Jawed
 
Hmm, are you arguing for moving some of the "SF" instructions into the SPs? I would think log/rcp would be really useful there (hence my previous link). SF could be relegated to sin/cos approximations, or those blue dots might be something else entirely.

That's a really bad idea:
  • the techniques for calculating all kinds of transcendentals have common hardware structures, so splitting these structures into distinct units is simply a waste
  • the general trend should be for less acceleration of transcendentals, not more - in general computation transcendentals are much less commonly used (about 5% if I remember right) than the hardware provides for (~25%)
Jawed
 
Why would a chip with +50% complexity be "minimum on par" with dual HD5870?..
There are so many ways to make a chip "complex". Do it smartly and you can get way more performance - it's a question of how radical you're prepared to be.

For example it's like comparing the performance of two GPUs: one with early-Z rejection and one without. How do you "measure" complexity there? All you can do is talk about how the transistor/mm²/power budgets were spent.

This is the Larrabee bet: let's say for 5 billion transistors on 28nm for 200W Larrabee overtakes the more traditional GPUs.

What I find disappointing about R800 is that apart from doubling the RBEs, deleting SPI and tweaking up the GDS size and implementing UAV/append/consume-specific buffers I don't get any real sense of the architecture taking a leap forwards. Of course, apart from the RBEs, it's hard to tell how effective the rest has been (or how well tessellation actually works). And my long standing argument is that the architecture (underlying design of units) is actually solid enough to work for a long long time. And there are games whose performance is ~1.8-1.9x HD4890. So it's not even as bad as it sometime seems. So there's a degree of wait-and-see about it.

Anywya, I'm still expecting NVidia to be pretty radical. NVidia's RV670->RV770 as it were. Plus some, with a bit of luck.

Jawed
 
Anywya, I'm still expecting NVidia to be pretty radical. NVidia's RV670->RV770 as it were. Plus some, with a bit of luck.
We can certainly hope for something like this. But RV670->RV770 was accomplished by eliminating mostly obvious mistakes in the R600 design and by some magic which allowed them to pack 2.5 times more ALUs in almost the same complexity (transistors and die size). So while we can hope for GF100 to somewhat repeat that success i'd say that counting on it as a "minimum" is highly unrealistic.

I expect GF100 to have a bigger performance advantage above Cypress than GT200 had above RV770 while the difference in complexity will be smaller. But I don't think that it's wise to expect Hemlock-level performance from one GF100 chip.
 
The arbitrary R/W in the LDS is important too.

I'm not convinced there are any big architectural leaps left to make, DWF seems something which can handled in software ... the only important leap left to make IMO is to fold the pixel cache into L2 (making it read/write, with coherency being guaranteed by relatively simple fences ... doesn't give the low latency cross core coherency of Larrabee, but I don't think that's really necessary). After that I don't really see how it will be much more difficult to program than say Larrabee, if you want to use the option of using the LDS with their comparitively huge gather bandwidths it will be harder to program ... but it's good to have options.
 
Last edited by a moderator:
Status
Not open for further replies.
Back
Top