NVIDIA GF100 & Friends speculation

Time to crush the half rate 32-bit integer nonsense. According to the Cuda 3.0 Programming Guide, Section 5.4.1, table 5.1 32 bit integer multiplication is the same throughput as floating point mulitplication for compute 2.0 devices, i.e. Fermi. The half rate stuff is dead wrong.

As for 24 bit integer multiplication, the reason it's slower is that there's probably no dedicated instruction (after all, if you have full speed 32 bit, there's no real reason for wasting op code space to include 24 bit...) for it on 2.0 hardware, meaning that it has to be done in software, using bitmasking to ensure correctness. This adds at least one additional instruction, thus making 24 bit slower on 2.0 hardware.
 
Remember the old days when paper launches happened? Classically the reviews would go up and then you could buy cards only weeks later. Then nvidia got all enthused by "hard launches" and kicked sand in the face of those companies who couldn't.

So this is the ultimate "paper launch", they've not even got the card in the hands of testers this time. Unless reviewers have them before 26th March and reviews up they might as well go the whole hog and not send any out to the review sites and just let the websites buy them like other folk. That would save them a few dollars. Not sure if that would influence the reviews though ;)
 
Time to crush the half rate 32-bit integer nonsense. According to the Cuda 3.0 Programming Guide, Section 5.4.1, table 5.1 32 bit integer multiplication is the same throughput as floating point mulitplication for compute 2.0 devices, i.e. Fermi. The half rate stuff is dead wrong.

As for 24 bit integer multiplication, the reason it's slower is that there's probably no dedicated instruction (after all, if you have full speed 32 bit, there's no real reason for wasting op code space to include 24 bit...) for it on 2.0 hardware, meaning that it has to be done in software, using bitmasking to ensure correctness. This adds at least one additional instruction, thus making 24 bit slower on 2.0 hardware.
32-Bit Integer Multiplication
On devices of compute capability 1.x, 32-bit integer multiplication is implemented using multiple instructions as it is not natively supported. 24-bit integer multiplication is natively supported via the __mul24 intrinsic.
On devices of compute capability 2.0, however, 32-bit integer multiplication is natively supported, but 24-bit integer multiplication is not. __mul24 is therefore implemented using multiple instructions and should not be used (Section 5.4.1)
 
Time to crush the half rate 32-bit integer nonsense. According to the Cuda 3.0 Programming Guide, Section 5.4.1, table 5.1 32 bit integer multiplication is the same throughput as floating point mulitplication for compute 2.0 devices, i.e. Fermi. The half rate stuff is dead wrong.

I got it from here
dkanter said:
Integer multiplies (and multiply accumulates) are native 32-bits now, instead of 24-bits in GT200, although they execute at half speed – each pipeline can execute 8 integer multiplies per cycle.
 
Perhaps, you can tell us a more accurate number. ;)

Depends on the type of reticle used for the machines that are (presumably) supplied by ASML.

ASML allows designs up to 63.5 x 63.5mm though the final reticle size on 200mm wafers is only 18x22.5 on designs of 45nm max on the old PAS5500.

However.. for anything sub 45nm/300mm, you need ASML's Twinscan, which has a reticle limit of 26x33 starting at 45nm. That kind of machine would be just fine for anything GF100 sized. As far as I know TSMC uses ASML equipment.

Your 10-15mm2 variance/limit doesn't rhyme with GF100's die size the practical limit of the equipment.
 
Last edited by a moderator:
The point I was raising was that an fp32 MUL will normalise its output. e.g. if you multiply two integers that are encoded as subnormals, the ALU will return the most significant digits and normalise the result (if possible).

I've just checked it (sigh, should have done that earlier), and CUDA 1.x's mul24 returns the low 32 bits of the result. Emulation in 2.0 devices should be nothing more than a bit of masking before doing the multiplication.


Well, appears really weird and unnecessarily complex and counterproductive, from a naive math point of view. :)
I think when ATI was on their shrink-on-die-logic-at-all-costs route with R700 (I think), they removed a huge amount of redundancy in the logics, I would guess a great deal of logic has been just slightly generalized to quit several other units.
A similar re-use approach is visible in Bulldozer, the ability to split the FPU in two is really great. I also saw somewhere a schematics of the FMA block where the single FMA-unit could be split into two, an independent MUL and an ADD, with very little logic overhead.

Though it's not that I don't understand sacrifice and consense and finding-the-middle in the context of GPUs ... ah and getting rid of exactly those past decisions which accumulated year over year ...

Oh and I've just noticed that floating point exceptions are quiet in CUDA 2.0.

I understand this means, FP-"exceptions" raise only flags and you can query those "por gusto", correct?
 
As an update, I am hearing, totally unconfirmed so far, that there may be one other tapeout either done or pending. I am far from 100% on this one though.

-Charlie

Again mayn thanks for you very interesting insight in what is going on behind the scences. I hope you will inform us quickly if the Tape-Out has happened and if the chip was dead or not and how many re-spins they need. It is always nice to get first hand unbiased information.
 
Shane Baxtor, Jekyll & Hyde revisited

For starters if the price we received on the GTX 470 is right with it lining up with the HD 5850 yet it offers performance better then a HD 5870 this will become a great buy. Throw in the fact you have CUDA, PhysX, 3DVision and those other features and you’ll probably find yourself extremely happy.

Even if the GTX 400 series does come out and spank the HD 5800 series you’ve had an awesome run, and at the end of the day the release of the GTX 400 series isn’t going to make the HD 5800 series performance worse, it’ll simply make it look worse. Personally while I think NV can make it look worse in the performance numbers I think that the ATI models will still retain the massive value for money perception.

Wait, Don’t Wait; it really doesn’t matter. If the GTX 470 perform the same as the HD 5850 and the GTX 480 performs the same as the HD 5870 it has to be cheaper, if they’re faster they can only be slightly more expensive.

How can a GTX470 be faster than a HD5870, at the price of a HD5850 and the latter two still maintain it's "value for money perception"?
 
A similar re-use approach is visible in Bulldozer, the ability to split the FPU in two is really great. I also saw somewhere a schematics of the FMA block where the single FMA-unit could be split into two, an independent MUL and an ADD, with very little logic overhead.

AFAICT, only the fetch and decode is being reused between the two cores. Each core had it's own 128 bit FPU in barcelona. They just have been put together in bulldozer.
 
With all due respect to mr. Baxtor (who I had never heard of before today) but at best I see him being used in the same way Fudo was used with the ever shifting launch dates ...
 
How can a GTX470 be faster than a HD5870, at the price of a HD5850 and the latter two still maintain it's "value for money perception"?
Well, obviously ATI would have to drop its prices. This would be fantastic for gamers, but I'm not sure it would make a whole lot of sense for nVidia, for two reasons:

1. I'm pretty sure that nVidia still has somewhat greater brand recognition on average.
2. The GF100 has some additional benefits besides its average performance.

Based upon these two points, it seems natural to expect that if one were to solely compare average gaming performance, the GF100 should clock in at a somewhat worse price/performance ratio than the HD5850/HD5870, just based upon demand alone.

Edit:
On the other hand, if these low prices were correct, then that would seem to indicate that nVidia has managed to get much higher volumes of working parts than ATI, which would seem to be rather contrary to most rumors so far.
 
Depends on the type of reticle used for the machines that are (presumably) supplied by ASML.

ASML allows designs up to 63.5 x 63.5mm though the final reticle size on 200mm wafers is only 18x22.5 on designs of 45nm max on the old PAS5500.

However.. for anything sub 45nm/300mm, you need ASML's Twinscan, which has a reticle limit of 26x33 starting at 45nm. That kind of machine would be just fine for anything GF100 sized. As far as I know TSMC uses ASML equipment.

Your 10-15mm2 variance/limit doesn't rhyme with GF100's die size the practical limit of the equipment.

I dunno.

dkanter said:
Nvidia’s system architecture continues to focus on the highest performance for a monolithic GPU, and nearly filling the reticle for TSMC’s lithography systems.

May it be noted that LRB 1 and Nehalem EX are in the same ball park as well.
 
Nearly filling one of the sides, or the complete reticle? It's bit hard when you don't know what size you take as a starting point. I still stand by 24x24 for GF100. Which would be within 10% of the 26mm limit of one of the sides of the mask. So yeah, it is "nearly" there on one axis, but not on the other.

Only public info from TSMC on it's 0.13micron and up processes is a reticle size of 21x21. Unless anyone with actual knowledge of TSMC's reticle size steps up we're a bit in the dark.
 
Shane Baxtor, Jekyll & Hyde revisited

How can a GTX470 be faster than a HD5870, at the price of a HD5850 and the latter two still maintain it's "value for money perception"?

I wouldn't believe anything about Fermi until Anand or some other unbiased person has it benchmarked and has pubishable pricing info.

Anything that we hear now about Fermi either being the best thing since sliced bread or, indeed the end of mankind as we know it is largely going to be fluff. My guess is that GTX470 will have similar performance to 5850 but worse power and heat and GTX480 will be similar to 5870 with the same power/heat problems. It won't be until Nv can get proper performance silicon out with higher yields and better heat characteristics that this will be worth owning.

The key for Nv is to not let the pain of Fermi go to waste. They should have been designing the successor for at least a year by now and it should be ready to launch within a year at least (or less preferably). Fermi looks like it will scale upwards quite well so the next gen for Nv could be interesting. I will be sitting the gen out entirely, 5870 is all to difficult to get my hands onto at a decentish price and I fear the same fate with the GTX480. I mean I'm still running my venerable 8800GTX and it plays SupCom pretty well!
 
The key for Nv is to not let the pain of Fermi go to waste. They should have been designing the successor for at least a year by now and it should be ready to launch within a year at least (or less preferably).

a year sounds way too long for me, we should be having Fermi2 news a lot sooner.
 
Even when its released I wont be able to buy one. $670 for a video card ? Better make me tea and do the dishes.

Also these are MSRP's so the inflation/gouging as you call it will only drive the price o greater heigts. Look at the 5970. ITs MSRP is $600 but its gone for upwards of $800 on newegg and still commands $50-$100 over MSRP when they are instock at newegg.

So you are effectively seeing these prices as the actual MSRPs ??...:rolleyes:

eastmen said:
What proff do you wnt in a pre release speculation post ? YOu might want to reread what I said and Why I said it.

You said Charlie was right about something, which is why I asked for proof. You know, Charlie's "articles" are self-contained i.e. they have no proof to back it up, just rumor and speculation created by Charlie himself, which is why they are not proof of anything.

eastmen said:
The gtx 480 is priced $70 higher than the 5970. The 5970 is a dual chip card. Thus the 5970 is cheaper than the gtx 480. If the cards perform similar or the 5970 (a dual chip card) performs better then the only reason for nvidia t price the card so high is because its very expensive to make and there will be few of them made and thus high price with limited ovlume will keep demand down.

Nvidia has had about 6 months to see the prices of the 5870 and 4 or so for the 5970. So they know exactly where their cards fall performance wise and price wise.

Again I ask, are they up for sale ? You are still assuming those prices are the real deal, when the product wasn't even launched. How about you wait for the actual launch for the actual prices, before making such bold claims...
 
I have the feeling that the GTX470 MSRP will be the most pleasant surprise of them all.
 
I have the feeling that the GTX470 MSRP will be the most pleasant surprise of them all.

Now where have I heard this line about pleasant surprises before? Another odd similarity, it came from Greece back then too:p
 
I will be sitting the gen out entirely,
I have to agree, though for somewhat different reasons. I am, however, quite excited about the architecture, and would really like to see what game developers are able to do in the next year to make use of it (I suspect I'll get my next video card upgrade in about a year or so...).
 
Back
Top