NVIDIA GF100 & Friends speculation

But dram memory has much bigger problem with writes than reads.
FWIW: no, this is not true. There is no speed difference between reads and writes for DRAM. You could do a continuous burst of 1 GB of data from or to a DRAM in the same time.

The thing that can hurt DRAM throughput (among other things) is quickly moving from reads to writes and back, such as read-modify-writes. OTOH, you can avoid most of this penalty by grouping a lot of reads and writes together, which is something a GPU should be good at.
 
Oh and I must have missed that half tmus disabled rumors. Any credibility to that? That would be very very strange imho.

A. Guess who started it..

2. I guess Arun saw some kind of road-map or talked to someone who did. He wasn't the only one talking about "top to bottom Fermi" around that time, It basically coincided with NV's presentation of their 2010 products to partners (where I got Optimus from too.)

III. should a "fully functional" B1/B2 appear later this summer, we'll know the complete truth (or do a Jamie Hyneman on me). Me mentioning it was enough to tickle ChrisRay though.
 
Last edited by a moderator:
In a slightly funny turn of events (BSoN ElcomSoft article), Theo puts some suggestion in the footnote of an image.

30wqnie.jpg


ici
 
In a slightly funny turn of events (BSoN ElcomSoft article), Theo puts some suggestion in the footnote of an image.

30wqnie.jpg


ici
I don't get it. So it is claimed 4 of nvidias fermi gpus are faster than 2 (not even running at their highest single configuration clock) cypress? I've no trouble believing that, but it doesn't really say much :).

btw anyone else noticed that benchmark seems to run terribly bad on GTX285? Slower than HD4870, and not even 4 of them reach the performance of a single HD5870 it seems...
 
He uses quotation marks in parts of this statement :

The very interesting bit and a bit of a slap for AMD's public stance on how "CUDA is a closed standard" and developers that go developing with CUDA will suffer when they decide to switch to open standards

What is he quoting? "CUDA is a closed standard" doesn't get any hits on google.
 
btw anyone else noticed that benchmark seems to run terribly bad on GTX285? Slower than HD4870, and not even 4 of them reach the performance of a single HD5870 it seems...
Raw throughput for these parts?

If I remember correctly, HD5870 has something like... 4 times the theorical throughput of GTX285, there we are.

Note that the graph itself represents the entirety of the press release, everything else is Theo's dreams.
 
btw anyone else noticed that benchmark seems to run terribly bad on GTX285? Slower than HD4870, and not even 4 of them reach the performance of a single HD5870 it seems...

4870 has higher alu throughput than 285 (by ~2x) so the numbers make sense. In pure GPGPU number crunching the 4k/5k architecture has signficantly more resources.
 
What is he quoting? "CUDA is a closed standard" doesn't get any hits on google.
That style of quotation is also used to make an allusion (e.g. Theo Valich is a "journalist"), or to precis something too tedious to describe at length (e.g. Theo Valich has a history of "silently updating published articles"). Or, at its basest, it's just laziness, cos he forgot AMD's exact words on the subject and can't be bothered to look them up... :p

Is Elcomsoft's software the fastest at password cracking?

That blog entry has redacted text at the request of NVidia.

Jawed
 
4870 has higher alu throughput than 285 (by ~2x) so the numbers make sense. In pure GPGPU number crunching the 4k/5k architecture has signficantly more resources.
This is true. However, it looks like HD5k has twice the performance of HD4k per flop, hence I figured it's not really raw alu throughput limited (not in HD4k at least), and whatever makes it faster on HD5k could also help GTX285, but that doesn't really seem to be the case... Not sure though what code it actually runs on these different chips...
 
This is true. However, it looks like HD5k has twice the performance of HD4k per flop, hence I figured it's not really raw alu throughput limited (not in HD4k at least), and whatever makes it faster on HD5k could also help GTX285, but that doesn't really seem to be the case... Not sure though what code it actually runs on these different chips...

I think were dealing with integers and not FP, and INT multiplying I believe is faster on Evergreen than RV770.
 
AFAIK, both evergreen and rv770 do int32 multiplication in the t unit, so no per core improvement in throughput.
The other units though gained 24-bit int mul capability. There's also things like SAD. I've no idea if that code uses any of that (or if that's even possible, in what language is it written?), or if the performance difference is due to differences outside the instruction set.
 
http://www.szgalaxy.com/Fermi/

GTX480 512 shaders
GTX470 448 shaders

Memory speed not specified for some reason.

Lists a bunch of events in China starting the 26 March, giving away GT240's and Fermi t-shirts, and "priority Fermi purchasing opportunities" if i have the translation correct.
 
Last edited by a moderator:
Back
Top