NVIDIA Maxwell Speculation Thread

Keep in mind that CUDAMiner is reportedly not yet optimized for the new Maxwell GPU, and even then GM107 gives very competitive Litecoin mining perf. per watt.
 
I mine scrypt coins at 890KH/s per card on 2x R9 290X with system power consumption of 695W. Mind you this is my normal gaming PC so it has overclocked i5, 2xSSD, 2xHDD, BlueRay RW and lots of fans, card readers, USB stuff. Cards on their own are probably around 220-240W each which gives efficiency of around 3.7KH/s/W.

Cards with Hynix memory can mine at speeds of up to 990KH/s while power consumption stays the same as Elpida cards.

One more thing, has anyone measured nVidia cards using cGminer and OpenCL? Has the newest driver increased mining rate similar to how LuxRenderer speed up?
NV miners use exclusively the cudaMiner client for scrypt/jane-based currencies. OCL clients are simply waste of energy for the green team. CUDA allows for much finer optimizations, specific to its hardware platform that otherwise would be impossible with OCL and NV haven't bother with this API for the last couple of years anyway.
 
It's not that simple, ExtremeTechs results for example disagree ( http://www.extremetech.com/gaming/1...per-efficient-quiet-a-serious-threat-to-amd/3 )
LiteCoinEfficiency.png


The big problem with mining benchmarks, including per watt, is that there's huge differences even between "identical" cards, 10%+ is easily there, and apparently BIOS-versions make huge differences, too (20-30%+)

The PCPerspective figures are perf/TDP. They are irrelevant.
 
Are there review sites that isolate the GPU power rails? That is: PCIe extender to isolate PCIe power rail and additional power wires?
 
Nvidia didn’t breathe a word of Maxwell’s seriously improved hashing ability in their marketing copy or press briefings, but Tom’s Hardware discovered it, and I’ve been able to replicate their findings with multiple 750 Ti cards from both Nvidia and PNY.
What you’re looking at in the image above is a hashrate of about 242kh/s using Nvidia’s reference 750 Ti 1GB graphics card ($139). This is significant for several reasons. First, the 750 Ti is a 60Watt card and doesn’t even require a PCI-E power connector. You could plug this card into a cheap box from HP or Dell with a 300W power supply and have power to spare. Second, the temperature never seems to breach 65 degrees Celsius, and it runs considerably quieter and cooler than the AMD 260x ($119), which achieves a peak hashrate of 206kh/s and consumes nearly 130Watts of power.
“Hold on a minute!” I can hear you saying. “AMD’s 260x is $20 cheaper than Nvidia’s entry-level 750 Ti!” That’s true, but the nominal price difference quickly evaporates when you consider how the 750 Ti sips power, which matters in the long run. Additionally, Tom’s Hardware ran the same mining environment test with AMD’s upcoming Radeon 265 ($149) and achieved a peak hashrate of 252kh/s — and remember that the Radeon 265 is a 150Watt card.
For that same price of $149, here’s what I pulled off with PNY’s 750 Ti 2GB with a moderate (and stable) overclock:
PNY GTX 750 Ti overclocked and using Cudaminer to mine Dogecoin.
A single PNY 750 Ti 2GB graphics card, overclocked and using Cudaminer to mine Dogecoin.
That’s right, 284kh/s, and an even better temperature ceiling of about 56 degrees Celsius. This is consuming less than half the power of AMD’s Radeon 265.
This all leads to a conclusion that’s far from crazy: When Nvidia’s high-end Maxwell cards drop later this year (possibly by late March), they’re going to surpass the hashrates currently possible from AMD, consume less power, and do so while staying cooler and quieter.
http://www.guru3d.com/news_story/will_nvidia_steal_the_cryptocurrency_mining_crown_from_amd.html
 
Enough with the mining craze already, things like that only hurts the GPU manufacturer not help them. NVIDIA would be stupid to repeat AMD's mistake with the mining fiasco, IMO it cost them sigificant marketshare in a dire time through which they needed to establish Mantle is a significant game changer to capture some market/mind share from gamers. only to fall flat in that regard because AMD cards are nowhere to be found. the irony!

Mining is a soap bubble that will soon evaporate while having little to no effect on anything. companies would be wiser if they ignored it completely.
 
It's true that the tom's hardware latency graph doesn't explicitly say what HW structure accesses are going to (how would Sandra know?). But the working set size is indicated, and there's a distinct jump in latency at 12KB and 2MB, which matches the L1 and L2 sizes that have been talked about in architecture previews.

I do realize that latency is not as important for a GPU's L1 cache as it is for a CPU's. I was comparing the two because the (claimed) latency differences struck me as ridiculous in spite of that. Point taken regarding apples versus oranges though.

Of cause cache played a role here, but you cannot determine cache latency this way, especially if your purpose is compare them between archs.

It depends on cache algorthims, cache line and cache's address-ablilty as well as accessing pattern, etc.

nvidia's programming handbook indicate the latency of shared memory is in the range of a few GPU cycles (they actually say it is as fast as registers), and in Nvidia's previous archs, L1/texture cache should have similar latency like shared memory, I doubt Maxwell's arch will be significantly different than that.
 
The only vendor that is somewhat ok is EVGA with 1 DisplayPort 1.2 port. The rest of them are terrible with VGA connector in 2014. :rolleyes:

http://www.evga.com/articles/00821/#3751

http://www.evga.com/products/images/gallery/02G-P4-3751-KR_XL_4.jpg

No forward looking vendors at all with 3 DisplayPort 1.2 and 1 HDMI. Nvidia really needs to lay down the law and move away from DVI-I/D.

The trouble going with DP only is that active adapters ain't cheap, throwing a couple of them in the box isn't an option - that would be nearly as expensive as the card itself.
VGA is old monitors, low end monitors, projector, DL DVI is 120Hz/144Hz stuff and 2560 wide stuff. Hundreds millions users rely on either of those. People do use decade old monitors and keep the nice ones for long too.
That eVGA is not too bad unless you wanted to do triple 2560x1440 monitors (a couple barely fit on a desk already) or more than one giant 4K 60Hz monitor, well too bad, I'm sure there will be GM206/204 or next-gen Radeon cards for that (or use multiple cards).
 

I don't know where Hilbert pulled his R7 260/265 numbers from but they are way off ...

It's almost like he just ran cGminer with no extra parameters and called achieved numbers peak performance!

For those who don't know here is quick comparision table for all mining hardware:
https://litecoin.info/Mining_hardware_comparison

It applies to all scrypt coins, not only Litecoin and clearly shows Radeon HD7850 (slower R7 265) should get around 350KH/s (252KH/s in article) and for willing to tweak, upwards of 400KH/s is possible.

New Maxwell chip is very good for mining and can stand on it's own against AMD hardware so there is no need for any of these publications to diminish Radeon performance in mining.
 
The trouble going with DP only is that active adapters ain't cheap, throwing a couple of them in the box isn't an option - that would be nearly as expensive as the card itself.
VGA is old monitors, low end monitors, projector, DL DVI is 120Hz/144Hz stuff and 2560 wide stuff. Hundreds millions users rely on either of those. People do use decade old monitors and keep the nice ones for long too.
That eVGA is not too bad unless you wanted to do triple 2560x1440 monitors (a couple barely fit on a desk already) or more than one giant 4K 60Hz monitor, well too bad, I'm sure there will be GM206/204 or next-gen Radeon cards for that (or use multiple cards).

The DVI on EVGA is DVI-I, not DVI-D, you get VGA from it with normal passive adapter, which is included with the card
 
On a GPU, the stack variables are in registers.
L1 is for spilled registers and other data.
Unlike Kepler, Maxwell's L1 is not even for spilled registers or local arrays. Those also go to L2.

Instead L1 is just a local reordering buffer for both SHFL commands (shuffling, swizzling, and copying words within a warp) and memory reads (a 32 word line is pulled from L2, and different threads within the warp select the words they want within the line).

The Maxwell L1/texture cache can still be used as a read-only data cache. This is somewhat manual, by using the _ldg() operator to read the data or by declaring the data pointer with const _restrict_ decorators in the code. This is the same method introduced in Kepler sm_35 for using the texture cache for data reads.
 
Back
Top