NVIDIA Maxwell Speculation Thread

Tridam · Feb 26, 2014

mczak said:
Tridam's excellent gtx 750 ti review is finally ready: http://www.hardware.fr/articles/916-1/nvidia-geforce-gtx-750-ti-gtx-750-maxwell-fait-ses-debuts.html
Interestingly, the fillrate test there does not mirror the uber high bandwidth efficiency of the fillrate test of 3dmark (as seen by anandtech). (Though Tridam's conclusion there are wrong, as he wrongly assumed number of ROPs were doubled. Especially fp32 blending is definitely just very slow, completely ROP bound and not bandwidth limited.)

Thanks, I fixed that! Of course the GM107 has 16 ROP as well but benefits from the bigger rasterizer and 4-5 SMM able to deliver 16-20 4 bytes pixels per clock. And you're correct that FP32 throughput is not limited by memory bandwidth, I just double checked that.

I'm using my own test. It shows the best case when it comes to data compression opportunities.

mczak · Feb 26, 2014

Tridam said:
Thanks, I fixed that! Of course the GM107 has 16 ROP as well but benefits from the bigger rasterizer and 4-5 SMM able to deliver 16-20 4 bytes pixels per clock.

Ok. I still heavily disagree with the fp32 blend conclusion though

.
The bandwidth needed for 4xfp32 would be 4 times more than for 4xint8. Ok compression could change that but the result is way way lower. I believe Kepler/Maxwell can do fp32 blend only with 1/16 rate (but being able to use all resources for just one channel, hence 1/4 rate for single-channel fp32). This matches the actual numbers coming out of the test MUCH better than assuming it's bandwidth limited...

I'm using my own test. It shows the best case when it comes to data compression opportunities.

I'll assume no data locality though (the 3dmark result seems to indicate it could take advantage of large (ROP) cache to me).

Tridam · Feb 26, 2014

Looks like I answered to this second part at the same time as you were posting hehe

I agree that everything points to the FP32 blending throughput being limited to 1/16. It's actually what I wrote previously.

mczak · Feb 26, 2014

Tridam said:
Looks like I answered to this second part at the same time as you were posting hehe

I agree that everything points to the FP32 blending throughput being limited to 1/16. It's actually what I wrote previously.

Ok I agree then

.
GM107 definitely seems to make good use of its available resources. Now I realize the complexity (transistor count) is quite close to Bonaire (and performance is quite close too, though with 20% less memory bandwidth to boot), but all the raw numbers (peak fp rate, tmus, rasterization) are nearly identical to Cape Verde instead.

DSC · Feb 27, 2014

http://www.anandtech.com/bench/product/1037?vs=1130

Is the GTX 750 Ti suppose to outperform the GTX 770 in Luxmark 2.0? That's impressive for a little GPU.

P.S Nevermind, I just looked at pcgameshardware and GTX 770 gets about double the score with the latest drivers, still the GM107 is close to it.

DSC · Feb 27, 2014

http://www.tomshardware.com/reviews/geforce-gtx-750-ti-passive-cooling,3757.html

LiXiangyang · Feb 27, 2014

Looking at the folding@home benchmark (which employs an embrassingly parallel method) the computing efficiency of Maxwell has been improved quite significantly.

Faster than GF100 despite of roughly comparable gflops and half memory bandwidth:
http://www.anandtech.com/bench/product/1135?vs=1130

Roughly 1/2 of Titan's performance:
http://www.anandtech.com/bench/product/1060?vs=1130

3/4 the computing power of the AMD 290X here:
http://www.anandtech.com/bench/product/1056?vs=1130

Very efficient arch, would save people alot of time on optimization.

Deleted member 2197 · Mar 1, 2014

EVGA adds GeForce GTX 750 with 2GB and SC, Displayport connector

Bonus 2GB GDDR5 Memory on select EVGA GeForce GTX 750 cards.
NVIDIA G-SYNC Ready – the EVGA GeForce GTX 750 series have full support for NVIDIA G-SYNC Technology with included DisplayPort connector.
Copper Core Insert included on EVGA Superclocked range of 750 – lowers temperatures by 5 degrees Celsius

http://eu.evga.com/articles/00821/

DSC · Mar 1, 2014

http://blenderartists.org/forum/showthread.php?327909-Cycles-NVidia-MAXWELL-Benchmarks

homerdog · Mar 1, 2014

Could please post some sort of description or provide some context for that?

DSC · Mar 1, 2014

Blender users are testing and benchmarking GM107, early results, seems to be close to GTX 570 at a much lower power usage. Still more testing to be done.

CarstenS · Mar 3, 2014

So it seems, results from Luxmark and SLG with higher complexity models seem to be valid performance indicators if done properly with the same driver revision.

edit:

Gipsel said:
As Warps are not tied to SIMD blocks but to schedulers (also in Kepler) it doesn't change register allocation at all, especially as the register file size and the maximum number of Warps per SMX/M didn't change. The additional ALUs of Kepler could only lead to a (most of the time) marginally faster execution, that's all.

Ah yes, right. Register contention occurs within the schedulers domain. So it should even be a bit worse now, with the slightly higher lifetime of each Warp compared to Kepler ("The additional ALUs of Kepler could only lead to a (most of the time) marginally faster execution, that's all."), yes?

DSC · Mar 3, 2014

http://us.download.nvidia.com/XFree86/Linux-x86/334.21/README/supportedchips.html

GeForce GTX 750 Ti 0x1380 E
GeForce GTX 750 0x1381 E
GeForce GTX 745 0x1382 E

Nvidia has confirmed the Maxwell hardware decoder is Feature Set E, Kepler's decoder is D.

iMacmatician · Mar 3, 2014

What is the GTX 745? A desktop GM108?

AnarchX · Mar 3, 2014

Device-ID 138x is GM107. Maybe a DDR3 GTX 750 which allows vendor pleasant 4/8GiB versions.

Blazkowicz · Mar 3, 2014

Maybe an OEM card with lower TDP.
DDR3 card would really not deserve an 'X' at all

DSC · Mar 3, 2014

http://forums.laptopvideo2go.com/topic/30757-hp-mobile-driver-33233/

NVIDIA_DEV.1340.2280.103C = "NVIDIA GeForce 830M"
NVIDIA_DEV.1340.2281.103C = "NVIDIA GeForce 830M "
NVIDIA_DEV.1340.2282.103C = "NVIDIA GeForce 830M "
NVIDIA_DEV.1341.21A0.103C = "NVIDIA GeForce 840M"
NVIDIA_DEV.1341.21DB.103C = "NVIDIA GeForce 840M "
NVIDIA_DEV.1341.21DC.103C = "NVIDIA GeForce 840M "
NVIDIA_DEV.1341.2280.103C = "NVIDIA GeForce 840M "
NVIDIA_DEV.1341.2281.103C = "NVIDIA GeForce 840M "
NVIDIA_DEV.1341.2282.103C = "NVIDIA GeForce 840M "
NVIDIA_DEV.1341.228C.103C = "NVIDIA GeForce 840M "
NVIDIA_DEV.1341.228D.103C = "NVIDIA GeForce 840M "
NVIDIA_DEV.1341.228E.103C = "NVIDIA GeForce 840M "

I believe Device-ID 134x is GM108.

mczak · Mar 3, 2014

Blazkowicz said:
Maybe an OEM card with lower TDP.
DDR3 card would really not deserve an 'X' at all

Well it could be both. If it's ddr3 and they do it "right" at least (that is, on a 4 SMM card lower the core clock a bit because it won't matter one bit anyway) tdp would get trivially down to like 30-40W.
I fully agree it wouldn't deserve a GTX designation, but I'm sceptical that would stop them...

LiXiangyang · Mar 3, 2014

The problem with luxmark is, just like lots other benchmarks, it only support Open CL, and NVIDIA's Open CL support is very poor, so its not a very good performance indicator for nvidia products thus shouldnt be a benchmark for cross-platform comparsion.

Whilst Folding@home support both Open CL and CUDA routines, thats why I picked it as a performance indicator for cross-platform comparisions.

Kaotik · Mar 3, 2014

LiXiangyang said:
The problem with luxmark is, just like lots other benchmarks, it only support Open CL, and NVIDIA's Open CL support is very poor, so its not a very good performance indicator for nvidia products thus shouldnt be a benchmark for cross-platform comparsion.

Whilst Folding@home support both Open CL and CUDA routines, thats why I picked it as a performance indicator for cross-platform comparisions.

It all depends on whether you want to test "theoretical performance" or real-life performance - if NVIDIAs OpenCL support is bad, it's bad, and reviews should point it out rather than just picking only software where the card performs well to test with

NVIDIA Maxwell Speculation Thread

Tridam

mczak

Tridam

mczak

DSC

DSC

LiXiangyang

Deleted member 2197

Guest

DSC

homerdog

donator of the year

DSC

CarstenS

Moderator

DSC

iMacmatician

AnarchX

Blazkowicz

DSC

mczak

LiXiangyang

Kaotik

Drunk Member

Similar threads