Yeah I think they will initially continue with GM200 even when they go into production with Tesla GP102 as the performance gap should be large enough for one to be a competitive priced Tesla and the other more about performance at a price.Note that NVIDIA just updated Quadro M6000 and Tesla M40 to 24GB barely two months ago. They may be winding down GeForce production, but NVIDIA is going to be minting GM200 for some time to come.
GM200 is on the roadmap (as M40) for a long time still (far into 2017). I have no inside information into Nvidia's production schedules, but at the moment it is much easier to buy a M40 than a P100, and I expect that to be true at least until Q1 2017.Not so sure about that. Rather looks like they might just use those to manage their inventory - and at the same time make a premium. While I obviously cannot be sure, I would guess no new GM200/204 out of TSMC any more.
I thought the conclusion from discussion/hypotheticals was that FP16 was reduced on 1080, but full with Tesla P100; what was never concluded was how this is done with 1080.And whatever is supported natively on GTX 1080: AotS is a DirectX title and as such has to make do with what's exposed there. And FP16 isn't on Pascal. As well as fine-grained preemption, I might add.
I am only stating what's exposed by the driver in DirectX (see DX Caps Viewer), not making any statements about the hardware.
https://www.reddit.com/r/pcgaming/c...ng_the_aots_image_quality_controversy/d3t9ml4We (Stardock/Oxide) are looking into whether someone has reduced the precision on the new FP16 pipe.
Both AMD and NV have access to the Ashes source base. Once we obtain the new cards, we can evaluate whether someone is trying to modify the game behavior (i.e. reduce visual quality) to get a higher score.
In the meantime, taking side by side screenshots and videos will help ensure GPU developers are dissuaded from trying to boost numbers at the cost of visuals.
OK here is another being a pain question ; if fp16 calculations currently has no benefit on existing hardware, then can someone ask Oxide why they went ahead and used it
Yeah that is a good point, wonder if that is the sole reason they did this or their scope was more.Even though you don't have double-speed fp16 alus, 16 bit operands will ease register pressure and maybe bandwidth. At least GCN takes advantage of this, and I would assume it's the same for maxwell.
Well it would be amusing if they went this route solely looking at GCN3I'm sure, Oxide did their very best to optimize that and keep losses in check.
Just to add here: this is CUDA and requires manual conversions. It's not exposed to DX. But then again CarstenS says Pascal doesn't expose min precision either.I've been using FP16 on Maxwell to reduce memory allocations and off-chip bandwidth. It works pretty well, but the conversion instructions run quite a bit slower than FP32 ops. Makes it hard to use to reduce register pressure.
I may not understand you fully, but are you talking about 32-bit operations on 16-bit operands producing differences in results outside what would be expected from that change of precision?There is general advantage of using fp16 data (memory size, bandwidth,...), but this has been around since floating point was first introduced to GPU pipeline.
Of current architectures only GCN3 can reduce register pressure by using fp16 (that is from min precision hints in HLSL). This presumably applies to GCN4 (Polaris) and Pascal as well.
Now it shouldn't need pointing out but since I'm starting to have some serious doubts regarding all this I'll do it any way: taking 3 fp16 registers and doing an actual fp16 multiply add on them can produce significantly different results then taking 3 fp16 registers and actually doing fp32 multiply add on them.
P.S.: Forgot about Tegra X1, that's current too.
Correct. I was talking about DX, since AotS is a DX game and it has to use what the API exposes here. In Cuda, things are different (also from OpenCL and Open GL, where FP16 doesn't seem to be exposed currently either).Just to add here: this is CUDA and requires manual conversions. It's not exposed to DX. But then again CarstenS says Pascal doesn't expose min precision either.
http://www.overclock.net/t/1601922/anybody-having-problems-with-gtx-1080I got it figured out. You guys were both on the right track. It was a refresh rate problem. I ended up leaving Gsync on and also enabled Fastsync. Everything seems to be working perfect now. It really odd though because I never had this happen at all with the same setting on my 980. Something must be a little different in the driver. Either way thanks to both of you for the suggestions. I'm just glad the card doesn't have issues. +rep