Nvidia Volta Speculation Thread

chris1515 · Oct 24, 2017

After it is better to attack the messenger than the message

If I follow her it is because she has a good interaction with many people working into the game industry and she is a dev herself and no other dev told it is some bullshit... I verify before posting...

EDIT: After it is in part some guesswork and maybe directly from herself not 100% correct for the details...

MDolenc · Oct 24, 2017

Why would any dev tell you it's bullshit? As I asked before: what's the news here? Or if you would prefer what looks bed for NV architecture and how is AMD better?

chris1515 · Oct 24, 2017

MDolenc said:
Why would any dev tell you it's bullshit? As I asked before: what's the news here? Or if you would prefer what looks bed for NV architecture and how is AMD better?

It is not new too but for example when she speak of AMD advantage on asynch compute it is not new and it is hot changing the fact than in most of the game Nvidia card are performing better and someimes much better than AMD counterpart...

Malo · Oct 24, 2017

What went from some possible interesting discussion points for a tech forum is leading towards biased opinion crap. Frustrating.

pharma · Oct 24, 2017

No one can become an Nvidia architecture expert in a few hours. All the more true when approached from a biased viewpoint ...

chris1515 · Oct 24, 2017

pharma said:
No one can become an Nvidia architecture expert in a few hours. All the more true when approached from a biased viewpoint ...

She is not a specialist but a dev working in game industry and it is what she heard by a tierce person if I understand well the first tweet she did...

yesterday i finally had the chance to get from the horse's mouth (i.e. not via mountains of extremely confused marketing) how nvidia's threading works

chris1515 · Oct 24, 2017

https://twitter.com/i/web/status/921801001111375872

pharma · Oct 24, 2017

Why would an Nvidia dev respond? It's obvious she is biased and her background is primarily working with AMD.

BRiT · Oct 24, 2017

Malo said:
What went from some possible interesting discussion points for a tech forum is leading towards biased opinion crap. Frustrating.

Its a horrible shame that the PC Forums community has the inability to self-moderate and be better posters. The response from company defenders is exactly what is wrong with the entire PC Forums community. They're doing nothing but creating a toxic environment. Too much worthless and silly infighting. Set aside your differences and focus on better interactions with other posters. This stupid fighting between Nvidia/Amd/Intel needs to stop.

Rootax · Oct 24, 2017

pharma said:
Why would an Nvidia dev respond? It's obvious she is biased and her background is primarily working with AMD.

Yeah but is she wrong about what she heard /said ? And saying "I prefer X over Y" doesn't make you anti-Y if Y do good thing. Yeah It seems she prefers how AMD chip works. Doesn't mean she can't see good thing on nvidia's side when things are good...

entity279 · Oct 24, 2017

pharma said:
Why would an Nvidia dev respond? It's obvious she is biased and her background is primarily working with AMD.

Why would anyone respond or care about this post? It is obvious it's biased and his/her background is primarily discrediting anyone that makes negative statements against nVidia

MDolenc · Oct 24, 2017

Ok, I'll ask again: what exactly is bad about NV architecture in that posting?

chris1515 · Oct 24, 2017

MDolenc said:
Ok, I'll ask again: what exactly is bad about NV architecture in that posting?

Nothing out of the bindless nature of the AMD GPU but it is not new and not suprising async compute work better on AMD...

manux · Oct 24, 2017

chris1515 said:
She is not a specialist but a dev working in game industry and it is what she heard by a tierce person if I understand well the first tweet she did...

Are you sure she is gamedev? My understanding is she is doing compiler stuff at apple.

chris1515 · Oct 24, 2017

manux said:
Are you sure she is gamedev? My understanding is she is doing compiler stuff at apple.

https://twitter.com/i/web/status/745458722676211712

She work on GPU compiler at Apple and is an Open source fan. And she has many interaction with game dev she follow and she is followed by many dev...

chris1515 · Oct 24, 2017

I began to follow her on twitter because twitter algorithm show she was followed by many GPU gamedev I follow...

pharma · Oct 24, 2017

Women in Tech – An Overview of Deep Learning

Hear from speakers Renee Yao, Product Marketing Manager of Deep Learning and Analytics at NVIDIA, Kari Briski, Director of Deep learning Software Product at NVIDIA, and Nazanin Zaker, Data Scientist at SAP, on different deep learning use cases, deep learning workflows, and its within the enterprise.

Volta Tensor Core accelerated training

The Training with Mixed-Precision User Guide introduces NVIDIA's latest architecture called Volta. This guide summarizes the ways that a framework can be fine-tuned to gain additional speedups by leveraging the Volta architectural features.

http://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html

gamervivek · Oct 24, 2017

MDolenc said:
Ok, I'll ask again: what exactly is bad about NV architecture in that posting?

It isn't bad but it might not be great,

the great irony here is that one of the main selling points: ~~~machine learning~~~ with the ~~~tensor units~~~: gains nothing from all this. it's the dumbest straight-line code you can imagine

absolutely none of this is meaningful for gaming performance in any modern engine sorry

Anarchist4000 · Oct 24, 2017

MDolenc said:
How so?

In regards to the deadlock, to my understanding, a branch would run ahead and create an infinite loop. On GCN, for example, the scheduler would be selecting waves based on some metric. An infinite loop should bias the selection rather quickly. So it may be slow, but it shouldn't lock. An implicit syncthreads(). Excluding a condition where nothing could advance, which really is a software issue. The scalar unit might be able to detect and resolve those issues as well during runtime. The parallel INT pipeline on Volta possibly doing the same. The per thread PC seems more of a software solution than hardware implementation, which goes back to the whole grouping debate.

Ext3h said:
Some detail is missing. It's hard to imagine as to why not even explicit lane repacking is a thing yet. Only thing I could possibly think of, is that each lane has a hard coded offset for the stack or in the register file which makes it impossible to just transfer execution to another lane without also copying a significant share of the state.

As MDolenc stated, the issue is going beyond the warp size and indirection becoming increasingly problematic for hardware. Grouping requires a significantly large pool to pull from for efficiency. Taking a threadgroup of 1024 and getting 32 threads doing the same thing. Communication between lanes would be slower as results would need written out to registers for efficiency. Simply put, indexing into a 1024 wide array becomes problematic, but not impossible. Wouldn't be surprised if that's the direction Volta's successor takes.

MDolenc · Oct 24, 2017

Anarchist4000 said:
In regards to the deadlock, to my understanding, a branch would run ahead and create an infinite loop. On GCN, for example, the scheduler would be selecting waves based on some metric. An infinite loop should bias the selection rather quickly. So it may be slow, but it shouldn't lock. An implicit syncthreads(). Excluding a condition where nothing could advance, which really is a software issue. The scalar unit might be able to detect and resolve those issues as well during runtime. The parallel INT pipeline on Volta possibly doing the same. The per thread PC seems more of a software solution than hardware implementation, which goes back to the whole grouping debate.

The deadlock isn't about the infinite loop specifically, though infinite loop example would also work to demonstrate this. Say that you have a divergent branch in a wave where one path will enter infinite loop waiting for _something_ and one lane in this same wave but in another path of the branch will do some calculations and set _something_. Sure you can execute different waves and even different kernels during execution on that same CU (nothing is stopping NV from doing this as well). But since none of the other waves will set _something_ it all really depends what happens in that one divergent wave.
If GPU went executing the path that will eventually set _something_ it will finish that path and then run this same wave through for the other branch with the infinite loop. Since _something_ was set the infinite loop will break.
If GPU went executing the path that enters infinite loop waiting for _something_ first then that wave will be stuck until GPU gets a reboot. Sure, GPU might continue on other waves and other kernels in the mean time. But currently GPUs can't do anything to unwind from this scenario.

Nvidia Volta Speculation Thread

chris1515

MDolenc

chris1515

Malo

Yak Mechanicum

pharma

chris1515

chris1515

pharma

BRiT

(>• •)>⌐■-■ (⌐■-■)

Rootax

entity279

MDolenc

chris1515

manux

chris1515

chris1515

pharma

gamervivek

Anarchist4000

MDolenc

Similar threads