NVIDIA Fermi: Architecture discussion

I know this. The latter part doesn't bother me, the former does. Maybe I'm just myopic but the concept of a compute layer that scales from mobile phones to petaflop clusters sounds like a one-size-fits-nobody solution.
 
PS. using Johnny come lately's definition over the established ones (AMD/NVIDIA agree on what threads are) makes little sense to me and will generally just cause confusions. Also I don't think Intel's chosen definitions make a lot of semantic sense to begin with.

Logic fails to understand your statement!

The Johnny come lately definitions are those of AMD/Nvidia. The words they are using have been in use for 20+ years and have defined definitions and wide understanding within the EE and CE communities.

So in essence, you need to rationalize your standpoint on the issue. Either its good to just make up random definitions for words with long established meanings which leads to zero semantic content and sense and Nvidia/AMD are doing a good job, OR its bad and Nvidia/AMD's marketing departments should be slapped for confusing the definitions for their benefit.
 
Logic fails to understand your statement!

I am more than sympathetic. But, to be fair, the "only" thing a standard thread has that an NVthread lacks is that a standard thread is the unit of dispatch. Otherwise, both types of threads contain state of execution (registers) and a PC (even if it isn't the one the warp is running/dispatching). At least, if I understand properly. I am assuming a lot. For example, if a single thread in a warp gains access to an atomic, I'm assuming that doesn't allow every thread in the warp access to it.

Perhaps someone else wants to spool up a compelling yarn for the use of thread? ;)
 
The Johnny come lately definitions are those of AMD/Nvidia. The words they are using have been in use for 20+ years and have defined definitions and wide understanding within the EE and CE communities.
In the context of GPUs (and this is still 3D architectures and chips forums) Intel is the latecomer to this party ... language is defined in context. Regardless strands&fibers are entirely new terms with no history, and as I said counter-intuitive definitions ... and their highly Larrabee specific use of the term threads is very much debatable.

What NVIDIA calls threads are threads even in the traditional sense. From the kernel program's point of view they execute independently, branch independently and share a memory space. Their scheduling works wildly differently than on traditional SMP machines, but meh. What Intel calls threads are also threads ... language is flexible. NVIDIA chose that in this context threads would only refer to the threads of execution of the kernel (and not the threads of the SIMD program, nor the different contexts of the SIMD programs the hardware can switch between with vertical multithreading) and Intel did it the other way ... in this context NVIDIA was first with the decision.
 
Last edited by a moderator:
In the context of GPUs (and this is still 3D architectures and chips forums) Intel is the latecomer to this party ... language is defined in context. Regardless strands&fibers are entirely new terms with no history, and as I said counter-intuitive definitions ... and their highly Larrabee specific use of the term threads is very much debatable.

And GPUs are the domain of the EE and CE fields which actually pre-date them! You seem to be under the misunderstanding that GPUs are something entirely new when in fact they are just another application of CompArch knowledge to a slightly different problem. Of the three, only Intel has so far used the proper terminology for thread. Instead both Nvidia and ATI are trying to redefine the concept of a thread to effectively mean a data-subset.

The truth is that neither Nvidia nor ATI support even a small amount of the number of "threads" they claim to support.

What NVIDIA calls threads are threads even in the traditional sense. From the kernel program's point of view they execute independently, branch independently and share a memory space.

They do not execute independently and they do not branch independently. They are merely running 16 datums in parallel and doing conditional updates of registers. If they were real threads then they could be running entirely different instructions streams which they do not.

One of the big hype items for G300 is the support of more than 1 thead active per chip! They call it a kernal, but they really mean thread.


Their scheduling works wildly differently than on traditional SMP machines, but meh.

Sure it does, traditional SMP machines have multiple schedulers, they had 1.

NVIDIA chose that in this context threads would only refer to the threads of execution of the kernel (and not the threads of the SIMD program, nor the different contexts of the SIMD programs the hardware can switch between with vertical multithreading) and Intel did it the other way ... in this context NVIDIA was first with the decision.

You realize that this also goes back to the whole we have infinite billions cores thing Nvidia tried when their marketing thought it would benefit them too, right?

Thread has a fairly well defined meaning. Nvidia isn't using that meaning. Nvidia is wrong.
 
One of the big hype items for G300 is the support of more than 1 thead active per chip! They call it a kernal, but they really mean thread.

Hmm, I thought warps executed and branched independently, otherwise what's the point? It's not clear what NV is calling a kernel to me. I see support for C++ method invocation, so we don't seem stuck within a single routine. What leads you to suspect kernel = thread rather than warp = thread and kernel = program?

MfA said:

If you ignore the execution half of the definition of a term integrally wrapped around the idea of execution, expect negative feedback :)
 
In the context of GPUs (and this is still 3D architectures and chips forums) Intel is the latecomer to this party ... language is defined in context.
I tend to agree with the Khronos stance on this point, which is that it's a bad idea to use terms that already have a defined meaning in another domain, particularly as the GPU domain tries to merge into the HPC domain. That's why they called them "work groups" and "work items" rather than warps and threads.

Regardless strands&fibers are entirely new terms with no history, and as I said counter-intuitive definitions ... and their highly Larrabee specific use of the term threads is very much debatable.
That's the point though - it's important to create a new term for a new concept rather than confusing it with a pre-existing term. (FWIW though, "fiber" is a pre-existing term in some OSes including Win32, and it's a similar concept to the current usage, albeit not always exactly the same.)

Furthermore I'm not sure how you can complain about the use of the term "threads" with respect to Larrabee... the definition or usage has not changed at all... it's exactly the same as it has always been, so I'm not sure why you think it is being used in some "Larrabee-specific" way.

What NVIDIA calls threads are threads even in the traditional sense. From the kernel program's point of view they execute independently, branch independently and share a memory space.
Not true, there are much more complicated rules with respect to "warps" that are semantically important.

Furthermore I'd argue that the programming semantics here are far less important than the execution semantics, which is typically how "threads" are defined in my experience. In that sense there are perfectly good SIMD and SPMD language and concepts that have already existed for a long time before GPUs that apply perfectly... there's no need to create new terminology just to *seem* different.

Don't be fooled - the renaming of terminology is pure marketing here and has nothing to do with ease of programmer understanding (who typically do just fine understanding the hardware concepts). It's just so they can say they run THIRTY THOUSANDS threads while "high end multi-socket systems" can only run 16. Yeah NVIDIA's really taking the high road in terms of helping programmer understanding ;)
 
Half of whom (at the brand-name level at least) shouldn't really be involved in the development of any API related to high-performance computing.

The embedded market could have a dedicated API (like OpenGL_ES <-> OpenGL) that's true. Eventually OpenGL_ES is much less a mess than OpenGL, which actually supports indiretly DemoCoders original point.
 
Charlie, hate on nVidia somewhere else/in another thread destined for that.
 
Thread has a fairly well defined meaning. Nvidia isn't using that meaning. Nvidia is wrong.
Except that Nvidia uses the correct meaning, and (despite what others on this thread are claiming) that meaning didn't come from the marketting department. No really, Nvidia knows how to architect chips. For real.
 
Furthermore I'm not sure how you can complain about the use of the term "threads" with respect to Larrabee... the definition or usage has not changed at all... it's exactly the same as it has always been
I didn't say it's changed ... but the fiber is also a thead in the classical sense (and what NVIDIA calls threads are threads too in the classical sense, the software sense where the term came from). Intel are calling hardware threads just threads and dropping the hardware bit ... that shorthand combined with the pre-existing use of the term in this context, and the fact that a fiber made of strands is prima facie ridiculous is just not conducive to proper understanding.

Just dumping everything and using the OpenCL terms without using shorthand for hardware threads in a way seemingly almost consciously designed to cause maximum confusion is good too.
 
I tend to agree with the Khronos stance on this point, which is that it's a bad idea to use terms that already have a defined meaning in another domain, particularly as the GPU domain tries to merge into the HPC domain. That's why they called them "work groups" and "work items" rather than warps and threads.
Yeah, OpenCL at least brings some sanity.

A work group is not equivalent to a warp though. A work group is a set of work items that can all share local memory. OpenCL doesn't have a concept like "warp".

Jawed
 
Except that Nvidia uses the correct meaning, and (despite what others on this thread are claiming) that meaning didn't come from the marketting department. No really, Nvidia knows how to architect chips. For real.

Then perhaps you'd like to enlighten us on HOW Nvidia's meaning is correct?
 
I didn't say it's changed ... but the fiber is also a thead in the classical sense (and what NVIDIA calls threads are threads too in the classical sense, the software sense where the term came from).
You and I have different definitions of a "thread in the classical sense". And most of the definitions that I can find online - wikipedia in particular - tend to agree with my definition, but I'm not willing to argue the point. If anything it shows that there's already a lot of confusion surrounding the terms.

Furthermore fibers are not full "threads" in a typical OS and that's the point - they are coroutines that are cooperatively scheduled by the user application, which is precisely the context in which they are being used for discussions on Larrabee. There's no redefinition going on that I've seen and I don't think you can make a real case for it... these usages are consistent with the previous usage of the terms in all major OSes and CPUs that I know of.

Intel are calling hardware threads just threads and dropping the hardware bit ... that shorthand combined with the pre-existing use of the term in this context, and the fact that a fiber made of strands is prima facie ridiculous is just not conducive to proper understanding.
I think Intel is being pretty clear about "hardware threads", but in the same way as with hyper-threading and other technologies. i.e. the threads are real, 100% OS-controlled, preempted, forkable, etc. POSIX "threads" that have real hardware resources dedicated to them. Nowhere do I see claims that these map 1:1 with "cores" (which is yet another awesome term being thrown around to mean "SIMD lane" in the GPU space, because it allows arbitrary inflation of marketing numbers).

Just dumping everything and using the OpenCL terms without using shorthand for hardware threads in a way seemingly almost consciously designed to cause maximum confusion is good too.
I don't see how it's confusing at all. It very clearly describes what you're telling the runtime semantically with work items and work groups, with the also-clear implication that these things map to different execution resources on different devices. It's extremely important to not confound the new concepts with existing ones that are *not the same thing*.

Bob said:
Except that Nvidia uses the correct meaning, and (despite what others on this thread are claiming) that meaning didn't come from the marketing department. No really, Nvidia knows how to architect chips. For real.
Their meaning is inconsistent with the meaning in the CPU and particularly HPC space that long predated them. Hence the confusion and questioning of why they would deliberately overload the term except to confuse people and inflate numbers.

Don't get me wrong, I do a lot of GPU computing and have a lot of respect for NVIDIA, but on this front I just can't cut them any slack. It was a bad call to name/rename concepts that already existed in other spaces as they did, and I'm glad to see Khronos taking the high road on this issue. Whether it was due to marketing, ignorance or it was simply misguided, I think it's pretty clear that it causes more confusion than necessary.
 
Last edited by a moderator:
There is no way you can define "thread" generally to exclude the claimed "NVIDIA Threads" but also include what is commonly referred to as "thread" on many other architectures.

NVIDIA engineers might know about HPC too.
 
There is no way you can define "thread" generally to exclude the claimed "NVIDIA Threads" but also include what is commonly referred to as "thread" on many other architectures.
What?? NVIDIA's definition of "threads" doesn't even meet the POSIX "definition" and they certainly don't agree with the majority of the wikipedia article on threads. Conversely, they do agree precisely with a predicated SIMD lane, or more generally the SPMD model. That has been well known for a long time and the more technical reviewers called out NVIDIA for introducing the nonsensical "SIMT" nomenclature when a perfectly valid term already existed. To quote AnandTech:

AnandTech said:
NVIDIA wanted us to push some ridiculous acronym for their SM's architecture: SIMT (single instruction multiple thread). First off, this is a confusing descriptor based on the normal understanding of instructions and threads. But more to the point, there already exists a programming model that nicely fits what NVIDIA and AMD are both actually doing in hardware: SPMD, or single program multiple data. This description is most often attached to distributed memory systems and large scale clusters, but it really is actually what is going on here.

Given a definition of thread broad enough to fit what NVIDIA calls a "thread", I might as well start calling the separate bits in each of the ALU's "threads" and multiply the marketing numbers by another 32x and talk about all the fancy atomic *single-cycle* coherent shared memory operations I can do across "threads" like ADD, MUL, etc. It just gets ridiculous if you expand the term to mean "any program written in a scalar fashion that may or may not be run concurrently, predicated, given dedicated hardware resources, in a SIMD lane, ... but really guys you write it like it was an independent thread that gets launched a million times..."

Uhhh... yeah.

NVIDIA engineers might know about HPC too.
I know a lot of them and they definitely do (I have nothing but respect for them!), but that's entirely besides the point. By the same token I could just say that "Khronos might know something about standardization of terminology", which is actually much more relevant...
 
*shrug*. What's a thread then? Explain what is missing from the NVIDIA architecture that would make threads be "predicated SIMD lanes".
 
Back
Top