AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

  • Within 1 or 2 weeks

    Votes: 1 0.6%
  • Within a month

    Votes: 5 3.2%
  • Within couple months

    Votes: 28 18.1%
  • Very late this year

    Votes: 52 33.5%
  • Not until next year

    Votes: 69 44.5%

  • Total voters
    155
  • Poll closed .
ATI seems to have uploaded the OpenCL beta drivers again but without the 5900 driver ID's in the inf's. I'm guessing they weren't supposed to be letting us know about the 5900 series yet.

That's what marketing's all about, isn't it? And AMD has done a fantastic and highly efficient job wrt to marketing for HD5k.
 
To me superscalar architecture is just one type of ILP extraction. VLIW architectures (amd gpu's, Itanium) extracts this at compile time. Dynamic superscalar (pentium and all modern CPUs) extract this at run time.

That's what that whole tiff boiled down to. Whether the definition of superscalar implies dynamic scheduling/issuing of instructions by the hardware to the various available execution units. As far as I know, it does.
 
Yeah, that's the problem in a nutshell. The term "dynamic superscalar" clearly implies dynamic extraction of ILP. Once you remove the word "dynamic" things get fuzzy; you see papers talking about "static superscalar" meaning VLIW.

I even found one paper that distinguished between "static superscalar" and VLIW depending on whether instruction N could use the results of instruction N-1. By that definition "VLIW" needed the equivalent of a delay slot while "static superscalar" did not. Perversely enough, since the 6xx/7xx shaders can always access the results from the immediately preceding ALU instruction via the PS/PV registers by that definition the 6xx+ shaders are superscalar and *not* VLIW :D

http://courses.ece.ubc.ca/476/www200.../Lecture29.pdf

I looked at maybe 50 links; roughly 2/3 seem to say that VLIW is *not* a type of superscalar architecture and the rest said that it was.
 
That's what that whole tiff boiled down to. Whether the definition of superscalar implies dynamic scheduling/issuing of instructions by the hardware to the various available execution units. As far as I know, it does.

\appeal to authority

Hennesey Patterson, page 115,

they describe 3 kinds of superscalar processors

  • static, aka in order eg, ARM,
  • dynamic, aka ooo but no speculation, no examples
  • speculative, aka ooo with speculation, modern x86

so plain superscalar is an ambiguous term. When somebody uses just the word superscalar, I take it to mean static superscalar. Your definitions/conventions/tastes may vary...
 
I've not seen anyone turn up their nose at a design that can fetch, issue, and execute multiple instructions or the equivalent of multiple independent instructions at once, regardless of method, which is an implementation detail.

When I hear or read someone describe a core as being superscalar, I assume that the design can generally process more than one instruction at a time.
I say generally because designs typically are not set up to support full issue/decode/execution for every combination of instructions possible at their given width, and some are much more limited than others.


I am curious where people would put a design capable of fetching and issuing multiple instructions, with the caveat that the design eschews dependence checking by doing a scalar fetch from multiple threads.
 
I've not seen anyone turn up their nose at a design that can fetch, issue, and execute multiple instructions or the equivalent of multiple independent instructions at once, regardless of method, which is an implementation detail.

When I hear or read someone describe a core as being superscalar, I assume that the design can generally process more than one instruction at a time.
I say generally because designs typically are not set up to support full issue/decode/execution for every combination of instructions possible at their given width, and some are much more limited than others.

I am curious where people would put a design capable of fetching and issuing multiple instructions, with the caveat that the design eschews dependence checking by doing a scalar fetch from multiple threads.

That's not superscalar - the best example is Niagara 2 which is decidedly not superscalar.

Superscalar implies that you can (under most circumstances) fetch, issue, execute and retire multiple instructions in a single cycle from a single thread.

David
 
I meant within a core, and at that level Niagara is single-issue.

edit: or is it? That's how I remember it being presented.

edit edit: Sorry, I read that too fast, you said Niagara 2.
 
Think of the first Pentium - an in-order, superscalar core. The same applies to early UltraSPARCs, Alphas, or even IBM's POWER6 - superscalar, albeit in-order; not VLIW at the same time. But as for VLIW machines, ILP extraction relied on compile time instruction scheduling. Guess some of you would call this "static superscalar".
 
The operative question is whether anyone would be confused by just calling a superscalar core "superscalar".

If the chip exploits ILP (per the earlier clarification) by fetching and executing multiple instructions from a single thread at the same time, it is superscalar.
 
The operative question is whether anyone would be confused by just calling a superscalar core "superscalar".
After beating the "what makes a thread" topic to death, one can get really confused even when looking at relatively simple terms.
It just seems to me that plain "superscalar" is quite often mistakenly taken as "OoO superscalar" or - "dynamic superscalar".
 
After beating the "what makes a thread" topic to death, one can get really confused even when looking at relatively simple terms.
It just seems to me that plain "superscalar" is quite often mistakenly taken as "OoO superscalar" or - "dynamic superscalar".

OoO and dynamic are terms that are fully orthogonal to whether a design is superscalar.

The great "thread" debate centers on a weakening of language that I do not see a parallel for in the usage of superscalar.
That debate was a question over whether a given entity in a set implementation counted as a thread.

It has been accepted any scheme that extracts ILP by fetching, issuing, and executing multiple instructions per cycle is superscalar, and this has been an acceptable usage for designs that have been in-order, VLIW, EPIC, OoO for decades.
 
Mr Demers seems to disagree.

Also, by your own definition I don't see how VLIW qualifies. After all the hardware is only fetching and decoding a single instruction isn't it?
 
Also, by your own definition I don't see how VLIW qualifies. After all the hardware is only fetching and decoding a single instruction isn't it?
Some early VLIWs didn't even decode, the instruction word was the set of command signals that would have come out of a decoder, if it were present.

I'll leave the long-instruction word items off my list if they don't fit.
 
Yeah he makes the distinction here.



bridgman, do you work for AMD? I see you refer to them as "us" over at Phoronix.

to be fair the quote: "Eric: Actually, it's not really superscalar...more like VLIW", doesn't explicitly say otherwise,.. in particular the "it's not really" and "like VLIW".
 
Now that I've had a night to sleep on it and review that section of Patterson and Hennessy, I admit that my ealier VLIW/superscalar confusion was some kind of brain fart. The distinction has been made between the two methods of extracting parallelism from the instruction stream.
 
Back
Top