AMD: R8xx Speculation

fellix · Oct 17, 2009

Meh, it's too late. The genie is out of the [strike]whoopass can[/strike] bottle. :smile:

CarstenS · Oct 17, 2009

Broken Hope said:
ATI seems to have uploaded the OpenCL beta drivers again but without the 5900 driver ID's in the inf's. I'm guessing they weren't supposed to be letting us know about the 5900 series yet.

That's what marketing's all about, isn't it? And AMD has done a fantastic and highly efficient job wrt to marketing for HD5k.

trinibwoy · Oct 18, 2009

rpg.314 said:
To me superscalar architecture is just one type of ILP extraction. VLIW architectures (amd gpu's, Itanium) extracts this at compile time. Dynamic superscalar (pentium and all modern CPUs) extract this at run time.

That's what that whole tiff boiled down to. Whether the definition of superscalar implies dynamic scheduling/issuing of instructions by the hardware to the various available execution units. As far as I know, it does.

bridgman · Oct 18, 2009

Yeah, that's the problem in a nutshell. The term "dynamic superscalar" clearly implies dynamic extraction of ILP. Once you remove the word "dynamic" things get fuzzy; you see papers talking about "static superscalar" meaning VLIW.

I even found one paper that distinguished between "static superscalar" and VLIW depending on whether instruction N could use the results of instruction N-1. By that definition "VLIW" needed the equivalent of a delay slot while "static superscalar" did not. Perversely enough, since the 6xx/7xx shaders can always access the results from the immediately preceding ALU instruction via the PS/PV registers by that definition the 6xx+ shaders are superscalar and *not* VLIW

http://courses.ece.ubc.ca/476/www200.../Lecture29.pdf

I looked at maybe 50 links; roughly 2/3 seem to say that VLIW is *not* a type of superscalar architecture and the rest said that it was.

rpg.314 · Oct 18, 2009

trinibwoy said:
That's what that whole tiff boiled down to. Whether the definition of superscalar implies dynamic scheduling/issuing of instructions by the hardware to the various available execution units. As far as I know, it does.

\appeal to authority

Hennesey Patterson, page 115,

they describe 3 kinds of superscalar processors

static, aka in order eg, ARM,
dynamic, aka ooo but no speculation, no examples
speculative, aka ooo with speculation, modern x86

so plain superscalar is an ambiguous term. When somebody uses just the word superscalar, I take it to mean static superscalar. Your definitions/conventions/tastes may vary...

3dilettante · Oct 19, 2009

I've not seen anyone turn up their nose at a design that can fetch, issue, and execute multiple instructions or the equivalent of multiple independent instructions at once, regardless of method, which is an implementation detail.

When I hear or read someone describe a core as being superscalar, I assume that the design can generally process more than one instruction at a time.
I say generally because designs typically are not set up to support full issue/decode/execution for every combination of instructions possible at their given width, and some are much more limited than others.

I am curious where people would put a design capable of fetching and issuing multiple instructions, with the caveat that the design eschews dependence checking by doing a scalar fetch from multiple threads.

dkanter · Oct 19, 2009

3dilettante said:
I've not seen anyone turn up their nose at a design that can fetch, issue, and execute multiple instructions or the equivalent of multiple independent instructions at once, regardless of method, which is an implementation detail.

When I hear or read someone describe a core as being superscalar, I assume that the design can generally process more than one instruction at a time.
I say generally because designs typically are not set up to support full issue/decode/execution for every combination of instructions possible at their given width, and some are much more limited than others.

I am curious where people would put a design capable of fetching and issuing multiple instructions, with the caveat that the design eschews dependence checking by doing a scalar fetch from multiple threads.

That's not superscalar - the best example is Niagara 2 which is decidedly not superscalar.

Superscalar implies that you can (under most circumstances) fetch, issue, execute and retire multiple instructions in a single cycle from a single thread.

David

3dilettante · Oct 19, 2009

I meant within a core, and at that level Niagara is single-issue.

edit: or is it? That's how I remember it being presented.

edit edit: Sorry, I read that too fast, you said Niagara 2.

kresek · Oct 19, 2009

Think of the first Pentium - an in-order, superscalar core. The same applies to early UltraSPARCs, Alphas, or even IBM's POWER6 - superscalar, albeit in-order; not VLIW at the same time. But as for VLIW machines, ILP extraction relied on compile time instruction scheduling. Guess some of you would call this "static superscalar".

3dilettante · Oct 19, 2009

The operative question is whether anyone would be confused by just calling a superscalar core "superscalar".

If the chip exploits ILP (per the earlier clarification) by fetching and executing multiple instructions from a single thread at the same time, it is superscalar.

kresek · Oct 19, 2009

3dilettante said:
The operative question is whether anyone would be confused by just calling a superscalar core "superscalar".

After beating the "what makes a thread" topic to death, one can get really confused even when looking at relatively simple terms.
It just seems to me that plain "superscalar" is quite often mistakenly taken as "OoO superscalar" or - "dynamic superscalar".

3dilettante · Oct 19, 2009

kresek said:
After beating the "what makes a thread" topic to death, one can get really confused even when looking at relatively simple terms.
It just seems to me that plain "superscalar" is quite often mistakenly taken as "OoO superscalar" or - "dynamic superscalar".

OoO and dynamic are terms that are fully orthogonal to whether a design is superscalar.

The great "thread" debate centers on a weakening of language that I do not see a parallel for in the usage of superscalar.
That debate was a question over whether a given entity in a set implementation counted as a thread.

It has been accepted any scheme that extracts ILP by fetching, issuing, and executing multiple instructions per cycle is superscalar, and this has been an acceptable usage for designs that have been in-order, VLIW, EPIC, OoO for decades.

trinibwoy · Oct 19, 2009

Mr Demers seems to disagree.

Also, by your own definition I don't see how VLIW qualifies. After all the hardware is only fetching and decoding a single instruction isn't it?

3dilettante · Oct 19, 2009

Disagree on what? That AMD units are VLIW superscalar?

3dilettante · Oct 19, 2009

trinibwoy said:
Also, by your own definition I don't see how VLIW qualifies. After all the hardware is only fetching and decoding a single instruction isn't it?

Some early VLIWs didn't even decode, the instruction word was the set of command signals that would have come out of a decoder, if it were present.

I'll leave the long-instruction word items off my list if they don't fit.

trinibwoy · Oct 20, 2009

3dilettante said:
Disagree on what? That AMD units are VLIW superscalar?

Yeah he makes the distinction here.

Eric: Actually, it's not really superscalar...more like VLIW

bridgman, do you work for AMD? I see you refer to them as "us" over at Phoronix.

FrameBuffer · Oct 20, 2009

trinibwoy said:
Yeah he makes the distinction here.

bridgman, do you work for AMD? I see you refer to them as "us" over at Phoronix.

to be fair the quote: "Eric: Actually, it's not really superscalar...more like VLIW", doesn't explicitly say otherwise,.. in particular the "it's not really" and "like VLIW".

bridgman · Oct 20, 2009

trinibwoy said:
bridgman, do you work for AMD? I see you refer to them as "us" over at Phoronix.

Yes I do. You might know me as "interview #43"

http://www.beyond3d.com/content/interviews/43

3dilettante · Oct 20, 2009

Now that I've had a night to sleep on it and review that section of Patterson and Hennessy, I admit that my ealier VLIW/superscalar confusion was some kind of brain fart. The distinction has been made between the two methods of extracting parallelism from the instruction stream.

trinibwoy · Oct 20, 2009

bridgman said:
Yes I do. You might know me as "interview #43"

http://www.beyond3d.com/content/interviews/43

Ooooh nice, bridgman is your real name...

AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

Within 1 or 2 weeks

Within a month

Within couple months

Very late this year

Not until next year

fellix

CarstenS

Moderator

trinibwoy

Meh

bridgman

rpg.314

3dilettante

dkanter

3dilettante

kresek

3dilettante

kresek

3dilettante

trinibwoy

Meh

3dilettante

3dilettante

trinibwoy

Meh

FrameBuffer

bridgman

3dilettante

trinibwoy

Meh

Similar threads