AMD: R8xx Speculation

ECH · Oct 16, 2009

Silent_Buddha said:
His question was the answer. AMD hasn't made an annoucement. Therefore, it's up to you to determine whether you think the rumor is true or not.

All you're going to find out there are various rumors and speculations on rumors.

Regards,
SB

Had you properly understood my question instead of replying to his, referring it to my post(s), would you clearly see that he is only asking what I am asking. IE: Is there any truth to the rumors? This is why you don't answer a question with a question.

Therefore, if you don't know then you don't know. Keep in mind that such inquires are common in a speculation thread such as this.

trinibwoy · Oct 16, 2009

Does anyone know what the L1 and LDS bandwidth is on Cypress? The slide deck says 960 dwords/cycle but I'm assuming that's combined L1+LDS.

Also, is it still true that all operands must first be fetched from the LDS into the register file before being made available to the shader core? Just wondering how the core is fed such a huge number of operands per cycle.

neliz · Oct 16, 2009

ECH said:
Therefore, if you don't know then you don't know. Keep in mind that such inquires are common in a speculation thread such as this.

If you want numbers. .well.. it's 50-50 right now.

And regarding "FUD" it's actually anti-Fermi FUD, this card would be very close to projected Fermi performance while costing less

seahawk · Oct 16, 2009

As a pure Fermi counter it will be only coming if

a) Fermis is really faster then 5870
b) Fermi is not that much faster then 5870 that 5890 becomes pointless

and when ATI knows the final clockspeed of Fermi.

I personally think they will bring a 5980 using selected Chips when Fermi goes retail. .

Silent_Buddha · Oct 16, 2009

ECH said:
Had you properly understood my question instead of replying to his, referring it to my post(s), would you clearly see that he is only asking what I am asking. IE: Is there any truth to the rumors? This is why you don't answer a question with a question.
Therefore, if you don't know then you don't know. Keep in mind that such inquires are common in a speculation thread such as this.

Well in that case the short answer would be.

Anyone that knows whether that is true or not wouldn't be able to say it's true if it is true.

Everyone else doesn't know if it's true or not.

But they may have or claim to have a source that knows.

In which case, we're back to square one, whether you believe the rumor is true or not.

Regards,
SB

AlexV · Oct 16, 2009

trinibwoy said:
Does anyone know what the L1 and LDS bandwidth is on Cypress? The slide deck says 960 dwords/cycle but I'm assuming that's combined L1+LDS.

And for the chip overall. But if they've given you that, they've also given you LDS fetch rate, because you know how many 32-bit fetches (dword) can be done from L1, so the difference is LDS. Unless that slide is hyper creative.

Jawed · Oct 16, 2009

Which would seem to suggest that LDS fetch rate is 640 dwords per clock, which is twice L1 fetch rate - which is known to be 80 x 4 = 320 dwords per clock.

Which would also seem to suggest that LDS in R800 is two 16KB blocks accessed in parallel, since this rate is double the per-16KB-LDS rate of R700.

Jawed

3dilettante · Oct 16, 2009

Was the bandwidth for L1 and LDS additive in RV770?

CarstenS · Oct 16, 2009

Do writes not also add to L1+LDS-Bandwidth?

Jawed · Oct 16, 2009

3dilettante said:
Was the bandwidth for L1 and LDS additive in RV770?

Ooh, good point. Forgot about that. Erm, I doubt it.

I've always assumed they're not additive. But I suppose it's conceivable that they are.

LDS in R700 assembly appears as a TEX clause. I've taken this to imply that texture fetches and LDS fetches (or stores) mutually-exclude each other.

Hmm...

Jawed

Jawed · Oct 16, 2009

CarstenS said:
Do writes not also add to L1+LDS-Bandwidth?

Perhaps - there's the concept of data exchange between threads that can run within a single LDS clause. So a LDS read and an LDS write could be going simultaneously.

Jawed

3dilettante · Oct 16, 2009

I haven't turned up something stating that TEX and LDS accesses can run simultaneously, though nothing stating they can't.

If they can in RV770, and AMD obfuscates its bandwidth math a little by counting write traffic to the LDS in the total, Cypress would have double bandwidth of its predecessor by virtue of doubled SIMD count only.

Jawed · Oct 16, 2009

That sounds like the simplest answer.

Jawed

Spyhawk · Oct 16, 2009

Ive been reading several reviews that mention that ATI gpus are Vliw Superscalar ? Is this correct even just a bit or not at all and just marketing mumbojumbo

bridgman · Oct 16, 2009

There's a good description in the B3D Cypress article :

http://www.beyond3d.com/content/reviews/53/5

... and more detail in section 4 of the R700 Instruction Set Architecture doc at amd.com :

http://developer.amd.com/gpu_assets/R700-Family_Instruction_Set_Architecture.pdf

The shader core organizes the ALUs in sets of 5, and each ALU shader instruction (called an instruction group) includes up to 5 different opcodes (operation + inputs/outputs), one for each ALU. So... superscalar via very long instruction words.

Mat3 · Oct 17, 2009

Spyhawk said:
Ive been reading several reviews that mention that ATI gpus are Vliw Superscalar ? Is this correct even just a bit or not at all and just marketing mumbojumbo

I'm guessing you've been reading the Hardforums board lately?

I say the "superscalar" part is mostly marketing.

bridgman · Oct 17, 2009

I'm not sure I understand why that thread got so hot. [EDIT - now I do; it was like that before the superscalar discussion started

].

I don't think there is any debate about how the ATI chips work, just a lack of agreement on the definition of "superscalar". Some definitions of superscalar include VLIW while others do not.

I found references to "static superscalar" implementations (aka VLIW) where opportunities for superscalar execution are determined at compile time, and "dynamic superscalar" implementations where the hardware analyzes dependencies and identifies opportunities for superscalar execution at run time.

hoom · Oct 17, 2009

Could it be argued that the splitting of the load between SIMDs is Superscalar?
Or is that way off base?

rpg.314 · Oct 17, 2009

To me superscalar architecture is just one type of ILP extraction. VLIW architectures (amd gpu's, Itanium) extracts this at compile time. Dynamic superscalar (pentium and all modern CPUs) extract this at run time.

Broken Hope · Oct 17, 2009

ATI seems to have uploaded the OpenCL beta drivers again but without the 5900 driver ID's in the inf's. I'm guessing they weren't supposed to be letting us know about the 5900 series yet.

AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

Within 1 or 2 weeks

Within a month

Within couple months

Very late this year

Not until next year

ECH

trinibwoy

Meh

neliz

GIGABYTE Man

seahawk

Silent_Buddha

AlexV

Heteroscedasticitate

Jawed

3dilettante

CarstenS

Moderator

Jawed

Jawed

3dilettante

Jawed

Spyhawk

bridgman

Mat3

bridgman

hoom

rpg.314

Broken Hope

Similar threads