Trinity vs Ivy Bridge

swaaye · Jul 2, 2011

Only Cayman (6950) is VLIW4. Trinity is indeed VLIW4 as well according to reports/rumors. Maybe one day we'll actually have some use for VLIW4 (GPGPU). But then AMD did just show us how they want to leave it behind too.

LordEC911 · Jul 2, 2011

swaaye said:
Only Cayman (6950) is VLIW4. Trinity is indeed VLIW4 as well according to reports/rumors. Maybe one day we'll actually have some use for VLIW4 (GPGPU). But then AMD did just show us how they want to leave it behind too.

I was talking about the supposed Freudian slip from Mr. Houston...

PCPer said:
7:55 Trinity has a "6850" kind of thing....interesting....
7:55 I think that slipped!
7:56 But then he stated Trinity would be "VLIW4" so Cayman-based... interesting.

http://www.pcper.com/news/Graphics-Cards/AMD-Fusion-Developer-Summit-2011-Live-Blog

Dave Baumann · Jul 2, 2011

That was a confusion. He thought that the 6800 was based on VLIW4. He meant to say that the architecture is based on the 6900 series.

LordEC911 · Jul 2, 2011

Dave Baumann said:
That was a confusion. He thought that the 6800 was based on VLIW4. He meant to say that the architecture is based on the 6900 series.

Awww... well that's no fun for silly season.
So back to 8-10SIMDs.

Kaotik · Jul 3, 2011

Dave Baumann said:
That was a confusion. He thought that the 6800 was based on VLIW4. He meant to say that the architecture is based on the 6900 series.

Since it's now "the past", was 6800-series meant to be VLIW4, but 32-40nm case forced it to be VLIW5?

mczak · Jul 3, 2011

ToTTenTranz said:
Desktop versions actually have decently-clocked DDR3 chips.

I've also heard that in some cases it's only a 16-bit bus, but I'm pretty sure the 780G in my Ferrari One is using a 32bit Sideport with 384MB. The access to UMA is blocked through the bios, though

Ok you made me curious and that's what I found: sideport on rs7xx chipsets is always 16bit, ddr2/ddr3 - looks like earlier boards tended to use ddr2-1066, later ones ddr3-1333 (but that't just a rough guideline). Still the bandwidth is pathetic in any case.
And I've got very, very serious doubts about your sideport size. 384MB might be the total allocated fb memory (including UMA + sideport). If you got ddr2 sideport then sideport size is likely 64 or 128MB (1 512mb or 1gb chip) if it's ddr3 it's most likely 128MB (1 1gb chip).

AnarchX · Jul 3, 2011

ToTTenTranz said:
Is it that much more?
Bloomfield (3-channel) has 200 more "pins" than Lynnfield (2-channel), and Lynnfield actually has 40M transistors more because of integrated PCI-Express and DMA.

mczak said:
It might not be that much more but it's still a budget cpu, after all. There is significantly more room for such things on the high end.

What speaks against two sockets: a budget one with 2x 64-bit and a high-end one with 3x 64-bit, which could be also used by 10 core BD Komodo.
The die could support 3x 64-bit, like the first K8 CPU supported 2x 64-bit and only 64-bit was used on S.754.

CarstenS · Jul 3, 2011

Kaotik said:
Since it's now "the past", was 6800-series meant to be VLIW4, but 32-40nm case forced it to be VLIW5?

I'd rather guess that 6900 was meant to be 6800 for quite some time...

Kaotik · Jul 3, 2011

CarstenS said:
I'd rather guess that 6900 was meant to be 6800 for quite some time...

Yeah, that's what I'm thinking too, as in, Barts wasn't supposed to be 6800.

Dave Baumann · Jul 3, 2011

Relatively speaking the brand is something that happens exceedingly latein a lifecycle. Engineers deal in codenames not brands.

Kaotik · Jul 3, 2011

Dave Baumann said:
Relatively speaking the brand is something that happens exceedingly latein a lifecycle. Engineers deal in codenames not brands.

And switching to these codenames instead of clear numbering codenames makes it difficult for us to follow on which was supposed to be what

Alexko · Jul 3, 2011

Kaotik said:
And switching to these codenames instead of clear numbering codenames makes it difficult for us to follow on which was supposed to be what

That's pretty much the point!

Well it's supposed to be difficult for NVIDIA, but it incidentally ends up being difficult for us as well.

Arun · Jul 4, 2011

It's not because it's the same VLIW4 shader core that it must necessarily be the same ALU-TEX ratio. Shaders with a higher ALU-TEX ratio require relatively less bandwidth per instruction and Llano is badly bandwidth limited already. If I had to guess, I'd go for 10 SIMDs (640 SPs) but with one Quad-TMU shared between two SIMDs resulting in the same number of TMUs as Llano (20 TMUs). With higher clocks you'd still have slightly higher TMU and ROP throughput but the die area saving should be worth it. I'd also expect a similar ALU ratio on the first GCN-based GPUs.

I could be wrong but I don't expect DDR3-2133 to ever be truly mainstream and it will be hard to find low-voltage DDR3-1866. DRAM price is still a significant part of the BOM so it'd be counter productive to force OEMs to pay even more for it.

mczak · Jul 4, 2011

Arun said:
It's not because it's the same VLIW4 shader core that it must necessarily be the same ALU-TEX ratio. Shaders with a higher ALU-TEX ratio require relatively less bandwidth per instruction and Llano is badly bandwidth limited already. If I had to guess, I'd go for 10 SIMDs (640 SPs) but with one Quad-TMU shared between two SIMDs resulting in the same number of TMUs as Llano (20 TMUs). With higher clocks you'd still have slightly higher TMU and ROP throughput but the die area saving should be worth it. I'd also expect a similar ALU ratio on the first GCN-based GPUs.

Not that it wouldn't make sense, but I just don't see any such changes when there's already a brand new architecture.
Plus, 10 simds but 5 quad-tmus probably isn't smaller than 8 simds with 8 quad-tmus anyway (and since it's bandwidth and even rop limited mostly anyway it probably doesn't really matter for performance either way though you're right in theory 10 simds (but half the tmus) might be a tiny bit faster.

I could be wrong but I don't expect DDR3-2133 to ever be truly mainstream and it will be hard to find low-voltage DDR3-1866. DRAM price is still a significant part of the BOM so it'd be counter productive to force OEMs to pay even more for it.

I don't know if ddr3-2133 will ever be mainstream, though if ddr4 is really only coming (barely) 2014 it could happen. But not for trinity timeframe.
Though unfortunately I bet OEMs will indeed save pennies with memory, just look at hd5570 / hd6570 or similar cards to get an idea, most of them not only don't use gddr5 (which probably indeed adds significant cost) but actually go for ddr3-667 instead of ddr3-800 (which is usually what the reference cards call for). I don't want how many pennies that saves (can count them on one hand?), but there is NO WAY the relative performance deficit is worth it... But I guess people buy that stuff... So you can only hope the OEMs hopefully at least will use ddr3-1600 for trinity...

hkultala · Jul 5, 2011

Arun said:
It's not because it's the same VLIW4 shader core that it must necessarily be the same ALU-TEX ratio. Shaders with a higher ALU-TEX ratio require relatively less bandwidth per instruction and Llano is badly bandwidth limited already. If I had to guess, I'd go for 10 SIMDs (640 SPs) but with one Quad-TMU shared between two SIMDs resulting in the same number of TMUs as Llano (20 TMUs).

You cannot just "share" those TMU's; There is one "4-way" TMU in every 16-way SIMT processor, and the internal busses and command structure etc won't allow "sharing it".

The biggest reason why R700-series was so much more efficient (performance/die size) than R600-series was putting the TMU's inside the shader processors.

In some low-end integrated models the SIMT processors are 8-way, so those have different "alu-tmu-ratio", the only way of reasonably "increasing" the "tmu-alu-ratio" would be a change to 32-way SIMT processors.
But that's not going to happen. It would mean increasing the wavefront size etc. which would result in many other changes.

hkultala · Jul 5, 2011

Arun said:
It's not because it's the same VLIW4 shader core that it must necessarily be the same ALU-TEX ratio. Shaders with a higher ALU-TEX ratio require relatively less bandwidth per instruction and Llano is badly bandwidth limited already.

TMU's don't require any bandwidth, it's the code that's using them.
If the code is bandwidth limited, having more TMU's won't make it run any slower.

And there are always those moments when a chip that's "usually" bandwidth limited is not a bandwidth limited.

And as you cannot separate those TMU's from the shader processors (without big change in architecture), it's much reasonable to just keep those extra TMU's even when they will be bandwidth-starved most of the time.

And.. I see no reason to go to 10 shader processors. That would be just overkill, and waste of die size.
Most of the time those just would not have anything reasonable to do because those would be waiting for textures(that would be coming slowly from memory) or waiting for pixels being drawn.

3dcgi · Jul 14, 2011

hkultala said:
The biggest reason why R700-series was so much more efficient (performance/die size) than R600-series was putting the TMU's inside the shader processors.

The 700 series was more efficient because multiple blocks were rewritten from scratch and others were heavily optimized.

CarstenS · Jul 14, 2011

hkultala said:
And as you cannot separate those TMU's from the shader processors (without big change in architecture), it's much reasonable to just keep those extra TMU's even when they will be bandwidth-starved most of the time.

HD 5450 was the last example I know of, where AMD had 80 ALU lanes coupled to 8 TMUs instead of four, also in RV730 they used the same 1:10 ratio. So, despite this going into the opposite direction,scaling of ALU-TEX ratio seems not completely absurd.

mczak · Jul 14, 2011

CarstenS said:
HD 5450 was the last example I know of, where AMD had 80 ALU lanes coupled to 8 TMUs instead of four, also in RV730 they used the same 1:10 ratio. So, despite this going into the opposite direction,scaling of ALU-TEX ratio seems not completely absurd.

But this changes the number of elements the chip is working on. The "normal" chips have simd width 16 and run an instruction for 4 clocks for granularity 64. Now granted you could probably increase that to 128 but I'm not sure it makes a lot of sense.

hkultala · Jul 14, 2011

CarstenS said:
HD 5450 was the last example I know of, where AMD had 80 ALU lanes coupled to 8 TMUs instead of four, also in RV730 they used the same 1:10 ratio. So, despite this going into the opposite direction,scaling of ALU-TEX ratio seems not completely absurd.

No, it had 8-way SIMT processor("40 ALU lanes") coupled into a 4-way TMU.

And it had two of these processors.

But changing the "alu-tmu ratio" into another direction would mean widening the SIMT width from 16-way to 32-way. And that would increase wavefront size from 64 to 128. And that would mean quite big changes on many things.

You can easily split the wavefront size into half, but not double the size of it.

Trinity vs Ivy Bridge

swaaye

Entirely Suboptimal

LordEC911

Dave Baumann

Gamerscore Wh...

LordEC911

Kaotik

Drunk Member

mczak

AnarchX

CarstenS

Moderator

Kaotik

Drunk Member

Dave Baumann

Gamerscore Wh...

Kaotik

Drunk Member

Alexko

Arun

Unknown.

mczak

hkultala

hkultala

3dcgi

CarstenS

Moderator

mczak

hkultala

Similar threads