Trinity vs Ivy Bridge

Only Cayman (6950) is VLIW4. Trinity is indeed VLIW4 as well according to reports/rumors. Maybe one day we'll actually have some use for VLIW4 (GPGPU). But then AMD did just show us how they want to leave it behind too.
 
Only Cayman (6950) is VLIW4. Trinity is indeed VLIW4 as well according to reports/rumors. Maybe one day we'll actually have some use for VLIW4 (GPGPU). But then AMD did just show us how they want to leave it behind too.

I was talking about the supposed Freudian slip from Mr. Houston...

PCPer said:
7:55 Trinity has a "6850" kind of thing....interesting....
7:55 I think that slipped!
7:56 But then he stated Trinity would be "VLIW4" so Cayman-based... interesting.
http://www.pcper.com/news/Graphics-Cards/AMD-Fusion-Developer-Summit-2011-Live-Blog
 
That was a confusion. He thought that the 6800 was based on VLIW4. He meant to say that the architecture is based on the 6900 series.
 
That was a confusion. He thought that the 6800 was based on VLIW4. He meant to say that the architecture is based on the 6900 series.

Since it's now "the past", was 6800-series meant to be VLIW4, but 32-40nm case forced it to be VLIW5?
 
Desktop versions actually have decently-clocked DDR3 chips.

I've also heard that in some cases it's only a 16-bit bus, but I'm pretty sure the 780G in my Ferrari One is using a 32bit Sideport with 384MB. The access to UMA is blocked through the bios, though :(
Ok you made me curious and that's what I found: sideport on rs7xx chipsets is always 16bit, ddr2/ddr3 - looks like earlier boards tended to use ddr2-1066, later ones ddr3-1333 (but that't just a rough guideline). Still the bandwidth is pathetic in any case.
And I've got very, very serious doubts about your sideport size. 384MB might be the total allocated fb memory (including UMA + sideport). If you got ddr2 sideport then sideport size is likely 64 or 128MB (1 512mb or 1gb chip) if it's ddr3 it's most likely 128MB (1 1gb chip).
 
Is it that much more?
Bloomfield (3-channel) has 200 more "pins" than Lynnfield (2-channel), and Lynnfield actually has 40M transistors more because of integrated PCI-Express and DMA.
It might not be that much more but it's still a budget cpu, after all. There is significantly more room for such things on the high end.

What speaks against two sockets: a budget one with 2x 64-bit and a high-end one with 3x 64-bit, which could be also used by 10 core BD Komodo.
The die could support 3x 64-bit, like the first K8 CPU supported 2x 64-bit and only 64-bit was used on S.754.
 
Relatively speaking the brand is something that happens exceedingly latein a lifecycle. Engineers deal in codenames not brands.
 
Relatively speaking the brand is something that happens exceedingly latein a lifecycle. Engineers deal in codenames not brands.

And switching to these codenames instead of clear numbering codenames makes it difficult for us to follow on which was supposed to be what :D
 
And switching to these codenames instead of clear numbering codenames makes it difficult for us to follow on which was supposed to be what :D

That's pretty much the point! ;)

Well it's supposed to be difficult for NVIDIA, but it incidentally ends up being difficult for us as well.
 
It's not because it's the same VLIW4 shader core that it must necessarily be the same ALU-TEX ratio. Shaders with a higher ALU-TEX ratio require relatively less bandwidth per instruction and Llano is badly bandwidth limited already. If I had to guess, I'd go for 10 SIMDs (640 SPs) but with one Quad-TMU shared between two SIMDs resulting in the same number of TMUs as Llano (20 TMUs). With higher clocks you'd still have slightly higher TMU and ROP throughput but the die area saving should be worth it. I'd also expect a similar ALU ratio on the first GCN-based GPUs.

I could be wrong but I don't expect DDR3-2133 to ever be truly mainstream and it will be hard to find low-voltage DDR3-1866. DRAM price is still a significant part of the BOM so it'd be counter productive to force OEMs to pay even more for it.
 
It's not because it's the same VLIW4 shader core that it must necessarily be the same ALU-TEX ratio. Shaders with a higher ALU-TEX ratio require relatively less bandwidth per instruction and Llano is badly bandwidth limited already. If I had to guess, I'd go for 10 SIMDs (640 SPs) but with one Quad-TMU shared between two SIMDs resulting in the same number of TMUs as Llano (20 TMUs). With higher clocks you'd still have slightly higher TMU and ROP throughput but the die area saving should be worth it. I'd also expect a similar ALU ratio on the first GCN-based GPUs.
Not that it wouldn't make sense, but I just don't see any such changes when there's already a brand new architecture.
Plus, 10 simds but 5 quad-tmus probably isn't smaller than 8 simds with 8 quad-tmus anyway (and since it's bandwidth and even rop limited mostly anyway it probably doesn't really matter for performance either way though you're right in theory 10 simds (but half the tmus) might be a tiny bit faster.

I could be wrong but I don't expect DDR3-2133 to ever be truly mainstream and it will be hard to find low-voltage DDR3-1866. DRAM price is still a significant part of the BOM so it'd be counter productive to force OEMs to pay even more for it.
I don't know if ddr3-2133 will ever be mainstream, though if ddr4 is really only coming (barely) 2014 it could happen. But not for trinity timeframe.
Though unfortunately I bet OEMs will indeed save pennies with memory, just look at hd5570 / hd6570 or similar cards to get an idea, most of them not only don't use gddr5 (which probably indeed adds significant cost) but actually go for ddr3-667 instead of ddr3-800 (which is usually what the reference cards call for). I don't want how many pennies that saves (can count them on one hand?), but there is NO WAY the relative performance deficit is worth it... But I guess people buy that stuff... So you can only hope the OEMs hopefully at least will use ddr3-1600 for trinity...
 
It's not because it's the same VLIW4 shader core that it must necessarily be the same ALU-TEX ratio. Shaders with a higher ALU-TEX ratio require relatively less bandwidth per instruction and Llano is badly bandwidth limited already. If I had to guess, I'd go for 10 SIMDs (640 SPs) but with one Quad-TMU shared between two SIMDs resulting in the same number of TMUs as Llano (20 TMUs).

You cannot just "share" those TMU's; There is one "4-way" TMU in every 16-way SIMT processor, and the internal busses and command structure etc won't allow "sharing it".

The biggest reason why R700-series was so much more efficient (performance/die size) than R600-series was putting the TMU's inside the shader processors.

In some low-end integrated models the SIMT processors are 8-way, so those have different "alu-tmu-ratio", the only way of reasonably "increasing" the "tmu-alu-ratio" would be a change to 32-way SIMT processors.
But that's not going to happen. It would mean increasing the wavefront size etc. which would result in many other changes.
 
Last edited by a moderator:
It's not because it's the same VLIW4 shader core that it must necessarily be the same ALU-TEX ratio. Shaders with a higher ALU-TEX ratio require relatively less bandwidth per instruction and Llano is badly bandwidth limited already.

TMU's don't require any bandwidth, it's the code that's using them.
If the code is bandwidth limited, having more TMU's won't make it run any slower.

And there are always those moments when a chip that's "usually" bandwidth limited is not a bandwidth limited.

And as you cannot separate those TMU's from the shader processors (without big change in architecture), it's much reasonable to just keep those extra TMU's even when they will be bandwidth-starved most of the time.

And.. I see no reason to go to 10 shader processors. That would be just overkill, and waste of die size.
Most of the time those just would not have anything reasonable to do because those would be waiting for textures(that would be coming slowly from memory) or waiting for pixels being drawn.
 
The biggest reason why R700-series was so much more efficient (performance/die size) than R600-series was putting the TMU's inside the shader processors.
The 700 series was more efficient because multiple blocks were rewritten from scratch and others were heavily optimized.
 
And as you cannot separate those TMU's from the shader processors (without big change in architecture), it's much reasonable to just keep those extra TMU's even when they will be bandwidth-starved most of the time.

HD 5450 was the last example I know of, where AMD had 80 ALU lanes coupled to 8 TMUs instead of four, also in RV730 they used the same 1:10 ratio. So, despite this going into the opposite direction,scaling of ALU-TEX ratio seems not completely absurd.
 
HD 5450 was the last example I know of, where AMD had 80 ALU lanes coupled to 8 TMUs instead of four, also in RV730 they used the same 1:10 ratio. So, despite this going into the opposite direction,scaling of ALU-TEX ratio seems not completely absurd.
But this changes the number of elements the chip is working on. The "normal" chips have simd width 16 and run an instruction for 4 clocks for granularity 64. Now granted you could probably increase that to 128 but I'm not sure it makes a lot of sense.
 
HD 5450 was the last example I know of, where AMD had 80 ALU lanes coupled to 8 TMUs instead of four, also in RV730 they used the same 1:10 ratio. So, despite this going into the opposite direction,scaling of ALU-TEX ratio seems not completely absurd.

No, it had 8-way SIMT processor("40 ALU lanes") coupled into a 4-way TMU.

And it had two of these processors.

But changing the "alu-tmu ratio" into another direction would mean widening the SIMT width from 16-way to 32-way. And that would increase wavefront size from 64 to 128. And that would mean quite big changes on many things.

You can easily split the wavefront size into half, but not double the size of it.
 
Back
Top