Sandy Bridge

OpenGL guy · Jul 7, 2009

nAo said:
Who's being facetious? You made up a model in your head of how that thing should work and now you are complaining about the fact that it sucks!? Perhaps if you had a better idea in the first place you wouldn't need to come up with ridiculous arguments such as the ramdac having to trash L3 cache just to display an image.

Who said it's ridiculous? Do you know the HW doesn't work that way? Do you understand that RAMDACs have line buffers and that you could save area by using the L3 cache as a line buffer? The trouble with that is L3 pollution, but that doesn't mean it's not a possible trade-off, especially where you are concerned about GPU die size.

Someone mentioned allowing compressed Z data to be stored in the L3. Note that a 1600x1200 32-bit Z buffer is already nearly 8 MB, so, again, it's quite easy for your L3 to get trashed by common GPU functions. Also many games frequently have multiple Z buffers, plus DSTs are often much larger than 1600x1200.

So if you want to avoid polluting L3 because of the RAMDAC, then that also means the color buffer won't always be able to use L3 either since the RAMDAC will reference that memory next frame. Now if you conclude that Z data is going to pollute the L3 too much, what's left? Vertex and texture data? Most of that data is used once and thrown away, so not much use to have in L3. So where is the actual benefit to L3 for the GPU?

nAo · Jul 7, 2009

A suggestion for you: stop making up crappy models of how to use L3 for graphics.

Tahir2 · Jul 7, 2009

So, what is the benefit of L3 cache for graphics? Any less crappy suggestions anyone?

3dilettante · Jul 7, 2009

Faster command transfer from a core to the IGP?

rpg.314 · Jul 7, 2009

Yeah, graphics caches work very differently from CPU caches. IGP will likely be walled off from polluting/accessing L3 anyway. It will likely end up accelerating sending graphics data to igp. Textures, z, stencil, rendertarget, framebuffer will likely be streamed from memory without polluting caches.

Jawed · Jul 7, 2009

If the graphics portion is a tile-based deferred renderer, how much of a hit to L3 would that be?

Jawed

3dilettante · Jul 7, 2009

If the data is contiguous, maybe a good tile size would be one with a stride that matches with the associativity of the L3, which would keep tile work from blotting out more than one way of associativity.

Nehalem has 16 way, for example.

8MB/16 = .5MB, which might be a good stride for the total size of a given bin's set.

On the other hand, not knowing the porting of the L3, it might not be a good idea to lean on a last-level cache too heavily.

There are a number of dark areas in the IGP section that may correspond to SRAM, which might keep more data within the IGP.

OpenGL guy · Jul 7, 2009

Jawed said:
If the graphics portion is a tile-based deferred renderer, how much of a hit to L3 would that be?

I fail to see how tiling matters. If your depth buffer is larger than the tile size, as it can easily be, eventually it all goes through the cache. If your idea of tile-based deferred renderer is to keep everything out of the L3, i.e. on-GPU tile cache, then what's the point of linking the GPU to the L3 in the first place?

Jawed · Jul 7, 2009

http://ati.amd.com/developer/gdc/2008/gdc2008_ribble_maurice_TileBasedGpus.pdf

L3 would be a stream-through resource (non-temporal "allocated" cache lines), I guess, e.g. for texture data and resolves - and to re-use the on-die memory controllers, rather than having a dedicated set in parallel with the CPU's or some bypass.

Binning may operate in a hierarchical manner, with "L1" caching of "micro-bins" within the on-die GPU coupled with infrequent flushes to memory through L3 which holds "macro-bins". If binning is reasonably multi-threaded then presumably it can hide the latencies involved.

Render target tiles don't need to be huge. Larrabee is going for tiles <256KB in size (substantially less, in fact).

Dunno, I know too little about the real world numbers for a TBDR when used for PC games (e.g. D3D10) versus the mobile space alluded to in Ribble's presentation or the glory days of yesteryear.

Also, I'm not sure where the display hardware lives, i.e. reading the front buffer for display on monitor(s). If that lives on the mobo, then wouldn't that have a dedicated blob of (low performance) memory?

Intel has access to TBDR IP doesn't it?

Jawed

Kaotik · Jul 8, 2009

Jawed said:
Intel has access to TBDR IP doesn't it?

Jawed

I have no idea on how "free" the access is, but yes, they do at least use IMG's SGX in one chipset under GMA-monicker, and IIRC their "own" GMA-chips use some sort of TBDR-system too.

mczak · Jul 8, 2009

Kaotik said:
I have no idea on how "free" the access is, but yes, they do at least use IMG's SGX in one chipset under GMA-monicker, and IIRC their "own" GMA-chips use some sort of TBDR-system too.

intel's gma used to do something called ZBR (zone based rendering) which is similar to tile based. Wasn't deferred, though. Only the older gen gma do that, however, not the i965-based ones.

aaronspink · Jul 8, 2009

3dilettante said:
I'm not sure how loosely we'd have to use the term "competitive" for much of the time span x86 competed with RISCs.

how about the majority of the time that x86 has been in existence. As least beginning with the P6 generation x86 has been at the top or number 2 spot in performance of all designs available. That's 14 years now. Also for most of that time, its also had a much more constrained environment for power and system interfaces.

It could easily be argued that it was competitive during the pentium era.

Gubbi · Jul 8, 2009

aaronspink said:
It could easily be argued that it was competitive during the pentium era.

Especially if you qualify with price.

I waited and waited and waited for the PPC train to overtake x86. There where interesting machines from Be, and Apple cloners. When Apple killed the PPC clone market it doomed the PPC architecture, IMO.

Edit: And if you didn't need vast floating point performance x86 was competitive from 486 and forward.

Cheers

3dilettante · Jul 8, 2009

aaronspink said:
how about the majority of the time that x86 has been in existence. As least beginning with the P6 generation x86 has been at the top or number 2 spot in performance of all designs available. That's 14 years now. Also for most of that time, its also had a much more constrained environment for power and system interfaces.

That's slightly earlier than where I put it, but I wouldn't use the 14 year number, since most of the RISCs in question that x86 competed with were EOL'ed, either in totality or as performance designs more than half a decade ago.

The span between the PPro's introduction, and the end of Alpha was about 5-6 years.
MIPs ceded the performance market in that time period.
HP gave up on PA-RISC in favor for Itanium in the same period.
These events occurred for a variety of reasons not related to CPU capability, and I won't credit competitiveness to x86 for the years where its competitors were already dead.

Sun was pretty lame CPU-wise throughout, that one I could give to x86.
PPC would be the one that hung on the longest, but it was a niche player that had failings that stemmed in part from non-technical reasons.

The primary area I see where x86 did compete head-to-head against RISC incumbents before their manufacturers gave up would be the low-end server market.
There was also the workstation market, but x86 had help.

It could easily be argued that it was competitive during the pentium era.

I'd argue for pushing it forward a generation.

Gubbi · Jul 8, 2009

3dilettante said:
That's slightly earlier than where I put it, but I wouldn't use the 14 year number, since most of the RISCs in question that x86 competed with were EOL'ed, either in totality or as performance designs more than half a decade ago.

The span between the PPro's introduction, and the end of Alpha was about 5-6 years.

When the PPRO launched in 1995 it was the fastest CPU, as measured by SpecInt 95, in the world.

Cheers

3dilettante · Jul 8, 2009

Gubbi said:
When the PPRO launched in 1995 it was the fastest CPU, as measured by SpecInt 95, in the world.

Cheers

DEC and HP (maybe HP, not sure) had systems beating it by almost 50% a few quarters later.
That's why I put the inflection point later than the PPro introduction, after Intel really got a consistent lead, not just peaking at the trough of other manufacturers' release schedules.

Blazkowicz · Jul 8, 2009

When the first Athlon was introduced, you then had two vendors of really fast x86 CPU, the game was over if you didn't take care of dishonest Power PC G3 benches from Apple

.
Pentium III had replaced MIPS CPU in silicon graphics workstations.

so I set the point at one decade.

Gubbi · Jul 9, 2009

3dilettante said:
DEC and HP (maybe HP, not sure) had systems beating it by almost 50% a few quarters later.
That's why I put the inflection point later than the PPro introduction, after Intel really got a consistent lead, not just peaking at the trough of other manufacturers' release schedules.

I agree, also because the PPRO beat the 21164 with just 0.1%.

But I can see where Aaron is coming from (being an Intel man). The 200MHz PPRO made people go from "Workstations will always be RISC" to "Uhm, we're not so sure anymore".

Cheers

aaronspink · Jul 9, 2009

Gubbi said:
I agree, also because the PPRO beat the 21164 with just 0.1%.

But I can see where Aaron is coming from (being an Intel man). The 200MHz PPRO made people go from "Workstations will always be RISC" to "Uhm, we're not so sure anymore".

Cheers

I designed Alphas and StrongArm in the past...

But PPro really was the turning point. I remember working for CAEN(ran all the workstation and servers for the engineering school) at the university of Michigan. PPro was the point where even the ultra cheap deals we got out of the vendors for sun, hp, ibm, and dec workstations really weren't worth it anymore. We could get these Dell boxes even cheaper, slap linux on it and it ran everything just as well if not better.

A couple years later you were hard pressed to find too many tranditional unix workstations around. Only ones that were still around where there because of specific apps.

PPro put everyone on notice.

Voxilla · Jul 9, 2009

aaronspink said:
I designed Alphas and StrongArm in the past...

But PPro really was the turning point. I remember working for CAEN(ran all the workstation and servers for the engineering school) at the university of Michigan. PPro was the point where even the ultra cheap deals we got out of the vendors for sun, hp, ibm, and dec workstations really weren't worth it anymore. We could get these Dell boxes even cheaper, slap linux on it and it ran everything just as well if not better.

A couple years later you were hard pressed to find too many tranditional unix workstations around. Only ones that were still around where there because of specific apps.

PPro put everyone on notice.

The StrongArm processor was an amazing processor. A quake engine I wrote for it ran faster than anything x86 at the time, and it didn't even have floating point or division. Try doing perspective correct texture mapping with that.

RISC processors competing at the high end may be mostly gone, but now the reverse is happening at the low end.

Sandy Bridge

OpenGL guy

nAo

Nutella Nutellae

Tahir2

3dilettante

rpg.314

Jawed

3dilettante

OpenGL guy

Jawed

Kaotik

Drunk Member

mczak

aaronspink

Gubbi

3dilettante

Gubbi

3dilettante

Blazkowicz

Gubbi

aaronspink

Voxilla