Prototype Primitives GuideOverview
This .inl file provides a C++-implementation of the Larrabee new instructions. It allows developers to experiment with developing Larrabee code without a Larrabee compiler and without Larrabee hardware.
Prototype Primitives GuideOverview
This .inl file provides a C++-implementation of the Larrabee new instructions. It allows developers to experiment with developing Larrabee code without a Larrabee compiler and without Larrabee hardware.
Kind of surprising there is a fixed amount at all (very much unlike GPUs). In theory the task switch instructions might have a couple of bits to indicate the amount of registers used though.some good stuff there. but i am somewhat disappointed at only 32 vector registers. But perhaps Intel knows better. *shrugs*
Its 32*16*4 float register (exlcuding the FPU stack) per core which is quite a lot really...Kind of surprising there is a fixed amount at all (very much unlike GPUs). In theory the task switch instructions might have a couple of bits to indicate the amount of registers used though.
Its 32*16*4 float register (exlcuding the FPU stack) per core which is quite a lot really...
I was just about to ask how big are and at what latency do GPU caches workbut may be it doesn't need them because it has caches to hide latency so doesn't need those many execution contexts.
AFAIK GPUs have caches for textures, vertices and constants, but registers space is just that..a e big register file/memory, not a cache.I was just about to ask how big are and at what latency do GPU caches work
On fiber switch the entire register set needs to get swapped ...I know that but is spilling register contents to L1 cache so bad? I think on most x86 CPU's it gives additional latency of around 2-4 cycles or so, shouldn't be that bad I think.
Latency is bad only when you cannot hide it.I know that but is spilling register contents to L1 cache so bad? I think on most x86 CPU's it gives additional latency of around 2-4 cycles or so, shouldn't be that bad I think.
Hm, let me get this straight:
On current GPUs there is a huge register file where threads/fibers reserve a bunch of registers as they need. When a thread gets swapped those registers are not flushed unless some other thread needs a few and there are no free ones available.
On larrabee the entire 2k register file gets flushed to cache on every fiber change no matter what.
Are there really no rename registers on Larrabee that could be used for kind of a double (quad?) buffer for fiber changes?
I believe R7xx GPUs can actually spill registers to ext memory when you need too many of them. IIRC SM4 hardware has to support up to 4096 temporaries per shader instance!No. the number of threads launched on gpu's are determined by registers/thread. so there is no spill and shader will refuse to launch if you exceed the limits.