Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
The simplest scenario is in pixel shading, where 4 quads of fragments are grouped to form what Intel calls a qquad.Any clue how strands and fibers are actually represented?
Yep. If there's 4GB of memory then you have 4GB/64 bytes (64 bytes = 16 scalars packed) ~67M variables that you can store in memory. If you have 32 cores and 4 hardware threads per core, that's 0.5MB per thread of private variables per hardware-scheduled context.Can I take a single hardware thread and do anything I want
As far as I can tell the 4 hardware threads (contexts) run symmetrically, by default. This may be how Intel hides read-after-write latency in the register file.I'm also curious as to how hardware and software switching are going to work together.
Are you using a dies-on-wafer calculator that assumes square dies?Nay -- 675 is too large for 85 pieces.
He only held up two wafers during the keynote so it's either one or the other. The one on the bench behind him does look like Jasper Forest so its pretty sure in my book.
Is Jasper Forest the same as the polaris proof of concept thingie? The wafer in the background sure looks a lot like polaris.
"What you saw is the 'extreme' version, let me put it that way," said Otellini, adding that the GPU is in the debug stage now. "I would expect volume introduction of this product to be early next year."
I'd be very surprised if it was >32 cores.128 LRB cores![]()
What say you?
I don't but it has little use anyway, the size of the "real" CPU part of Larrabee cores should be insignificant compared to the vector ALU so making any conclusions about the number of cores based on it is pretty meaningless.BTW, does any one have any idea about the die size (and the process, of course) of the original pentiums on which this is supposedly based?
Did that definition of "core" include caches and (part of) ringbus?It stated 1/3 of the core.
So, a fibre corresponds with a qquad. "Fibre" is purely software-implemented multi-threading. "Strand" is then the number of elements that share a program counter.
In Larrabee it appears that the normal way fibres will be constructed is from 16 strands. There's no reason not to use more (and for double-precision the minimum would be only 8) but this would prolly be a tweak for performance.
The program that actually runs on Larrabee will be a loop, "for each qquad: shade".
This is similar to the discussion we've been having recently about making a kernel produce more than one result: logically making one invocation compute multiple work items.
So, a fibre corresponds with a qquad. "Fibre" is purely software-implemented multi-threading. "Strand" is then the number of elements that share a program counter.
That should make for some fun/nerve-wracking performance tuningHow you split up memory (and allocate some for shared storage, e.g. as textures usuable by all threads) is up to you, depending on how coarse- or fine-grained you want to make your work-items.
So let me get this straight...
A fibre is a piece of SIMD code, where you have 16 scalar (or 8 in the case of DP) strands operating in parallel, all running on a logical 'x86' core?
That sounds quite similar to what nVidia does.
What do you exactly mean by that?I'm also not clear on whether the developer is also responsible for predication within a strand group.
What do you exactly mean by that?
On the surface yes. But that's where the similarities end it seems. Nvidia assumes more responsibility for latency hiding than LRB does.