22 nm Larrabee

Look at the SRAM size he is promising. 256 MB / chip

Hmm, I missed that -- where did you see that?
I see 256GB NVRAM (whatever that means) with 1.6TB/s and 16TF, and I see 1024 L2s, but not seeing onboard sram sizes -- you must be looking at a different slide?
 
The Xeon Phi 5110P only comes in at 1.01 Peak DP TFlops and burns 225 watts on the 22nm process.

http://techreport.com/news/23884/intel-joins-the-data-parallel-computing-fraternity-with-xeon-phi

The Tesla K20X has 1.31 Peak DP TFlops and burns 235 watts on the 28nm process.

http://techreport.com/news/23882/nvidia-intros-tesla-k20-series-as-titan-snags-top500-lead

The Xeon Phi 5110P is 23% slower than the Tesla K20X yet burns about the same power and is on a newer process (22nm). It look pretty underwhelming to me.

and GK110 has tons of non compute stuff.. imagine how many more cores could nv have added instead.. not to mention a full node handicap
 
Hmm, I missed that -- where did you see that?
I see 256GB NVRAM (whatever that means) with 1.6TB/s and 16TF, and I see 1024 L2s, but not seeing onboard sram sizes -- you must be looking at a different slide?

That slide has been going around for a while. It says each L2 is 256K.
 
NVRAM can also be MRAM or whatever available (plain DRAM with battery or supercapacitor backup also qualifies, it's why you lose your BIOS settings if you remove the coin shaped battery on your motherboard. You've got NVRAM there)

But, I assumed these 256GB are not the NVRAM. Did you miss the multiple "DRAM cubes"? :). 256GB are on multiple stacks of DRAM on interposer, that's not too bad either.
 
NVRAM can also be MRAM or whatever available (plain DRAM with battery or supercapacitor backup also qualifies

Oh, non-volatile. Hah, I'll not embarrass myself and explain what I thought it stood for :>

But, I assumed these 256GB are not the NVRAM. Did you miss the multiple "DRAM cubes"? :). 256GB are on multiple stacks of DRAM on interposer, that's not too bad either.

Well, I'm running on memory here, but it seemed like that was on-chip, and I had trouble imagining 256GB on an interposer. Some serious potential bandwidth with 256GB onboard....
 
I now have trouble imagining 256GB on interposer too but it doesn't feel too impossible in 2020 for the highest end chip ever. They would kind of max out the tech.

I believe there was confusion with rpg.314 reading 256 MB and concluding it was SRAM, but I'm 100% sure 256 GB is written there.

for 256GB that could be eight times a pile of eight stacked memory dies, with 4GB i.e. 32Gbit per unit of memory (not too far of contemporary 4Gbit chips). Of course this doesn't really exist, nor a 10nm process. I hope 2020 is a far away enough date, these techs are maybe the far end of current realistic R&D.

PS : well it can be 128GB "D-RAM cubes" and 128GB NVRAM or something.
 
Last edited by a moderator:
Does anyone know anything about this (from many months ago)? Is this reliable in any way?

Intel MIC: 14nm Knights Landing to have both PCIe & socket versions (14-16 DP GFLOPS/Watt)
I would expect something like half of those numbers.
 
I have a doubt about Knights Corner specifications: I know that each core has a 512 bit vector processor, but is it a single unit capable of multiple operations at lower width (32 & 64 bit) in the same cycle or is it composed, like some table shows, by a 16-way 32 bit and an 8-way 64 bit vector unit? Wouldn't that make the processor actually 1024 bit wide?
 
I have a doubt about Knights Corner specifications: I know that each core has a 512 bit vector processor, but is it a single unit capable of multiple operations at lower width (32 & 64 bit) in the same cycle or is it composed, like some table shows, by a 16-way 32 bit and an 8-way 64 bit vector unit? Wouldn't that make the processor actually 1024 bit wide?
It's the former. It either processes 16 32bit float operations or 8 double operations. It basically works the same as the SSE or AVX units, it's just wider.
 
MiC's L1 cache is not programmable, the inter-thread commuications on MiC is pretty much like the case of CPU. Intel's developer's forum is near, I will definitely go there to verifty if my experience with MiC is merely an exception.
 
Last edited by a moderator:
Just returned from the IDC, according to the intel guys:

1)The LLC cache arrangement of Phi is not like these found in intel CPU, LLC(which is L2 for phi) of Xeon Phi is local to each core, so for each core there is only 512kB L2 cache, instead of the 31MB number Intel promoted, any data cached that need to be accessed, that not avilable at the local L2 cache, will need to be transfered to the local L2 before accessing.

For comparison, GK110 has 1.5MB of L2 cache, but it is global cache like Intel's LLC on ivy bridge/sandy bridge CPUs, so its data is accessable to all gpu cores.

2)At least according to the intel guys at IDC, Intel has no plan to introduce programmable L1 cache into their future generation MIC co-processors.

3)Xeon Phi's SIMD unit is more or less the same as Haswell's AVX-2, just wider.

4)Unlike HT in CPU, hardware multi-threading on MIC is estenial for MIC to achieve peak performance.

5)Intel's guys here are very open to promote MIC's programmability comparing to Nvidia's offers, but remain tight-lipped regarding the performance comparison between the two products.

6) The card is likely to be cheaper than K20/K20X, but it is not for retail, only provided with whole system solution, and some company at IDC manage to pack 4 of these cards in one case with dual socket CPUs.
 
Back
Top