2 IA cores share the same mesh node/routing logic, see Intel press release material.
Sounds a bit like Bulldozer in that regard.
2 IA cores share the same mesh node/routing logic, see Intel press release material.
Snooping will always fuck things up ... software managed directories is the way to go, not hardware managed ones.Because in certain cases it is fantastically useful, think big shared data structures where reads/queries vastly outnumber updates.
Snooping will always fuck things up ... software managed directories is the way to go, not hardware managed ones.
Right, because a hw cache controller doesn't know (cannot know without sw help?) that a particular cacheline will be used mostly in read operations and writes to it will be relatively v. infrequent and hence put the same cacheline in many cores.
My understanding is that a cacheline in a cpu (since they are r/w there) can only be owned by one core. If many cores read the same data, then there will be a lot of traffic just to push stuff around.This is the scenario where caches shine. Frequent reads and infrequent writes.
The problem is when cache coherency is used for IPC. Producers writing to queues causing invalidate traffic, subsequent consumer requests then cause queries for the line. That's a lot of wasted traffic to just get a few bytes from core 0 to core 1.
The solution is either explicit message queues or virtual channels in the memory system. Both require more hardware support.
Cheers
This is the scenario where caches shine. Frequent reads and infrequent writes.
So anyone know how fast Larrabee was in the sparse matrix multiplication from that video?
Also, do you think Rattner was a bit embarrassed to be hyping 1TFLOPS achieved SGEMM when ASCI Red was DGEMM?
My understanding is that a cacheline in a cpu (since they are r/w there) can only be owned by one core. If many cores read the same data, then there will be a lot of traffic just to push stuff around.
Snooping will always fuck things up ... software managed directories is the way to go, not hardware managed ones.
Roll it into the page handling perhaps?
Really that sounds way too low to even be worth demonstratingIIRC it's something like 8GFLOPS.
Not exactly what I meant. I meant that you could maintain a per page subscriber list, it might trap if the list isn't cached in the hardware TLB at the moment but it need not trap on every write.Would you then trap on a write to a page?
Not exactly what I meant. I meant that you could maintain a per page subscriber list, it might trap if the list isn't cached in the hardware TLB at the moment but it need not trap on every write.
So I guess nVidia was right to call it a bunch of powerpoint slides Maybe they're going to focus on the new 48 core design they've been showing off lately?