Indeed after AMD presentation of the Streamrollers cores, some here hinted at mild improvements vs Ivy Bridge, I may not get everything but the thing looks like a monster.Good Lord!
Haswell is indeed taking full advantage of the 22nm tech.
Yeah, that's really good.gpu in compute operation has 0.5TB/s of bandwidth the last level of cache the thing could literally fly.
Are Intel really telling us all this stuff eight months before launch? That seems a bit bonkers, or is there the posibility that they might surprise people with an early launch?
It's not like Intel are up against anything substantial from AMD.
Hopefully they spill the beans soon. I just hope that you don't have to code a loop for it (like you must for Knights Corner). The worst case is that it's just a long microcoded sequence, but that wouldn't make much sense. I am keeping my pessimistic view until Intel proves me otherwise. Efficient gather is almost too good to be trueNo mention of how gather is handled, though.
And "no changes to key pipelines" either.
Maybe a copy&paste error from SNB/IVB? Since there's now a dedicated store AGU, and only one store data port, it seems like there would be no reason at all to use a shared load/store AGU for calculating the store address. In some way that would be more like Nehalem/Westmere, which also had separate load and store AGUs (but of course just one of each).It's kind of interesting to see a core that could for some reason generate 3 store addresses a cycle. Perhaps the extra port is to keep the store address out of the way of the load calculations as much as possible.
Well, how would it know this RAM is closer...?
Besides, from what I understand no desktop OS can optimize RAM during runtime; once something's loaded somewhere it pretty much stays there.
Well, how would it know this RAM is closer...? Besides, from what I understand no desktop OS can optimize RAM during runtime; once something's loaded somewhere it pretty much stays there.
Hopefully they spill the beans soon. I just hope that you don't have to code a loop for it (like you must for Knights Corner). The worst case is that it's just a long microcoded sequence, but that wouldn't make much sense. I am keeping my pessimistic view until Intel proves me otherwise. Efficient gather is almost too good to be true
... Intel has added two extra ports, but none of them does load related things. And "no changes to key pipelines" either. No mention about other load related improvements either. So my conclusion is that gather likely takes several cycles to complete (even without cache misses).
A microcoded sequence could still be faster.Hopefully they spill the beans soon. I just hope that you don't have to code a loop for it (like you must for Knights Corner). The worst case is that it's just a long microcoded sequence, but that wouldn't make much sense. I am keeping my pessimistic view until Intel proves me otherwise. Efficient gather is almost too good to be true
... Intel has added two extra ports, but none of them does load related things. And "no changes to key pipelines" either. No mention about other load related improvements either. So my conclusion is that gather likely takes several cycles to complete (even without cache misses).