Coherency isn't really an issue (and not really necessary at that fine a granularity). The raw bandwidth required is the problem.So all of your memory management processes now have to go over this bus - global memory atomic ops for example.
Coherency isn't really an issue (and not really necessary at that fine a granularity). The raw bandwidth required is the problem.So all of your memory management processes now have to go over this bus - global memory atomic ops for example.
And like silent-guy said, lrb's architecture doesn't give it a free pass on inter-chip bandwidth.
Yeah I think that scaling was cores per chip not number of independent chips.
Coherency isn't really an issue (and not really necessary at that fine a granularity). The raw bandwidth required is the problem.
I'm not sure what kind of bandwidth you mean here, but wouldn't in theory memory bandwidth automatically be twice as high if such a config uses 2*256bit busses as 1 real 512bit wide bus?
If you want a 2 chip memory uniform memory architecture wrt bandwidth (you ignore latency) where each chip has the same amount of bandwidth as a 1 chip solution and if you assume that you can spread your accesses in a uniform way across the memory of both chips, then you'll have a 50% IO pin overhead compared to a single chip solution.
In single chip mode, you have a 2x256 bit bus. In dual chip, you have 1 256-bit bus going to local memory and 1 256-bit bus going from chip a to chip b, but chip b also needs concurrent access to chip a, so you need an additional 256-bit bus for that.
In any case, your total bandwidth per IO pin (memory pins + interconnect pins) has gone down by 50%: you still only have 512 pins to memory with the same total bandwidth, but you also have 512 pins between the two chips. If your application is bandwidth limited, you didn't win anything.
If you make the interconnect only 2x128 bit and 2x384 bits to memory, then you're not uniform anymore and you have to start to implement about application specific allocation strategies.
In case of a GPU, I assume you could do smart placement of textures and Z-buffers, duplicate some texture in both memories, make sure GPU-A renders one part of the screen and GPU-B renders the other...
Starts to sound a lot like SFR, doesn't it?
If you want a 2 chip memory uniform memory architecture wrt bandwidth (you ignore latency) where each chip has the same amount of bandwidth as a 1 chip solution and if you assume that you can spread your accesses in a uniform way across the memory of both chips, then you'll have a 50% IO pin overhead compared to a single chip solution.
In single chip mode, you have a 2x256 bit bus. In dual chip, you have 1 256-bit bus going to local memory and 1 256-bit bus going from chip a to chip b, but chip b also needs concurrent access to chip a, so you need an additional 256-bit bus for that.
In any case, your total bandwidth per IO pin (memory pins + interconnect pins) has gone down by 50%: you still only have 512 pins to memory with the same total bandwidth, but you also have 512 pins between the two chips. If your application is bandwidth limited, you didn't win anything.
If you make the interconnect only 2x128 bit and 2x384 bits to memory, then you're not uniform anymore and you have to start to implement about application specific allocation strategies. In case of a GPU, I assume you could do smart placement of textures and Z-buffers, duplicate some texture in both memories, make sure GPU-A renders one part of the screen and GPU-B renders the other...
Starts to sound a lot like SFR, doesn't it?
No matter what, you'll very soon run into issues that prevent perfect scaling.
I don’t want to spoil the party but I am still pretty sure that Hydra will never work as promised.
It could work to some degree with special modified drivers and support from the GPU manufacture but not as a generic solution. As I don’t expect support from nvidia or ATI there is only Intel left.
In other words nothing; why would Intel think of multi-GPU with a concept like Larabee anyway?
MSI works on Hydra featuring P55 board:
http://www.iopanel.net/forum/thread31404.html
Based on what I saw last year as compared to this year, I am confident that it will achieve competitive gaming performance. (Sorry we can't say more, they are really tying our hands here.)