LucidLogix Hydra, madness?

MfA · Feb 21, 2009

trinibwoy said:
So all of your memory management processes now have to go over this bus - global memory atomic ops for example.

Coherency isn't really an issue (and not really necessary at that fine a granularity). The raw bandwidth required is the problem.

CarstenS · Feb 22, 2009

trinibwoy said:
And like silent-guy said, lrb's architecture doesn't give it a free pass on inter-chip bandwidth.

I think intel's own scaling measurement with different numbers of cores have left quite an imprint on peoples minds.

Though I am not sure, i assume these were done at simulator level and wrt a single LRB solution having this and that many cores, not multiple LRB addon-cards communicating via some kind of external interface.

trinibwoy · Feb 22, 2009

Yeah I think that scaling was cores per chip not number of independent chips.

AlexV · Feb 22, 2009

trinibwoy said:
Yeah I think that scaling was cores per chip not number of independent chips.

That is correct.

Ailuros · Feb 23, 2009

MfA said:
Coherency isn't really an issue (and not really necessary at that fine a granularity). The raw bandwidth required is the problem.

I'm not sure what kind of bandwidth you mean here, but wouldn't in theory memory bandwidth automatically be twice as high if such a config uses 2*256bit busses as 1 real 512bit wide bus?

MfA · Feb 23, 2009

Not memory bandwidth, interconnect bandwidth. Basically it takes extra pins and gates, which lie idle in single chip setups (which are the most common).

silent_guy · Feb 23, 2009

Ailuros said:
I'm not sure what kind of bandwidth you mean here, but wouldn't in theory memory bandwidth automatically be twice as high if such a config uses 2*256bit busses as 1 real 512bit wide bus?

If you want a 2 chip memory uniform memory architecture wrt bandwidth (you ignore latency) where each chip has the same amount of bandwidth as a 1 chip solution and if you assume that you can spread your accesses in a uniform way across the memory of both chips, then you'll have a 50% IO pin overhead compared to a single chip solution.

In single chip mode, you have a 2x256 bit bus. In dual chip, you have 1 256-bit bus going to local memory and 1 256-bit bus going from chip a to chip b, but chip b also needs concurrent access to chip a, so you need an additional 256-bit bus for that.

In any case, your total bandwidth per IO pin (memory pins + interconnect pins) has gone down by 50%: you still only have 512 pins to memory with the same total bandwidth, but you also have 512 pins between the two chips. If your application is bandwidth limited, you didn't win anything.

If you make the interconnect only 2x128 bit and 2x384 bits to memory, then you're not uniform anymore and you have to start to implement about application specific allocation strategies. In case of a GPU, I assume you could do smart placement of textures and Z-buffers, duplicate some texture in both memories, make sure GPU-A renders one part of the screen and GPU-B renders the other...

Starts to sound a lot like SFR, doesn't it?

No matter what, you'll very soon run into issues that prevent perfect scaling.

trinibwoy · Feb 23, 2009

This might be a stupid question but how exactly would a third-party compositing solution get access to intermediate buffers on a GPU without assistance from the graphics driver? Not to mention how it sends a composed multi-sampled buffer back to a card for AA resolution. Actually how does any communication happen at all without AMD's and Nvidia's blessings?

Ailuros · Feb 24, 2009

silent_guy said:
If you want a 2 chip memory uniform memory architecture wrt bandwidth (you ignore latency) where each chip has the same amount of bandwidth as a 1 chip solution and if you assume that you can spread your accesses in a uniform way across the memory of both chips, then you'll have a 50% IO pin overhead compared to a single chip solution.

In single chip mode, you have a 2x256 bit bus. In dual chip, you have 1 256-bit bus going to local memory and 1 256-bit bus going from chip a to chip b, but chip b also needs concurrent access to chip a, so you need an additional 256-bit bus for that.

In any case, your total bandwidth per IO pin (memory pins + interconnect pins) has gone down by 50%: you still only have 512 pins to memory with the same total bandwidth, but you also have 512 pins between the two chips. If your application is bandwidth limited, you didn't win anything.

If you make the interconnect only 2x128 bit and 2x384 bits to memory, then you're not uniform anymore and you have to start to implement about application specific allocation strategies.

Damn...

In case of a GPU, I assume you could do smart placement of textures and Z-buffers, duplicate some texture in both memories, make sure GPU-A renders one part of the screen and GPU-B renders the other...

Starts to sound a lot like SFR, doesn't it?

SFR isn't necessarily a panacea if you could IMHLO manage to scale geometry as well as with AFR and manage to divide as good as possible the workload between GPUs.

rpg.314 · Feb 24, 2009

silent_guy said:
If you want a 2 chip memory uniform memory architecture wrt bandwidth (you ignore latency) where each chip has the same amount of bandwidth as a 1 chip solution and if you assume that you can spread your accesses in a uniform way across the memory of both chips, then you'll have a 50% IO pin overhead compared to a single chip solution.

In single chip mode, you have a 2x256 bit bus. In dual chip, you have 1 256-bit bus going to local memory and 1 256-bit bus going from chip a to chip b, but chip b also needs concurrent access to chip a, so you need an additional 256-bit bus for that.

In any case, your total bandwidth per IO pin (memory pins + interconnect pins) has gone down by 50%: you still only have 512 pins to memory with the same total bandwidth, but you also have 512 pins between the two chips. If your application is bandwidth limited, you didn't win anything.

If you make the interconnect only 2x128 bit and 2x384 bits to memory, then you're not uniform anymore and you have to start to implement about application specific allocation strategies. In case of a GPU, I assume you could do smart placement of textures and Z-buffers, duplicate some texture in both memories, make sure GPU-A renders one part of the screen and GPU-B renders the other...

Starts to sound a lot like SFR, doesn't it?

No matter what, you'll very soon run into issues that prevent perfect scaling.

Nice explanation. If the software model can guarantee that no chips write to the same location, but read is arbit, then perhaps you will not have these issues. Let us take the case of simple texturing here. In a 2 gpu system, only the pixels at the border will need to read from the other gpu, so and since the ratio of border to bulk pixels goes down as the resolution increases, then numa style hacks may npt be necessary. For general texturing, it's difficult to say.

AnarchX · Aug 11, 2009

MSI works on Hydra featuring P55 board:
http://www.iopanel.net/forum/thread31404.html

rpg.314 · Aug 12, 2009

Let's hope they exit their stealth mode soon. If the boards have been fabbed, then it shouldn't be long then.

mboeller · Aug 12, 2009

In Jon Peddie's new report you can find a generic decription of the Hydra system. It seems multi-GPU systems could take off with the help of the Hydra chipset:

Link: http://www.jonpeddie.com/special/WhitePapers/Multi-GPU-issues-and-opportunities.pdf

Demirug · Aug 12, 2009

I don’t want to spoil the party but I am still pretty sure that Hydra will never work as promised.

It could work to some degree with special modified drivers and support from the GPU manufacture but not as a generic solution. As I don’t expect support from nvidia or ATI there is only Intel left.

Ailuros · Aug 12, 2009

Demirug said:
I don’t want to spoil the party but I am still pretty sure that Hydra will never work as promised.

It could work to some degree with special modified drivers and support from the GPU manufacture but not as a generic solution. As I don’t expect support from nvidia or ATI there is only Intel left.

In other words nothing; why would Intel think of multi-GPU with a concept like Larabee anyway?

Demirug · Aug 12, 2009

Ailuros said:
In other words nothing; why would Intel think of multi-GPU with a concept like Larabee anyway?

I am may be wrong but so far I never saw Intel announced some kind of inter card connection.

dkanter · Aug 12, 2009

Intel likely used this a bargaining chip with NV about SLI.

David

AlexV · Aug 13, 2009

AnarchX said:
MSI works on Hydra featuring P55 board:
http://www.iopanel.net/forum/thread31404.html

And it's only a glorified PCI-E bridge there. So instead of using nV's NF200 or some PLX bridge or another, they use the Hydra. Also, apparently that board is merely a prototype, whether or not it'll ever enter mass production is unknown.

Arty · Sep 23, 2009

Lucid had a new demo with MSI's Big Bang board and Ryan (PCPer) had to say this about Hydra 200.

Based on what I saw last year as compared to this year, I am confident that it will achieve competitive gaming performance. (Sorry we can't say more, they are really tying our hands here.)

Silent_Buddha · Sep 23, 2009

Anandtech got a bit of hands on also, and this is their impression...

http://www.anandtech.com/video/showdoc.aspx?i=3646

Multi-GPU with a Nvidia + ATI GPU on the same system.

Price is pretty steep. Over 70 USD for the high end version and over 40 USD for the low end version of the chip.

Regards,
SB

LucidLogix Hydra, madness?

MfA

CarstenS

Moderator

trinibwoy

Meh

AlexV

Heteroscedasticitate

Ailuros

Epsilon plus three

MfA

silent_guy

trinibwoy

Meh

Ailuros

Epsilon plus three

rpg.314

AnarchX

rpg.314

mboeller

Demirug

Ailuros

Epsilon plus three

Demirug

dkanter

AlexV

Heteroscedasticitate

Arty

KEPLER

Silent_Buddha

Similar threads