building up a render farm

Discussion in 'PC Hardware, Software and Displays' started by rgbaguy, Jan 23, 2010.

  1. rgbaguy

    Newcomer

    Joined:
    Jan 23, 2010
    Messages:
    46
    Likes Received:
    0
  2. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    6,705
    Likes Received:
    458
    No I meant like the DELL PowerConnect™ 6248, it has two slots for 10 Gb modules ... so you connect the fileserver/central node to the 10 Gb port(s) and all the render slaves to the 1 Gb ports.

    PS. well that Juniper switch can have 10 Gb ports as well, but it's a little more expensive.
     
    #62 MfA, Jan 30, 2010
    Last edited by a moderator: Jan 30, 2010
  3. -tkf-

    Legend

    Joined:
    Sep 4, 2002
    Messages:
    5,632
    Likes Received:
    36
    This is the components that made up my 10gbit Network:

    Intel 10 Gigabit CX4 Dual Port Server Adapter - NIC
    HP ProCurve 10GbE CX4 Copper Module
    HP ProCurve Switch 2910al-24G
    HP CX4 LOCAL CONNECTION CABLE 3M

    The HP 10Gbit stuff was pretty cheap when we came up with this, at least compared to other solution.

    The idea was that in theory the Server could serve up to 10 hosts at full speed (100 MegaBytes) without the network being the limiting factor. The chance of 10 hosts pulling max data at the exact same time was slim in our case and it gave us plenty of headroom. We chose Raid 6, even it the Barf(?) points out that Harddrives "are" free it isn´t the case :) With 16 Disks and Raid 6 we have 12TB Capacity, 2 parity Disks and 1 hot spare.

    Our backup solution was a Mirror Server just like the primary server and a Tape Backup connected to this server, we would Mirror the server during nights and backup to the Tape during day, you would need some automatic mirroring (build into the Server 2008 from Microsoft afaik), and then i guess you could backup you mirror server just like us.

    Tape Backup: HP StorageWorks MSL2024 Ultrium 1760 with a native capacity of 20TB
    We planned on buying 2 extra magazine and 24 extra tapes, and would swap the magazines to get the data to a remote site (aka someone would take it home)
     
  4. rgbaguy

    Newcomer

    Joined:
    Jan 23, 2010
    Messages:
    46
    Likes Received:
    0
    Hey guys, thanks for the replies!

    Cool, that does make sense. So 40 render slaves connected to the 1 GB ports, the storage server (will act as central server) into one 10 GB port and our main workstation into another 10 GB port.

    Planning on only sending renders through one machine, will be a small team on this job, I'll basically be the one setting up renders and so on, the rest of the team are animators. We will be outsourcing modeling/texturing to freelancers. So will have a separate network for the workstations, and another for the farm/server.

    Had a look at the Power Connect, looks cool. Will look around here for options etc. So basically I just need to look for a 48 port gigabit ethernet switch with 2 10 GB uplink modules right?

    Cool :) Yeah, the chances of all 40 slaves trying to pull data off the server isn't very likely, but possible. I'm not even going to worry about that :p

    What do you think of the storage links I posted above? Good enough? Can someone please explain to me basically how setup would work? I'm new to all this, but learning as I go!

    On tape backup, woah, thats new too. Do you reckon that it's essential?

    Basically what we do now is work off shared drives, then then the job is finished we just pull the drives out (to put in storage), and buy new ones. It's a little clumsy and old fashioned, but it works just fine (we are a small team).

    That's my biggest issue, storage/backup. It would help alot if someone could give me some basic ideas behind this, I've looked on google for RAID, NAS, SAN etc, it's all very confusing.

    Not sure if tape backup would we worth it in our case, RAID definitely is a good idea, and to persist our data we can just take out the hard drives and put in storage. That could get pretty clumsy though if we have a lot of data to backup.

    Looking forward to more replies! Almost at the end of putting this together, wouldn't be able to do this without everyones help :)
     
  5. -tkf-

    Legend

    Joined:
    Sep 4, 2002
    Messages:
    5,632
    Likes Received:
    36
    Again i went with simple and plain. I plan on building my own server in a Rack Cabinet with either 12 or 16 drives, an adaptec controller and the above mentioned Network stuff.

    I used alot of time on messing around with Brand Name solutions, but they were all to expensive and to slow. A 16 Drive Raid 6 setup mounted directly in the server with a 10 Gbit card will really yield alot of performance.

    Any storage solution that has to "hang" on the server via a classic Fiber, SaS or SCSI link will be limited by the connection speed to the storage.
    Considering the price for most storage solutions with 16 Drives, the added extra cost for a Xeon Motherboard, plenty of ram and a OS was imho a fair price to pay for the improved performance.
     
  6. rgbaguy

    Newcomer

    Joined:
    Jan 23, 2010
    Messages:
    46
    Likes Received:
    0
    Woah, these 2 uplink 10GB switches aren't cheap.

    Perhaps standard 48-port gigabit will be fine?
     
  7. rgbaguy

    Newcomer

    Joined:
    Jan 23, 2010
    Messages:
    46
    Likes Received:
    0
    Sorry didn't see your reply. Cool, ok. This is proving a difficult one for me
     
  8. -tkf-

    Legend

    Joined:
    Sep 4, 2002
    Messages:
    5,632
    Likes Received:
    36
    A Cluster the size of what you are planning will always be as fast as it´s weakest link, you NEED to do some tests on REAL data. Run some jobs, monitor the network usage vs CPU usage, is the CPU idle because the network is bottlenecked, is the server that is supplying the data the problem, do you cache the data on each render node (afaik this can be done with XSI for example, but we had some big problems getting the Render nodes stable on our setup.. though it wasn´t the same software).

    If you Cache the data on each Render Node you might be able to work with a slow network, however your network will be absolutely killed when the render jobs start.

    You might find that scraping one node and putting that money into a fast network will yield you more render speed in the end. But you are the only one that can measure that..

    My approach was to see how i could build a cheap network with lots of power, the same goes for my cluster.

    I absolutely hate Hard drives as backup devices.. there i said it :)
    Your approach to backup should be, what happens if there is a fire or if a lion eats my hardware.. imagine the building where everything is in goes away. What do you do?
    Hard drives on a remote location would at least help in that regard, but i would at least have 2 harddrive copies of my data then. Why do i say this? I have on more than one occasion connected a hard drive and prompted if i wants to "format this device".

    If you have absolutely control of what you want to backup, you buy one Tape Drive without loader and just backs it up to tape like you would copy it to a hard drive... and save money :)
     
  9. rgbaguy

    Newcomer

    Joined:
    Jan 23, 2010
    Messages:
    46
    Likes Received:
    0
    Thanks for the reply -tkf-!

    What you have said is very true. The last job we did, we had 3 machines rendering off a shared drive, saving the renders to the shared drive took as long as the actual render took in some situations. I think our onboard cards are 100 base, I can't be too sure.

    I have thought about caching, would just be pretty messy and hard to manage.

    I guess it does make sense to invest in a good network backbone.

    Ok, do you reckon that gigabit ethernet is good enough? On a switch, isn't the total bandwidth equal to N GB/s / N ports? i.e. 1000 / 48 = ~20 GB/s per machine?

    Also, certain RAID configs would result in a greater level of parallelism, thereby improving performance right?

    I should perhaps relook the network and storage. Perhaps I should actually consider SAS drives for the storage server. A little tired now, will get back to this tomorrow.

    Ok, will look into tape backup as well :) Hopefully there won't be any lions or fires :D
     
  10. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    6,705
    Likes Received:
    458
    The extra cost for using the more expensive switch is 50$ per node for the 40 node setup ... meh.

    Why not start with a really really cheap 8 or 16 port gigabit switch for now and then decide what to do when you scale up to 40 nodes? Those 1000$ switches might be cheaper than the 2500$ ones with 10Gb uplinks, but if you have to replace them anyway in the end it's still 1000$ down the drain.

    PS. it's Gb ... so divided by 48 it's 1e9/48/8 = 2.6 MB/s.
     
    #70 MfA, Jan 30, 2010
    Last edited by a moderator: Jan 30, 2010
  11. -tkf-

    Legend

    Joined:
    Sep 4, 2002
    Messages:
    5,632
    Likes Received:
    36
    In theory i would guestimate a 16 Disk System with 15 drives in raid 6 and one hot spare, to be able to sustain up to 1200 MB pr sec. A 10 GB network card can roughly handle 1000MB.

    So with 80 Clients at 1Gbit/100MB pr machine, max throughput you are still hitting the ceiling on the network, it´s just 10 times faster than an all out 1 Gbit network.

    All my drives is SATA btw, Raid Certified, i prefer WD :)
     
  12. Blazkowicz

    Legend Veteran

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    can your controller sustain that?
     
  13. -tkf-

    Legend

    Joined:
    Sep 4, 2002
    Messages:
    5,632
    Likes Received:
    36
  14. rgbaguy

    Newcomer

    Joined:
    Jan 23, 2010
    Messages:
    46
    Likes Received:
    0
    Thanks for the replies guys! :)

    Will have to work out how we will scale up the farm, might be cheaper to just get a 48-port switch, maybe not. Will run some numbers once I have more information.

    Cool, 2.6 MBytes/s is enough for each blade? Seems on the low side. 10 sec to copy over a 26 MB file seems a little slow? Actually on full capacity it will be 3.125 MB/s per blade (10^9/40).

    True. Heh, well SATA should be fine, I mean if the each machine is writing out at 3.125 MB/s it's hardly going to hurt performance as the drives have a bandwidth of 300 MB/s, and it's highly unlikely that all blades will write concurrently. Max bandwidth through the server would be 125 MB/s so it won't hurt performance. In that case, perhaps a RAID 1 will be fine? On 6 drives that gives 3 TB total capacity and pretty much 0 chance of data loss.

    If I seem a little consistent on my replies above (i.e. on RAID 1) that's because I only just saw these posts. I often write out my questions in notepad while I'm doing research on the net and then post.

    Switch:

    48-port gigabit switch with 1/2 10GB uplink slots.

    Storage server:

    Dual-port 10GB ethernet (onboard or add-on card). Single port would be cheaper, not sure if that is available though?

    This looks good (Intel Server SR2612 UR):

    http://www.intel.com/products/server/systems/SR2612UR/SR2612UR-overview.htm

    Up to 12 SAS/SATA hot-swap bays. 2U rackmount.

    And prices with markup here:

    http://www.wantitall.co.za/PC-Hardware/Intel-Server-SR2612UR__B002V3FDH4

    For the above, single socket Xeon E5502, S5520UR server board, 4 GB DDR3800/1066/1333 (RAM is so cheap, so might just opt for 1333). It's not a bad price actually, we would get at roughly 70-75% of price quoted on link directly above (we get at cost).

    Will need to add on CPU, heatsink, drives, and RAID controller (onboard RAID only supports levels 0/1/10 - although level 10 might just be perfect).

    The following components are included:

    Server Board S5520UR
    Server Chassis SR2612
    12 hot swap drive bays
    2 x 760 W PSU's
    4 fans
    2 CPU heatsinks (weird, Xeon's don't come with heatsinks anymore)

    Would need to add on Dual-port 10 GB ethernet card.

    Only one thing bugs me, on the S5520UR specs, it says the board only has 6 SATA-300 ports? Strange. Might have to get a different board? Still with RAID10, we can get up to 4 TB storage, which should be fine anyway. What do you think?

    RAID config:

    RAID 5 looks fine, min drives is 3, space efficiency is n-1, and fault tolerance is 1 drive. That should be safe enough?

    So with 6 x 1 TB SATA-300 drives, that gives 5 TB space. Not quite sure which is better, level 5 or 10?

    As I understand it, RAID10 on 6 1TB drives gives:

    Half of the drives mirrored for striping, and half the drives for parity. So with 6 drives, that gives 3 GB total storage, is that right?

    All that is needed next is a controller card (depending on which RAID level is chosen, perhaps level 6, in that case there is no onboard support for level 6).

    RAID controller:

    RAID 0,1,5,6,10,50

    http://www.wantitall.co.za/PC-Hardw...RAID-0-1-5-6-10-50-PCI-Express-x8__B0017QZLVE

    RAID 0,1,5,6,10,50,60

    http://www.wantitall.co.za/PC-Hardw...le-300-MBps-RAID-0-1-5-6-10-50-60__B000W7PNW6

    First one seems good?

    So the above, that might just complete the storage issue in itself. Then need to find a switch.
     
  15. rgbaguy

    Newcomer

    Joined:
    Jan 23, 2010
    Messages:
    46
    Likes Received:
    0
    Actually thinking about it, the render-lighting/compositing workstation will be reading/writing data to the storage server. This workstation should have a 10Gb network card installed to maximize throughput.

    A 10Gb card has a bandwidth of 1250 MB/s, and judging by how big some render passes can be, it would make sense to maximize bandwidth here.

    So, that means, a SATA-300 has a bandwidth of 300 MB/s, which is a little less than 1/4 of the bandwidth of a 10Gb card, so a RAID level that supports striping to give a bandwidth as close to the 1250 MB/s of the 10Gb card would make sense. Anything over 1250 MB/s would be negated by the fact that the render blades are also reading/writing from/to the storage server.

    Perhaps to be on the safe side, at least 10 drives should be striped to give a bandwidth greater than the 10Gb bandwidth of 1250 MB/s, so that would be 5 striped drives? Does this make sense, or just plain stupid? That would mean a 12 drive setup would make sense.

    OR adopt SATA-600 or SAS drives? That would mean for SATA-600, RAID 5,6,10 would make sense?
     
    #75 rgbaguy, Jan 31, 2010
    Last edited by a moderator: Jan 31, 2010
  16. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    15,926
    Likes Received:
    4,879
    Also remember that other than burst transfers, mechanical HDDs aren't going to be doing much more than 100 MB/s for SATA drives.

    Single SSD's can hit in the 150-250 range. But the price of those is extremely high per MB.

    Your file server if working right should be caching often accessed and recently accessed files, as well as prefetching data when able, so that will help when serving files to multiple nodes. I'm not sure if your useage will need an ultra fast disk subsystem rather than just a good file server.

    I'm with MfA in recommending Raid 10, for good data redundancy and speed.

    Regards,
    SB
     
  17. Blazkowicz

    Legend Veteran

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    you have many options, such as raid 50, and my favorite would be raid 100.
    because that's a really huge raid level number :lol:
     
  18. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    6,705
    Likes Received:
    458
    RAID5/6/50/60 arrays all share the same problem, you have a lot of identical drives with an identical usage history. The failure of one drive in such an array is one of the best predictors of future failures of the other drives ... or in other words, they are likely to all fail close together. Which can be rather nasty during a rebuild. If you have a good batch of drives the risk remains small ... if you happen to have a bad batch of drives (which can happen even with enterprise drives) you're up shit creek without a paddle.

    Reliability wise RAID10 with two sets of HDs from different manufacturers is the superior option.

    That said, if you are going to do RAID5/6 do it with a large array so you at least make most of the cost savings ... RAID50/60 with small individual arrays (before RAID0 striping) is the worst of both worlds, not really that much cheaper than RAID10 but still much more risky.
     
  19. -tkf-

    Legend

    Joined:
    Sep 4, 2002
    Messages:
    5,632
    Likes Received:
    36
    If you are concerned about the chance of getting a bad batch you just spread out among different brands.

    The advantage of Raid 6 is space, 16 drives with 3 providing security vs having to use a 13/13 and more hardware on the controller side. I guess the best solution would be Raid 5 or 6 with a Hot spare running a mirror of one of the Parity disks. I never understood why hot spares weren´t use for "something". In our Raid 5 Boxes we use 2 hot spares.
     
  20. rgbaguy

    Newcomer

    Joined:
    Jan 23, 2010
    Messages:
    46
    Likes Received:
    0
    Thanks for the replies guys! :) Sorry for late reply, got caught up with work

    Ah ok, I thought transfers were done at a constant 300 MB/s for SATA-300's and so on... Hmmm

    True, SATA should be ok then. Can you explain a bit how to setup caching etc on the fileserver?

    RAID 10, cool

    Woah, doesn't that require lots of drives/partitions! Seems a bit crazy for our purposes :D

    Cool, it does make sense to split the drives up between different manufacturers. Say half Seagate and half Samsung.

    I have to say, I have only experienced one drive failure in 12 years and that drive had its data corrupted for some reason I never found out why. True that I've probably only owned 20 or so drives in those years. I guess best be on the safe side, just saying never had problems like you guys are saying. Could be a more frequent problem in the top-end enterprise sector though?

    RAID 10, cool

    Cool, will do.

    Hmmm, RAID 6. Ok, I have a setup with 6 1TB SATA-300 drives. RAID 6 or RAID 10? Obviously backup is more important than space. So RAID 6 makes more sense then? Speed would be nice, so RAID 10 would be good too. Hard choice.

    Have I got this right on;

    6 drive RAID 6 config:
    3 parity drives each holding 1/3 of the data
    3 striped drives each holding 1/3 of the striped data

    6 drive RAID 10 config:
    3 mirrored - each drive holds 1/3 of the data
    3 striped - each drive holds 1/3 of the striped data

    Both options leave 3 TB free?

    The problem with RAID 6 is the motherboard doesn't have onboard support, so would need to get a controller card.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...