building up a render farm

rgbaguy

Newcomer
Hi there,

I'm new to this forum, so hello :)

I'm looking to build up a render farm for my company, and have been doing quite a bit of research. We have a really big job that may be coming up, and have to boost up our infrastructure to handle this.

I initially contacted people like BOXX, Workstations Specialists, IBM and so on. After much back and forth between all these guys, IBM seemed the way to go on the cost front and available support.

The requirements we worked out are:

Dual socket Xeon E5520 2.26 Ghz/1066 Mhz FSB/8 MB L2
Server board with dual socket support for the 5500s
12 GB DDR3 1333 Mhz RAM
80 GB SAS or SATA drive

We will be needing 40 of these machines, as we have built the hardware taking our software licences into consideration.

The only problem is how expensive the hardware is, we will be building up the farm in multiples of 5 as our software plan allows.

Quote from IBM on a 42U rack (with 2 x Power Distribution Units with power cords, 1 x console switch and 1 x 1U monitor and keyboard), BladeCenter-E chassis (14 blades), 5 HS22 blade servers of the above spec, DS3200 storage controller with 6 x 1 TB SATA drives, 1 x TS2900 tape autoloader, 1 x LT04 SAS tape drive, 10 x LT04 tape cartridges came to USD 73,878.31, which is like good god. That's on 5 dual socket blades, and our end requirements are on 40 dual socket blades.

Does this sound reasonable? I mean, the hardware is fantastic no doubt, and 3 year support included. It's certainly not a bad deal, at least compared to BOXX etc.

Still the costs are quite steep. On the quote we received, it hasn't been broken down, so have requested that the quote be broken down so I can see the cost implications on scaling up.

So I've gone back to my orignial research on building up our own rack, which will no doubt be much more cost effective, but a nightmare to maintain. I was hoping someone could help me out on this front. I have come across the following articles:

http://helmer.sfe.se/
http://www.tomshardware.com/reviews/server-part-1,775.html
http://blog.deadlycomputer.com/2006/09/17/458/

I'm not afraid of getting my hands dirty building one of these. The cost savings will be well worth it. I have done research on hardware, and the 1U/2U servers that Intel offers look great, but I'm pretty sure they won't be cheap either. So I'm looking into building a custom rack that will hold all the components without any chassis, basically the components will just be resting on an aliminium support (think ATX case support) that can be slid in and out of the rack. We can get our hardware at cost, which is a good thing as well.

If we go the above route we won't be packing the machines very densely in order to try keep them as cool as possible.

On cooling, will be putting the rack in a closed off area with air con. Thinking of building a AC controller linked to a thermostat to save power.

Any suggestions?

Thanks!
 
Although you haven't described the exact application, I'm going to assume that it's CG animation.
I'm no expert but some of our experiences were:

- We have 1U machines with quad-core CPU, probably two of them per machine, running Linux (less trouble, better resource usage, easy 64bit) and Mental Ray; don't know about the make. I think this maximizes the use of the software licences too, and reduces data traffic, having 8 CPUs working on the same batch.
- Make sure to extensively test one system before commiting; particularly with DDR3 memory, it's proven to be problematic.
- Our older IBM blades had LOTS of hard drives dying on us... go for extra reliability there.
- AC is probably a bigger concern then you'd expect, probably going to require custom solutions; also prepare for electricity shortages!
- Also be mindful of weight, particularly if you're not on the first floor of the building, this stuff is heavy and office buildings might not be designed to support them.
- Lots of render nodes will probably require a huge and fast central storage system, make sure to do your research and testing if possible; we have like 10 TB online or so, and some fast backup/restore system. Local hard drives don't help much as most productions update content far too often.
 
40 high-end servers running for hours on end at full load with various other support equipment (storage and whatnot) will dissipate enormous amounts of waste heat. Hopefully you've planned for this already. :D
 
Hey Laa-Yosh, small world hey! I used to post on spiraloid a few years back! Good to hear from you again :) Loved your work on the Blur cinematics, great stuff! Hows it going?

Yeah, our pipeline is built around Maya and mental ray, we will be taking advantage of the 5 mental ray batch licences that come with each licence of Maya 2010. Also, the beauty of this is that the licences work per machine ID, as opposed to the mental ray licences, which are per socket. So that's a significant savings.

Thanks for the advice, I'm doing as much research as I can. IBM has already commited to test session so we can test software trials with the hardware. True, cooling will be a bit of a problem. Weight is not so much a problem on the office front, that's all settled. It's more cooling that is a worry.

On storage, I agree.

On the custom rack, have done some research on the cost/performance front, and it looks like this would be the best solution on the hardware front:

Dual socket Xeon E5520 (with STS100A heatsinks)
S5500WB server board
6 x 2 GB DDR3 1333 Mhz
80 GB SAS/SATA (yet to get to this)
SUSE Linux Enterprise 11 64 bit

Obviously designing a rack to hold the hardware.

Still have to work out the smallest possible PSU for the above. Costing seems to be working out in our favour, still might not be a good idea to go this route. Hopefully the scale up won't cost too much, then in that case IBM might just be our safest bet.

Looking forward to more replies :)
 
40 high-end servers running for hours on end at full load with various other support equipment (storage and whatnot) will dissipate enormous amounts of waste heat. Hopefully you've planned for this already. :D

Hehe, sorry didn't see your message. Yeah, I can see that. And it's a big worry :p

Sorry guys, made a typo, the blade specs from IBM (HS22), not the E5520 but the L5520, sorry!
 
I don't think the risk and work involved really justifies not simply buying something like Supermicro chassis ... a motherboard, chassis, PSU and heatsinks/fans for 812$ seems like a good deal.

PS. their blades are even cheaper, but the enclosure will run you more than a 42U rack.
 
Just a thought:

If your goal is to use this setup for CGI rendering it's pretty much impossible that your disk subsystem will be so loaded that it warrants going for SAS drives. These are just way, WAY more expensive, espeically considering you'll need dedicated controllers (that also tend to be quite costly) to even be able to connect them...

A JBOD/RAID5 array of standard SATA drives should be quite sufficient for your needs methinks, no? :)
 
I agree about SATA vs SAS (although it makes sense as a host interface for higher end RAID solutions like with that DS3200). Not RAID-5 though :) (Seriously, I'd go for RAID 10 ... performs better, cost is close enough and the odds of a drive failure during a RAID rebuild are not insignificant).
 
Thanks for the replies!

MfA, thanks for the link, looks great. Have contacted the guys.

Grall, yeah that had occured to me, there is no point in going for SAS drives, SATA will do just fine. The render nodes won't actually have project data on local disk, but will reference data off a shared drive. So yeah, that is uneccessary. Agree on a RAID array, had that in mind already. But glad to hear someone suggest that approach. :)

MfA, glad to hear your input on the above.

Guys, I'm seriously considering building up the farm from scratch now. I mean, sure the amount of work involved in maintenance will be greater than a pre-built setup like what IBM has to offer. But I can cut the costs by way over 60% after running some rough numbers (I can even build up the full 40 nodes for a little more than what IBM have quoted for the above). I might even opt for the i7 processors on a dual socket server board. The Xeon's are rather pricey, and I don't believe there is really much justification in spending extra cash on these. Besides, from my experience with hardware, I believe it won't be a problem at all. The power/cooling worries me a lot more. I guess on this front, I'll build the farm into clusters of 5 nodes that aren't packed so tight, space isn't really a major issue to be honest. Will also make it much easier to manage the hardware and cooling IMO.

After doing quite a bit of research on the Intel site, have put together a plan for the custom setup:

Intel S5500WB server board
Intel Xeon E5520 (uses 20 W more power than the L5520, which costs ~1.4x the price of the E5520, but this isn't really a problem as we won't really be running the farm that much, have worked out that from our job forecasts that this extra cost in power would only reflect after roughly 2 years, so isn't really a factor to worry about)
and obviously RAM, PSU etc.

Initially the above seems like a fairly cost effective solution from a dual socket Xeon POV, however have yet to look into the possibility of building up with i7's, it really does make sense to invest in cheaper hardware as it's a depreciating asset. One thing for certain is dual socket are a must, as we need to take as much advantage out of the Maya mental ray batch licences. Any thoughts?

Still need to do more research on this.

Thanks for the replies, looking forward to more input :)

Talk soon!
 
Besides won't have to worry about having to use virtual memory on the render nodes either, 12 GB is plenty enough for our needs.
 
When you have the thing set up, will you promise to post crysis benchmarks ;)

Lol yeah, I promise :p It won't be happening for a while though, still have to get the job approved (hopefully!)

Funny thing, just looked on the Intel site, doesn't look like the i7's have dual socket support, which is strange. I'm pretty sure I've heard of dual socket support for those?

Besides, they use 130 W over the 80 W of the E5520, thats almost twice as much power. Not good.

Looks like Xeon's are the way to go.
 
Hey Laa-Yosh, small world hey! I used to post on spiraloid a few years back! Good to hear from you again Loved your work on the Blur cinematics, great stuff! Hows it going?

That's coooool :) Spiraloid was such a unique place. Did you know that Bay Raitt's now at Valve, he's probably the reason TF2 and LFD has such good facial animation...
I'm at Digic Pictures, things are going well - we've got a lot of work to do, clients are happy and returning... it's just that sometimes they make it so hard ;) We haven't yet received any references for a project that has to be done for E3, so I expect another nice crunch. At least summer's gonna be easier though ;)

The render nodes won't actually have project data on local disk, but will reference data off a shared drive. So yeah, that is uneccessary.

We've implemented local HDD texture caches and it has worked very well. Need to write a simple shell script to manually update it whenever someone finishes/modifies an asset, but the network traffic decrease makes it more then worth the effort. Make sure to include disks that are large enough to hold all project's data.

The power/cooling worries me a lot more. I guess on this front, I'll build the farm into clusters of 5 nodes that aren't packed so tight, space isn't really a major issue to be honest. Will also make it much easier to manage the hardware and cooling IMO.

You really should contact a specialist there IMHO. Might be expensive but it's worth it, less hardware failures and downtime...

Can't really comment on the hardware though, after 6 years with a proper sysadmin I'm not even sure what my own workstation has in it ;)
 
Funny thing, just looked on the Intel site, doesn't look like the i7's have dual socket support, which is strange. I'm pretty sure I've heard of dual socket support for those?
Intel probably wants you to buy their identical but slightly differently wired and much more expensive xeon i7 CPUs for multi-socket support...
 
Yo Laa-Yosh!

Yeah, I do know. He's such an awesome artist ;) Yeah, Digic seems like a really awesome studio, really love the work you guys are doing. Also it's really cool you guys put a little more behind the scenes than most others :p Always fun to look at. Glad to hear business is going well, I'm sure you could do with a break though ;)

Cool note on the HDD cache, especially with large textures. Could even move parts of the project temporarily onto the local drive for rendering, could be a good idea.

Sure, I should probably make a plan to find someone to help figure out the cooling.

Hehe, I'm sure you have some pretty beastly hardware down in your box, I mean with all the displacement you are pushing these days :D

Talk laters!
 
Intel probably wants you to buy their identical but slightly differently wired and much more expensive xeon i7 CPUs for multi-socket support...

I guess that is a point. Unfortunately dual socket is most definitely our best option to leverage the most out of our software licenses.
 
I'll build the farm into clusters of 5 nodes that aren't packed so tight, space isn't really a major issue to be honest. Will also make it much easier to manage the hardware and cooling IMO.
Even if you don't buy existing blades/1Us you can still get pretty cheap 2U/4U rackmount EATX cases (<100$ with PSU, although I'd be a bit hesitant about tying 1200$ worth of hardware to a cheap PSU). A cheap 42U rack is hardly going to make an impact either.

BTW is Gigabit really still enough nowadays? (Infiniband would add ~500$ per node, 10 Gb/s ethernet probably more since switches are much more expensive.)
 
Even if you don't buy existing blades/1Us you can still get pretty cheap 2U/4U rackmount EATX cases (<100$ with PSU, although I'd be a bit hesitant about tying 1200$ worth of hardware to a cheap PSU). A cheap 42U rack is hardly going to make an impact either.

BTW is Gigabit really still enough nowadays? (Infiniband would add ~500$ per node, 10 Gb/s ethernet probably more since switches are much more expensive.)

Hey MfA, thanks for the reply :) Yeah, I will look into that, I do agree with you. I honestly think ethernet should be fine, honestly it won't really have that much impact really.
 
Hi there,

Have looked into chassis for the hardware (we will probably build the farm up from scratch now, seeing that IBM's quote is a little out of our budget).

What do you think:

http://www.rackmount.com/Rackmt/ATXBladeS100.htm

http://www.rackmount.com/PrdGuide_System_RAID_NAS.htm

Have contacted Rackmount asking if they can help with 10 blade chassis for the hardware spec below:

Dual socket Xeon E5520
Intel S5500WB server board
12 GB DDR3 ECC 1333 RAM
2 x 160 GB mirrored SATA drives

The blade chassis fits into an 8U enclosure, so on 40 blades that would use up 40U leaving 2U for a RAID array (see last link above).

What do you think?

Look forward to any replies! Thanks!
 
Back
Top