We get asked a lot of questions about how to select Apache Hadoop worker node hardware. During my time at Yahoo!, we bought a lot of nodes with 6*2TB SATA drives, 24GB RAM and 8 cores in a dual socket configuration. This has proven to be a pretty good configuration. This year, I’ve seen systems with 12*2TB SATA drives, 48GB RAM and 8 cores in a dual socket configurations. We will see a move to 3TB drives this year.
What configuration makes sense for any given organization is driven by such ratios as the storage-to-compute ratio of your workload and other factors that cannot be answered in a generic way. Further, the hardware industry moves quickly. In this post I’ll try to outline the principles that have generally guided Hadoop hardware configuration selections over the last six years. All of these thoughts are aimed at designing medium to large Apache Hadoop clusters. Scott Carey made a good case for smaller machines for small clusters the other day on the Apache mailing list.
The key for Hadoop clusters is to buy quality commodity equipment. Most Hadoop purchasers are cost conscious and as your clusters grow, their cost can be significant. When thinking about cost, one needs to think about the whole system, including network, power and the extra components included in many high-end systems. Remember that Hadoop is built to handle component failure well and to scale out on low cost gear. RAID cards, redundant power supplies and other per-component reliability features are not needed. Buy error-correcting RAM and SATA drives with good MTBF numbers. Good RAM allows you to trust the quality of your computations. Hard drives are the largest source of failures, so buy decent ones.
On CPU: It helps to understand your workload, but for most systems I recommend sticking with medium clock speeds and no more than 2 sockets. Both your upfront costs and power costs rise quickly on the high-end. For many workloads, the extra performance per node is not cost-effective.
On Power: Power is a major concern when designing Hadoop clusters. It is worth understanding how much power the systems you are buying use and not buying the biggest and fastest nodes on the market. In years past we saw huge savings in pricing and significant power savings by avoiding the fastest CPUs, not buying redundant power supplies, etc. Nowadays, vendors are building machines for cloud data centers that are designed to reduce cost and power and that exclude a lot of the niceties that bulk up traditional servers. Supermicro, Dell and HP all have such product lines for cloud providers, so if you are buying in large volume, it is worth looking for stripped-down cloud servers.
On RAM: What you need to consider is the amount of RAM needed to keep the processors busy and where the knee in the cost curve resides. Right now 48GB seems like a pretty good number. You can get this much RAM at commodity prices on low-end server motherboards. This is enough to provide the Hadoop framework with lots of RAM (~4 GB) and still have plenty to run many processes. Don’t worry too much about RAM, you’ll find a use for it, often running more processes in parallel. If you don’t, the system will still use it to good effect, caching disk data and improving performance.
On Disk: Look to buy high-capacity SATA drives, usually 7200RPM. Hadoop is storage hungry and seek efficient but it does not require fast, expensive hard drives. Keep in mind that with 12-drive systems you are generally getting 24 or 36 TB/node. Until recently, putting this much storage in a node was not practical because, in large clusters, disk failures are a regular occurrence and replicating 24+TB could swamp the network for long enough to really disrupt work and cause jobs to miss SLAs. The most recent release of Hadoop 0.20.204 is engineered to handle the failure of drives more elegantly, allowing machines to continue serving from their remaining drives. With these changes, we expect to see a lot of 12+ drive systems. In general, add disks for storage and not seeks. If your workload does not require huge amounts of storage, dropping disk count to 6 or 4 per box is a reasonable way to economize.
On Network: This is the hardest variable to nail down. Hadoop workloads vary a lot. The key is to buy enough network capacity to allow all nodes in your cluster to communicate with each other at reasonable speeds and for reasonable cost. For smaller clusters, I’d recommend at least 1GB all-to-all bandwidth, which is easily achieved by just connecting all of your nodes to a good switch. With larger clusters this is still a good target although based on workload you can probably go lower. In the very large data centers the Yahoo! built, they are seeing 2*10GB per 20 node rack going up to a pair of central switches, with rack nodes connected with two 1GB links. As a rule of thumb, watch the ratio of network-to-computer cost and aim for network cost being somewhere around 20% of your total cost. Network costs should include your complete network, core switches, rack switches, any network cards needed, etc. We’ve been seeing InfiniBand and 10GB Ethernet networks to the node now. If you can build this cost effectively, that’s great. However, keep in mind that Hadoop grew up with commodity Ethernet, so understand your workload requirements before spending too much on the network.
I hope you find this outline of how we think about Apache Hadoop hardware helpful.