Previous Page
Next Page

2.6. Hardware Platforms

For large-scale web applications, software comprises an important but not complete piece of the puzzle. Hardware can be as significant as software, in the design as well as the implementation stages. The general architecture of a large application needs to be designed both in terms of the software components and the hardware platform they run on. The hardware platform, at least initially, tends to form a large portion of the overall cost of deploying a web application. The cost for software development, comprised of ongoing developer payroll, is usually bigger in the end, but hardware costs come early and all at once. Thus it's important to think carefully about designing your hardware platform in order to be in a position where initial cost is low and the track for expansion is clearly defined.

What the Heck Is . . . a Hardware Platform?

When we talk about hardware platforms, we're not talking about specific processors or bus architectures, but rather the collected jumble of machines, hardware components, and system software that make up the application as a whole. A hardware platform could be a single type of machine running a single OS, but more often than not is comprised of several classes of machines, perhaps running several OSes.


Donald Knuth said it best, in a quote that we'll be revisiting periodically:

We should forget about small efficiencies, about 97 percent of the time. Premature optimization is the root of all evil.

This applies directly to software development, but also works well applied as a rule for hardware platform design andthe software process in general. By starting small and general, we can avoid wasting time on work that will ultimately be thrown away.

Out of this principal come a few good rules of thumb for initial design of your hardware platform:


Buy commodity hardware

Unless you've built a very similar application of the same scale before, buying commodity hardware, at least initially, is almost always a good idea. By buying off-the-shelf servers and networking gear, you'll reduce cost and maximize repurposability. If your application fails before it takes off, you've wasted less money. If your application does well, you've spent less money upfront and have more for expansion. Overestimating hardware needs for startup applications can dry up a lot of money that would have otherwise been available for more pressing causes, such as paying staff and expanding your hardware platform when needed.

Without the experience of running a very similar application, you won't initially know whether you'll need more database, disk, or web-serving capacity. You won't know the difference it makes to put 8 GB of RAM into a database server compared to 2 GB. Premature optimization at the hardware level can be very dangerous in terms of both money and time. Start with a conservative platform consisting of commodity hardware.


Use a pre-built OS

It seems as though this should go without saying, but there is generally no need for startup application developers to compile their own OS. In many cases in the past, startup applications have focused a lot of initial effort on squeezing fewer and fewer cycles out of each operation by tuning kernel settings, removing kernel modules, and so on. If the gain you get is 10 percent and it takes an engineer a month, then, assuming your servers in that class cost less than 10 months of developer pay, you've wasted time and money. Only when you get to the scale where the time spent would save money (when you have enough servers that saving 10 percent of the capacity on each is worth more than the engineering time taken) is it sensible to start working at that level.

For virtually all applications, default kernels work fine. There are a couple of special cases where you might need specific kernel support outside of mainline builds (such as using Physical Address Extension (PAE), or a kernel HTTP server), but as a good rule of thumb when starting out, don't build your own kernel.


Use pre-built software

By the same token, it's usually a waste of time to build your own software. The versions of Apache, PHP, and Perl that ship with your OS will almost always be perfectly fine (the exception to this rule is MySQL, which is better obtained from

MySQL.com
). If you absolutely must have a different version, using the supplied binaries is a good idea. There's no reason to think that compiling MySQL yourself will get you any kind of gain. In fact, in the case of MySQL, you'll usually end up with a slower version when you compile it yourself. Precompiled binaries have already been tested by a huge group of developers, so you can leverage that work and avoid having to do anything yourself. You don't want to get into the position where you hit a bug in your application and have to wonder whether it's a bug in your software or in the server applications you might have mis-compiled.

2.6.1. Shared Hardware

After an application goes past the point of residing solely on your local machine, the next logical step is to use shared hardware. Shared hardware is usually leased from large providers, such as an ISP and hosting service, where a box is shared with many other users. This kind of platform is great for prototyping, development, and even small scale launches. Shared hosting is typically very cheap and you usually get what you pay for. If your application usesa database, then your performance is at the mercy of the other database users. Web server configuration and custom modules are not usually possible. Larger providers offer upgrade paths to move onto dedicated hardware when the time comes, making the transition to a full setup easier.

2.6.2. Dedicated Hardware

The next step up from using shared hardware is moving to dedicated hardware. The phrase "dedicated hardware" tends to be a little misleading, in that in addition to the hardware being dedicated to running your application, you're renting it from a provider who owns and maintains the hardware. With dedicated hardware, your contact still goes only as far as remotely logging in over SSH; you don't need to swap out disks, rack machines, and so on. Dedicated hosting comes in the full range from completely managed (you receive a user login and the host takes care of everything else) to completely unmanaged (you get a remote console and install an OS yourself).

Depending on the scale you want to grow to, a dedicated hardware platform is sometimes the most cost-effective. You don't need to have system administrators on your engineering team and you won't spend developer time on configuration tasks. However, the effectiveness of this setup very much relies on the working relationship between you and the host's network operations center (NOC) and staff. The level of service that hosts provide varies wildly, so it's definitely worth getting references from people you know who are doing similar things.

2.6.3. Co-Located Hardware

This kind of vendor will not last in the long term if you intend to create a really large application. The world's largest web applications require hundreds of thousands of servers, although you're probably not going to reach that scale. Along with the dedicated server model, you have two options. Small companies and startups usually opt to start with co-location. A co-location facility (or "colo") provides space, power, and bandwidth, while you provide the hardware and support.

The services provided by a colo can vary quite a bit. Some will do virtually nothing, while some provide server and service monitoring and will diagnose server issues with you over the phone. All facilities provide network monitoring and basic services such as rebooting a crashed server, although depending on your contract, such services might incur per-incident costs.

Choosing a colo is a big task and should not be taken lightly. While changing colos is certainly possible, it's a big pain that you'll almost certainly want to avoid. If you get stuck in a bad colo further down the line, the effort and cost involved in moving can be enough to dissuade you from ever moving again (a fact that some colos appear to bank on). As with hosting vendors, gather the opinions of other people who host their platforms at the colos you're interested in. In particular, make sure you talk to developers of applications at the same scale as your proposed application. Some colos specialize in small platforms and provide bad support for larger platforms, while some will only provide good service to large customers.

2.6.4. Self-Hosting

When you get to the point of having a few thousand servers, it's usually beneficial to start running your own data centers (DCs). This is a huge task, which usually involves designing purpose-built facilities, hiring 24-hour NOC and site operations staff, and having multiple redundant power grid connections, a massive uninterruptible power supply (UPS), power filtering and generation equipment, fire suppression, and multiple peering contracts with backbone carriers.

It can sometimes be tempting to self-host hardware on a small scale; getting a leased line into your offices and running servers from there seems simple enough. This is usually not a good idea and should probably be avoided. You will usually end up spending more money and having more problems than you would with other solutions. If you don't have a colo near you, consider hosting in a managed environment, or hiring a systems administrator who lives near a colo. Self-hosting can work well up to the point; then bandwidth gets too expensive (upstream bandwidth to a private location typically costs much more than downstream) or you suffer an outage. Being down for a few days because someone cut through a phone cable is annoying when it's your home connection, but crippling when it's your whole business.

Helping you to create your own DC is definitely outside the scope of this book, but hopefully one day your application will grow to the scale where it becomes a viable option.


Previous Page
Next Page