4.1. CryptographyCryptography is a mathematical science used to secure storage and transmission of data. The process involves two steps: encryption transforms information into unreadable data, and decryption converts unreadable data back into a readable form. When cryptography was first used, confidentiality was achieved by keeping the transformation algorithms secret, but people figured out those algorithms. Today, algorithms are kept public and well documented, but they require a secret piece of information; a key, to hide and reveal data. Here are three terms you need to know:
Cryptography aims to achieve four goals:
The point of cryptography is to make it easy to hide (encrypt) information yet make it difficult and time consuming for anyone without the decryption key to decrypt encrypted information. No one technique or algorithm can be used to achieve all the goals listed above. Instead, several concepts and techniques have to be combined to achieve the full effect. There are four important concepts to cover:
Do not be intimidated by the large number of encryption methods in use. Mathematicians are always looking for better and faster methods, making the number constantly grow. You certainly do not need to be aware of the inner details of these algorithms to use them. You do, however, have to be aware of legal issues that accompany them:
4.1.1. Symmetric EncryptionSymmetric encryption (also known as private-key encryption or secret-key encryption) is a fast encryption method that uses a single key to encrypt and decrypt data. On its own it offers data confidentiality (and to some extent, authentication) provided the parties involved in communication safely exchange the secret key in advance. An example of the use of symmetric encryption is shown in Figure 4-1. Figure 4-1. Symmetric encryption exampleHere are six commonly used symmetric encryption algorithms:
A best encryption algorithm does not exist. All algorithms from the list have been thoroughly researched and are considered to be technically secure. Other issues that need to be taken into consideration are the interoperability, key length, speed, and legal issues. The key-length argument renders DES and 3DES (for new implementations) obsolete. It is widely believed that the minimum secure key length for symmetric encryption today is 80 bits. Encryption of at least 128 bits is recommended for all new applications. Having been adopted as a standard by the U.S. government, AES is the closest to being the algorithm of choice. Symmetric encryption has inherent problems that show up as soon as the number of parties involved is increased to more than two:
In spite of these problems, a major advantage to symmetric encryption is its speed, which makes it the only choice when large amounts of data need to be encrypted (for storage or transmission). 4.1.2. Asymmetric EncryptionAsymmetric encryption (also known as public-key encryption) tries to solve the problems found in symmetric encryption algorithms. Instead of one secret key, public-key encryption requires two keys, one of which is called a public key and the other a private key. The two keys, the encryption algorithm, and the decryption algorithm are mathematically related: information encrypted with a public key can be decrypted (using the same algorithm) only if the private key is known. The reverse also holds: data encrypted using the private key can be decrypted only with the public key. The key names give away their intended usage. The public key can be distributed freely to everyone. Whoever is in the possession of the public key can use the key and the corresponding encryption algorithm to encrypt a message that can only be decrypted by the owner of the private key that corresponds to the public key. This is illustrated in Figure 4-2, in which Bob encrypts a message using Alice's public key and sends the result to Alice. (The names Alice and Bob are commonly used in explanations related to cryptography. For more information, read the corresponding Wikipedia entry: http://en.wikipedia.org/wiki/Alice_and_Bob.) Alice then decrypts the message using her private key. Figure 4-2. Asymmetric encryption exampleThere exists another use for the private key. When information is encrypted with a private key, anyone (anyone with access to the public key, that is) can decrypt it with the public key. This is not as useless as it may seem at first glance. Because no key other than the public key can unlock the message, the recipient is certain the encrypted message was sent from the private-key owner. This technique of encrypting with a private key, illustrated in Figure 4-3, is known as a digital signature because it is the equivalent of a real signature in everyday life. Figure 4-3. Alice sends Bob a message he can verify came from herHere are three asymmetric encryption methods in use today:
Public-key encryption does have a significant drawback: it is much slower than symmetric encryption, so even today's computers cannot use this type of encryption alone and achieve acceptably fast communication speeds. Because of this, it is mostly used to digitally sign small amounts of data. Public-key cryptography seems to solve the scalability problem we mentioned earlier. If every person has a two-key pair, anyone on the Internet will be able to communicate securely with anyone else. One problem remains, which is the problem of key distribution. How do you find someone's public key? And how do you know the key you have really belongs to them? I will address these issues in a moment. 4.1.3. One-Way EncryptionOne-way encryption is the process performed by certain mathematical functions that generate "random" output when given some data on input. These functions are called hash functions or message digest functions. The word hash is used to refer to the output produced by a hash function. Hash functions have the following attributes:
Hash functions have two common uses. One is to store some information without storing the data itself. For example, hash functions are frequently used for safe password storage. Instead of storing passwords in plaintextwhere they can be accessed by whoever has access to the systemit is better to store only password hashes. Since the same password always produces the same hash, the system can still perform its main functionpassword verificationbut the risk of user password database compromise is gone. The other common use is to quickly verify data integrity. (You may have done this, as shown in Chapter 2, when you verified the integrity of the downloaded Apache distribution.) If a hash output is provided for a file, the recipient can calculate the hash himself and compare the result with the provided value. A difference in values means the file was changed or corrupted. Hash functions are free of usage, export, or patent restrictions, and that led to their popularity and unrestricted usage growth. Here are three popular hash functions:
Today, it is believed a hash function should produce output at least 160 bits long. Therefore, the SHA-1 algorithm is recommended as the hash algorithm of choice for new applications. 4.1.4. Public-Key InfrastructureEncryption algorithms alone are insufficient to verify someone's identity in the digital world. This is especially true if you need to verify the identity of someone you have never met. Public-key infrastructure (PKI) is a concept that allows identities to be bound to certificates and provides a way to verify that certificates are genuine. It uses public-key encryption, digital certificates, and certificate authorities to do this. 4.1.4.1 Digital certificatesA digital certificate is an electronic document used to identify an organization, an individual, or a computer system. It is similar to documents issued by governments, which are designed to prove one thing or the other (such as your identity, or the fact that you have passed a driving test). Unlike hardcopy documents, however, digital certificates can have an additional function: they can be used to sign other digital certificates. Each certificate contains information about a subject (the person or organization whose identity is being certified), as well as the subject's public key and a digital signature made by the authority issuing the certificate. There are many standards developed for digital certificates, but X.509 v3 is almost universally used (the popular PGP encryption protocol being the only exception). A digital certificate is your ID in the digital world. Unlike the real world, no organization has exclusive rights to issue "official" certificates at this time (although governments will probably start issuing digital certificates in the future). Anyone with enough skill can create and sign digital certificates. But if everyone did, digital certificates would not be worth much. It is like me vouching for someone I know. Sure, my mother is probably going to trust me, but will someone who does not know me at all? For certificates to have value they must be trusted. You will see how this can be achieved in the next section. 4.1.4.2 Certificate authoritiesA certificate authority (CA) is an entity that signs certificates. If you trust a CA then you will probably trust the certificate it signed, too. Anyone can be a CA, and you can even sign your own certificate (we will do exactly that later). There are three kinds of certificates:
I have mentioned that digital certificates can be used to sign other digital certificates. This is what CAs do. They have one very important certificate, called the root certificate, which they use to sign other people's certificates. CAs sign their own root certificates and certificates from trusted authorities are accepted as valid. Such certificates are distributed with software that uses them (e.g., web browsers). A partial list of authorities accepted by my browser, Mozilla 1.7, is given in Figure 4-4. (I added the Apache Security CA, whose creation is shown later in this chapter, after importing into the browser the root certificate for it.) Figure 4-4. A list of certificate authorities accepted by Mozilla 1.74.1.4.3 Web of trustIdentity validation through certificate authorities represents a well-organized identity verification model. A small number of trusted certificate authorities have the last word in saying who is legitimate. Another approach to identity verification is to avoid the use of authorities, and base verification on a distributed, peer-to-peer operation where users' identities are confirmed by other users. This is how a web of trust is formed. It is a method commonly used among security-conscious computer users today. This is how the web of trust works:
A web of trust example is given in Figure 4-5. Figure 4-5. There are two trust paths from Alice to DaveThe web of trust is difficult but not impossible to achieve. As long as every person in the chain ensures the next person is who he claims he is, and as long as every member remains vigilant, there is a good chance of success. However, misuse is possible and likely. That is why the user of the web of trust must decide what trust means in each case. Having one path from one person to another is good, but having multiple independent paths is better. The web of trust concept is well suited for use by individuals and by programs like PGP (Pretty Good Privacy) or GnuPG. You can find out more about the web of trust concept in the GnuPG documentation:
4.1.5. How It All Falls into PlaceNow that we have the basic elements covered, let's examine how these pieces fall into place:
If you want to continue studying cryptography, read Applied Cryptography by Bruce Schneier (Wiley), considered to be a major work in the field. |