Basic Database Terminology

You may have noticed that you're already several pages into a database book and still haven't seen a whole bunch of jargon and technical terminology. In fact, I still haven't said anything at all about what "a database" actually looks like, even though we have a rough specification of how our sample database will be used. However, we're about to design that database, and then we'll begin implementing it, so we can't avoid terminology any longer. That's what this section is about. It describes some terms that come up throughout the book so that you'll be familiar with them. Fortunately, many relational database concepts are really quite simple. In fact, much of the appeal of relational databases stems from the simplicity of their foundational concepts.

Structural Terminology

Within the database world, MySQL is classified as a relational database management system (RDBMS). That phrase breaks down as follows:

The database (the "DB" in RDBMS) is the repository for the information you want to store, structured in a simple, regular fashion:
- The collection of data in a database is organized into tables.
- Each table is organized into rows and columns.
- Each row in a table is a record.
- Records can contain several pieces of information; each column in a table corresponds to one of those pieces.
The management system (the "MS") is the software that lets you use your data by allowing you to insert, retrieve, modify, or delete records.
The word "relational" (the "R") indicates a particular kind of DBMS, one that is very good at relating (that is, matching up) information stored in one table to information stored in another by looking for elements common to each of them. The power of a relational DBMS lies in its capability to pull data from those tables conveniently and to join information from related tables to produce answers to questions that can't be answered from individual tables alone.

Here's an example that shows how a relational database organizes data into tables and relates the information from one table to another. Suppose that you run a Web site that includes a banner-advertisement service. You contract with companies that want their ads displayed when people visit the pages on your site. Each time a visitor hits one of your pages, you serve an ad embedded in the page that is sent to the visitor's browser and assess the company a small fee. To represent this information, you maintain three tables (see Figure 1.1). One table, company, has columns for company name, number, address, and telephone number. Another table, ad, lists ad numbers, the number for the company that "owns" the ad, and the amount you charge per hit. The third table, hit, logs each ad hit by ad number and the date on which the ad was served.

Figure 1.1. Banner advertisement tables.

[View full size image]

Some questions can be answered using the information in a single table. To determine the number of companies you have contracts with, you need count only the rows in the company table. Similarly, to determine the number of hits during a given time period, only the hit table need be examined. Other questions are more complex, and it's necessary to consult multiple tables to determine the answers. For example, to determine how many times each of the ads for Pickles, Inc. was served on July 14, you'd use all three tables as follows:

Look up the company name (Pickles, Inc.) in the company table to find the company number (14).
Use the company number to find matching records in the ad table so that you can determine the associated ad numbers. There are two such ads, 48 and 101.
For each of the matched records in the ad table, use the ad number in the record to find matching records in the hit table that fall within the desired date range, and then count the number of matches. There are three matches for ad 48 and two matches for ad 101.

Sounds complicated! But that's just the kind of thing at which relational database systems excel. The complexity actually is somewhat illusory because each of the steps just described really amounts to little more than a simple matching operation: You relate one table to another by matching values from one table's rows to values in another table's rows. This same simple operation can be exploited in various ways to answer all kinds of questions: How many different ads does each company have? Which company's ads are most popular? How much revenue does each ad generate? What is the total fee for each company for the current billing period?

Now you know enough relational database theory to understand the rest of this book, and we don't have to go into Third Normal Form, Entity-Relationship Diagrams, and all that kind of stuff. (If you want to read about such things, I suggest you begin with the works of C.J. Date or E.F. Codd.)

Query Language Terminology

To communicate with MySQL, you use a language called SQL (Structured Query Language). SQL is today's standard database language, and all major database systems understand it. SQL supports many different kinds of statements, all designed to make it possible to interact with your database in interesting and useful ways.

As with any language, SQL can seem strange while you're first learning it. For example, to create a table, you need to tell MySQL what the table's structure should be. You and I might think of the table in terms of a diagram or picture, but MySQL doesn't, so you create the table by telling MySQL something like this:

CREATE TABLE company
(
    company_name CHAR(30),
    company_num  INT,
    address      CHAR(30),
    phone        CHAR(12)
);

Statements like that can be somewhat imposing when you're new to SQL, but you need not be a programmer to learn how to use SQL effectively. As you gain familiarity with the language, you'll look at CREATE TABLE in a different lightas an ally that helps you describe your information, not as just a weird bit of gibberish.

MySQL Architectural Terminology

When you use MySQL, you're actually using at least two programs, because MySQL operates using a client/server architecture:

The first program is the MySQL server, mysqld. The server runs on the machine where your databases are stored. It listens for client requests coming in over the network and accesses database contents according to those requests to provide clients with the information they ask for.
The other programs are client programs; they connect to the database server and issue queries to tell it what information they want.

Most MySQL distributions include the database server and several client programs. (If you use RPM packages on Linux, there are separate server and client RPM packages, so you should install both.) You use the clients according to the purposes you want to achieve. The one most commonly used is mysql, an interactive client that lets you issue queries and see the results. Two administrative clients are mysqldump, a backup program that dumps table contents into a file, and mysqladmin, which allows you to check on the status of the server and performs other administrative tasks such as telling the server to shut down. MySQL distributions include other clients as well. If you have application requirements for which none of the standard clients is suited, MySQL also provides a client-programming library so that you can write your own programs. The library is usable directly from C programs. If you prefer a language other than C, interfaces are available for several other languagesPerl, PHP, Python, Java, C++, and Ruby, to name a few.

Graphical MySQL Client Programs Are on the Way

The client programs I discuss in this book all are used from the command line. MySQL AB, the company behind MySQL, is busy creating a new set of client programs that have a graphical user interface (GUI). These provide point-and-click capabilities and should make MySQL even easier to use than it already is. Currently these programs are fairly new, which is why I don't cover them. However, early versions are available for download. You can get them by visiting http://dev.mysql.com/.

MySQL's client/server architecture has certain benefits:

The server provides concurrency control so that two users cannot modify the same record at the same time. All client requests go through the server, so the server sorts out who gets to do what, and when. If multiple clients want to access the same table at the same time, they don't all have to find and negotiate with each other. They just send their requests to the server and let it take care of determining the order in which the requests are performed.
You don't have to be logged in on the machine where your database is located. MySQL understands how to work in a networked environment, so you can run a client program from wherever you happen to be, and the client can connect to the server over the network. Distance isn't a factor; you can access the server from anywhere in the world. If the server is located on a computer in Australia, you can take your laptop computer on a trip to Iceland and still access your database. Does that mean anyone can get at your data, just by connecting to the Internet? No. MySQL includes a flexible security system, so you can allow access only to people who should have it. And you can make sure that those people are able to do only what they should. Perhaps Sally in the billing office should be able to read and update (modify) records, but Phil at the service desk should be able only to look at them. You can set each person's privileges accordingly. If you do want to run a self-contained system, just set the access privileges so that clients can connect only from the host on which the server is running.

Beginning with MySQL 4.0, you have another option for running the server. In addition to the usual mysqld server that is used in a client/server setting, MySQL includes the server as a library, libmysqld, that you can link into programs to produce standalone MySQL-based applications. This is called the "embedded server library" because it's embedded into individual applications. Use of the embedded server contrasts with the client/server approach in that no network is required. This makes it easier to create and package applications that can be distributed on their own with fewer assumptions about their external operational environment. On the other hand, it should be used only in situations where the embedded application is the only one that needs access to the databases managed by the server.

The Difference Between "MySQL" and "mysql"

To avoid confusion, I should point out that "MySQL" refers to the entire MySQL RDBMS and "mysql" is the name of a particular client program. They sound the same if you pronounce them, but they're distinguished here by capitalization and typeface differences.

Speaking of pronunciation, MySQL is pronounced "my-ess-queue-ell." We know this because the MySQL Reference Manual says so. On the other hand, depending on who you ask, SQL is pronounced "ess-queue-ell" or "sequel." This book assumes the pronunciation "ess-queue-ell," which is why it uses constructs such as "an SQL query" rather than "a SQL query."