8.2. Tools

Used together, the following tools can greatly enhance the effectiveness and ease of use of the performance tools described in previous chapters.

8.2.1. bash

bash is the default Linux command-line shell, and you most likely use it every time you interact with the Linux command line. bash has a powerful scripting language that is typically used to create shell scripts. However, the scripting language can also be called from the command line and enables you to easily automate some of the more tedious tasks during a performance investigation.

8.2.1.1 Performance-Related Options

bash provides a series of commands that can be used together to periodically run a particular command. Most Linux users have bash as their default shell, so just logging in to a machine or opening a terminal brings up a bash prompt. If you are not using bash, you can invoke it by typing bash.

After you have a bash command prompt, you can enter a series of bash scripting commands to automate the continuous execution of a particular command. This feature proves most useful when you need to periodically extract performance statistics using a particular command. These scripting options are described in Table 8-1.

Table 8-1. bash Runtime Scripting Options
Option
Explanation
while condition
This executes a loop until the condition is false.
do
This indicates the start of a loop.
done
This indicates the end of a loop.

bash is infinitely flexible and is documented in the bash man page. Although bash's complexity can be overwhelming, it is not necessary to master it all to put bash immediately to use.

8.2.1.2 Example Usage

Although some performance tools, such as vmstat and sar, periodically display updated performance statistics, other commands, such as ps and ifconfig, do not. bash can call commands such as ps and ifconfig to periodically display their statistics. For example, in Listing 8.1, we ask bash to do something in a while loop based on the condition TRue. Because the TRue command is always true, the while loop will never exit. Next, the commands that will be executed after each iteration start after the do command. These commands ask bash to sleep for one second and then run ifconfig to extract performance information about the eth0 controller. However, because we are only interested in the received packets, we grep output of ifconfig for the string "RX packets". Finally, we issue the done command to tell bash we are done with the loop. Because the TRue command always returns true, this entire loop will run forever unless we interrupt it with a <Ctrl-C>.

Listing 8.1.


[ezolt@wintermute tmp]$ while true; do sleep 1; /sbin/ifconfig eth0 | grep

"RX packets" ; done;



                         RX packets:2256178 errors:0 dropped:0 overruns:0 frame:0

                         RX packets:2256261 errors:0 dropped:0 overruns:0 frame:0

                         RX packets:2256329 errors:0 dropped:0 overruns:0 frame:0

                         RX packets:2256415 errors:0 dropped:0 overruns:0 frame:0

                         RX packets:2256459 errors:0 dropped:0 overruns:0 frame:0

...

With the bash script in Listing 8.1, you see network performance statistics updated every second. The same loop can be used to monitor other events by changing the ifconfig command to some other command, and the amount of time between updates can also be varied by changing the amount of sleep. This simple loop is easy to type directly into the command line and enables you to automate the display of any performance statistics that interest you.

8.2.2. tee

tee is a simple command that enables you to simultaneously save the standard output of a command to a file and display it. tee also proves useful when you want to save a performance tool's output and view it at the same time, such as when you are monitoring the performance statistics of a live system, but also storing them for later analysis.

8.2.2.1 Performance-Related Options

tee is invoked with the following command line:


<command> | tee [-a] [file]

tee takes the output provided by <command> and saves it to the specified file, but also prints it to standard output. If the -a option is specified, tee appends the output to the file instead of overwriting it.

8.2.2.2 Example Usage

Listing 8.2 shows tee being used to record the output of vmstat. As you can see, tee displays the output that vmstat has generated, but it also saves it in the file /tmp/vmstat_out. Saving the output of vmstat enables us to analyze or graph the performance data at a later date.

Listing 8.2.


[ezolt@localhost book]$ vmstat 1 5 | tee /tmp/vmstat_out

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----

 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa

 2  0 135832   3648  16112  95236    2    3    15    14   39   194  3  1 92  4

 0  0 135832   4480  16112  95236    0    0     0     0 1007  1014  7  2 91  0

 1  0 135832   4480  16112  95236    0    0     0     0 1002   783  6  2 92  0

 0  0 135832   4480  16112  95236    0    0     0     0 1005   828  5  2 93  0

 0  0 135832   4480  16112  95236    0    0     0     0 1056   920  7  3 90  0

tee is a simple command, but it is powerful because it makes it easy to record the output of a given performance tool.

8.2.3. script

The script command is used to save all the input and output generated during a shell session into a text file. This text file can be used later to both replay the executed commands and review the results. When investigating a performance problem, it is useful to have a record of the exact command lines executed so that you can later review the exact tests you performed. Having a record of the executed commands means that you also can easily cut and paste the command lines when investigating a different problem. In addition, it is useful to have a record of the performance results so that you can review them later when looking for new insights.

8.2.3.1 Performance-Related Options

script is a relatively simple command. When run, it just starts a new shell and records all the keystrokes and input and the output generated during the life of the shell into a text file. script is invoked with the following command line:


script [-a] [-t] [file]

By default, script places all the output into a file called typescript unless you specify a different one. Table 8-2 describes some of the command-line options of script.

Table 8-2. script Command-Line Options
Option
Explanation
-a
Appends the script output to the file instead of overwriting it.
-t
Adds timing information about the amount of time between each output/input. This prints the number of characters displayed and the amount of time elapsed between the display of each group of characters.
file
Name of the output file.

One word of warning: script literally captures every type of output that was sent to the screen. If you have colored or bold output, this shows up as esc characters within the output file. These characters can significantly clutter the output and are not usually useful. If you set the TERM environmental variable to dumb (using setenv TERM dumb for csh-based shells and export TERM=dumb for sh-based shells), applications will not output the escape characters. This provides a more readable output.

In addition, the timing information provided by script clutters the output. Although it can be useful to have automatically generated timing information, it may be easier to not use script's timing, and instead just time the important commands with the time command mentioned in the previous chapter.

8.2.3.2 Example Usage

As stated previously, we will have more readable script output if we set the terminal to dumb. We can do that with the following command:


[ezolt@wintermute manuscript]$ export TERM=dumb

Next, we actually start the script command. Listing 8.3 shows script being started with an output file of ps_output. script continues to record the session until you exit the shell with the exit command or a <Ctrl-D>.

Listing 8.3.




[ezolt@wintermute manuscript]$ script ps_output

Script started, file is ps_output

[ezolt@wintermute manuscript]$ ps

  PID TTY          TIME CMD

 4285 pts/1    00:00:00 bash

 4413 pts/1    00:00:00 ps

[ezolt@wintermute manuscript]$ Script done, file is ps_output

Next, in Listing 8.4, we look at the output recorded by script. As you can see, it contains all the commands and output that we generated.

Listing 8.4.


[ezolt@wintermute manuscript]$ cat ps_output

Script started on Wed Jun 16 20:43:35 2004

[ezolt@wintermute manuscript]$ ps

  PID TTY          TIME CMD



 4285 pts/1    00:00:00 bash

 4413 pts/1    00:00:00 ps

[ezolt@wintermute manuscript]$

Script done on Wed Jun 16 20:43:41 2004

script is a great command to accurately record all interaction during a session. The files that script generates are tiny compared to the size of modern hard drives. Recording a performance investigation session and saving it for later review is always a good idea. At worst, it is a small amount of wasted effort and disk space to record the session. At best, the saved sessions can be looked at later and do not require you to rerun the commands recorded in that session.

8.2.4. watch

By default, the watch command runs a command every second and displays its output on the screen. watch is useful when working with performance tools that do not periodically display updated results. For example, some tools, such as ifconfig and ps, display the current performance statistics and then exit. Because watch periodically runs these commands and displays their output, it is possible to see by glancing at the screen which statistics are changing and how fast they are changing.

8.2.4.1 Performance-Related Options

watch is invoked with the following command line:


watch [-d[=cumulative]] [-n sec] <command>

If called with no parameters, watch just displays the output of the given command every second until you interrupt it. In the default output, it can often be difficult to see what has changed from screen to screen, so watch provides options that highlight the differences between each output. This can make it easier to spot the differences in output between each sample. Table 8-3 describes the command-line options that watch accepts.

Table 8-3. watch Command-Line Options
Option
Explanation
-d[=cumulative]
This option highlights the output that has changed between each sample. If the cumulative option is used, an area is highlighted if it has ever changed, not just if it has changed between samples.
-n sec
The number of seconds to wait between updates.

watch is a great tool to see how a performance statistic changes over time. It is not a complicated tool, but does its job well. It really fills a void when using performance tools that cannot periodically display updated output. When using these tools, you can run watch in a window and glance at it periodically to see how the statistic changes.

8.2.4.2 Example Usage

The first example, in Listing 8.5, shows watch being run with the ps command. We are asking ps to show us the number of minor faults that each process is generating. watch clears the screen and updates this information every 10 seconds. Note that it may be necessary to enclose the command that you want to run in quotation marks so that watch does not confuse the options of the command that you are trying to execute with its own options.

Listing 8.5.


[ezolt@wintermute ezolt]$ watch  -n 10 "ps -o minflt,cmd"



Every 10s: ps -o minflt,cmd

Wed Jun 16 08:33:21 2004



MINFLT CMD



  1467 bash



    41 watch -n 1 ps -o minflt,cmd



    66 ps -o minflt,cmd

watch is a tool whose basic function could easily be written as a simple shell script. However, watch is easier than using a shell script because it is almost always available and just works. Remember that performance tools such as ifconfig or ps display statistics only once, whereas watch makes it easier to follow (with only a glance) how the statistics change.

8.2.5. gnumeric

When investigating a performance problem, the performance tools often generate vast amounts of performance statistics. It can sometimes be problematic to sort through this data and find the trends and patterns that demonstrate how the system is behaving. Spreadsheets in general, and gnumeric in particular, provide three different aspects that make this task easier. First, gnumeric provides built-in functions, such as max, min, average, and standard deviation, which enable you to numerically analyze the performance data. Second, gnumeric provides a flexible way to import the tabular text data commonly output by many performance tools. Finally, gnumeric provides a powerful graphing utility that can visualize the performance data generated by the performance tools. This can prove invaluable when searching for data trends over long periods of time. It is also especially useful when looking for correlations between different types of data (such as the correlation between disk I/O and CPU usage). It is often hard to see patterns in text output, but in graphical form, the system's behavior can be much clearer. Other spreadsheets, such as OpenOffice's oocalc, could also be used, but gnumeric's powerful text importer and graphing tools make it the easiest to use.

8.2.5.1 Performance-Related Options

To use a spreadsheet to assist in performance analysis, just follow these steps:

1.	Save performance data into a text file.
2.	Import the text file into `gnumeric`.
3.	Analyze or graph the data.

gnumeric can generate many different types of graphs and has many different functions to analyze data. The best way to see gnumeric's power and flexibility is to load some data and experiment with it.

8.2.5.2 Example Usage

To demonstrate the usefulness of the gnumeric, we first have to generate performance data that we will graph or analyze. Listing 8.6 asks vmstat to generate 100 seconds of output and save that information in a text file called vmstat_output. This data will be loaded into gnumeric. The -n option tells vmstat to print only header information once (rather than after every screenful of information).

Listing 8.6.


[ezolt@nohs ezolt]$ vmstat -n 1 100 > vmstat_output

Next, we start gnumeric using the following command:


[ezolt@nohs ezolt]$ gnumeric &

This opens a blank spreadsheet where we can import the vmstat data.

Selecting File > Open in gnumeric brings up a dialog (not shown) that enables you to select both the file to open and the type of file. We select Text Import (Configurable) for file type, and we are guided through a series of screens to select which columns of the vmstat_output file map to which columns of the spreadsheet. For vmstat, it is useful to start importing at the second line of text, because the second line contains the names and sizing appropriate for each column. It is also useful to select Fixed-Width for importing the data because that is how vmstat outputs its data. After successfully importing the data, we see the spreadsheet in Figure 8-1.

Figure 8-1.

Next, we graph the data that we have imported. In Figure 8-2, we create a stacked graph of the different CPU usages (us, sys, id, wa). Because these statistics should always total 100 percent (or close to it), we can see which state dominates at each time. In this case, the system is idle most of the time, but it has a big amount of wait time in the first quarter of the graph.

Figure 8-2.

Graphs can be a powerful way to see how the performance statistics of a single run of a test change over time. It can also prove useful to see how different runs compare to each other. When graphing data from different runs, be sure to use the same scale for each of the graphs. This allows you to compare and contrast the data more easily.

gnumeric is a lightweight application that enables you to quickly and easily import and graph/analyze vast amounts of performance data. It is a great tool to play around with performance data to see whether any interesting characteristics appear.

8.2.6. ldd

ldd can be used to display which libraries a particular binary relies on. ldd helps track down the location of a library function that an application may be using. By figuring out all the libraries that an application is using, it is possible to search through each of them for the library that contains a given function.

8.2.6.1 Performance-Related Options

ldd is invoked with the following command line:


ldd <binary>

ldd then displays a list of all the libraries that this binary requires and which files in the system are fulfilling those requirements.

8.2.6.2 Example Usage

Listing 8.7 shows ldd being used on the ls binary. In this particular case, we can see that ls relies on the following libraries: linux-gate.so.1, librt.so.1, libacl.so.1, libselinux.so.1, libc.so.6, libpthread.so.0, ld-linux.so.2, and libattr.so.1.

Listing 8.7.


[ezolt@localhost book]$ ldd /bin/ls

        linux-gate.so.1 =>  (0x00dfe000)

        librt.so.1 => /lib/tls/librt.so.1 (0x0205b000)

        libacl.so.1 => /lib/libacl.so.1 (0x04983000)



        libselinux.so.1 => /lib/libselinux.so.1 (0x020c0000)

        libc.so.6 => /lib/tls/libc.so.6 (0x0011a000)

        libpthread.so.0 => /lib/tls/libpthread.so.0 (0x00372000)

        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x00101000)

        libattr.so.1 => /lib/libattr.so.1 (0x03fa4000)

ldd is a relatively simple tool, but it can be invaluable when trying to track down exactly which libraries an application is using and where they are located on the system.

8.2.7. objdump

objdump is a complicated and powerful tool for analyzing various aspects of binaries or libraries. Although it has many other capabilities, it can be used to determine which functions a given library provides.

8.2.7.1 Performance-Related Options

objdump is invoked with the following command line:


objdump -T <binary>

When object is invoked with the -T option, it displays all the symbols that this library/binary either relies on or provides. These symbols can be data structures or functions. Every line of the objdump output that contains .text is a function that this binary provides.

8.2.7.2 Example Usage

Listing 8.8 shows objdump used to analyze the gtk library. Because we are only interested in the symbols that libgtk.so provides, we use fgrep to prune the output to only those lines that contain .text. In this case, we can see that some of the functions that libgtk.so provides are gtk_arg_values_equal, gtk_tooltips_set_colors, and gtk_viewport_set_hadjustment.

Listing 8.8.


[ezolt@localhost book]$ objdump -T /usr/lib/libgtk.so | fgrep .text

0384eb60 l    d  .text  00000000

0394c580 g    DF .text  00000209  Base        gtk_arg_values_equal

0389b630 g    DF .text  000001b5  Base        gtk_signal_add_emission_hook_full

0385cdf0 g    DF .text  0000015a  Base        gtk_widget_restore_default_style

03865a20 g    DF .text  000002ae  Base        gtk_viewport_set_hadjustment

03929a20 g    DF .text  00000112  Base        gtk_clist_columns_autosize

0389d9a0 g    DF .text  000001bc  Base        gtk_selection_notify

03909840 g    DF .text  000001a4  Base        gtk_drag_set_icon_pixmap

03871a20 g    DF .text  00000080  Base        gtk_tooltips_set_colors

038e6b40 g    DF .text  00000028  Base        gtk_hseparator_new

038eb720 g    DF .text  0000007a  Base        gtk_hbutton_box_set_layout_default

038e08b0 g    DF .text  000003df  Base        gtk_item_factory_add_foreign

03899bc0 g    DF .text  000001d6  Base        gtk_signal_connect_object_while_alive



....

When using performance tools (such as ltrace), which display the library functions an application calls (but not the libraries themselves), objdump helps locate the shared library each function is present in.

8.2.8. GNU Debugger (gdb)

gdb is a powerful application debugger that can help investigate many different aspects of a running application. gdb has three features that make it a valuable tool when diagnosing performance problems. First, gdb can attach to a currently running process. Second, gdb can display a backtrace for that process, which shows the current source line and the call tree. Attaching to a process and extracting a backtrace can be a quick way to find some of the more obvious performance problems. However, if the application is not stuck in a single location, it may be hard to diagnose the problem using gdb, and a system-wide profiler, such as oprofile, is a much better choice. Finally, gdb can map a virtual address back to a particular function. gdb may do a better job of figuring out the location of the virtual address than performance tools. For example, if oprofile gives information about where events occur in relation to a virtual address rather than a function name, gdb can be used to figure out the function for that address.

8.2.8.1 Performance-Related Options

gdb is invoked with the following command line, in which pid is the process that gdb will attach to:


gdb -p pid

After gdb has attached to the process, it enters an interactive mode in which you can examine the current execution location and runtime variables for the given process. Table 8-4 describes one of the commands that you can use to examine the running process.

Table 8-4. gdb Runtime Options
Option
Explanation
bt
This shows the backtrace for the currently executing process.

gdb has many more command-line options and runtime controls that are more appropriate for debugging rather than a performance investigation. See the gdb man page or type help at the gdb prompt for more information.

8.2.8.2 Example Usage

To examine how gdb works, it is useful to demonstrate it on a simple test application. The program in Listing 8.9 just calls function a() from main and spins in an infinite loop. The program will never exit, so when we attach to it with gdb, it should always be executing the infinite loop in function a().

Listing 8.9.


void a(void)

{

  while(1);

}



main()

{

  a();

}

Listing 8.10 launches the application and attaches to its pid with gdb. We ask gdb to generate a backtrace, which shows us exactly what code is currently executing and, what set of function calls leads to the current location. As expected, gdb shows us that we were executing the infinite loop in a(), and that this was called from main().

Listing 8.10.


[ezolt@wintermute examples]$ ./chew &

[2] 17389

[ezolt@wintermute examples]$ gdb -p 17389

GNU gdb Red Hat Linux (5.3.90-0.20030710.41rh)

Copyright 2003 Free Software Foundation, Inc.

GDB is free software, covered by the GNU General Public License, and you are

welcome to change it and/or distribute copies of it under certain conditions.

Type "show copying" to see the conditions.

There is absolutely no warranty for GDB. Type "show warranty" for details.

This GDB was configured as "i386-redhat-linux-gnu".

Attaching to process 17389

Reading symbols from /usr/src/perf/utils/examples/chew...done.

Using host libthread_db library

"/lib/tls/libthread_db.so.1".

Reading symbols from /lib/tls/libc.so.6...done.

Loaded symbols for /lib/tls/libc.so.6



Reading symbols from /lib/ld-linux.so.2...done.

Loaded symbols for /lib/ld-linux.so.2

a () at chew.c:3

3         while(1);

(gdb) bt

#0  a () at chew.c:3

#1  0x0804832f in main () at chew.c:8

Finally, in Listing 8.11, we ask gdb to show us where the virtual address 0x0804832F is located, and gdb shows that that address is part of the function main.

Listing 8.11.


(gdb) x 0x0804832f

0x804832f <main+21>: 0x9090c3c9

gdb is an extraordinarily powerful debugger and can be helpful during a performance investigation. It is even helpful after the performance problem has been identified, when you need to determine exactly why a particular code path was taken.

8.2.9. gcc (GNU Compiler Collection)

gcc is the most popular compiler used by Linux systems. Like all compilers, gcc takes source code (such as C, C++, or Objective-C) and generates binaries. It provides many options to optimize the resultant binary, as well as options that make it easier to track the performance of an application. The details of gcc's performance optimization options are not covered in this book, but you should investigate them when trying to increase an application's performance. gcc provides performance optimization options that enable you to tune the performance of compiled binaries using architecture generic optimizations (using -01, -02, -03), architecture-specific optimizations (-march and -mcpu), and feedback-directed optimization (using -fprofile-arcs and -fbranch-probabilities). More details on each of the optimization options are provided in the gcc man page.

8.2.9.1 Performance-Related Options

gcc can be invoked in its most basic form as follows:


gcc [-g level] [-pg] -o prog_name source.c

gcc has an enormous number of options that influence how it compiles an application. If you feel brave, take a look at them in the gcc man page. The particular options that can help during a performance investigation are shown in Table 8-5.

Table 8-5. gcc Command-Line Options
Option
Explanation
-g[1 | 2 | 3]
The -g option adds debugging information to the binary with a default level of 2. If a level is specified, gcc adjusts the amount of debugging information stored in the binary. Level 1 provides only enough information to generate backtraces, but no information on the source-line mappings of particular lines of code. Level 3 provides more information than level 2, such as the macro definitions present in the source.
-pg
This turns on application profiling.

Many performance investigation tools, such as oprofile, require an application to be compiled with debugging information to map performance information back to a particular line of application source code. They will generally still work without the debugging information, but if debugging is enabled, they will provide richer information. Application profiling was described in more detail in a previous chapter.

8.2.9.2 Example Usage

Probably the best way to understand the type of debugging information that gcc can provide is to see a simple example. In Listing 8.12, we have the source for the C application, deep.c, which just calls a series of functions and then prints out the string "hi" a number of times depending on what number was passed in. The application's main function calls function a(), which calls function b() and then prints out "hi".

Listing 8.12.


void b(int count)

{

  int i;

  for (i=0; i<count;i++)

    {printf("hi\n");}

}



void a(int count)

{

  b(count);

}



int main()

{



  a(10);

}

First, as shown in Listing 8.13, we compile the application without any debugging information. We then start the application in the debugger and add a breakpoint to the b() function. When we run the application, it stops at function b(), and we ask for a backtrace. gdb can figure out the backtrace, but it does not know what values were passed between functions or where the function exists in the original source file.

Listing 8.13.


[ezolt@wintermute utils]$ gcc -o deep deep.c



[ezolt@wintermute utils]$ gdb ./deep

...

(gdb) break b

Breakpoint 1 at 0x804834e

(gdb) run

Starting program: /usr/src/perf/utils/deep

(no debugging symbols found)...(no debugging symbols found)...

Breakpoint 1, 0x0804834e in b ()

(gdb) bt

#0  0x0804834e in b ()

#1  0x08048389 in a ()

#2  0x080483a8 in main ()

In Listing 8.14, we compile the same application with debugging information turned on. Now when we run gdb and generate a backtrace, we can see which values were passed to each function call and the exact line of source where a particular line of code resides.

Listing 8.14.


[ezolt@wintermute utils]$ gcc -g -o deep deep.c

[ezolt@wintermute utils]$ gdb ./deep

..

(gdb) break b

Breakpoint 1 at 0x804834e: file deep.c, line 3.

(gdb) run

Starting program: /usr/src/perf/utils/deep



Breakpoint 1, b (count=10) at deep.c:3

3         for (i=0; i<count;i++)

(gdb) bt

#0  b (count=10) at deep.c:3

#1  0x08048389 in a (count=10) at deep.c:9

#2  0x080483a8 in main () at deep.c:14

Debugging information can significantly add to the size of the final executable that gcc generates. However, the information that it provides is invaluable when tracking a performance problem.

< Day Day Up >