< Day Day Up > |
8.2. ToolsUsed together, the following tools can greatly enhance the effectiveness and ease of use of the performance tools described in previous chapters. 8.2.1. bashbash is the default Linux command-line shell, and you most likely use it every time you interact with the Linux command line. bash has a powerful scripting language that is typically used to create shell scripts. However, the scripting language can also be called from the command line and enables you to easily automate some of the more tedious tasks during a performance investigation. 8.2.1.1 Performance-Related Optionsbash provides a series of commands that can be used together to periodically run a particular command. Most Linux users have bash as their default shell, so just logging in to a machine or opening a terminal brings up a bash prompt. If you are not using bash, you can invoke it by typing bash. After you have a bash command prompt, you can enter a series of bash scripting commands to automate the continuous execution of a particular command. This feature proves most useful when you need to periodically extract performance statistics using a particular command. These scripting options are described in Table 8-1.
bash is infinitely flexible and is documented in the bash man page. Although bash's complexity can be overwhelming, it is not necessary to master it all to put bash immediately to use. 8.2.1.2 Example UsageAlthough some performance tools, such as vmstat and sar, periodically display updated performance statistics, other commands, such as ps and ifconfig, do not. bash can call commands such as ps and ifconfig to periodically display their statistics. For example, in Listing 8.1, we ask bash to do something in a while loop based on the condition TRue. Because the TRue command is always true, the while loop will never exit. Next, the commands that will be executed after each iteration start after the do command. These commands ask bash to sleep for one second and then run ifconfig to extract performance information about the eth0 controller. However, because we are only interested in the received packets, we grep output of ifconfig for the string "RX packets". Finally, we issue the done command to tell bash we are done with the loop. Because the TRue command always returns true, this entire loop will run forever unless we interrupt it with a <Ctrl-C>. Listing 8.1.[ezolt@wintermute tmp]$ while true; do sleep 1; /sbin/ifconfig eth0 | grep "RX packets" ; done; RX packets:2256178 errors:0 dropped:0 overruns:0 frame:0 RX packets:2256261 errors:0 dropped:0 overruns:0 frame:0 RX packets:2256329 errors:0 dropped:0 overruns:0 frame:0 RX packets:2256415 errors:0 dropped:0 overruns:0 frame:0 RX packets:2256459 errors:0 dropped:0 overruns:0 frame:0 ... With the bash script in Listing 8.1, you see network performance statistics updated every second. The same loop can be used to monitor other events by changing the ifconfig command to some other command, and the amount of time between updates can also be varied by changing the amount of sleep. This simple loop is easy to type directly into the command line and enables you to automate the display of any performance statistics that interest you. 8.2.2. teetee is a simple command that enables you to simultaneously save the standard output of a command to a file and display it. tee also proves useful when you want to save a performance tool's output and view it at the same time, such as when you are monitoring the performance statistics of a live system, but also storing them for later analysis. 8.2.2.1 Performance-Related Optionstee is invoked with the following command line: <command> | tee [-a] [file] tee takes the output provided by <command> and saves it to the specified file, but also prints it to standard output. If the -a option is specified, tee appends the output to the file instead of overwriting it. 8.2.2.2 Example UsageListing 8.2 shows tee being used to record the output of vmstat. As you can see, tee displays the output that vmstat has generated, but it also saves it in the file /tmp/vmstat_out. Saving the output of vmstat enables us to analyze or graph the performance data at a later date. Listing 8.2.[ezolt@localhost book]$ vmstat 1 5 | tee /tmp/vmstat_out procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 2 0 135832 3648 16112 95236 2 3 15 14 39 194 3 1 92 4 0 0 135832 4480 16112 95236 0 0 0 0 1007 1014 7 2 91 0 1 0 135832 4480 16112 95236 0 0 0 0 1002 783 6 2 92 0 0 0 135832 4480 16112 95236 0 0 0 0 1005 828 5 2 93 0 0 0 135832 4480 16112 95236 0 0 0 0 1056 920 7 3 90 0 tee is a simple command, but it is powerful because it makes it easy to record the output of a given performance tool. 8.2.3. scriptThe script command is used to save all the input and output generated during a shell session into a text file. This text file can be used later to both replay the executed commands and review the results. When investigating a performance problem, it is useful to have a record of the exact command lines executed so that you can later review the exact tests you performed. Having a record of the executed commands means that you also can easily cut and paste the command lines when investigating a different problem. In addition, it is useful to have a record of the performance results so that you can review them later when looking for new insights. 8.2.3.1 Performance-Related Optionsscript is a relatively simple command. When run, it just starts a new shell and records all the keystrokes and input and the output generated during the life of the shell into a text file. script is invoked with the following command line: script [-a] [-t] [file] By default, script places all the output into a file called typescript unless you specify a different one. Table 8-2 describes some of the command-line options of script. One word of warning: script literally captures every type of output that was sent to the screen. If you have colored or bold output, this shows up as esc characters within the output file. These characters can significantly clutter the output and are not usually useful. If you set the TERM environmental variable to dumb (using setenv TERM dumb for csh-based shells and export TERM=dumb for sh-based shells), applications will not output the escape characters. This provides a more readable output. In addition, the timing information provided by script clutters the output. Although it can be useful to have automatically generated timing information, it may be easier to not use script's timing, and instead just time the important commands with the time command mentioned in the previous chapter. 8.2.3.2 Example UsageAs stated previously, we will have more readable script output if we set the terminal to dumb. We can do that with the following command: [ezolt@wintermute manuscript]$ export TERM=dumb Next, we actually start the script command. Listing 8.3 shows script being started with an output file of ps_output. script continues to record the session until you exit the shell with the exit command or a <Ctrl-D>. Listing 8.3.[ezolt@wintermute manuscript]$ script ps_output Script started, file is ps_output [ezolt@wintermute manuscript]$ ps PID TTY TIME CMD 4285 pts/1 00:00:00 bash 4413 pts/1 00:00:00 ps [ezolt@wintermute manuscript]$ Script done, file is ps_output Next, in Listing 8.4, we look at the output recorded by script. As you can see, it contains all the commands and output that we generated. Listing 8.4.[ezolt@wintermute manuscript]$ cat ps_output Script started on Wed Jun 16 20:43:35 2004 [ezolt@wintermute manuscript]$ ps PID TTY TIME CMD 4285 pts/1 00:00:00 bash 4413 pts/1 00:00:00 ps [ezolt@wintermute manuscript]$ Script done on Wed Jun 16 20:43:41 2004 script is a great command to accurately record all interaction during a session. The files that script generates are tiny compared to the size of modern hard drives. Recording a performance investigation session and saving it for later review is always a good idea. At worst, it is a small amount of wasted effort and disk space to record the session. At best, the saved sessions can be looked at later and do not require you to rerun the commands recorded in that session. 8.2.4. watchBy default, the watch command runs a command every second and displays its output on the screen. watch is useful when working with performance tools that do not periodically display updated results. For example, some tools, such as ifconfig and ps, display the current performance statistics and then exit. Because watch periodically runs these commands and displays their output, it is possible to see by glancing at the screen which statistics are changing and how fast they are changing. 8.2.4.1 Performance-Related Optionswatch is invoked with the following command line: watch [-d[=cumulative]] [-n sec] <command> If called with no parameters, watch just displays the output of the given command every second until you interrupt it. In the default output, it can often be difficult to see what has changed from screen to screen, so watch provides options that highlight the differences between each output. This can make it easier to spot the differences in output between each sample. Table 8-3 describes the command-line options that watch accepts. watch is a great tool to see how a performance statistic changes over time. It is not a complicated tool, but does its job well. It really fills a void when using performance tools that cannot periodically display updated output. When using these tools, you can run watch in a window and glance at it periodically to see how the statistic changes. 8.2.4.2 Example UsageThe first example, in Listing 8.5, shows watch being run with the ps command. We are asking ps to show us the number of minor faults that each process is generating. watch clears the screen and updates this information every 10 seconds. Note that it may be necessary to enclose the command that you want to run in quotation marks so that watch does not confuse the options of the command that you are trying to execute with its own options. Listing 8.5.[ezolt@wintermute ezolt]$ watch -n 10 "ps -o minflt,cmd" Every 10s: ps -o minflt,cmd Wed Jun 16 08:33:21 2004 MINFLT CMD 1467 bash 41 watch -n 1 ps -o minflt,cmd 66 ps -o minflt,cmd watch is a tool whose basic function could easily be written as a simple shell script. However, watch is easier than using a shell script because it is almost always available and just works. Remember that performance tools such as ifconfig or ps display statistics only once, whereas watch makes it easier to follow (with only a glance) how the statistics change. 8.2.5. gnumericWhen investigating a performance problem, the performance tools often generate vast amounts of performance statistics. It can sometimes be problematic to sort through this data and find the trends and patterns that demonstrate how the system is behaving. Spreadsheets in general, and gnumeric in particular, provide three different aspects that make this task easier. First, gnumeric provides built-in functions, such as max, min, average, and standard deviation, which enable you to numerically analyze the performance data. Second, gnumeric provides a flexible way to import the tabular text data commonly output by many performance tools. Finally, gnumeric provides a powerful graphing utility that can visualize the performance data generated by the performance tools. This can prove invaluable when searching for data trends over long periods of time. It is also especially useful when looking for correlations between different types of data (such as the correlation between disk I/O and CPU usage). It is often hard to see patterns in text output, but in graphical form, the system's behavior can be much clearer. Other spreadsheets, such as OpenOffice's oocalc, could also be used, but gnumeric's powerful text importer and graphing tools make it the easiest to use. 8.2.5.1 Performance-Related OptionsTo use a spreadsheet to assist in performance analysis, just follow these steps:
gnumeric can generate many different types of graphs and has many different functions to analyze data. The best way to see gnumeric's power and flexibility is to load some data and experiment with it. 8.2.5.2 Example UsageTo demonstrate the usefulness of the gnumeric, we first have to generate performance data that we will graph or analyze. Listing 8.6 asks vmstat to generate 100 seconds of output and save that information in a text file called vmstat_output. This data will be loaded into gnumeric. The -n option tells vmstat to print only header information once (rather than after every screenful of information). Listing 8.6.[ezolt@nohs ezolt]$ vmstat -n 1 100 > vmstat_output Next, we start gnumeric using the following command: [ezolt@nohs ezolt]$ gnumeric & This opens a blank spreadsheet where we can import the vmstat data. Selecting File > Open in gnumeric brings up a dialog (not shown) that enables you to select both the file to open and the type of file. We select Text Import (Configurable) for file type, and we are guided through a series of screens to select which columns of the vmstat_output file map to which columns of the spreadsheet. For vmstat, it is useful to start importing at the second line of text, because the second line contains the names and sizing appropriate for each column. It is also useful to select Fixed-Width for importing the data because that is how vmstat outputs its data. After successfully importing the data, we see the spreadsheet in Figure 8-1. Figure 8-1.Next, we graph the data that we have imported. In Figure 8-2, we create a stacked graph of the different CPU usages (us, sys, id, wa). Because these statistics should always total 100 percent (or close to it), we can see which state dominates at each time. In this case, the system is idle most of the time, but it has a big amount of wait time in the first quarter of the graph. Figure 8-2.Graphs can be a powerful way to see how the performance statistics of a single run of a test change over time. It can also prove useful to see how different runs compare to each other. When graphing data from different runs, be sure to use the same scale for each of the graphs. This allows you to compare and contrast the data more easily. gnumeric is a lightweight application that enables you to quickly and easily import and graph/analyze vast amounts of performance data. It is a great tool to play around with performance data to see whether any interesting characteristics appear. 8.2.6. lddldd can be used to display which libraries a particular binary relies on. ldd helps track down the location of a library function that an application may be using. By figuring out all the libraries that an application is using, it is possible to search through each of them for the library that contains a given function. 8.2.6.1 Performance-Related Optionsldd is invoked with the following command line: ldd <binary> ldd then displays a list of all the libraries that this binary requires and which files in the system are fulfilling those requirements. 8.2.6.2 Example UsageListing 8.7 shows ldd being used on the ls binary. In this particular case, we can see that ls relies on the following libraries: linux-gate.so.1, librt.so.1, libacl.so.1, libselinux.so.1, libc.so.6, libpthread.so.0, ld-linux.so.2, and libattr.so.1. Listing 8.7.[ezolt@localhost book]$ ldd /bin/ls linux-gate.so.1 => (0x00dfe000) librt.so.1 => /lib/tls/librt.so.1 (0x0205b000) libacl.so.1 => /lib/libacl.so.1 (0x04983000) libselinux.so.1 => /lib/libselinux.so.1 (0x020c0000) libc.so.6 => /lib/tls/libc.so.6 (0x0011a000) libpthread.so.0 => /lib/tls/libpthread.so.0 (0x00372000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x00101000) libattr.so.1 => /lib/libattr.so.1 (0x03fa4000) ldd is a relatively simple tool, but it can be invaluable when trying to track down exactly which libraries an application is using and where they are located on the system. 8.2.7. objdumpobjdump is a complicated and powerful tool for analyzing various aspects of binaries or libraries. Although it has many other capabilities, it can be used to determine which functions a given library provides. 8.2.7.1 Performance-Related Optionsobjdump is invoked with the following command line: objdump -T <binary> When object is invoked with the -T option, it displays all the symbols that this library/binary either relies on or provides. These symbols can be data structures or functions. Every line of the objdump output that contains .text is a function that this binary provides. 8.2.7.2 Example UsageListing 8.8 shows objdump used to analyze the gtk library. Because we are only interested in the symbols that libgtk.so provides, we use fgrep to prune the output to only those lines that contain .text. In this case, we can see that some of the functions that libgtk.so provides are gtk_arg_values_equal, gtk_tooltips_set_colors, and gtk_viewport_set_hadjustment. Listing 8.8.[ezolt@localhost book]$ objdump -T /usr/lib/libgtk.so | fgrep .text 0384eb60 l d .text 00000000 0394c580 g DF .text 00000209 Base gtk_arg_values_equal 0389b630 g DF .text 000001b5 Base gtk_signal_add_emission_hook_full 0385cdf0 g DF .text 0000015a Base gtk_widget_restore_default_style 03865a20 g DF .text 000002ae Base gtk_viewport_set_hadjustment 03929a20 g DF .text 00000112 Base gtk_clist_columns_autosize 0389d9a0 g DF .text 000001bc Base gtk_selection_notify 03909840 g DF .text 000001a4 Base gtk_drag_set_icon_pixmap 03871a20 g DF .text 00000080 Base gtk_tooltips_set_colors 038e6b40 g DF .text 00000028 Base gtk_hseparator_new 038eb720 g DF .text 0000007a Base gtk_hbutton_box_set_layout_default 038e08b0 g DF .text 000003df Base gtk_item_factory_add_foreign 03899bc0 g DF .text 000001d6 Base gtk_signal_connect_object_while_alive .... When using performance tools (such as ltrace), which display the library functions an application calls (but not the libraries themselves), objdump helps locate the shared library each function is present in. 8.2.8. GNU Debugger (gdb)gdb is a powerful application debugger that can help investigate many different aspects of a running application. gdb has three features that make it a valuable tool when diagnosing performance problems. First, gdb can attach to a currently running process. Second, gdb can display a backtrace for that process, which shows the current source line and the call tree. Attaching to a process and extracting a backtrace can be a quick way to find some of the more obvious performance problems. However, if the application is not stuck in a single location, it may be hard to diagnose the problem using gdb, and a system-wide profiler, such as oprofile, is a much better choice. Finally, gdb can map a virtual address back to a particular function. gdb may do a better job of figuring out the location of the virtual address than performance tools. For example, if oprofile gives information about where events occur in relation to a virtual address rather than a function name, gdb can be used to figure out the function for that address. 8.2.8.1 Performance-Related Optionsgdb is invoked with the following command line, in which pid is the process that gdb will attach to: gdb -p pid After gdb has attached to the process, it enters an interactive mode in which you can examine the current execution location and runtime variables for the given process. Table 8-4 describes one of the commands that you can use to examine the running process.
gdb has many more command-line options and runtime controls that are more appropriate for debugging rather than a performance investigation. See the gdb man page or type help at the gdb prompt for more information. 8.2.8.2 Example UsageTo examine how gdb works, it is useful to demonstrate it on a simple test application. The program in Listing 8.9 just calls function a() from main and spins in an infinite loop. The program will never exit, so when we attach to it with gdb, it should always be executing the infinite loop in function a(). Listing 8.9.void a(void) { while(1); } main() { a(); } Listing 8.10 launches the application and attaches to its pid with gdb. We ask gdb to generate a backtrace, which shows us exactly what code is currently executing and, what set of function calls leads to the current location. As expected, gdb shows us that we were executing the infinite loop in a(), and that this was called from main(). Listing 8.10.
[ezolt@wintermute examples]$ ./chew &
[2] 17389
[ezolt@wintermute examples]$ gdb -p 17389
GNU gdb Red Hat Linux (5.3.90-0.20030710.41rh)
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu".
Attaching to process 17389
Reading symbols from /usr/src/perf/utils/examples/chew...done.
Using host libthread_db library
"/lib/tls/libthread_db.so.1".
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
a () at chew.c:3
3 while(1);
(gdb) bt
#0 a () at chew.c:3
#1 0x0804832f in main () at chew.c:8
Finally, in Listing 8.11, we ask gdb to show us where the virtual address 0x0804832F is located, and gdb shows that that address is part of the function main. Listing 8.11.(gdb) x 0x0804832f 0x804832f <main+21>: 0x9090c3c9 gdb is an extraordinarily powerful debugger and can be helpful during a performance investigation. It is even helpful after the performance problem has been identified, when you need to determine exactly why a particular code path was taken. 8.2.9. gcc (GNU Compiler Collection)gcc is the most popular compiler used by Linux systems. Like all compilers, gcc takes source code (such as C, C++, or Objective-C) and generates binaries. It provides many options to optimize the resultant binary, as well as options that make it easier to track the performance of an application. The details of gcc's performance optimization options are not covered in this book, but you should investigate them when trying to increase an application's performance. gcc provides performance optimization options that enable you to tune the performance of compiled binaries using architecture generic optimizations (using -01, -02, -03), architecture-specific optimizations (-march and -mcpu), and feedback-directed optimization (using -fprofile-arcs and -fbranch-probabilities). More details on each of the optimization options are provided in the gcc man page. 8.2.9.1 Performance-Related Optionsgcc can be invoked in its most basic form as follows: gcc [-g level] [-pg] -o prog_name source.c gcc has an enormous number of options that influence how it compiles an application. If you feel brave, take a look at them in the gcc man page. The particular options that can help during a performance investigation are shown in Table 8-5. Many performance investigation tools, such as oprofile, require an application to be compiled with debugging information to map performance information back to a particular line of application source code. They will generally still work without the debugging information, but if debugging is enabled, they will provide richer information. Application profiling was described in more detail in a previous chapter. 8.2.9.2 Example UsageProbably the best way to understand the type of debugging information that gcc can provide is to see a simple example. In Listing 8.12, we have the source for the C application, deep.c, which just calls a series of functions and then prints out the string "hi" a number of times depending on what number was passed in. The application's main function calls function a(), which calls function b() and then prints out "hi". Listing 8.12.void b(int count) { int i; for (i=0; i<count;i++) {printf("hi\n");} } void a(int count) { b(count); } int main() { a(10); } First, as shown in Listing 8.13, we compile the application without any debugging information. We then start the application in the debugger and add a breakpoint to the b() function. When we run the application, it stops at function b(), and we ask for a backtrace. gdb can figure out the backtrace, but it does not know what values were passed between functions or where the function exists in the original source file. Listing 8.13.
[ezolt@wintermute utils]$ gcc -o deep deep.c
[ezolt@wintermute utils]$ gdb ./deep
...
(gdb) break b
Breakpoint 1 at 0x804834e
(gdb) run
Starting program: /usr/src/perf/utils/deep
(no debugging symbols found)...(no debugging symbols found)...
Breakpoint 1, 0x0804834e in b ()
(gdb) bt
#0 0x0804834e in b ()
#1 0x08048389 in a ()
#2 0x080483a8 in main ()
In Listing 8.14, we compile the same application with debugging information turned on. Now when we run gdb and generate a backtrace, we can see which values were passed to each function call and the exact line of source where a particular line of code resides. Listing 8.14.
[ezolt@wintermute utils]$ gcc -g -o deep deep.c
[ezolt@wintermute utils]$ gdb ./deep
..
(gdb) break b
Breakpoint 1 at 0x804834e: file deep.c, line 3.
(gdb) run
Starting program: /usr/src/perf/utils/deep
Breakpoint 1, b (count=10) at deep.c:3
3 for (i=0; i<count;i++)
(gdb) bt
#0 b (count=10) at deep.c:3
#1 0x08048389 in a (count=10) at deep.c:9
#2 0x080483a8 in main () at deep.c:14
Debugging information can significantly add to the size of the final executable that gcc generates. However, the information that it provides is invaluable when tracking a performance problem. |
< Day Day Up > |