< Day Day Up > |
Traditionally the borne shell (known as the sh program) has been used as the de-facto standard among different UNIX systems. However, the borne again shell (known as the bash program) has replaced the standard borne shell, as bash is more advanced and is backward compatible with the scripts written for the borne shell. On the Linux operating system, a symbolic link /bin/sh is created to the /bin/bash program, hence causing the bash program to be invoked whenever either sh or bash is executed. It should be understood that in this book, every reference to a shell script means a bash shell script. If other scripting languages are referenced, they will be explicitly mentioned.
The shell scripts can be executed in different ways. On Linux, every user needs a shell to interact with the operating system. If the bash shell is used as the default login shell for a user, then every command (or program or script) executed by the user in that session is executed by the bash shell. If the user is executing a script without specifying the shell to execute it, then bash assumes that it is a bash shell script, and the script is validated against the bash scripting language syntax. If the script does not conform to the bash scripting syntax, then error messages will be sent to the standard output. For example, if your login shell is bash shell, and you attempt to run a Korn shell script without specifying that Korn shell should execute the script, then bash shell will throw errors indicating the syntax rule violations. The same rule applies if Korn shell is your login shell and you attempt to execute a bash shell script without specifying that bash should execute the script.
For a program or script to execute, the user should have execute permission on the executable file or should belong to the group that owns the file and has execute permission on it. In addition, the shell needs to be specified where the file is located. This can be done in different ways. If the executable file name is specified with the qualifying pathname (i.e., the full directory name containing the file), the shell can execute the program. If the file is located in one of the directories specified in the PATH environment variable, the shell searches for the file in the directories in the order in which the directory names are specified in the PATH variable. The search stops at a point when the file is first located in one of the PATH directories, and the file is executed. If the file is located in the current directory (or in the parent directory of the current directory), it should be executed with commands similar to the following.
$ ./<execfile> $ ../<execfile>
In the first example, the execfile is assumed to be in the current directory, and in the second example, it is assumed to be in the parent directory of the current directory. If files from a particular directory are frequently executed, then it may be a good idea to add that directory to the PATH variable. Before adding a directory to the PATH directory list, it is always good to check if it is already set in order to avoid duplicate entries. To see the directories set in the PATH variable, type the command shown below at the shell prompt.
$ echo $PATH
The echo command throws the value stored in the environment variable (PATH variable in this case) to the standard output. To access the contents of a shell variable, the variable should be prefixed with the $ symbol. The output of the above command might look something like the one shown below.
/usr/local/bin:/usr/bin:/usr/X11R6/bin:/sbin:/bin:/usr/sbin
The value set in the PATH command may be very long; for presentation sake only a few directories are shown here. To add a directory (say /usr/satya/bin) at the end of the PATH variable, the following command should be executed.
$ export PATH=$PATH:/usr/satya/bin
It can be noticed that the new directory is separated from the current value of the PATH variable with the : delimiter, and the new value of the PATH variable is evaluated from the righthand side of the expression and set to the PATH variable. The export command makes the new value of the variable available in the current shell session and all the child processes that will be created by the current shell session. The export command can be separately executed after setting the variable value as shown below.
$ PATH=$PATH:/usr/satya/bin $ export PATH
It should also be noted that whitespaces and tabs are not permitted while setting the PATH variable. The new directory may also be added at the beginning of the PATH variable (preceding the list of all the current directories set in the variable), as shown below.
$ export PATH=/usr/satya/bin:$PATH
Usually there is no need to add custom directories before the system PATH, due to possible malfunctioning of the system or a risk of compromising system security. If it is an absolute requirement that your own directory should precede the system PATH, you should ensure that the specific directory should have very restricted access permissions. The value set to the environment variable in a particular shell session is available only in that session. To make the change permanent for a user, the changes must be made in the login shell script, as described in the following subsection.
At this point, you will need to understand an important feature of the shell, and that is, the precedence rule that the shell follows in order to identify which program to execute while executing a command line. There are built-in commands supported by the shell. In addition, there are commands that are individual programs (or application executables) and shell scripts. These are all executable files, and some of them might have same names as others; there will not be any name conflicts as long as they are uniquely identified by their respective inodes. Also, the bash shell supports functions within shell scripts, which might have the same names as one of the executable files on the system. This raises the question about the precedence rule that the shell should follow when it encounters two or more executable entities (commands, programs, scripts, and functions) having the same name. The precedence rule is: first functions, then built-in commands, then scripts, and finally executable application programs. However, the precedence can be overridden, as is explained later in the chapter.
The terminal-based programs usually interact with the standard input (the keyboard) for receiving the user input and the standard output (the terminal window) for displaying output to the user. Errors are displayed to the standard error stream. Although the standard input stream is identified by the file descriptor 0, the standard output stream is identified by the file descriptor 1, and the standard error stream is identified by the file descriptor 2. This can be noticed from the device descriptions in the /dev directory, which contains entries for stdin (standard input), stdout (standard output), and stderr (standard error) streams as fd/0, fd/1, and fd/2, respectively.
However, it is likely that the input might be available in a file, or the output is required to be saved to a file. This is performed by the I/O redirection mechanism. The < redirection symbol represents input redirection from a file or another input stream. Similarly, the > redirection symbol represents output redirection to a file or another stream. Using pipes, the standard output of a command can be redirected to the standard input of another command. The following examples demonstrate these techniques.
$ ls –l | more $ ls –l > listing.txt
The first line shows that the output of the ls –l command is redirected to the more command; so that the output is paginated and can be scrolled using the standard more command features. The second command shows that the output is redirected to a file and hence is not displayed on the screen. Later, the file can be viewed using the more command or edited using a text editor. When the > redirection symbol is used, only the standard output is redirected to the specified file or stream. Typically, the program streams its output to the standard output (file descriptor 1) and streams errors to the standard error (file descriptor 2). In order to redirect the standard error to the same stream as the standard output, one of the following forms of the command should be executed.
$ ls –l &> listing.txt $ ls –l > listing.txt 2>&1
Here the ls –l command is shown as an example; however, the same concept can be used to capture standard output and/or standard error messages to files/streams. When the > redirection symbol is used, the corresponding file or stream is opened for writing, and the existing contents of the stream are erased automatically. However, the >> redirection stream can be used to append to the stream (if it already exists) or create the stream (if it does not exist). If the >> redirection symbol is used to capture both the standard output and standard error streams, then the second form of the syntax (from the two forms shown before) should be used, and the first form does not work. Usually, applications that write message and error log files use the >> redirection symbol to append to the existing file, and the file accumulates messages over a period of time (e.g., a week or month) and then it is closed, and the next file is opened for the next collection period.
Similar to the output redirection, the input can also be redirected using command syntax similar to that shown here. In the example, it is assumed that process_listing is a program that reads from standard input and can be redirected to read from the file listing.txt.
$ process_listing < listing.txt
So far, the discussion has focused on how to execute a program or shell script from the shell prompt. However, a program or script may be invoked from another shell script. It is also possible to introduce controlled execution of the programs, check for error codes returned by called programs, and modularize the scripts by grouping code into functions. All these topics are discussed in the subsections that follow.
As noted previously, users are associated with a default login shell, which is set in the user profile where the user login id is created. This information is stored in the /etc/passwd file and can only be updated by the super user root. If you desire to have a particular shell as the default one permanently, you can get this done by the super user. Otherwise, from one shell prompt, another shell can be invoked by just executing that program. It should also be noted that the Linux operating system has many system-level user ids created during installation automatically, and these user ids usually do not permit anyone to log in to the command-line mode. Therefore they do not have an associated login shell in the /etc/passwd file.
The shell can be invoked in one of the three modes: as an interactive login shell, an interactive non-login shell, or a noninteractive shell. When users log in to their default shells, typically the shell is invoked in the first of the three modes, and a set of shell variables and options are preset and available to the user. Some of them are predefined in the shell, and most of them are configurable in the shell login scripts. The default name of the user-specific login script for bash is either .bash_profile or .bash_login or .profile, and is located in the user’s home directory, such as /home/Michael/.bash_profile or /home/Michael/.bash_login or /home/Michael/.profile for a user id named Michael. When the user logs into the system, the shell first executes the default shell profile script /etc/profile. This is the master file (to be executed by the shell every time a user logs into the operating system) and should only be edited carefully by the system administrator. Errors in this file could potentially bar every user from logging in to the system. The /etc/profile typically contains commands to set the default shell variables, including the default PATH setting. After the standard settings are done, the script may execute additional scripts to add some application-specific directories to the PATH value. The process of making these additional settings may be different across different Linux distributions. On Red Hat Linux 8.0/9.0 and SuSE Linux 8.1/8.2, these additional settings are described in individual scripts under the /etc/profile.d directory. For example, every Linux distribution usually comes with a Java2 SDK (or at least the Java Runtime Environment), and therefore includes a script that determines the Java/JRE bin directory and adds it to the standard PATH. If it is necessary to add a new directory to the PATH variable across the system (for all the users), just create a new script with ‘.sh’ extension, place it in the /etc/profile.d directory, and change the file permissions to include execute permission for everyone. However, if a setting is exclusive to a user and not intended to be valid for all the users system-wide, then such settings should not be placed in this directory; rather they should be placed in the user’s login/session configuration script as mentioned in the description that follows.
Once the default login script is executed, the shell attempts to locate the customized (user-specific) login file such as .bash_profile, .bash_login, or .profile in the home directory of the user, in the same order. If the shell is able to locate at least one of these files, then the commands contained in that script (the first of the three files found) are executed. If none of these files is present in the user’s home directory, then the login process ends with the default login script itself.
Every time a new session of the shell is invoked in the second mode (i.e., interactive non-login mode), the shell attempts to run the session configuration script .bashrc, if it exists in the user’s home directory. The .bashrc file contains additional/updated settings of environment variables including the PATH. The purpose of this file is to provide settings that should be done when an instance of the shell is created for user interaction. If this file does not exist for a user, there is no harm, as the login script was already executed when the user logged in. The shell is invoked in the third mode when executing a script noninteractively. In this mode, shell does not use the value of the PATH environmental variable to search for the file. Therefore, it is always necessary to qualify the executable file with full pathname.
The shell maintains a history of previously used commands, which can be displayed by executing the history command, as shown in examples below.
$ history $ history 25
When the history command is executed with no arguments, all the commands currently stored in the memory are displayed. If a numeric argument is provided as in the second example, it displays only the most recent requested item of history commands. The sole purpose of maintaining the history of commands is to improve user productivity by providing a way to navigate through the command list and also to move the cursor from left to right within a single command, very similar to the way a text file is navigated; in this analogy, the history file may be viewed as a text file, and each command is viewed as a line in the text file. In order to do a productive navigation of the history commands, we can choose our favorite text editor as template, so that we can use that editor’s navigation and edit commands to choose a particular command and modify it before execution. By default, the Emacs editor is enabled to edit the commands at command mode. But this can be changed to the vi editor by typing the following command at the shell prompt.
$ set –o vi
The size of the history memory, the number of commands that should be saved for future sessions, and the name of the file that contains the saved history commands can be controlled by a set of predefined environment variables, as described in Table 4.1, along with several other shell variables. Linux application developers very commonly use the variables described in this table. As mentioned earlier, the shell variables must be accessed through the $ notation, as in $HOME to access the variable HOME. The best way to check values of these variables is to issue an echo command at the command prompt, as shown here.
$ echo $<variable>
Shell Variable |
Description |
---|---|
HOME |
This variable contains the home directory of the user currently logged in. For example, if a user id “Michael” is logged in, the HOME variable contains something like /home/Michael (assuming that the user home directories are setup under the /home directory. |
SECONDS |
The value of this variable changes dynamically and represents the number of seconds passed since the shell was invoked. This value is pertained to the specific shell session from where the variable is checked. |
BASH |
This variable contains the fully qualified program name (directory and executable file name) that represents the shell. Typically it is /bin/bash. |
PWD |
The value of this variable changes dynamically and represents the current directory. Every time you change to a new directory with the cd command, the value displayed by PWD is changed to contain the current directory. The letters PWD represent ‘present working directory.’ The command pwd executed at the shell prompt also gives the same result. |
OLDPWD |
This variable represents the directory value stored in PWD before the most recent execution of cd command. The value of this variable also changes dynamically every time the cd command is executed along with the PWD variable. |
EDITOR |
This variable contains the fully qualified program name of the text editor you would prefer to use, such as /usr/bin/vim to represent the vi editor. It is a variable that facilitates an application to identify the preferred text editor of the user. |
LINES |
This variable contains an integer representing the number of lines that are displayed in one full screen of the current command-line window. The default value is usually 24. If the window is stretched or reduced, the value changes accordingly. |
COLUMNS |
This variable contains an integer representing the number of columns that are displayed in one full screen of the current command-line window. The default value is usually 80. If the window is stretched or reduced, the value changes accordingly. |
TERM |
This variable contains a value that represents the terminal type being used in the current command-line window. The terminal type influences the response of the function keys and certain key sequences, and this information is used by applications (developed using the curses library, which was originally developed on the UNIX/Linux systems to enable programmers to develop character-based user interfaces) that print formatted output to the terminal (or command-line window). Linux systems typically provide the ‘xterm’ terminal type, which is very convenient and friendly with the curses-based applications. However, after the inception of GUI-based applications, the character-mode applications and data entry screens are only limited to specific output devices designed/required to work with textual screens. |
HISTFILE |
The history command displays a list of previously used commands at the shell prompt so that we can easily recapitulate (or retrieve and re-execute) complex commands used before. The HISTFILE variable stores the fully qualified name of the file that stores the history of commands. The default file is $HOME/.bash_history. When the current shell session is terminated, the commands stored in the memory are saved to this file. |
HISTFILESIZE |
This variable represents the size of the $HISTFILE, in terms of number of commands to be saved. When the current shell session is terminated, only the most recent $HISTFILESIZE number of commands are saved to the $HISTFILE. |
HISTSIZE |
This variable represents the number of commands that should be remembered in the current shell session so that they are retrieved when the history command is executed. Every time a new command is executed, it is added to this list. However, if the list is going to exceed $HISTSIZE value when adding the new command, then the oldest command is removed from the list. Also, the list is refreshed from the $HISTFILE when a new shell session is started. The $HISTSIZE value is different from the $HISTFILESIZE value in that the former controls the history list size in memory while the latter controls the size of the history list saved to the file. |
SHELL |
This variable represents the fully qualified program name of the shell that is being executed in the current command line window. It is typically /bin/bash for the bash shell. |
It should be noted that additional application-level variables could be defined and used by developers in their own shell scripts, which is the normal practice.
Command prompt variables PS1 and PS2 provide a way to customize the command prompt string. PS1 is the primary string, and PS2 is the secondary string used as the command prompt. When a shell session is initiated by invoking bash (graphically or at the command prompt), the command prompt primary string is displayed. This is somewhat similar to the C:> prompt in the command prompt window of the Windows operating system. By setting the PS1 variable to a desired setting and style, the command prompt can be customized. In order to make the new command prompt active automatically for every bash session, the setting should be made in the $HOME/.bashrc file or the $HOME/.bash_profile file, as the case may be. There are default commands that the shell interprets to evaluate the command prompt strings. The commands should be enclosed in double quotes. The following examples will make the concept clearer.
$ export PS1="\d ==>" $ export PS1="\t ==>" $ export PS1="\u@\h \W ==>" $ export PS1="\s \v ==>"
These settings would give command prompts as shown here respectively.
Sat Dec 28 ==> 21:46:17 ==> satya@etslinux LinuxBook ==> bash 2.05b ==>
From the examples, it is clear that the \d command prints the current date in ‘Day Mon dd’ format, the \t command prints the current time in ‘HH:MM:SS’ (24 hour) format, \u command prints the current user, \h command prints the hostname, \W command prints the current base directory name (not the complete path), the \s command prints the shell, and the \v command prints the current version of bash shell. It should be noted that the commands shown here are only a subset of a complete set of commands available, and the user is encouraged to refer to other documentation and manual pages for additional commands or more details. The default value of the secondary command prompt PS2 is > and can be set to a desired value. This is used to let the user know that the command line typed at the primary command prompt is incomplete before pressing the ENTER key. At the secondary command prompt string PS2, the user can continue typing the command that was started at the primary command prompt, complete the command, and finally press the ENTER key. If the command is too long, the system keeps showing the secondary prompt string as long as the command is not complete. The bash shell automatically determines when to display the secondary command prompt based on the command-line text being entered.
Command aliasing is a technique used to give alias names to commands. This is generally useful for (and is typically used by) people who are familiar with another operating system and are trying familiarize themselves with Linux. Because Linux was very rich in command-line mode before it became a desktop operating system, and as Linux’s origin is UNIX and the majority of the commands are short-named, cryptic, and not easily remembered by a new user, providing an alias name to a command is very helpful. An alias name is meaningful to the user yet can shorten a very lengthy compound command involving several pipes and arguments. The example shown below makes this concept easily understood.
$ alias MechPay="gen_paystub_summary.sh | grep 'Mechanic'"
In a scenario where the accounting department of a machine shop generates pay stubs for its entire staff every month, the gen_paystub_summary.sh script executes the necessary programs and finally generates the summary for the entire staff for the month. The output of the script is directed to the grep command, which extracts all those lines that represent payment to the Mechanic type staff. As shown in the example, the entire command can be aliased with the short name MechPay, so that the user does not have to remember the command. Whenever the user needs to type this entire command, it is sufficient to type the alias alone as a command. Because aliases are usually meant to be specific for a single user or a group of users, they are usually set in the user login or session configuration script and available to the user every time a shell session is initiated. Although aliases make the job very easy for users, the disadvantage of using too many aliases is that the user may develop the tendency not to master any of the commands. However, the user makes the ultimate decision of whether to attempt mastering the rich command set of the operating system or to depend on the alias names to make the day-to-day job more comfortable. The command unalias will reset an alias name to null, as shown here.
$ unalias MechPay
Earlier, we discussed the fact that files representing the shell scripts need to have execute permission set at least at one of the user categories (owner, group, or public). This can be done using the chmod command, as shown in the examples below.
$ chmod +x <script name> $ chmod 755 <script name>
In the first example, the execute permission is added to every category of user. In the second example, the owner gets read, write, and execute permissions and everyone else gets read and execute permissions. More details on the file permissions may be obtained from Chapter 2 Linux for Windows Programmers.
Shell scripts are nothing more than a group of statements of different types. These can be assignment statements that assign values to variables, execution commands that perform specific tasks, flow control constructs that introduce the control logic to the script, or functions that add the necessary modularity to the script. In addition, the scripts also contain commented lines to provide script level documentation. The commented lines start with the # character. Some of these concepts were discussed earlier, and the rest of the discussion in this section is going to expand them further.
Variables are used to store values. They form an important feature of any programming language. It has been discussed earlier that the shell provides a number of built-in variables having special meaning. In addition to the concepts of built-in variables and command execution as discussed in the previous section, the bash shell provides a very powerful programming environment that includes/permits user-defined variables and functions, flow-control and looping constructs, ability to handle error conditions and process signals, and so on. Using these features, we can build very useful (and powerful) scripts that are interpreted by the bash shell. In fact, the entire Linux operating system and its processes are heavily dependent on the shell scripts, whether the bash shell, the Korn shell, or another shell. For example, the installation programs used to install most of the popular software use the shell scripts internally. Usually, the user does not need to modify or view these installation scripts; however, if a particular script fails to execute properly due to inconsistencies between the script contents and the system environment, knowing the concepts of shell scripts and the ability to understand them could help the user to debug the installation and probably fix the problem instead of waiting to receive support from the vendor. If the particular software is freeware or shareware, most likely the users would not get any free support and might have to fix the problem themselves anyway. In a way, all Linux users are expected to understand Linux shell scripts to some extent, if not to the expert level. With the advent of GUI-based IDEs, users may not have to write scripts, but they need to be able to understand the scripts written by others.
Very similar to the pre-built environment variables discussed so far, the shell permits user-defined variables to be used in simple user sessions or in scripts. Variables do not have to be declared before setting their values. If we do not set a value to any variable, the shell assumes that the variable does not exist. By default, the values stored in the variables are treated as character strings. However, when the variables are used in arithmetic expressions, and the format understood by the arithmetic expressions, then they are treated as arithmetic variables. A running shell script is identified as a process. When a process invokes another shell script, a sub-process is initiated by the main process. Variables declared in the parent process are not visible to the sub-process unless they are exported in the main process. This is the reason we execute the export command after setting a value to a variable. For example, if we set a value to a variable at the command prompt and do not export the variable, the subsequent execution of the echo command will not be able to identify that variable. This is because the shell session running the command prompt window is itself a process and initiates a sub-process when the echo command is executed. So variables set in the shell session are not visible to the commands executed in the same shell session unless they are exported using the export command.
As mentioned earlier, shell variable values are interpreted as strings, and they are accessed through the $ notation as in $VAR to access the value stored in the VAR variable. However, there are a few operators that can be used—along with the pair of curly braces {}— while accessing the shell variables, in order to facilitate additional functionality. To check if a variable VAR is defined or not, the ${VAR:+value} expression is used; if the variable defined, then the ‘value’ is returned, otherwise null is returned. Similarly when the ${VAR:-defaultvalue} expression is used, the shell checks if the VAR variable is defined. If the variable is defined, then the value of the variable is returned by the expression. If the variable is not defined, the ‘defaultvalue’ is returned by the expression but not assigned to the variable. However, when the ${VAR:=defaultvalue} expression is used, the shell returns the variable value if it is defined, otherwise sets the VAR variable with the ‘defaultvalue’. The expression ${VAR:<index>} returns the substring starting at the ‘index’ position from the beginning of the VAR variable, if it is defined. Similarly, the expression ${VAR:<index>:<len>} returns the substring starting at the ‘index’ position and of length ‘len’. In both the cases, null is returned if the VAR is not defined, and the string is considered to be zero-index based. The expression ${#VAR} returns the length of the variable content, which is the number of characters in the string value of the variable.
The expression ${VAR#pattern} checks if the evaluated pattern matches at the beginning of the variable value, then the shortest matching value is removed from the variable and the resultant string is returned. On the other hand, the expression ${VAR##pattern} returns the string remaining after removing the longest matching pattern value in the variable. In both these cases, the pattern may contain wildcard characters such as * and ?. When the * wildcard character is used, it actually implies any number of occurrences of any characters where * is positioned within the pattern. For example, the pattern di*t might evaluate to a number of possibilities including words such as ‘distinct’, ‘different’, ‘district’, ‘distract’, ‘distort’, ‘distant’, and so on. Therefore, the meaning of shortest matching and longest matching patterns is different only when the * wildcard character is used. In other cases, both the variations of the expression would give the same result. This can be demonstrated with the following example script saved as StringExpr.sh.
# StringExpr.sh VAR="distinct and different values" echo ${VAR#di*t} echo ${VAR##di*t}
When the script is executed using the ./StringExpr.sh command line (assuming that the script is located in the current directory), the following output is shown.
Inct and different values values
The first output line demonstrates that the shortest match to the di*t pattern from the beginning of the string is ‘dist’, which is removed from the displayed string. The second output line demonstrates that the longest match to the same di*t pattern is ‘distinct and different,’ which is removed from the displayed string.
Similar to the previous two pattern-matching expressions, the two expressions ${VAR%pattern} and ${VAR%%pattern} attempt to match the pattern at the end of the variable value. There are two more pattern-matching expressions that help in substituting the longest string matching the specified pattern with alternate values. They are ${VAR/pattern/value} and ${VAR//pattern/value}. Between these two, the first expression replaces the first occurrence of the matching pattern, and the second expression replaces all the occurrences of the matching pattern.
In addition to the different types of expressions discussed so far, it is possible to construct expressions that function like compound components. Expressions of the form ((expression)) are evaluated using the arithmetic evaluation principles and are equivalent to using the let built-in command. Variables used within this type of expression are directly accessed by their names without prefixing the $ symbol (i.e., without using the $ notation). If logical operators are used within this type of expression, the logical value (zero or non-zero) is computed and used in conditional statements. If simple arithmetic operators are used, then the computed value represents the result of arithmetic expression and can be stored in arithmetic variables and used in further arithmetic expressions.
While executing the shell scripts, any number of arguments can be passed, and within the script they are identified by their relative position in the command-line string. The general syntax and an example are provided to make the concept clearer.
$ <script name> [argument1] [argument2] [argument3] . . . . $ ./Generate_Invoices.sh 2002 March NA
The first line is the general syntax indicating the arguments are optional. The second line shows an example command line executing the script Generate_Invoices.sh with three arguments 2002, March, and NA. The script may interpret these values such that the invoices need to be generated for March 2002 for the North American (NA) region. When the script is invoked with a command line such as this, the entire command line is parsed and stored by the string in a number of built-in variables. The built-in variable $0 identifies the name of the script, while the arguments are identified relative to the order in which they are passed, as $1 for the first argument, $2 for the second argument, $3 for the third argument, and so on. The variable $* stores all the command-line arguments (separated by the first character of the internal field separator, IFS variable, which is usually the whitespace) as a single string, and the variable $@ stores all the command-line arguments individually separated by whitespace. $@ notation is useful for working with the set of arguments as a list. The ‘$#’ variable specifies the number of arguments (3 in the previous example) passed in the command line. However, all the three variables $*, $@, and $# do not consider the name of the script in the context of their own interpretation.
Any programming (or scripting) language may be considered incomplete if it does not support some of the necessary concepts as discussed in the beginning of this section. User-defined functions are one of them. Using functions, one can write modular code that will be easy to understand and maintain. The general syntax of a function in bash language is very simple and is shown below.
Function <function name> {
statements
}
function <function name> () {
statements
}
<function name> () {
statements
}
The keyword function is optional. However, if it is present, the parentheses ‘()’ appearing after the function name are optional. If the keyword function is not present, then the pair of parentheses must be present. Then follows the body of the function enclosed in curly braces. The function accepts any number of arguments passed during its invocation, and there is no need to mention the arguments in the function heading line, as they are positional, just like the command-line arguments of the script. The $0 variable, which stands for the name of the script, is the same throughout the script and does not change its value even within internal functions defined in the script. However, the other positional arguments such as $1, $2, and so on change their meaning based on the context. They represent the script arguments if they are referenced outside of any of the functions and represent function arguments if they are referenced inside a function. Variables defined within the script are accessible from within the functions if the references are made after the variable is defined. Thus, a function will be able to change values of variables defined outside of its definition. Consider the following example script for a better understanding.
# CallMe.sh script # function definition function CallMe { echo Today is : $TODAY echo Hello $1 $2 } # TODAY=`date` echo Number of script arguments: "$#" echo Hello "$*" # # reverse the script arguments # while calling the CallMe function CallMe $2 $1
As the script shows, the script arguments are reversed and passed to the function. Execute the following command line, assuming that the script is saved as CallMe.sh in the current directory. The script is executed with the ./ command-line prefix as the script is saved in the current directory
$ ./CallMe.sh John Smith
The output of the script, as shown below, explains some of the concepts discussed before.
Number of script arguments: 2 Hello John Smith Today is : Tue Dec 31 06:57:42 CST 2002 Hello Smith John
The variable TODAY is defined outside the function and is still accessible within the function, whereas the function arguments are different from the script arguments. If the script arguments are not explicitly passed to the function as we did here, the function will not be able to recognize them. You may write a variation of this script to test this concept. When setting the value of the TODAY variable, a different notation is used. The date command is executed to produce the current date value, and then the value is captured into the variable on the left side of the statement (which is TODAY in this case) by surrounding the command on the right side with the reverse-quote character ` on both the sides. This is not the same as the single-quote character. It is the character that is combined with the tilde ~ symbol on the keyboard and is usually located toward the top left corner below the ESC key. The same result can be obtained by the $(command) notation. For example, both the forms $(date) and date would compute the current date and timestamp and assign this value to the variable on the left side of the assignment statement.
Earlier, we learned that many Linux commands are individual programs. However, similar to the rich set of built-in variables, the shell also provides a rich set of built-in commands. As the name suggests, these are not individual programs; rather, they are built into the shell itself. Upon successful execution, the built-in commands return zero and a non-zero value if the command fails to execute. Table 4.2 displays some of the most commonly used built-in commands of the bash shell, which means that the list is not exhaustive. Some of these commands were already discussed in the previous sections, and the others are described in the table. The descriptions provided in the table represent typical use of the respective commands. However, there are more options and variations of the commands, which the reader is encouraged to explore independently.
Built-In Command |
Description |
---|---|
alias |
The alias command is used to define alias names for compound and complex command-line syntax. If the command is executed with no arguments, then it displays all currently defined alias names. |
bg <process id> |
A process running in the foreground mode locks up the current shell session from doing anything else until the process completes its execution. However, a process started in the background mode frees up the terminal and we can do other tasks while the process is running. The bg command is used to send a currently running foreground process to background mode so that the terminal is freed up for other tasks. Because the current terminal is locked up with the process, the bg command should be executed from another terminal. The argument to this command is the process id of the process that should be sent to background mode. More than one process can be sent to background mode at once using the bg command. |
break [n] |
The break command is useful for breaking a loop under certain conditions. The argument is optional, and if passed, then it should be numeric and >= 1. The argument indicates the number of levels of loops that the control should exit. |
builtin <built-in command> [args] |
If the name of a function in a shell script is the same as one of the shell’s built-in commands, then the function name gets precedence over the built-in command. This means that when we execute the command (having the same name for the function and the built-in command), the function is executed instead of the built-in command. If we need to override this default behavior and tell the shell that the built-in command should be executed instead of the function, then the command should be preceded by the reserved word builtin. It is also useful in another situation. We may actually intend to invoke the function whose name resembles a built-in command, but from within the function we need to call the built-in command that has the same name. In this case, preceding the command with the builtin reserved word avoids recursion of the function and a possible infinite loop. The return code of this command is the return command> [args] code of the actual built-in command executed. If we attempt to execute a command that is not a shell built-in command, then the ‘builtin’ command fails and returns the false return code. |
cd |
The cd command is used to change the current directory. The argument to this command is a pathname, which may be absolute or relative and may contain any number of levels of directories separated by the slash ‘/’. If the pathname in the argument begins with a slash ‘/’, then the pathname is assumed to be the absolute path and is evaluated from the root directory. For example if the argument is /invoices/2002, then the invoices directory should exist in the root directory and should contain the 2002 directory within itself. If the pathname does not begin with a slash /, then the first directory in the path is assumed to be present in the current directory. |
command |
The command command is very similar to the builtin <command> [args] command. However, it extends its search criteria to the path list specified by the PATH variable and can execute commands that are individual programs, in addition to the built-in commands. The return code from this command is the return code of the actual command executed. |
declare |
The declare variable may be used to declare variables explicitly or declare and export them simultaneously. When used with the –i option, declare treats the variables as integers. When declared with the –x option, variables are automatically exported. The following examples will make it clearer. $ declare –i counter=10 # declares integer variable $ declare –x hstr="Hello World" #declares & exports $ declare –f # display currently defined functions |
echo |
The echo command is used to throw a message or display the value of a shell variable (or a combination of both) to the standard output, which is usually the terminal window. |
eval [args] |
The eval command enables us to execute dynamically built command lines. The arguments passed to the ‘eval’ command are treated to construct a command line and the ‘eval’ command makes the shell to execute this command line. This concept can be extended to the extent that commands can be dynamically constructed within shell scripts and stored in shell variables, and passed to the eval command. For example, the command eval $VAR attempts to execute a string stored in the VAR variable assuming that the string is a command line that could be executed at the shell prompt. The exit code returned by the command stored in the variable is returned by the eval command as its own return code. |
exit [rc] |
The exit command is used to exit the current shell script. The argument is a numeric value and is interpreted as the return code of the current script. Exiting a return code to the calling script or program is the most common way of checking for successful execution of the called script or program. |
export [args] |
The export command exports the variables (that are passed as the command arguments) within the current shell session, so that the succeeding commands will be able to recognize them, if referenced. This command has been discussed in the previous sections. |
fc |
The fc command may be used to view (and edit/execute) the commands from the command history file. The history command discussed earlier is useful to navigate the commands either by specific command number or by backward and forward movement with the arrow keys or using the navigation keys of an editor |
fc |
such as vi or Emacs. On the other hand, the fc command is used to retrieve history commands using the command names. For example, the command line fc –I cd cp displays the list of commands from the last cd command to the last cp command, which means from the point when the cd command is last time used, to the point when the cp command is last time used. The –I (letter I) option is used to list the commands, while the –e <editor> option is used to edit the command history with the specified editor, as in fc –e vi cd cp, which displays the list in the vi editor for editing. When the edits are saved, the changes are made permanent in the .bash_history file. |
fg <process id> |
The fg command brings a background process to the foreground mode and therefore locks up the current terminal. The argument to the command is the process id of the process that has to be brought to foreground mode. |
help <command> |
The help command provides a brief description and usage information about the command that is passed as the argument. The help command works only for the shell built-in commands. For other commands, the manual pages should be referred using the man <command> syntax. |
history [n] |
The history command is used to retrieve a command from the command history for modification and/or re-execution without having the need to type the whole command. This is very useful for repeated execution of long command lines and has already been discussed in the previous sections. |
kill <process ids> |
The kill command is used to abnormally terminate a process. One or more process ids can be passed as arguments to the command. |
let |
The let command is used to assign arithmetic values or the result of arithmetic expressions to variables. The command format is ‘let <variable>=<arithmetic expression>’. |
pwd |
The pwd command is used to check the current directory name in the interactive shell session. It stands for print working directory. The PWD shell variable also contains the same value and may be used in shell scripts instead of the pwd command. |
read <args> |
The read command is used to read one line of input from the standard input, splits the input line into individual words, and stores them each in the arguments passed to the command. The arguments act as variables to store the words split by the command. If the number of words in the input line is more than the number of arguments, the last argument takes the rest of words from the input line. |
return [rc] |
The return command is used to return a return code from a called function to the calling function. Also the return command can be used to conditionally return from the middle of a function if required, without waiting until the end. This is useful in effective error handling in shell scripts using one or more functions. If the return code is omitted, then the return code of the last command executed within the function body is considered as the return status of the command. |
set |
If the set command is executed without any arguments and options, the currently defined variables and functions are displayed. There are many options supported by the set command, some of which are seen here. $ set –o allexport # marks future variable # definitions as exported $ set –o vi # uses vi editor for history navigation $ set –o emacs # uses Emacs for history navigation $ set –o noexec # used in non-interactive shell to # read command for syntax checking # only and not for executing. |
source <file> |
The source command reads the commands specified in the input file and executes them sequentially. Finally, it returns the exit status code of the last command. Hence, the input file is also another shell script. If the argument file name does not have a slash /, then the directories listed in the PATH variable are searched for the file and will execute the first file found. |
suspend |
The suspend command suspends the execution of the current shell session (from where the command is executed) until it receives the SIGCONT signal |
test <expression> |
The test command tests the conditional expression passed as the argument and returns 0 for success and 1 for failure. The return code can be captured into a variable or directly checked whether the condition succeeded or not. The conditional expressions are discussed in detail in the following section while discussing the if/else/fi construct. |
trap <command> <args> |
A simple definition of signals is that they are messages sent by one process to another when the sending process needs something to be done by the receiving process. The trap command can trap the specified signals and execute another command or script. |
type |
The type command is used to get details about the commands, such as what type of command it is. For example, the type –type <command> command line determines and displays whether the specified command is a file, a built-in, an alias, a keyword, and so on. The type –all <command> displays all the information about the command, such as the pathname of the command, if it is a file, if it is an alias, and so on. The type –path <command> returns the full path of the specified command if it is a file such as an external command or a script. Thus, the type command is useful in identifying the type of an executable entity. |
umask <mask> |
The umask command takes an argument that is the complement of the file permissions to be set (in terms of the octal value). For example, if the default file permissions to be set for every file created by a user is 755, then the umask 022 command in the user’s login script would accomplish this, as 0 is the complement of 7, and 2 is the complement of 5. Thus, executing the umask command in the beginning of a shell session could eliminate the need to manually change the file permissions for many files. However, if specific files need to be given different permissions, we would still have to execute the chmod command. |
unalias |
The unalias command is used to remove the alias name created for a command line. |
unset <arg> |
The unset command undefines a variable or function. After executing unset on a variable, the variable is not available for use. |
wait <procid> |
The wait command will make the script wait for the specified process identified by the argument passed to the command. The argument may specify more than one process to wait. The return code of the wait command is the return code of the last process waited for. If none of the specified processes exist, the wait command returns 127 as return code. |
The bash language provides a rich set of flow control constructs, which form the backbone of the scripts. These are discussed in this subsection.
The ‘if/else’ construct can be used in several forms, as shown here. It should be noted that the ‘if’ construct is surrounded by the ‘if’ and ‘fi’ keywords.
if <condition> then
statements
fi
if <condition> then
statements1
else
statements2
fi
if <condition> then
statements1
elif <condition>
statements2
. . .
else
statementsN
fi
The first form of the ‘if’ construct is very simple and evaluates the condition. If the condition is true, then the statements are executed; otherwise the control transfers to the command following the ‘fi’ keyword. The second form of the construct ensures that either of the statement groups is executed depending on whether the condition is evaluated to true or false. In the third form, the ‘elif’ stands for ‘else if’ and checks another condition if the first one fails. In this form, there could be as many ‘elif’s as there are conditions to be evaluated, and the ‘else’ clause appears only at the end. In all the cases, there should be only one ‘else’ clause, even though there could be more than one ‘elif’ clause.
The conditions tested in the ‘if’ and ‘elif’ constructs are one or more statements that are treated as logical conditions. If the condition contains more than one statement, then they are linked through logical operators. For example, every Linux command that returns a zero or non-zero value can be tested in these constructs. The zero return code is considered as success (and hence true), and the non-zero return code is considered as failure (and hence false). The logical operators && and || are used to test the ‘logical and’ and ‘logical or’ conditions for two or more statements. The logical operator ! preceding a condition is used to test for negation of the condition. The logical operators <, >, =, and != may be used on string operands for testing whether the first string is less than, greater than, equal to, and not equal to, respectively. The -n operator preceding a string checks for the string to be not null or non-zero length, and the -z operator preceding a string checks for the string to be null or zero length. The logical operators -lt, -gt, -eq, -ne, -ge, and le are used on integer operands and stand for less than, greater than, equal to, not equal to, greater than or equal to, and less than or equal to, respectively.
The first type of conditions test whether a specific command was run successfully or not. Such conditions are checked without enclosing brackets. The second type of conditions that can be tested with the ‘if’ and ‘elif’ constructs are those that are enclosed in square brackets, as in [ . . . ]. When conditions are specified within the square brackets, there should be a space after the opening bracket [ and the beginning of condition statement and before the ending of condition statement and the closing bracket ].
There are special test conditions designed for testing the file attributes. These are of the form [ <option> <filename> ], where the option checks for the specific attribute or condition. The different command options used to check for the existence of the file are presented in Table 4.3.
Command Option |
Description |
---|---|
-a and –e |
This option is used to check for the existence of the file and return true if the file exists. |
-b |
This option returns true if the file exists and is a block device file. |
-c |
This option returns true if the file exists and is a character device file. |
-d |
This option returns true if the file exists and is a directory. |
-f |
This option returns true if the file exists and is any regular file such as a data file or program source file or an executable file and so on. |
h |
This option returns true if the file exists and is a symbolic link to another file or directory. |
p |
This option returns true if the file exists and is a named pipe. |
r |
This option returns true if the file exists and is read permitted, which means that the user executing the script has permission to read the file. |
s |
This option returns true if the file exists and is of non-zero size. |
w |
This option returns true if the file exists and is writable, which means that the user executing the script has write permission on the file. |
x |
This option returns true if the file exists and is execute permitted, which means that the user executing the script has execute permission on the file. |
O |
This option returns true if the file exists and the user that is executing the script owns it. |
G |
This option returns true if the file exists and the user that is executing the script belongs to the group that owns the file. |
S |
This option returns true if the file exists and is a socket file. |
There are logical operators that compare two files, as in <file1> <operator> <file2>, as shown in Table 4.4.
Operator |
Description |
---|---|
file1 –ot file2 |
The –ot operator returns true if the file1 is older than the file2 or when file2 exists and file1 does not exist. |
file1 –nt file2 |
The –nt operator returns true if the file1 is newer than the file2 or the file1 exists and the file2 does not exist. |
file1 –ef file2 |
The –ef operator returns true if the file1 and file2 both refer to the same device and inode numbers. |
The test built-in command is used along with the ‘if’ and ‘elif’ constructs, in place of the square brackets.
Example condition statements are shown here to demonstrate some of these concepts.
# CpFile.sh script OLDFILE=oldscript.sh NEWFILE=newscript.sh if cp "$OLDFILE" "$NEWFILE" 2>/dev/null; then echo $OLDFILE is copied to $NEWFILE else echo Failed to copy $OLDFILE to $NEWFILE fi # CheckString.sh script YOURNAME="Satya Sai" NAME1="Satya Sai" NAME2="John Smith" if [ "$YOURNAME" = "$NAME1" ]; then echo $YOURNAME is $NAME1 else echo $YOURNAME is not $NAME1 fi
# CheckFile.sh script FILE=/home/satya/linkFile if [ -h "$FILE" ]; then echo $FILE is a symbolic link else echo $FILE is not a symbolic link fi
# testCommand.sh script VARNAME= if test –z "$VARNAME"; then echo VARNAME is a null string else echo VARNAME is not a null string fi
The examples are provided only to demonstrate how to code the ‘if’ construct or to help the readers understand existing shell scripts. However, the readers are encouraged to practice writing their own examples to test many of the conditions not shown in the book.
There are two forms of the ‘for’ loop: in the first form, it is used to execute a group of statements a fixed number of times, and in this form it is very similar to the ‘for’ loop in the C/C++ language. The general syntax and an example script are provided here.
for (( (expr1); (expr2); (expr3) )) do
statements
done
# ForLoop.sh for (( (i=0); (i< 4); i++ )) do echo Loop counter value is $i done
The output of this script would be as shown below.
Loop counter value is 0 Loop counter value is 1 Loop counter value is 2 Loop counter value is 3
The ‘expr1’ is an expression that is evaluated while entering the loop and may be considered as the loop initiation expression. Typically, a loop counter variable is initiated in this expression. The ‘expr2’ expression is a conditional expression and is executed at the beginning of every iteration; the loop counter variable is tested in this expression. An iteration is defined as executing the set of statements inside the loop for once. If the expression is evaluated to false or non-zero, then the next iteration is executed. The loop exits when this expression is evaluated to true or zero for the first time. Therefore, when beginning the loop, it should be evaluated to false; otherwise the loop does not execute even one iteration. The ‘expr3’ is an expression that is executed every time after executing the iteration. Therefore, this expression is used to change the value of the variable that is initiated in ‘expr1’ and being tested in ‘expr2’, so that the loop exit criteria are built within the loop description itself. It is important to note that the surrounding pairs of parentheses (( . . . )) are required for the shell to interpret the expressions as arithmetic. It is very easy to setup the ‘for’ loop with perfect exit criteria and therefore to avoid the infinite looping situation; however, the conditional expression should be set up properly.
The second kind of ‘for’ loop also executes the iteration for a fixed number of times, but the number of iterations is not controlled by a counter as has been done in the previous case. Rather, the loop is executed over a list of items, such as the list of command-line arguments, the list of directories in the $PATH variable, the list of files found within a directory, a list of values built programmatically and so on. In all these cases, the values within the list are assumed to be separated by the first character of the IFS special variable, which was discussed earlier. The general syntax and an example are provided here.
for <variable> in $<listvar>; do
statements
done
# ForLoop2.sh FRUITS="appl orange pear banana" for fruit in $FRUITS; do echo $fruit done
The output of this script would be as shown below.
appl orange pear banana
If the field separator in the list variable is not the default IFS character (usually whitespace), then the IFS can be set explicitly before the for loop as in IFS=: to separate the directories within the PATH global variable, or as in IFS=/ to separate the individual subdirectory names within a directory name, so that the shell interprets the items in the list appropriately. Similarly, the command-line arguments can be accessed through the $@ list variable, as shown in the following example.
# ForLoop3.sh for arg in "$@"; do echo $arg done
The output of this script would be similar to the previous outputs, depending on the arguments provided at the command-line. However, it should be remembered that the $0 variable (the name of the script) is not an item in the $@ list.
The next type of looping construct is the ‘while’ loop, which is very similar to the one seen in C/C++ programming languages. The general syntax of the construct is provided here.
while <condition>; do
statements
done
In the ‘while’ loop, the set of statements within the loop are executed as long as the ‘condition’ is true. The condition is checked at the beginning of every iteration. Whenever the condition is evaluated to false, the ‘while’ loop immediately stops further iterations, and the control flows to the statement that follows the end of the loop. However, if it is desired to exit the loop in the middle of an iteration, then the break command should be executed to break the loop by force. Another looping construct of interest is the ‘until’ loop, which is very similar to the ‘while’ loop but only different in one aspect. The ‘until’ loop executes the iterations as long as the condition is evaluated to false and exits the loop whenever the condition becomes true. The meaning of this construct is to execute the loop until the condition is true. In all other aspects, it behaves similar to the ‘while’ loop. The general syntax is provided below.
until <condition>; do
statements
done
The ‘case’ construct may be used when it is necessary to test for multiple values of an expression and then execute different groups of statements each time when the expression is evaluated to a different value. In other words, the case statement effectively replaces a complicated if …elif … elif … type of construct. The general syntax and an example script using the ‘case’ construct are given here.
case <expr> in
( pattern1 )
statements ;;
( pattern2 )
statements ;;
. . .
. . .
esac
# CaseConstruct.sh for arg in "$@"; do echo $arg case $arg in ( 1 ) echo One ;; case $arg in ( 2 ) echo Two ;; case $arg in ( 3 ) echo Three ;; esac done
When the script is executed with the command-line ./CaseConstruct.sh 1 2 3, the output of this script would be as shown below.
1 One 2 Two 3 Three
The last construct that we are going to discuss is the ‘select’ construct, which is used to construct a menu of options and to take appropriate action based on the input provided by the user. The general syntax of the construct is shown below.
select <variable> in <listvar>; do
statements
done
In this construct, the ‘listvar’ is the list of items displayed in the menu and is equivalent to the other lists that we have discussed before. The items in the list can be simple words if the default IFS character is used as a field separator. By assigning a different value to the IFS before using the ‘select’ construct, whitespace can be used to create menu options containing multiple words separated by whitespace. When executing a ‘select’ construct, the shell builds the menu options and numbers them sequentially from 1 to N, where 1 and N represent the first and last items in the list and sequence numbers between represent the corresponding menu options. Then the shell displays the menu with the sequence numbers and accepts a value from the standard input when the user chooses an option by entering the appropriate number and clicking the ENTER key. The value entered by the user is then stored in the ‘variable’ mentioned in the select clause. The value of the variable is accessible within the construct’s statements using the default $<variable> notation. Within the ‘select’ construct, the break statement should be used to appropriately come out of the construct; otherwise, the ‘select’ construct behaves like an infinite loop. The PS3 command-line prompt is used to display a message prompt for the user to understand that a choice has to be made at the prompt. The example provided here helps you to understand the concept of using this construct.
# SelectConstruct.sh PS3='Please enter your choice ' OPTIONS="Apple Orange Pear Banana" select choice in $OPTIONS; do if [ $choice ]; then echo Your choice of fruit is $choice else echo Your choice is not valid – exiting . . . break fi done
When this script is executed with the command line ‘/.SelectConstruct.sh’, the menu is displayed as shown below.
1) Apple 2) Orange 3) Pear 4) Banana Please enter your choice
If you enter anything other than the specified choices, then the message “Your choice is not valid – exiting . . .” and the script exits.
The discussion to this point refers to the features provided by the bash shell and is neither complete nor exhaustive. As mentioned earlier, the purpose of this discussion is to enable you to understand the power of the shell and start writing simple scripts. You are therefore encouraged to further explore the shell functionality and the rich set of commands in order to get more details.
< Day Day Up > |