7.2. Intercepting System CallsProcesses run in two modes: user and kernel. Most of the time processes run under the user mode when they have access to limited resources. When a process needs to perform a service offered by the kernel, it invokes a system call. System calls serve as gates into the kernel. They are software interrupts that the operating system processes in kernel mode. The sections in the following paragraphs show how LKMs can perform various tricks by intercepting system calls. 7.2.1. The System Call TableThe Linux kernel maintains a system call table, which is simply a set of pointers to functions that implement the system calls. To see the list of system calls implemented by your kernel, see /usr/include/bits/syscall.h. The kernel stores the system call table under a structure called sys_call_table, which you can find in the arch/i386/kernel/entry.S file.
7.2.2. strace Is Your FriendOften it is necessary to hook into programs to understand what system calls they invoke. The strace tool can do this. For example, consider the following C program, which simply prints the /etc/passwd file: #include <stdio.h> int main(void) { FILE *myfile; char tempstring[1024]; if(!(myfile=fopen("/etc/passwd","r"))) { fprintf(stderr,"Could not open file"); exit(1); } while(!feof(myfile)) { fscanf(myfile,"%s",tempstring); fprintf(stdout,"%s",tempstring); } exit(0); } Assuming you have compiled the preceding code with the gcc compiler to produce an executable called a.out, run the following strace command: [notroot]$ strace -o strace.out ./a.out > /dev/null Now the output from strace is stored in strace.out. Take a look at it to see all the function calls invoked by a.out. For example, issue the following grep command to realize that the fopen( ) library call in a.out invokes the open( ) system call to open the /etc/passwd file: [notroot]$ grep "/etc/passwd" strace.out
open("/etc/passwd", O_RDONLY) = 3 7.2.3. Forcing Access to sys_call_tableBecause sys_call_table is no longer exported in the 2.6 kernels, we can access it only by brute force. LKMs have access to kernel memory, so it is possible to gain access to sys_call_table by comparing known locations with exported system calls. Although sys_call_table itself is not exported, a few system calls such as sys_read() and sys_write( ) are still exported and available to LKMs. To demonstrate how to get access to sys_call_table in the 2.6 kernels, we will write a simple LKM that intercepts sys_open( ) and prevents anyone from opening the /tmp/test file.
We'll walk through the critical bits here, but you'll find the full source code for intercept_open.c in the next section. Notice that the my_init( ) function is called during initialization. This function attempts to gain access to sys_call_table by starting at the address of system_utsname. The system_utsname structure contains a list of system information and is known to exist before the system call table. Therefore, the function starts at the location of system_utsname and iterates 1,024 (MAX_TRY) times. It advances a byte every time and compares the current location with that of sys_read(), whose address is assumed to be available to the LKM. Once a match is found, the loop breaks and we have access to sys_call_table: while(i) { if(sys_table[__NR_read] == (unsigned long)sys_read) { sys_call_table=sys_table; flag=1; break; } i--; sys_table++; } The LKM invokes xchg( ) to alter the system call table to point sys_call_table[_ _NR_open] to our_fake_open_function( ): original_sys_open =(void * )xchg(&sys_call_table[_ _NR_open], our_fake_open_function); This causes our_fake_open_function( ) to be invoked instead of the original sys_open( ) call. The xchg( ) function also returns original_sys_open, which contains a pointer to the original sys_open( ). We use this pointer to reset the system call table to point to the original sys_open() when the LKM is unloaded: xchg(&sys_call_table[_ _NR_open], original_sys_open); The our_fake_open_function( ) function checks to see if the *filename parameter is set to the file we are trying to prevent from being opened, which in our case is assumed to be /tmp/test. However, it is not sufficient to compare /tmp/test with the value of filename because if a process's current directory is /tmp, for example, it might invoke sys_open( ) with test as the parameter. The surest way to check if filename is indeed referring to /tmp/test is to compare the inode of /tmp/test with the inode of the file corresponding to filename. Inodes are data structures that contain information about files in the system. Because every file has a unique inode, we can be certain of our results. To obtain the inode, our_fake_open_function( ) invokes user_path_walk( ) and passes it filename and a structure of type nameidata as required by the function. However, before user_path_walk( ) is called with /tmp/test as a parameter, the LKM calls the following functions: fs=get_fs( ); set_fs(get_ds( )); The user_path_walk( ) function expects the location of filename to be present in memory in user space. However, because we are writing a kernel module, our code will be in kernel space and user_path_walk( ) will fail because it expects to be run in user mode. Therefore, before we invoke user_path_walk( ), we will need to invoke the get_fs( ) function, which reads the value of the highest segment of kernel memory, and then invoke set_fs( ) along with get_ds( ) as a parameter. This changes the kernel virtual memory limit for user space memory so that user_path_walk( ) can succeed. Once the module is done calling user_path_walk( ), it restores the limit: set_fs(fs); If the files' inodes are equal, we know the user is attempting to open /tmp/test and the module returns -EACCES: if(inode==inode_t) return -EACCES; Otherwise, the module invokes the original sys_open( ): return original_sys_open(filename,flags,mode); 7.2.3.1 intercept_open.cFollowing is the full source code of our intercept_open LKM: #include <linux/module.h> #include <linux/kernel.h> #include <linux/init.h> #include <linux/syscalls.h> #include <linux/unistd.h> #include <linux/proc_fs.h> #include <asm/uaccess.h> #include <linux/namei.h> int flag=0; #define MAX_TRY 1024; MODULE_LICENSE ("GPL"); unsigned long *sys_call_table; asmlinkage long (*original_sys_open) (const char __user * filename, int flags, int mode); asmlinkage int our_fake_open_function(const char __user *filename, int flags, int mode) { int error; struct nameidata nd,nd_t; struct inode *inode,*inode_t; mm_segment_t fs; error=user_path_walk(filename,&nd); if(!error) { inode=nd.dentry->d_inode; /*Have to do this before calling user_path_walk( ) from kernel space:*/ fs=get_fs( ); set_fs(get_ds( )); /*Protect /tmp/test. Change this to whatever file you want to protect*/ error=user_path_walk("/tmp/test",&nd_t); set_fs(fs); if(!error) { inode_t=nd_t.dentry->d_inode; if(inode==inode_t) return -EACCES; } } return original_sys_open(filename,flags,mode); } static int __init my_init (void) { int i=MAX_TRY; unsigned long *sys_table; sys_table = (unsigned long *)&system_utsname; while(i) { if(sys_table[__NR_read] == (unsigned long)sys_read) { sys_call_table=sys_table; flag=1; break; } i--; sys_table++; } if(flag) { original_sys_open =(void * )xchg(&sys_call_table[__NR_open], our_fake_open_function); } return 0; } static void my_exit (void) { xchg(&sys_call_table[__NR_open], original_sys_open); } module_init(my_init); module_exit(my_exit); 7.2.3.2 Compiling and testing intercept_openTo compile intercept_open.c, use the following makefile: obj-m += intercept_open.o Compile using the following make command: [notroot]$ make -C /usr/src/linux-`uname -r` SUBDIRS=$PWD modules Create /tmp/test: [notroot]$ echo hi > /tmp/test Load insert_open.ko: [root]# insmod ./intercept_open.ko Try to open /tmp/test: [root]# cat /tmp/test
cat: /tmp/test: Permission denied Unload the module: [root]# rmmod intercept_open [root]# cat /tmp/test
hi 7.2.4. Intercepting sys_unlink( ) Using System.mapIn the previous section, we looked at how to obtain the address of sys_call_table by searching kernel memory. However, if the kernel's System.map file is available, you can use it to obtain the location of sys_call_table, and this location can be hardcoded into the LKM. An LKM that denies the deletion of files by intercepting sys_unlink( ) is a good illustration. First, find the location of sys_call_table from System.map: [notroot]$ grep sys_call_table /boot/System.map c044fd00 D sys_call_table The module's source code hardcodes the address to obtain sys_call_table: *(long *)&sys_call_table=0xc044fd00; The module alters the system call table to point _ _NR_unlink to hacked_sys_unlink, and stores the original location of sys_unlink( ): original_sys_unlink =(void * )xchg(&sys_call_table[_ _NR_unlink], hacked_sys_unlink); The hacked_sys_unlink( ) function returns -1 whenever it is called. It never invokes the original sys_unlink( ): asmlinkage long hacked_sys_unlink(const char *pathname) { return -1; } This prevents any process from being able to delete any file on the system. 7.2.4.1 intercept_unlink.cFollowing is the full source code of our intercept_unlink LKM: #include <linux/module.h> #include <linux/kernel.h> #include <linux/init.h> #include <linux/syscalls.h> #include <linux/unistd.h> MODULE_LICENSE ("GPL"); unsigned long *sys_call_table; asmlinkage long (*original_sys_unlink) (const char *pathname); /*return -1. this will prevent any process from unlinking any file*/ asmlinkage long hacked_sys_unlink(const char *pathname) { return -1; } static int _ _init my_init (void) { /*obtain sys_call_table from hardcoded value we found in System.map*/ *(long *)&sys_call_table=0xc044fd00; /*store original location of sys_unlink. Alter sys_call_table to point _ _NR_unlink to our hacked_sys_unlink*/ original_sys_unlink =(void * )xchg(&sys_call_table[_ _NR_unlink], hacked_sys_unlink); return 0; } static void my_exit (void) /*restore original sys_unlink in sys_call_table*/ xchg(&sys_call_table[_ _NR_unlink], original_sys_unlink); } module_init(my_init); module_exit(my_exit); 7.2.4.2 Compiling and testing intercept_unlinkTo test the module, use the following makefile: obj-m += intercept_unlink.o Compile using the following make command: [notroot]$ make -C /usr/src/linux-`uname -r` SUBDIRS=$PWD modules Create a test file: [notroot]$ touch /tmp/testfile Load the module: [root]# insmod ./intercept_unlink.ko Attempt to delete the file: [root]# rm -rf /tmp/testfile
rm: cannot remove `/tmp/testfile': Operation not permitted Unload the module: [root]# rmmod intercept_unlink Now, you should be able to delete the file: [root]# rm -rf /tmp/testfile 7.2.5. Intercepting sys_exit( ) in 2.4 KernelsThe 2.4 kernels export the sys_call_table symbol. Many people still use the 2.4 kernels, so this section quickly shows you how to write an LKM for the 2.4 kernel to intercept sys_exit( ). This example is very simple and straightforward, and once you understand how intercept_exit.c works, you'll be able to port the other examples in this chapter to 2.4 kernels.
The intercept_exit module intercepts sys_exit( ) and prints the value of error_code passed to sys_exit() onto the console. The init_module( ) function is called when the LKM is loaded. This function stores a reference to the original sys_exit( ) call, and it points sys_call_table[_ _NR_exit] to our_fake_exit_function: original_sys_exit = sys_call_table[_ _NR_exit]; sys_call_table[_ _NR_exit]=our_fake_exit_function; The our_fake_exit_function( ) call prints the value of error_code and then calls the original sys_exit( ): asmlinkage int our_fake_exit_function(int error_code) { printk("HEY! sys_exit called with error_code=%d\n",error_code); return original_sys_exit(error_code); } The LKM restores sys_call_table[_ _NR_exit] to point to original_sys_exit when it is unloaded: sys_call_table[_ _NR_exit]=original_sys_exit; 7.2.5.1 intercept_exit.cFollowing is the full source code of our intercept_exit LKM: #include <linux/module.h> #include <linux/kernel.h> #include <sys/syscall.h> MODULE_LICENSE("GPL"); extern void *sys_call_table[]; asmlinkage int (*original_sys_exit)(int); asmlinkage int our_fake_exit_function(int error_code) { /*print message on console every time we are called*/ printk("HEY! sys_exit called with error_code=%d\n",error_code); /*call original sys_exit and return its value*/ return original_sys_exit(error_code); } int init_module(void) { /*store reference to the original sys_exit call*/ original_sys_exit = sys_call_table[__NR_exit]; /*manipulate sys_call_table to call our fake exit function instead*/ sys_call_table[__NR_exit]=our_fake_exit_function; return 0; } void cleanup_module(void) { /*restore original sys_exit*/ sys_call_table[__NR_exit]=original_sys_exit; } 7.2.5.2 Compiling and testing intercept_exitCompile intercept_exit.c: [notroot]$ gcc -D__KERNEL_ _ -DMODULE -I/usr/src/linux/include -c intercept_exit.c Insert it into the kernel: [root]# insmod ./intercept_exit.o Ask ls to list a nonexistent file. This will cause ls to exit with a nonzero value, and our LKM will print this value: [notroot]$ ls /tmp/nonexistent
ls: /tmp/nonexistent: No such file or directory
HEY! sys_exit called with error_code=1 [root]# rmmod intercept_exit |