============================================================== Linux Kernel State Tracer (LKST) -------------------------------- Date: Nov. 25, 2002 Version: 1.3.1 Authors: Kazuyoshi Serizawa Tetsuhito Nakamura Keisuke Hatasaki Takahiro Tsurukubo Yasunori Goto Satoshi Sakamoto Takao Indoh ============================================================== COPYRIGHT (C) HITACHI,LTD. 2001-2002 ALL RIGHTS RESERVED. COPYRIGHT (C) FUJITSU,LTD. 2001-2002 ALL RIGHTS RESERVED. ============================================================== Introduction ------------- Linux Kernel State Tracer(LKST) records information as trace data about events in the Linux Kernel. It records various events like process context switch, send signal, exception, memory allocation, send packet, and so on. This program is a facility for analyzing trouble of the Linux Kernel. It becomes possible that developers investigate troubles by this facility without stopping important(used for Mission Critical purposes) server. Functions of present Linux could not offer enough information for developers to solve the trouble of the system. - Standard Kernel:When the kernel is panic, it displays only the registers and stack information. If developers want more information, it is necessary to insert printk() in the kernel and recompile it. - LKCD :LKCD store a dump when the system crash. But, if developers use it only, they cannot investigate the sequence of events till kernel panic. (LKCD is a.k.a. Linux Kernel Crash Dumps) Event information of the kernel is always recorded as trace data on the server under running by using LKST. So, these problems can be solved. Moreover, LKST has some features. - It is possible to change dynamically which events are recorded. Developers can obtain information about the events which they concern only. And, It reduces the overhead of components which is not related with trouble. - It is possible to change each handlers related each events. A default handler of all events is just recording the events. But, if it is necessary, default handler can be changed. This function can be used as following, +Notify user processes when the handler detects abnormal status. +Change a maskset,which controls what type of events should be recorded, dynamically by the handler. -> The system can usually run with a few events trace for the cause of good performance. And, when the handler detects abnormal status, it can change a maskset to get more detail information. Release Notes ------------- - Update for kernel 2.4.19. - Separate Kernel Hooks from LKST patch. - Add module that record only specified process. - Make sure lkstlogd release own resources. - lkstlogd report warning to syslog. - Add and modify APIs for event-handler and maskset. - Fix some bugs. Installation ------------ See file of "Install" or "Install_rpm". How To Use ---------- Please refer the online manual for details of each command. 1) Display the present kernel trace data It is necessary to execute 2 commands. a. Get a log buffer from kernel Enter the command as follows. Then LKST log buffer is saved to file. % lkstbuf read -f logfile b. Display the trace data Enter the command as follows. Trace data is written to stdout. This command can be specified various options to filter the traced record. % lkstbuf print -f logfile ------------------------------------------------------------------- event_type=context_switch cpu=02, pid=00001008 time=Tue Mar 26 18:48:53.143274134 2002 arg1=0xd68a4000 0x00000000 : pointer to task_struct(prev) arg2=0xf2dde000 0x00000000 : pointer to task_struct(next) arg3=0x00000001 0x0000000a : process state/process count event_type=mem_get_freepage cpu=02, pid=00001008 time=Tue Mar 26 18:48:53.143270345 2002 arg1=0xecc0f000 0x00000000 : pointer to page arg2=0x000001f0 0x00000000 : type of page(gfp_mask) arg3=0x00000000 0x00000000 : the number of page(order) arg4=0xc0168197 0x00000000 : call address event_type=mem_malloc cpu=02, pid=00001008 time=Tue Mar 26 18:48:53.143268419 2002 arg1=0x00000018 0x00000000 : size arg2=0x000001f0 0x00000000 : flags arg3=0xdfe3e5a0 0x00000000 : objp arg4=0xc0168578 0x00000000 : call address ... ... ------------------------------------------------------------------- 2) Change which events are recorded. The traced event types can be changed by doing as follows. a.Get a maskset file. At first, please get a maskset file which control what event types should be recorded by LKST. % lkstm read -m id -d | grep 0x > maskset_file "id" is the decimal number which is used as an identifier of the maskset. Id 0, 1, and 2 are the reserved maskset; Id 0 does not record any event types, id 1 records all of the event types, and id 2 records the default event types. b.Edit the maskset file. Next, edit the saved maskset_file. In default, event of spinlock() is not recorded, so id of event handler for event 0x080 is 0x0ff. (Event type number of spin_lock() is 0x080.) : : 0x07f 0x001 0x080 0x0ff <<< 0x081 0x0ff : : To record the event, please change the value "0x0ff" to "0x001" at right side of the line which contains its event number. : : 0x07f 0x001 0x080 0x001 <- change 0x081 0x0ff : : Oppositely, If one event is not wanted to record, please delete the line which is written it's event number. ex) event type of system call entry ( number = 0x030). : : 0x02f 0x001 0x030 0x0ff <- change 0x031 0x001 : : All the of Event type numbers are written in lkst-events-1.06.pdf that is available from download page. c.Write the new maskset. The new maskset is written when the next command is executed. % lkstm write -m id -f maskset_file "id" can be specified from 4 to 254. Please repeat the operation from a. to c., if you want to make more masksets. If you want to confirm the state of the maskset, maskset_read have to be executed alone. % lkstm read -m id d.Change maskset New maskset becomes effective if maskset_set command is executed, and the traced event types are changed. % lkstm set -m id e.Confirm which maskset is currently selected. If you want to know which maskset has been selected, use lkst_status command. % lkst status ------------------------------------------------------------------- number of masksets : 4 number of event-handlers: 4 number of buffers : 5 current_maskset_id : 2 current_buffer_id (cpu: 000): 1 current_buffer_id (cpu: 001): 2 current_buffer_id (cpu: 002): 3 current_buffer_id (cpu: 003): 4 ------------------------------------------------------------------- ID of the selected maskset is shown to the right of "current_maskset_id" in the forth line. In this example, it is 2. And the number of right of "number of masksets" is total number of the masksets. In this example, it is 4. cf. Other information means as follows. "number of event-handlers" :number of registered event handlers "number of buffers" :number of created buffers "current_buffer_id" :ID of currently selected buffers for all CPUs 3) Invoke LKST logging daemon LKST logging daemon continues to output the containts of event buffer to log files. Please refer the online manual of lkstlogd(8) for details. Name of the files are /var/log/lkst/sebufCC.S, CC represents cpu id corresponds to the file and S represents a sequence number (described later). If directory /var/log/lkst does not exist, please make it as follows. % mkdir /var/log/lkst This is an example of command to invoke the daemon, % lkstlogd -a -b 2097152 -l 10485760 -n 2 LKST logging daemon will stop and restart to write the file when the daemon receives a signal SIGUSR1. To send the signal, this is convenient, % kill -USR1 `cat /var/run/lkstlogd.pid` The size of each file are limited as specified by -l option, so old records would be overwritten. To avoid overwitten, LKST logging daemon can change files. Said sequence number will be increased on this change and to discriminate new set of files and old files. To change file, % kill -USR2 `cat /var/run/lkstlogd.pid` examples/lkst_watch_daemon is an example of this feature. This script watch unexpected termination of another daemon, then change the files. 4) Dump logs in panic() Please do "3) Install addon tools" in "Install". And just do, % insmod lkst_mod_panicdump.o maskset_id_base=2 A parameter "maskset_id_base" specifies which maskset to be used as a template of the new maskset. For testing, this module can call panic(). See 6)-c below. 5) How to display stack trace on each event a.Insmod the stack trace module. Please do "3) Install addon tools" in "Install". And do, % insmod lkst_mod_stacktrace maskset_id_base=2 A parameter "maskset_id_base" specifies a template. b.Execute special formatter Execute lkst_mod_stacktrace_read.pl. % lkstbuf read > logfile % perl lkst_mod_stacktrace_read.pl \ -s /boot/System.map-`uname -r` -k /proc/ksyms -d logfile All of the options can be omitted. The default value for -s and -k are the same as those values of above command line. If option -d is omitted, this formatter reads from current event buffers. ------------------------------------------------------------------- 0x00057f24: 0000269005494693 [mc], \ type: INT_HARDWARE_ENTRY(0x0010), pid: 0000000000, cpu: 000 \ 0x00000000 0x00000000 0x00000000 0x00000001 \ 0x00000000 0x00000000 0x00000000 0x00000000 \ 0x00057f26: 0000269005511255 [mc], \ type: INT_TASKLETHI_ENTRY(0x0012), pid: 0000000000, cpu: 000 \ 0x00000000 0xc01257d0 0x00000000 0x00000000 \ 0x00000000 0x00000000 0x00000000 0x00000000 \ 0x00057f28: 0000269005513862 [mc], \ type: INT_BH_ENTRY(0x0016), pid: 0000000000, cpu: 000 \ 0x00000000 0x00000000 0x00000000 0xc012a390 \ 0x00000000 0x00000000 0x00000000 0x00000000 \ ------------------------------------------------------------------- NOTE: Contents of stack trace might include some unrelated address. 6) Change event handlers As described on the "Introduction", users can register additional event handler in order to add function more than recording the events. Please refer addons/lkst_mod_panicdump.c as an example. a.Register new event-handler. b.Change maskset. When lkst_mod_panicdump.o is loaded, the initialiser in the module will do them automatically. The initialiser will register lkst_evhandler_panicdump() as a new eventhandler that ID is 128, and create a new maskset named "maskset_panicdump". c.Control event handler. The control function of the event handler can invoke panic() for testing. % echo panic | lksteh ctrl -e 128 -f - The control function compares the argument and "panic", if correspond, the call panic("LKST PANIC TEST"); 7) Add user buffer If a new user buffer is added, data can be recorded in the added buffer. It becomes possible that a lot of data can be recorded without overflowing the standard buffer. Please refer exapmles/lkst_test_run. This is an example of switching the buffer where data is recorded when the specified command is executed. % cd examples % sh lkst_test_run 65536 ps aux The first argument is the size of new buffer, and latter arguments are command to execute. 8) How to use LKST as an event counter a.Insmod the counter module. Make addons/lkst_mod_event_count.o, and insmod it. b.Execute command. Execute addons/lkst_mod_event_count_read, and then the total of each event counter is displayed. NOTE: After insmod, lkst_mod_event_count_read can be executed only once. 9) How to insert trace point a.Define new event type in lkst_events.h as follows. LKST_ETYPE_DEF(event-ID, priority, hook-type, event-name, description) event-ID --- Value of the event type(0x000..0xfff) 0x000-0x0ff: preset 0x100-0x1ff: user 0x200-0xeff: reserved 0xf00-0xfff: system priority --- Priority of the event type(0x00..0xff) hook-type --- Type of hook header for Kernel Hooks Specify either the following according to the insertion location of the HOOK macro. - NORMAL: If you insert HOOK macro in the kernel, use this type. - MODULE: If you insert HOOK macro in the module, use this type. - INLINE: If you insert HOOK macro in the in-line function of the kernel, use this type. If you insert the same HOOK macro in the two or more places, use this. NOTE: If you insert HOOK macro in the in-line function of the module, use MODULE type. ~~~~~~ event-name --- Mnemonic of the event type description --- Description of the event type (example) LKST_ETYPE_DEF(0x100, 0x0A, NORMAL, NEW_EVENT, NEW_EVENT) b.Insert following HOOK macro where you want to add the trace point. Add following sentence to the file to be added the trace point. #include - LKST_HOOK_INLINE(event-name, argument1, argument2, argument3, argument4) The case of inserting HOOK macro in the in-line function or the macro function. If you insert the same HOOK macros in the two or more places, use this. - LKST_HOOK(event-name, argument1, argument2, argument3, argument4) The case except the above-mentioned. * event-name : It should be the same as what defined by the LKST_ETYPE_DEF macro. * argument1..4 : argument for HOOK. c.When the trace point is in the module, the macro for declaration of HOOK header is put in it. And the macro for the initialization and termination of HOOK is inserted respectively in the function of the module-initialization and module-termination. Declaration macro for HOOK header LKST_HOOK_DECLARE(event-name) -- Declaration for LKST_HOOK LKST_HOOK_DECLARE_INLINE(event-name) -- Declaration for LKST_HOOK_INLINE Initialization macro LKST_HOOK_INIT_MODULE(event-name, ret-variable) Termination macro LKST_HOOK_TERMINATE_MODULE(event-name, ret-variable) * event-name : It should be the same as what defined by the LKST_ETYPE_DEF macro. * ret-valiable : variable which receives return value. See sample program "examples/lkst_mod_trc.c". 10) How to extract LKST trace data from LKCD dump file a.Create dump file. See LKCD documents which can be downloaded at http://lkcd.sourceforge.net/. (i) Save system memory to dump device. Enter the command as follows. % lkcd config % echo 1 > /proc/sys/kernel/sysrq Hit Sysrq+C, then start saving system memory to dump device, and rebooting automatically. NOTE: At default, system memory is saved at swap disk. So if swapon runs in booting your system, system memory saved at swap disk is collapsed. Please save system memory at another disk except swap disk, or comment out the line of swapon in the rc.sysinit. (ii) Crate dump file. Enter the command as follows. Dump file is created at /var/log/dump/N (N is an integer of 0 or more.). % lkcd save b.Run lcrash command (i) Enter the command as follows. lcrash command starts and prompt is shown. % cd /var/log/dump/N (N is an integer of 0 or more.) % lcrash -n N map = map.N, dump = dump.N, outfile = stdout, kerntypes = kerntypes .N Please wait... Initializing vmdump access ... Done. Loading system map ................................. Done. Loading type info (Kerntypes) ... Done. Initializing arch specific data ... Done. Loading ksyms from dump ....... Done. lkstdumplib being initialized >> (ii) Run lkstdump sub-command Enter the command as follows. Trace data is written to file . >> lkstdump logfile buffer[0] write(1024 records) buffer[1] write(1024 records) buffer[2] write(1024 records) buffer[3] write(1024 records) buffer[4] write(1024 records) 's file format is the same as the file created by "lkstbuf read", so you can see by using "lsktbuf print". Known Bugs --------- - The time data in the event records written by IOCTL(LKST_IOC_BUFFER_READ) and lkst_buffer_read do not conform to the draft of POSIX standard 1003.25. Future plans ------------ - Performance improvement. - Adding more extensions for analyzing various faults. - Add connectivity to Event Logging (http://evlog.sourceforge.net/). Reporting Problems and Discussing LKST -------------------------------------- email: lkst-develop@lists.sourceforge.net visit: https://sourceforge.net/projects/lkst/ or http://oss.hitachi.co.jp/sdl/english/kstracer_main.html Licence -------- COPYRIGHT (C) HITACHI,LTD. 2001-2002 ALL RIGHTS RESERVED. COPYRIGHT (C) FUJITSU,LTD. 2001-2002 ALL RIGHTS RESERVED. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. Change History -------------- 1.3.1 25th of November, 2002 Release v1.3.1 Update for kernel 2.4.19. Separate Kernel Hooks from LKST patch. Add module that record only specified process. Make sure lkstlogd release own resources. lkstlogd report warning to syslog. Add and modify APIs for event-handler and maskset. Fix some bugs. 1.3 30th of August, 2002 Release v1.3 Performance improvement of Kernel Hooks. Modify default maskset. Support CSV format by lkstbuf command. Add and modify APIs for event-handler. Add config to delete hooks on inline func. Add script to make /dev/lkst. Fix some bugs. 1.2 28th of June, 2002 Release v1.2 Performance improvement of default handler. 1.1 14th of May, 2002 Release v1.1 Replace LKCD patch with the official one. 1.01 26th of April, 2002 Release v1.01 Update lkstlogd command. Add trace data of exception events. Fix some bugs. 1.0 12th of April, 2002 Release v1.0 Add kernel crash dump function by using LKCD. Update some command. Add log header to log file. Fix some bugs. 0.9 31st of March, 2002 Release v0.9 for some enhancements. Add LKST logging daemon, Replace trace point hook to use Kernel Hooks, Add some add-ons and exaples. 0.1a 18th of January, 2002 Fix bug of the racing between etrc_evhandler_default() and etrc_buffer_{delete,create}(). Add some minor modification, clearfy some comments. 0.1 14th of December, 2001 First public drop