首页 > 技术知识 > 正文

1. 前言

这里总结几种系统异常时,常用的几种调试方法

2. Debuggerd

Debuggerd 和 echo t > /proc/sysrq-trigger 一起调试进程空间和内核空间死锁、睡眠问题

3.Kill命令

Kill -6 可以打印所有进程的core dump backtrace,数据会保存到/data/tombstones/tombstone_0{0..9}递增文件中,同时也会打印一份保存到 data/anr/traces.txt文件中。其效果和debuggerd 打印的core dump结果一致;

Kill -3 可以打印zygote进程空间的core dump backtrac,数据只保存到/data/anr/traces.txt文件中,类似于AMS 中watchdog服务检查到ANR后打出的traces.txt结果一致;

4.Strace usage: strace [-CdffhiqrtttTvVxxy] [-I n] [-e expr]… [-a column] [-o file] [-s strsize] [-P path]… -p pid… / [-D] [-E var=val]… [-u username] PROG [ARGS] or: strace -c[df] [-I n] [-e expr]… [-O overhead] [-S sortby] -p pid… / [-D] [-E var=val]… [-u username] PROG [ARGS] -c — count time, calls, and errors for each syscall and report summary -C — like -c but also print regular output -w — summarise syscall latency (default is system time) -d — enable debug output to stderr -D — run tracer process as a detached grandchild, not as parent -f — follow forks, -ff — with output into separate files -i — print instruction pointer at time of syscall -q — suppress messages about attaching, detaching, etc. -r — print relative timestamp, -t — absolute timestamp, -tt — with usecs -T — print time spent in each syscall -v — verbose mode: print unabbreviated argv, stat, termios, etc. args -x — print non-ascii strings in hex, -xx — print all strings in hex -y — print paths associated with file descriptor arguments -h — print help message, -V — print version -a column — alignment COLUMN for printing syscall results (default 40) -b execve — detach on this syscall -e expr — a qualifying expression: option=[!]all or option=[!]val1[,val2]… options: trace, abbrev, verbose, raw, signal, read, write -I interruptible — 1: no signals are blocked 2: fatal signals are blocked while decoding syscall (default) 3: fatal signals are always blocked (default if -o FILE PROG) 4: fatal signals and SIGTSTP (^Z) are always blocked (useful to make strace -o FILE PROG not stop on ^Z) -o file — send trace output to FILE instead of stderr -O overhead — set overhead for tracing syscalls to OVERHEAD usecs -p pid — trace process with process id PID, may be repeated -s strsize — limit length of print strings to STRSIZE chars (default 32) -S sortby — sort syscall counts by: time, calls, name, nothing (default time) -u username — run command as username handling setuid and/or setgid -E var=val — put var=val in the environment for command -E var — remove var from the environment for command -P path — trace accesses to path
<

strace -Ff -p 1364 -T Android 系统调试技巧(4)系统异常调试

strace -Ff -p 1364 -T -r Android 系统调试技巧(4)系统异常调试1

strace -Ff -p 1364 -T -t Android 系统调试技巧(4)系统异常调试2

或者 strace -Ff -p 1364 -T -tt Android 系统调试技巧(4)系统异常调试3

strace -Ff -p 1364 -T -tt -o /data/strace.log Android 系统调试技巧(4)系统异常调试4

strace -Ff -p 1364 -c 系统调用耗时 Android 系统调试技巧(4)系统异常调试5

strace -Ff -p 1364 -c -w等待系统调用耗时 Android 系统调试技巧(4)系统异常调试6

strace -Ff -p 1364 -y -tt -T Android 系统调试技巧(4)系统异常调试7

5. 应用进程ANR

(1)首先通过strace -fF -p {$PID} 确认到具体的线程ANR状态 (2)通过debuggerd -b {$PID} 确认线程backtrace栈状态 (3)异步等待 ANR线程A:A1在等待同一个进程空间的线程A:B1处理任务,再通过strace追踪线程状态; (4)同步睡眠 ANR线程A:A1在等待一个锁,检查锁被哪个线程占用; (5)系统调用阻塞 ANR线程A:A1在系统调用中发生睡眠,打印出进程在内核空间的栈分析系统调用睡眠原因 (6)进程间通信等待 ANR线程A:A1在进程间通信binder过程中睡眠,通过当前进程proc的binder线程状态确认线程等待关系,例如线程A:A1等待线程B:B1,通过strace或者debuggerd确认线程B:B1状态,对B:B1 的分析同样要去判断是否发生异步等待、同步睡眠、统调用阻塞和进程间通信等待

6. Monkey稳定性问题

monkey问题排查思路,monkey测试停止,无非有两种情况:

系统异常重启; 内核内存回收oom kill掉monkey(内存泄漏)

1.android场景下,一般都是a情况,针对a情况,有很多类型: 1). 系统native重要进程abort掉,父进程init进程kill掉所有子进程,重启系统; 2). system server watchdog 检测到ANR,kill掉system server,zygote检测到system server子进程退出,自己kill掉自己,init检测到子进程ygote退出后,kill掉所有的子进程重启; 3). system server 进程空间线程发生异常abort掉,走了2)的流程

2. 排查此类问题,首先要从后台log中,检查a情况是否发生,通过搜索关键字 AndroidRuntime START com.android.internal.os.ZygoteInit

如果关键字发生两次以上,说明系统发生了重启,确认了a类问题后,仍需进一步确认1)、2)、3)三类情况中的哪一种,方法如下:

针对1)类问题,搜索一下关键字,然后反向搜索,确认是否是系统native进程例如surfaceflinger发生异常; ServiceManager( 1584): service display died

ServiceManager( 1584): service usagestats died

ServiceManager( 1584): service batterystats died

针对2)类问题,执行搜索关键字: WATCHDOG KILLING SYSTEM PROCESS

针对3)类问题,执行搜索关键字: system_server

3. 现场问题分析注意事项: 首先要在log文件中,确认Zygote 和 SystemServer进程pid,然后才能去检索第一现场附件的log,一旦系统出现多次重启,很容易迷失在log中。

系统Zygote初始化关键字: 01-01 08:01:53.770 D/AndroidRuntime( 1590): >>>>>> AndroidRuntime START com.android.internal.os.ZygoteInit <<<<<< Zygote初始化system server关键字: 01-01 08:02:01.210 I/dalvikvm( 1590): System server process 2369 has been created 01-01 08:02:01.220 I/SystemServer( 2369): start SystemServer main :16606

猜你喜欢