nand monkey老化测试内存泄露分析

1. 现场记录 6>[233147.509309] SysRq : Manual OOM execution <4>[233147.514648] kworker/2:2 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 <6>[233147.514669] kworker/2:2 cpuset=/ mems_allowed=0 <4>[233147.514680] CPU: 2 PID: 6447 Comm: kworker/2:2 Tainted: G O 3.10.65 #1 <4>[233147.514702] Workqueue: events moom_callback <0>[233147.514710] Call trace: <4>[233147.517515] [<ffffffc000088704>] dump_backtrace+0x0/0x11c <4>[233147.517531] [<ffffffc000088840>] show_stack+0x20/0x30 <4>[233147.517545] [<ffffffc000776364>] dump_stack+0x1c/0x28 <4>[233147.517555] [<ffffffc00077504c>] dump_header.isra.13+0x90/0x1a0 <4>[233147.517567] [<ffffffc0001548e0>] oom_kill_process+0x84/0x36c <4>[233147.517576] [<ffffffc000155050>] out_of_memory+0x268/0x290 <4>[233147.517584] [<ffffffc0003c13e0>] moom_callback+0x28/0x34 <4>[233147.517598] [<ffffffc0000b70e4>] process_one_work+0x270/0x3f0 <4>[233147.517607] [<ffffffc0000b8238>] worker_thread+0x210/0x330 <4>[233147.517619] [<ffffffc0000be100>] kthread+0xb4/0xc0 <4>[233147.517624] Mem-Info: <4>[233147.517633] DMA per-cpu: <4>[233147.517640] CPU 0: hi: 186, btch: 31 usd: 15 <4>[233147.517647] CPU 1: hi: 186, btch: 31 usd: 126 <4>[233147.517653] CPU 2: hi: 186, btch: 31 usd: 33 <4>[233147.517660] CPU 3: hi: 186, btch: 31 usd: 101 <4>[233147.517675] active_anon:6 inactive_anon:491 isolated_anon:0 <4>[233147.517675] active_file:126 inactive_file:152 isolated_file:0 <4>[233147.517675] unevictable:1727 dirty:0 writeback:1 unstable:0 <4>[233147.517675] free:65151 slab_reclaimable:2187 slab_unreclaimable:17740 <4>[233147.517675] mapped:364 shmem:0 pagetables:1563 bounce:0 <4>[233147.517675] free_cma:59401 <4>[233147.517711] DMA free:260604kB min:6644kB low:30268kB high:31928kB active_anon:24kB inactive_anon:1964kB active_file:504kB inactive_file:608kB unevictable:6908kB isolated(anon):0kB isolated(file):0kB present:1032192kB managed:690172kB mlocked:0kB dirty:0kB writeback:4kB mapped:1456kB shmem:0kB slab_reclaimable:8748kB slab_unreclaimable:70960kB kernel_stack:7104kB pagetables:6252kB unstable:0kB bounce:0kB free_cma:237604kB writeback_tmp:0kB pages_scanned:43 all_unreclaimable? no <4>[233147.517718] lowmem_reserve[]: 0 0 0 <4>[233147.517728] DMA: 503*4kB (UEM) 259*8kB (UEMC) 145*16kB (UEMC) 98*32kB (UEM) 129*64kB (UEMC) 161*128kB (UEMC) 55*256kB (UMC) 26*512kB (UMC) 9*1024kB (UMC) 14*2048kB (MC) 159*4096kB (MRC) = 754948kB <4>[233147.517778] 2525 total pagecache pages <4>[233147.517787] 496 pages in swap cache <4>[233147.517793] Swap cache stats: add 38367408, delete 38366912, find 10676545/15065336 <4>[233147.517798] Free swap = 1048kB <4>[233147.517803] Total swap = 163836kB <4>[233147.541740] 258048 pages RAM <4>[233147.541754] 7902 pages reserved <4>[233147.541759] 919895 pages shared <4>[233147.541764] 46073 pages non-shared <6>[233147.541770] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name <6>[233147.541822] [ 1094] 0 1094 2265 62 5 37 -1000 ueventd <6>[233147.541845] [ 1584] 0 1584 2265 59 5 36 -1000 fswatcherd <6>[233147.541862] [ 1619] 1023 1619 4065 0 7 207 -1000 sdcard <6>[233147.541874] [ 1620] 1036 1620 6560 0 13 1031 -1000 logd <6>[233147.541885] [ 1621] 0 1621 2515 71 5 49 -1000 healthd <6>[233147.541896] [ 1622] 0 1622 3344 0 7 109 -1000 lmkd <6>[233147.541909] [ 1623] 1000 1623 2604 0 6 81 -1000 servicemanager <6>[233147.541920] [ 1624] 0 1624 5456 0 10 229 -1000 vold <6>[233147.541932] [ 1625] 1000 1625 19260 0 41 746 -1000 surfaceflinger <6>[233147.541943] [ 1626] 0 1626 2528 1 6 73 -1000 sh <6>[233147.541955] [ 1637] 0 1637 2528 10 6 70 -1000 sh <6>[233147.541968] [ 1691] 0 1691 2528 48 6 52 -1000 sh <6>[233147.541979] [ 1693] 0 1693 2538 0 5 190 -1000 debuggerd <6>[233147.541991] [ 1694] 0 1694 3016 0 7 245 -1000 debuggerd64 <6>[233147.542002] [ 1695] 0 1695 3152 0 7 114 -1000 rild <6>[233147.542013] [ 1696] 1019 1696 6682 0 13 360 -1000 drmserver <6>[233147.542025] [ 1698] 1012 1698 2622 1 6 80 -1000 installd <6>[233147.542036] [ 1700] 1017 1700 3895 0 9 195 -1000 keystore <6>[233147.542048] [ 1701] 0 1701 2202 0 5 45 -1000 multi_ir <6>[233147.542059] [ 1702] 0 1702 16354 0 32 732 -1000 systemmixservic <6>[233147.542071] [ 1703] 0 1703 16352 2 30 733 -1000 isomountmanager <6>[233147.542083] [ 1704] 0 1704 16606 0 30 756 -1000 gpioservice <6>[233147.542094] [ 1705] 0 1705 16355 0 30 733 -1000 securefileserve <6>[233147.542106] [ 1707] 0 1707 367607 0 95 2005 -1000 main <6>[233147.542117] [ 1708] 0 1708 4047 70 7 411 -1000 adbd <6>[233147.542129] [ 1847] 0 1847 2489 0 6 80 -1000 logcat <6>[233147.542142] [ 8012] 0 8012 2489 0 5 96 -1000 logcat <6>[233147.542156] [ 6752] 0 6752 513629 0 113 2664 -1000 main <6>[233147.542168] [ 6754] 0 6754 6209 0 12 219 -1000 netd <6>[233147.542179] [ 6755] 1013 6755 39828 0 46 1044 -1000 mediaserver <6>[233147.542191] [ 7214] 1000 7214 542387 0 179 10210 -941 system_server <6>[233147.542203] [ 8091] 10010 8091 520573 0 126 7493 -705 ndroid.systemui <6>[233147.542217] [ 8519] 1000 8519 370035 0 74 2343 -705 iracastReceiver <6>[233147.542229] [ 8629] 1010 8629 4031 0 9 225 -1000 wpa_supplicant <6>[233147.542240] [ 9404] 1014 9404 2578 0 6 86 -1000 dhcpcd <6>[233147.542252] [16422] 10021 16422 520429 0 113 3643 117 putmethod.latin <6>[233147.542267] [31474] 10018 31474 535839 0 167 11959 0 er.firelauncher <6>[233147.542282] [ 1106] 10028 1106 376544 0 100 2980 294 ay.happyplay.aw <6>[233147.542294] [ 1131] 10003 1131 516663 0 104 3068 294 d.process.media <6>[233147.542305] [ 1155] 10026 1155 517634 0 101 2960 470 ftwinner.update <3>[233147.542315] Out of memory: Kill process 1155 (ftwinner.update) score 480 or sacrifice child
<

/ # cpu_monitor -u 1 -m 500

———————————H64–Mem State {unit:MB}———————————— — Total – Memory: 977 – Swap: 159 – Vma: 245759 Anon Slab Cache Buffer KernStack Total-Free Sys-Free Cma-Free Swap-Free Vma-Free 3 77 11 0 1 246 14 232 7 245691 6 77 9 0 1 245 14 231 10 245691 0 77 8 0 1 251 18 232 5 245691 0 77 7 0 1 252 19 233 4 245691 0 77 7 0 1 252 19 233 4 245691 0 77 7 0 1 252 18 233 4 245691 0 77 7 0 1 253 18 234 4 245691 0 77 7 0 1 253 18 234 4 245691 0 77 7 0 1 252 18 234 4 245691 0 77 7 0 1 252 19 233 4 245691 0 77 7 0 1 252 18 234 4 245691 0 77 7 0 1 252 18 233 4 245691 0 77 7 0 1 252 18 233 4 245691 0 77 7 0 1 252 18 234 4 245691 0 77 8 0 1 252 18 234 4 245691 0 77 7 0 1 251 19 232 4 245691 0 77 7 0 1 252 18 233 4 245691 0 77 8 0 1 252 18 234 4 245691 2. Android 统计lost ram方法:

Lost Ram = Total Ram – Used Ram -Free Ram 其中Used Ram = Android 用户进程PSS(包含Anon) + Kernel slab + KernelStack + PageTables

Free Ram = cache proces + Kernel File cache + Kernel free

由于上面的Used Ram对kernel统计不够全面,漏掉了kernel drive直接从buddy中申请的内存,例如(Gpu page alloc/dma alloc、VE/DE Cma alloc、+ 音频Dma alloc、binder vmalloc 和zram alloc等内存) 因此Lost = kernel reserve + kernel driver page alloc

3. 问题描述

(1)Android memoryleak检测 stop start 重启android 后,Lost Ram 依然保持不变,因此初步排除Lost Ram 跟Androd 内存泄露关联不大. (2)内核 memoryleak检测 内核 kmemleak检测,没发现出现明显的大内存泄露,包括slab内存申请和文件系统的page alloc申请.

(3)内核某些模块直接通过page alloc使用buddy内存,怀疑出现异常,导致内存无法释放,造成内存不断消耗

使用page owner 核对内核内存使用,数据量均正常,没有发现明显差异.初步排除此坏一点. (注:配置给zram的swap空间,会在压缩过程中直接从buddy申请内存,这部分内存被被算到Lost Ram中的)

(4)发现内存统计存在异常

cat /proc/pageinfo

Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10 Node 0, zone DMA, type Unmovable 51 136 19 327 339 152 27 2 1 1 0 Node 0, zone DMA, type Reclaimable 227 60 26 14 2 1 1 1 0 1 0 Node 0, zone DMA, type Movable 255 616 92 30 19 2 4 2 3 4 100 Node 0, zone DMA, type Reserve 0 0 0 0 0 0 0 0 0 0 2 Node 0, zone DMA, type CMA 1163 1577 1027 186 15 17 19 17 15 12 16 Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0 Number of blocks type Unmovable Reclaimable Movable Reserve CMA Isolate Node 0, zone DMA 33 4 138 2 75 0

计算pageinfo 节点中的free:

order-: 0-4k: 1696 1-8k: 2389 2-16k: 1164 3-32k: 557 4-64k: 375 5-128k: 172 6-256k: 46 7-512K: 20 8-1024K: 18 9-2048k: 18 10-4096k: 118 free : 6784 + 19112 + 18624 + 17824 + 24000 + 22016 + 11776 + 10240 + 18432 + 36864 + 483328 = 669000 (653MB) dumpsys meminfo Total RAM: 1000408 kB (status critical) Free RAM: 185240 kB (0 cached pss + 14908 cached kernel + 170332 free) Used RAM: 311229 kB (177513 used pss + 133716 kernel) Lost RAM: 500139 kB 4. 问题分析

结果: (1) 疑点1 Lost RAM = (pageinfo)free – (meminfo)free ,推断出系统从/proc/meminfo节点得到的mem free数据存在偏差,

导致dumpsys meminfo把偏差全部算在了Lost Ram中./proc/meminfo中的free数据为什么和pageinfo中统计数据不同? /proc/meminfo 的free 为:si_meminfo–>freeram = global_page_state(NR_FREE_PAGES) pageinfo 的free为: pdata->zone->free_area[order]–>free_list[mtype]统计

<6>[234260.975325] SysRq : Show Memory <4>[234260.978880] Mem-Info: <4>[234260.978889] DMA per-cpu: <4>[234260.978897] CPU 0: hi: 186, btch: 31 usd: 30 <4>[234260.978903] CPU 1: hi: 186, btch: 31 usd: 182 <4>[234260.978919] active_anon:27 inactive_anon:140 isolated_anon:32 <4>[234260.978919] active_file:39 inactive_file:161 isolated_file:0 <4>[234260.978919] unevictable:1727 dirty:0 writeback:0 unstable:0 <4>[234260.978919] free:64562 slab_reclaimable:2160 slab_unreclaimable:17660 <4>[234260.978919] mapped:305 shmem:0 pagetables:1524 bounce:0 <4>[234260.978919] free_cma:59927 <4>[234260.978951] DMA free:258248kB min:6644kB low:30268kB high:31928kB active_anon:108kB inactive_anon:560kB active_file:156kB inactive_file:644kB unevictable:6908kB isolated(anon):128kB isolated(file):0kB present:1032192kB managed:690172kB mlocked:0kB dirty:0kB writeback:0kB mapped:1220kB shmem:0kB slab_reclaimable:8640kB slab_unreclaimable:70640kB kernel_stack:6672kB pagetables:6096kB unstable:0kB bounce:0kB free_cma:239708kB writeback_tmp:0kB pages_scanned:32 all_unreclaimable? no <4>[234260.978957] lowmem_reserve[]: 0 0 0 <4>[234260.978968] DMA: 512*4kB (UEMC) 392*8kB (UEMC) 351*16kB (UEMC) 136*32kB (UEMC) 117*64kB (UEMC) 133*128kB (UEMC) 47*256kB (UMC) 23*512kB (UMC) 9*1024kB (UMC) 14*2048kB (MC) 159*4096kB (MRC) = 752624kB <4>[234260.979016] 2097 total pagecache pages <4>[234260.979025] 143 pages in swap cache <4>[234260.979031] Swap cache stats: add 45655669, delete 45655526, find 10847251/16089610 <4>[234260.979036] Free swap = 3404kB <4>[234260.979041] Total swap = 163836kB <4>[234260.982533] 258048 pages RAM <4>[234260.982533] 7902 pages reserved <4>[234260.982533] 788206 pages shared <4>[234260.982533] 46679 pages non-shared
<

DMA free:258248kB DMA: 5124kB (UEMC) 3928kB (UEMC) 35116kB (UEMC) 13632kB (UEMC) 11764kB (UEMC) 133128kB (UEMC) 47256kB (UMC) 23512kB (UMC) 91024kB (UMC) 142048kB (MC) 159*4096kB (MRC) = 752624kB 这两处存在明显的内存差异

(2)疑点2 Movable order-10 free 内存页100,合计483328KB,约472MB,理论上monkey测试系统碎片化会越来越严重,

为什么Movable 4MB大块连续内存这么多? 从monkey的测试过程来看,moveable的order-10 free 内存页确实在不停的增长,且增长的数量与lost ram存在很接近的数量比例. 在page_alloc过程中,怀疑order为0的页面申请__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order))出错.

免责声明:文章内容来自互联网,本站不对其真实性负责,也不承担任何法律责任,如有侵权等情况,请与本站联系删除。
转载请注明出处:nand monkey老化测试内存泄露分析 https://www.yhzz.com.cn/a/12722.html

上一篇 2023-05-10 21:53:03
下一篇 2023-05-10 23:14:33

相关推荐

联系云恒

在线留言: 我要留言
客服热线:400-600-0310
工作时间:周一至周六,08:30-17:30,节假日休息。