有个ft测试的环境,其中nginx使用docker发布的。测试用例是curl的时候,没有获得nginx的响应。

docker ps
CONTAINER ID        IMAGE                                                                  COMMAND                  CREATED             STATUS              PORTS                                                                                         NAMES
e39be26bc976        cdn-release-docker.artnj.zte.com.cn/img-for-ottci/python:latest        "/bin/bash -c /home/s"   12 minutes ago      Up 12 minutes       22/tcp, 9220/tcp                                                                              RF_THUMBNAIL
a7f8ff2c2fac        ft_thumbnail:V5.01.01.01                                               "/usr/sbin/init"         12 minutes ago      Up 12 minutes                                                                                                     FT_thumbnail
4881975f8e10        cdn-release-docker.artnj.zte.com.cn/img-for-ottci/python:latest        "/bin/bash -c /home/s"   31 minutes ago      Up 31 minutes       22/tcp, 9220/tcp                                                                              RF_filepackager
bfdc3beaf7e4        ft_filepackager:V5.01.01.01                                            "/usr/sbin/init bash"    31 minutes ago      Up 31 minutes                                

 进入容器,看一下具体信息:

docker exec -it bfdc3beaf7e4 bash

  敲一下top命令,可以发现多个nginx的work进程的cpu很高。

bash-4.2# top
TERM environment variable not set.
bash-4.2#
bash-4.2# echo TERM
TERM
bash-4.2# echo $TERM
dumb
bash-4.2# export M
bash-4.2# export TERM=dumb
bash-4.2# top
top - 09:28:13 up 81 days,  7:01,  0 users,  load average: 17.46, 44.24, 34.20
Tasks:  49 total,   2 running,  47 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.2 us,  4.4 sy,  0.0 ni, 93.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 26321361+total, 48650644 free, 24579464 used, 18998350+buff/cache
KiB Swap:        0 total,        0 free,        0 used. 23644137+avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1435 root      20   0 9855.0m  44528   3236 R  93.3  0.0  24:27.53 nginx

bash-4.2# strace -p 1435
bash: strace: command not found

  

由于并没有将strace命令一起打包到这个镜像中,所以需要在容器外和容器内的进程对应起来。然后查看对应的进程信息:

[root@localhost ~]# docker top bfdc3beaf7e4 |grep -i nginx
root                26807               22109               0                   17:33               ?                   00:00:00            wget -T 10 -t 1 -O /home/zxcdn/agent/plugin/stat/../../crontab/nginxstatus.tmp http://127.0.0.1:6620/NGINXSTATUS
root                30753               16121               0                   16:53               ?                   00:00:00            nginx: master process /home/zxcdn/ottcache/nginx/sbin/nginx -p /home/zxcdn/ottcache/nginx
root                30754               30753               72                  16:53               ?                   00:29:28            nginx: worker process

  有很多nginx的work进程,而且进程名相同,只截取部分。nginx开启的work进程是64个,所以有一大片。那怎么对应呢?有个小技巧,就是在容器里运行ps -ef |grep -i nginx |grep -i worker,然后对应的顺序,在容器外和docker top |grep -i nginx |grep -i worker是一致的。

在容器外找到对应的cpu高的进程滞后,先看一下这个进程是用户态cpu高还是内核态cpu高,可以使用

pidstat -p 30754 1 来确定,然后本文刚好是内核态cpu高,则可以使用strace 来看为什么该进程cpu高了。本文是因为对应的lua代码的覆盖率的频率设置太低,而且lua脚本有个地方触发了循环,导致了频繁操作read和write。

open("/home/devops/luacov/luacov.stats.out", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 18
fstat(18, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5178ddc000
write(18, "398:/home/zxcdn/ottcac 0 0 0 0 0"..., 4096) = 4096
write(18, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 4096) = 4096
write(18, "0 0 0 0 0 0 0 0 0 0 0 0 3 3 0 0 "..., 4096) = 4096
write(18, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 4096) = 4096
write(18, "1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "..., 4096) = 4096
write(18, "0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 "..., 4096) = 4096
write(18, "28 0 0 0 0 0 114 114 114 0 114 0"..., 4096) = 4096
write(18, "0 0 0 0 0 0 0 0 0 0 0 0 889 0 0 "..., 4096) = 4096
write(18, " 0 0 0 0 0 0 0 0 0 0 480 0 0 480"..., 4096) = 4096
write(18, " 0 0 0 0 354 354 0 0 0 0 0 0 0 0"..., 4096) = 4096
write(18, "0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "..., 204) = 204
close(18)                               = 0
munmap(0x7f5178ddc000, 4096)            = 0
close(20)                               = 0
close(22)                               = 0
open("/home/devops/luacov/luacov.stats.out", O_RDONLY) = 18
fstat(18, {st_mode=S_IFREG|0644, st_size=41164, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5178ddc000
read(18, "398:/home/zxcdn/ottcac 0 0 0 0 0"..., 4096) = 4096
read(18, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 4096) = 4096
read(18, "0 0 0 0 0 0 0 0 0 0 0 0 3 3 0 0 "..., 4096) = 4096
read(18, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 4096) = 4096
read(18, "1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "..., 4096) = 4096
read(18, "0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 "..., 4096) = 4096
read(18, "28 0 0 0 0 0 114 114 114 0 114 0"..., 4096) = 4096
read(18, "0 0 0 0 0 0 0 0 0 0 0 0 889 0 0 "..., 4096) = 4096
read(18, " 0 0 0 0 0 0 0 0 0 0 480 0 0 480"..., 4096) = 4096
read(18, " 0 0 0 0 354 354 0 0 0 0 0 0 0 0"..., 4096) = 4096
read(18, "0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "..., 4096) = 204