V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
V2EX 提问指南
programV2
V2EX  ›  问与答

请问大家是如何让 docker daemon 守护进程不退出的?

  •  1
     
  •   programV2 · 2020-12-02 10:50:57 +08:00 · 1686 次点击
    这是一个创建于 1239 天前的主题,其中的信息可能已经有所发展或是发生改变。

    突然收到服务监控通知, 发现 docker 上 4 个容器服务同时挂掉了, journalctl -u docker.service 进入日志查看发现了这个, 请问有 V 友碰到过吗? 该如何排错呢? 谢谢大家指点! 另外想请教大家现在都在用哪种方法让 docker daemon 守护进程不退出?

    Docker version 18.09.7, build 2d0083d linux 版本: 4.15.0-123-generic #126~16.04.1-Ubuntu SMP

    systemd[1]: Stopping Docker Application Container Engine...

    074627-05:00" level=info msg="Processing signal 'terminated'"

    049975-05:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

    444047-05:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

    699341-05:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

    441246-05:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

    204594-05:00" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby

    664254-05:00" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby

    systemd[1]: Stopped Docker Application Container Engine.

    12 条回复    2020-12-03 10:34:28 +08:00
    programV2
        1
    programV2  
    OP
       2020-12-02 10:54:51 +08:00
    后面我用 重启了 docker daemon 进程, 下边是余下的日志:

    systemd[1]: Starting Docker Application Container Engine...
    3548512-05:00" level=info msg="parsed scheme: \"unix\"" module=grpc
    4983202-05:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
    5244875-05:00" level=info msg="parsed scheme: \"unix\"" module=grpc
    5268345-05:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
    6388785-05:00" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///run/containerd/containerd.sock 0 <nil>}]" module=grpc
    6429898-05:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
    6495301-05:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4206e0ab0, CONNECTING" module=grpc
    7938884-05:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4206e0ab0, READY" module=grpc
    8150455-05:00" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///run/containerd/containerd.sock 0 <nil>}]" module=grpc
    8187954-05:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
    8245945-05:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4206e0da0, CONNECTING" module=grpc
    8847325-05:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4206e0da0, READY" module=grpc
    5119350-05:00" level=info msg="[graphdriver] using prior storage driver: overlay2"
    8117760-05:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
    8433459-05:00" level=warning msg="Your kernel does not support swap memory limit"
    8515283-05:00" level=warning msg="Your kernel does not support cgroup rt period"
    8529936-05:00" level=warning msg="Your kernel does not support cgroup rt runtime"
    9099663-05:00" level=info msg="Loading containers: start."
    7191821-05:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
    3364491-05:00" level=info msg="Loading containers: done."
    3809274-05:00" level=warning msg="failed to retrieve runc version: unknown output format: runc version spec: 1.0.1-dev\n"
    3797654-05:00" level=info msg="Docker daemon" commit=2d0083d graphdriver(s)=overlay2 version=18.09.7
    4010995-05:00" level=info msg="Daemon has completed initialization"
    9557408-05:00" level=info msg="API listen on /var/run/docker.sock"
    systemd[1]: Started Docker Application Container Engine.
    programV2
        2
    programV2  
    OP
       2020-12-02 11:22:27 +08:00
    大家都没碰到过吗?
    programV2
        3
    programV2  
    OP
       2020-12-02 11:29:38 +08:00
    Docker info 的output 

    Images: 5
    Server Version: 18.09.7
    Storage Driver: overlay2
    Backing Filesystem: extfs
    Supports d_type: true
    Native Overlay Diff: true
    Logging Driver: json-file
    Cgroup Driver: cgroupfs
    Plugins:
    Volume: local
    Network: bridge host macvlan null overlay
    Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
    Swarm: inactive
    Runtimes: runc
    Default Runtime: runc
    Init Binary: docker-init
    containerd version:
    runc version: N/A
    init version: v0.18.0 (expected: fec3683b971d9c3ef73f284f176672c44b448662)
    Security Options:
    apparmor
    seccomp
    Profile: default
    Kernel Version: 4.15.0-123-generic
    Operating System: Ubuntu 16.04.6 LTS
    OSType: linux
    Architecture: x86_64
    CPUs: 2
    Total Memory: 1001MiB
    Name: xxxx2.localdomain
    ID: xx42:xxxx:5JEH:DQ7M:SI6Z:JE5G:ZE75:ZVYI:3FDM:NCXE:DIDP:ELIT
    Docker Root Dir: /var/lib/docker
    Debug Mode (client): false
    Debug Mode (server): false
    Registry: https://index.docker.io/v1/
    Labels:
    Experimental: false
    Insecure Registries:
    127.0.0.0/8
    Live Restore Enabled: false

    WARNING: No swap limit support
    cheng6563
        4
    cheng6563  
       2020-12-02 14:18:29 +08:00
    074627-05:00" level=info msg="Processing signal 'terminated'"
    这怎么像是人工停的。
    给 systemd 服务配个自动重启吧。
    mritd
        5
    mritd  
       2020-12-02 14:30:57 +08:00
    哈哈哈哈 他妈的 我以为被坑的只有我 哈哈哈哈

    我也是今天早上发现某些服务器 docker daemon 没了,然后发现 container 实际上还在运行(开启了 --live-restore)
    看到的日志也是给了 terminated 信号,追查时间点定位在 6:40 左右

    然后果断的 systemctl list-timers

    ```sh
    06:40:13 CST 7h ago apt-daily-upgrade.timer apt-daily-upgrade.service
    ```

    接下来

    cat /var/log/unattended-upgrades/unattended-upgrades-dpkg.log

    ```sh
    Log started: 2020-12-02 06:40:15
    (Reading database ... 110792 files and directories currently installed.)
    Preparing to unpack .../containerd_1.3.3-0ubuntu2.1_amd64.deb ...
    Unpacking containerd (1.3.3-0ubuntu2.1) over (1.3.3-0ubuntu2) ...
    Setting up containerd (1.3.3-0ubuntu2.1) ...
    Processing triggers for man-db (2.9.1-1) ...
    ```

    坑了个爹,弥补措施就是

    ```sh
    apt-mark hold docker docker.io containerd
    systemctl disable apt-daily.timer apt-daily-upgrade.timer
    systemctl stop apt-daily.timer apt-daily-upgrade.timer
    ```
    programV2
        6
    programV2  
    OP
       2020-12-02 15:47:02 +08:00 via iPhone
    @mritd 昨天晚上就已经挂掉了。让我排查一整天。你这样管用吗?我是先把自动升级给关掉了。
    mritd
        7
    mritd  
       2020-12-02 15:51:32 +08:00
    @programV2 apt-mark hold 可以让某个软件包在 upgrade 时候不升级,我当时忘记 hold containerd 了;然后所幸把这个 自动升级也关了,差点坑死我这个玩意。
    programV2
        8
    programV2  
    OP
       2020-12-02 16:04:00 +08:00 via iPhone
    @mritd 国外论坛昨晚很多人发帖反馈了,我们论坛里面这么少人用容器吗?怎么都没人反馈呀?我还差点以为是我其他软件出问题了,怎么都想不到是这个?
    mritd
        9
    mritd  
       2020-12-02 16:10:38 +08:00
    @programV2 #8 我是连续出现了 3 台机器,一台生产一台测试还有一个国外 vps ;然后觉得事情没有这么简单排查了一下 哈哈
    programV2
        10
    programV2  
    OP
       2020-12-02 16:45:16 +08:00 via iPhone
    @mritd v 友你也是 Ubuntu 16 吗?有个问题想请教一下你,我用 journalctl -u docker.service 进入日志查看想要复制日志,发现日志太长,我在手机上用选中的方式没法全部都复制下来,这种情况有什么命令适合将日志全部复制出来吗?
    julyclyde
        11
    julyclyde  
       2020-12-02 19:50:13 +08:00
    诶不过按说
    docker 和 containerd 重启
    应该不影响已经启动的容器啊?
    mritd
        12
    mritd  
       2020-12-03 10:34:28 +08:00
    @programV2 #10 我们是 18/20 ;其实正常的 journalctl -u docker 接管道就可以把日志传输到下个程序,在 mac 上有 pbcopy ;为了在 ssh 中也可以直接复制到本地,我自己弄个一个工具 可以在无限远端直接复制到本地,手机上就不知道了....
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   我们的愿景   ·   实用小工具   ·   1468 人在线   最高记录 6543   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 30ms · UTC 17:22 · PVG 01:22 · LAX 10:22 · JFK 13:22
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.