Redis频繁出现AOF刷盘被阻塞相关日志

一、背景说明

  • 客户咨询 redis cluster 集群中的实例日志经常出现这种 AOF 写入被阻塞的日志,要搞明白什么原因所导致的。

  • redis AOF 相关配置文件

appendonly                                yes
appendfilename                            redis.aof
appendfsync                               everysec
auto-aof-rewrite-percentage               100
auto-aof-rewrite-min-size                 1gb
no-appendfsync-on-rewrite                 no
  • redis 日志
3921:S 24 Jul 21:00:54.067 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
3921:S 24 Jul 21:00:56.071 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
3921:S 24 Jul 21:00:58.075 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
3921:S 24 Jul 21:01:00.078 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
3921:S 24 Jul 21:01:01.516 * FAIL message received from 390b9d8ce1c68ffc6637ee5a3c27d895a20ebaad about 96d0027ca496bcab060b142088607e2dfe465d71
3921:S 24 Jul 21:01:02.081 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
3921:S 24 Jul 21:01:04.084 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
3921:S 24 Jul 21:01:06.086 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
3921:S 24 Jul 21:01:08.089 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
3921:S 24 Jul 21:01:10.093 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
3921:S 24 Jul 21:01:12.097 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
3921:S 24 Jul 21:01:14.002 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
3921:S 24 Jul 21:01:16.006 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
3921:S 24 Jul 21:01:18.008 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
3921:S 24 Jul 21:01:20.011 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
3921:S 24 Jul 21:01:22.015 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
3921:S 24 Jul 21:01:22.436 * Clear FAIL state for node 96d0027ca496bcab060b142088607e2dfe465d71: slave is reachable again.
3921:S 24 Jul 22:02:17.350 * Starting automatic rewriting of AOF on 100% growth
3921:S 24 Jul 22:02:17.356 * Background append only file rewriting started by pid 10604
3921:S 24 Jul 22:02:17.944 * AOF rewrite child asks to stop sending diffs.
10604:C 24 Jul 22:02:17.944 * Parent agreed to stop sending diffs. Finalizing AOF...
10604:C 24 Jul 22:02:17.944 * Concatenating 0.00 MB of AOF diff received from parent.
10604:C 24 Jul 22:02:17.944 * SYNC append only file rewrite performed
10604:C 24 Jul 22:02:17.948 * AOF rewrite: 6 MB of memory used by copy-on-write
3921:S 24 Jul 22:02:17.957 * Background AOF rewrite terminated with success
3921:S 24 Jul 22:02:17.957 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
3921:S 24 Jul 22:02:17.957 * Background AOF rewrite finished successfully

二、问题解决

  • no-appendfsync-on-rewrite 参数

  • 同时在执行 bgrewriteaof 操作和主进程写aof文件的操作,两者都会操作磁盘,而 bgrewriteaof 往往会涉及大量磁盘操作,这样就会造成主进程在写aof文件的时候出现阻塞的情形,当前如果该参数设置为no,是最安全的方式,不会丢失数据,但是要忍受阻塞的问题。如果设置为yes呢?这就相当于将appendfsync设置为no,这说明并没有执行磁盘操作,只是写入了系统缓冲区,因此这样并不会造成阻塞(因为没有竞争磁盘),但是如果这个时候redis挂掉,就会丢失数据。丢失多少数据呢?在linux的操作系统的默认设置下,最多会丢失30s的数据。

  • 因此,如果应用系统无法忍受延迟,而可以容忍少量的数据丢失,则设置为yes。如果应用系统无法忍受数据丢失,则设置为no。

  • 修改后的配置文件 redis AOF 相关配置文件

appendonly                                yes
appendfilename                            redis.aof
appendfsync                               everysec
auto-aof-rewrite-percentage               100
auto-aof-rewrite-min-size                 1gb
no-appendfsync-on-rewrite                 yes    # 改为 yes
「点点赞赏,手留余香」

    还没有人赞赏,快来当第一个赞赏的人吧!
0 条回复 A 作者 M 管理员
    所有的伟大,都源于一个勇敢的开始!
欢迎您,新朋友,感谢参与互动!欢迎您 {{author}},您在本站有{{commentsCount}}条评论