redis 性能监控和排查

redis出现瓶颈的问题,现在把排查的一些经验记录下来备查，本篇只是思路的整理，不涉及具体的使用。

大体的思路如下：

1.通过slow log查看

参考 http://www.cnblogs.com/onmyway20xx/p/5486604.html

　　查看下是否有较为明显的慢查询？一般认为出现慢查询的话，redis性能瓶颈已经比较明显了

2. 通过info 查看；

　　info里面的信息比较多，通常关注以下几块

　　　# Memory　　　　
　　　　used_memory_human:795.13K #redis现在占用的内存，有可能包括SWAP虚拟内存。
　　　　used_memory_rss:18259968　　#系统给redis分配的内存　　
　　　　used_memory_peak_human:9.51M　　#Redis所用内存的峰值　
　　　　mem_fragmentation_ratio:22.43 #used_memory_rss/used_memory ,当mem_fragmentation_ratio <1 时，说明used_memory > used_memory_rss，

　　　　这时Redis已经在使用SWAP，运行性能会受很大影响。

3. 通过benchmark测试下当前服务器的性能；

4. 通过MONITOR测算一次请求对redis操作的次数；

1. INFO

info指令返回服务器相关信息，包括：

server: General information about the Redis server
clients: Client connections section
memory: Memory consumption related information
persistence: RDB and AOF related information
stats: General statistics
replication: Master/slave replication information
cpu: CPU consumption statistics
commandstats: Redis command statistics
cluster: Redis Cluster section
keyspace: Database related statistics

其本身支持定制返回列表：

[root@~]# redis-cli info
[root@~]# redis-cli info default
[root@~]# redis-cli info all

详情请见： http://www.redis.cn/commands/info.html

2. MONITOR

MONITOR是一个调试命令，返回服务器处理的每一个命令，它能帮助我们了解在数据库上发生了什么操作。共有3种操作方法：

[root@~]# redis-cli monitor
OK
1417532512.619715 [0 127.0.0.1:55043] "REPLCONF" "ACK" "6623624"
[root@~]# telnet 127.0.0.1 6379
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
monitor
+OK
+1417532567.733458 [0 127.0.0.1:55043] "REPLCONF" "ACK" "6623708"
+1417532568.735936 [0 127.0.0.1:55043] "REPLCONF" "ACK" "6623708"
quit
+OK
Connection closed by foreign host.
[root@~]# redis-cli 
127.0.0.1:6379> monitor
OK
1417532590.785487 [0 127.0.0.1:55043] "REPLCONF" "ACK" "6623736"

由于MONITOR命令返回服务器处理的所有的命令, 所以在性能上会有一些消耗。使用官方的压测工具测试结果如下

在不运行MONITOR命令的情况下，benchmark的测试结果:

[root@~/software/redis-2.8.17]# src/redis-benchmark -c 10 -n 100000 -q
PING_INLINE: 51020.41 requests per second
PING_BULK: 50607.29 requests per second
SET: 37257.82 requests per second
GET: 49800.80 requests per second
INCR: 38699.69 requests per second
LPUSH: 38910.51 requests per second
LPOP: 39277.30 requests per second
SADD: 54614.96 requests per second
SPOP: 51948.05 requests per second
LPUSH (needed to benchmark LRANGE): 38819.88 requests per second
LRANGE_100 (first 100 elements): 20112.63 requests per second
LRANGE_300 (first 300 elements): 9025.27 requests per second
LRANGE_500 (first 450 elements): 6836.67 requests per second
LRANGE_600 (first 600 elements): 5406.28 requests per second
MSET (10 keys): 19394.88 requests per second

在运行MONITOR命令的情况下，benchmark的测试结果: (redis-cli monitor > /dev/null):

[root@~/software/redis-2.8.17]# src/redis-benchmark -c 10 -n 100000 -q
PING_INLINE: 42211.91 requests per second
PING_BULK: 42936.88 requests per second
SET: 26143.79 requests per second
GET: 33990.48 requests per second
INCR: 26553.37 requests per second
LPUSH: 27337.34 requests per second
LPOP: 27225.70 requests per second
SADD: 30459.95 requests per second
SPOP: 39494.47 requests per second
LPUSH (needed to benchmark LRANGE): 26315.79 requests per second
LRANGE_100 (first 100 elements): 22055.58 requests per second
LRANGE_300 (first 300 elements): 8104.38 requests per second
LRANGE_500 (first 450 elements): 6371.05 requests per second
LRANGE_600 (first 600 elements): 5031.95 requests per second
MSET (10 keys): 14861.05 requests per second

可以看到各项指标基本都有所下降。

详情请见： http://www.redis.cn/commands/monitor.html

3. SLOWLOG

通过SLOWLOG可以读取慢查询日志。

使用SLOWLOG LEN就可以获取当前慢日志长度。

[root@~/software/redis-2.8.17]# redis-cli 
127.0.0.1:6379> slowlog len
(integer) 28

使用SLOWLOG GET就可以获取所有慢日志。

127.0.0.1:6379> slowlog get 
 1) 1) (integer) 27
    2) (integer) 1417531320
    3) (integer) 24623
    4) 1) "info"

其中，各项指标表示：

A unique progressive identifier for every slow log entry.
The unix timestamp at which the logged command was processed.
The amount of time needed for its execution, in microseconds（注意，microseconds翻译成微秒，而不是毫秒）.
The array composing the arguments of the command.

使用SLOWLOG GET N就可以获取最近N条慢日志。

127.0.0.1:6379> slowlog get 2
1) 1) (integer) 27
  2) (integer) 1417531320
  3) (integer) 24623
  4) 1) "info"
2) 1) (integer) 26
  2) (integer) 1417528379
  3) (integer) 21363
  4) 1) "get"
    2) "user:score"

使用SLOWLOG RESET命令重置慢日志。一旦执行，将丢失以前的所有慢日志。

127.0.0.1:6379> slowlog reset
3. redis延迟时间排查

最近数据量越来越多，并发写操作很多的情况下，Redis出现响应慢的情况；

可以使用 Redis命令来测试一下redis的响应速度：

redis-cli --latency -h xxx -p xxxx

这条命令会向Redis插入示例数据来检查平均延时。 Ctrl+C可以随时结束测试；

下面我们列一下会出现延时的可能：

硬件，系统：硬件问题是所有问题最底层的问题了，如果硬件慢，例如CPU主频低，内存小，磁盘IO慢，这些会让所有运行在上面的系统响应慢；另外，使用虚拟机会让系统运行的性能太为下降；当然，有钱的话，这问题很容易解决；系统方面，Linux本身的系统资源调度也会产生一定的延时。这些一般不会很大，可以忽略不计；
网络：如果客户端和redis在同一台服务器上，使用socket建立连接会比监听 TCP/IP 端口快很多；
Redis命令：一些时间复杂度比较高的命令，如 lrem，sort，sunion等命令会花比较长时间；另外，大量的重复连接也会造成延时，重用连接是一种很好的品质；如果有大量写操作，可以使用 pipeline 管道的方式（类似mysql事务），一次性提交，这样数据量也少了，连接次数也少了，不用每次都返回数据，速度自然会快很多；
持久化：Redis持久化需要fork出一个进程来进行持久化操作，这本身就会引发延时，如果数据变化大，RDB配置时间短，那这个代价还是挺大的；再加上，硬盘这东西真有点不靠谱，如果还是虚拟机上的虚拟硬盘，如果还是NFS共享目录，那这延时会让你崩溃。所以，如果系统不需要持久化，关了吧。
相关参考地址：https://blog.csdn.net/li123128/article/details/85531376
https://blog.csdn.net/xuyunti/article/details/84766936