早上业务监控发现主备出现较高延迟,观察slave status如下:
mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 10.62.52.115 Master_User: sla Master_Port: 8803 Connect_Retry: 2 Master_Log_File: mysql-bin.000029 Read_Master_Log_Pos: 49766 Relay_Log_File: relay-bin.000046 Relay_Log_Pos: 10963 Relay_Master_Log_File: mysql-bin.000016 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 10790 Relay_Log_Space: 21289747354 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 521158803 Master_UUID: 97946853-a581-11ea-a199-fcbcd17390d4 Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Waiting for Slave Workers to free pending events Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 97946853-a581-11ea-a199-fcbcd17390d4:856187-1631090 Executed_Gtid_Set: 97946853-a581-11ea-a199-fcbcd17390d4:1-1614044, e00cb509-a581-11ea-be11-f41d6b6ff7b3:1-587 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec)当前显示主从并无异常,但是Slave_SQL_Running_State状态让人起疑。
Slave_SQL_Running_State: Waiting for Slave Workers to free pending events按照大佬们的说法先改成单线程复制,等复制延迟追上后再换回多线程。 执行stop slave;时卡住,无法退出。观察另一套从库发现正在执行一条delete 8w+秒。 由于无法预估影响时间,且从库不做读库使用,直接选择重做主从。
