MySQL排障实战(一)—— 连接异常中断
问题背景
数栈数据质量模块,接入客户的数据源后,一执行就报错。报错信息:
{"logInfo": {{"jobid":"1a4ebbbd","msg_info":"2021-12-20 14:23:54:submit job is success"}} "engineLog": {com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure The last packet successfully received from the server was 43 milliseconds ago. The last packet sent successfully to the server was 70 milliseconds ago.}
问题分析
通常这类错误是数据库连接空闲时间超过wait_timeout,连接被服务端回收。但是这个场景下,新建立的连接立刻就中断了。
产品功能运行几秒钟就会报错。
MySQL服务端日志
错误日志里有大量报错
2021-12-20T01:43:32.319449Z 102 [Note] Aborted connection 102 to db: 'idatatesta' user: 'acc' host: 'BF-202111221553' (Got an error reading communication packets) 2021-12-20T01:43:32.495966Z 103 [Note] Aborted connection 103 to db: 'idatatestb' user: 'acc' host: 'BF-202111221553' (Got an error reading communication packets) 2021-12-20T01:43:32.898337Z 104 [Note] Aborted connection 104 to db: 'idatatesta' user: 'acc' host: 'BF-202111221553' (Got an error reading communication packets) 2021-12-20T01:43:32.993156Z 105 [Note] Aborted connection 105 to db: 'idatatestb' user: 'acc' host: 'BF-202111221553' (Got an error reading communication packets) 2021-12-20T01:47:09.177221Z 108 [Note] Aborted connection 108 to db: 'idatatesta' user: 'acc' host: 'BF-202111221553' (Got an error reading communication packets) 2021-12-20T01:47:09.304512Z 109 [Note] Aborted connection 109 to db: 'idatatesta' user: 'acc' host: 'BF-202111221553' (Got an error reading communication packets) 2021-12-20T01:47:42.890847Z 110 [Note] Aborted connection 110 to db: 'idatatesta' user: 'acc' host: 'BF-202111221553' (Got an error reading communication packets)
在客户实例上重现问题
使用这个数据源同步数据没有遇到问题。研发把数据质量的业务代码单独拉出来,做了一个测试案例,一执行就会报错。
使用java代码测试
import java.sql.*; public class Test { public static void main(String[] args) { try { test(); } catch (SQLException throwables) { throwables.printStackTrace(); } } public static void test() throws SQLException { String jdbcUrl = "jdbc:mysql://host:port/idatatesta?useUnicode=true&queryTimeout=2400&characterEncoding=utf-8&tinyInt1isBit=false&useSSL=false&serverTimezone=Asia/Shanghai&autoReconnect=true&failOverReadOnly=false&allowMultiQueries=true"; String username = ""; String password = ""; String processSql = "CREATE PROCEDURE xxxaabb() BEGIN\n" + " DROP TABLE\n" + " IF\n" + " EXISTS idatatesta.temp_data_24c92b71;\n" + " set @val_0 =(\n" + " SELECT\n" + " count( 1 ) AS null_count \n" + " FROM\n" + " `idatatesta`.`user` \n" + " WHERE\n" + " NAME IS NULL \n" + " );\n" + " CREATE TABLE\n" + " IF\n" + " NOT EXISTS idatatesta.temp_data_24c92b71 (\n" + " `tenant_id` INT ( 11 ) NOT NULL DEFAULT '0',\n" + " `monitor_id` INT ( 11 ) NOT NULL,\n" + " `rule_id` INT ( 11 ) NOT NULL,\n" + " `record_id` INT ( 11 ) NOT NULL,\n" + " `val` VARCHAR ( 255 ) \n" + " ) ENGINE = INNODB DEFAULT CHARSET = utf8;\n" + " insert INTO idatatesta.temp_data_24c92b71 ( tenant_id, monitor_id, rule_id, record_id, val )\n" + " VALUES\n" + " ( 57, 507, 1623, 13819, @val_0 );\n" + " drop TABLE\n" + " IF\n" + " EXISTS `idatatesta`.`dq_monitor_24c92b71`;\n" + " CREATE TABLE `idatatesta`.`dq_monitor_24c92b71` (\n" + " `id` INT ( 11 ) UNSIGNED NOT NULL COMMENT '编号',\n" + " `name` VARCHAR ( 255 ) DEFAULT NULL COMMENT '姓名',\n" + " `create_date` datetime DEFAULT NULL COMMENT '创建时间',\n" + " `update_date` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT '修改>时间',\n" + " `sex` VARCHAR ( 4 ) DEFAULT NULL COMMENT '0:女,1:男' \n" + " ) ENGINE = INNODB DEFAULT CHARSET = utf8 COMMENT = '用户表';\n" + " ALTER TABLE `idatatesta`.`dq_monitor_24c92b71` ADD `dq_job_id` VARCHAR ( 100 );\n" + " ALTER TABLE `idatatesta`.`dq_monitor_24c92b71` ADD `dq_rule_id` INT;\n" + " ALTER TABLE `idatatesta`.`dq_monitor_24c92b71` ADD cyc_time datetime;\n" + " ALTER TABLE `idatatesta`.`dq_monitor_24c92b71` ADD check_column VARCHAR ( 100 );\n" + " INSERT INTO `idatatesta`.`dq_monitor_24c92b71` SELECT\n" + " *,\n" + " '24c92b71',\n" + " 1623,\n" + " '2021-12-24 13:52:26',\n" + " 'name' \n" + " FROM\n" + " `idatatesta`.`user` \n" + " WHERE\n" + " NAME IS NULL;\n" + "end;"; Connection connection = DriverManager.getConnection(jdbcUrl, username, password); Statement statement = connection.createStatement(); System.out.println("XXX"); System.out.println(processSql); statement.execute(processSql); System.out.println("YYY"); //调用存储过程 String procCall = "call xxxaabb()"; CallableStatement callableStatement = connection.prepareCall(procCall); System.out.println("ZZZ"); callableStatement.execute(); System.out.println("AAA"); } }
测试代码很简单,就是创建一个存储过程,然后再调用存储过程。
代码在创建存储过程的时候就会抛异常。
更换jdbc驱动
使用最新版本的jdbc驱动测试,一样会报同样的错。
使用mysql客户端测试
将创建存储过程的SQL单独复制出来,使用mysql客户端连接,创建存储过程,也会报错。
ERROR 2013 (HY000): Lost connection to MySQL server during query
简化测试案例
用最简单的SQL测试,同样会报错。只要sql语句超过一定长度,就能复现问题。
mysql> select 'VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV' as c; ERROR 2013 (HY000): Lost connection to MySQL server during query
搭建测试环境尝试
客户的mysql安装在windows上,版本是5.7.35。我们在windows服务器上搭建一个类似的环境做测试,没有复现问题,排查mysql bug。
安装mysql
下载mysql windows软件包、
解压
初始化数据库
启动mysqld
修改root默认密码
mysqld --initialize mysqld
默认密码在datadir下的错误日志中:
2021-12-30T07:07:58.587500Z 0 [Warning] CA certificate ca.pem is self signed. 2021-12-30T07:07:59.837500Z 1 [Note] A temporary password is generated for root@localhost: s.rntBi)D3(b
测试
安装完成后测试,并没有复现问题。
到客户环境复现问题
客户一直说他的环境没有问题。我们登录客户环境,使用mysql客户端连接数据库,使用同样的SQL复现了问题。
客户之前使用navicat,没有复现问题,但是使用mysql命令行客户端很快就复现了问题。
客户环境下,使用127.0.0.1访问数据库,执行sql不会报错。
确认网络环境
跟客户了解,使用了花生壳做内网穿透。客户基本也认可了问题出在花生壳上。
安装花生壳
我们在自己的服务器上也安装了花生壳,尝试重现问题。然而并没有问题。
确认客户的花生壳端口映射配置
这里的映射类型是HTTP,
修改映射类型
建议客户将映射类型改成TCP。修改后再测试,没有再出现执行长SQL就会连接断开的现象。
总结
这是一个奇葩的问题。
我们的思路是
复现问题
用最简单的场景复现问题
对比差异(参数、网络环境、软件版本等)
定位和解决问题