首页 > 代码库 > MHA+ProxySQL实现读写分离高可用

MHA+ProxySQL实现读写分离高可用

最近在研究ProxySQL,觉得还挺不错的,所以就简单的折腾了一下,ProxySQL目前也是Percona在推荐的一个读写分离的中间件。关于详细的介绍可以参考官方文档。https://github.com/sysown/proxysql/wiki

本文主要介绍的是MHA+ProxySQL读写分离以及高可用,ProxySQL的细节请参考文档,目前已经有人写发非常详细了,文章最后会给出链接。当然和Group Replication,PXC搭配那是更完美的。关于MHA的配置参考我前面的文章,本文就不再介绍。下面来看看

MHA和ProxySQL搭配使用的架构图:

技术分享

 

测试环境(1主2从):

M-> 192.168.0.20S1-> 192.168.0.30S2-> 192.168.0.40proxysql -> 192.168.0.10

1. 检查主从复制是否正常:

技术分享
[root@db_server_yayun_04 ~]# masterha_check_repl --conf=/data/mha/3306/mha.cnf Thu Jun 15 17:45:32 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Thu Jun 15 17:45:32 2017 - [info] Reading application default configuration from /data/mha/3306/mha.cnf..Thu Jun 15 17:45:32 2017 - [info] Updating application default configuration from /usr/local/bin/load_cnf..Thu Jun 15 17:45:32 2017 - [info] Reading server configuration from /data/mha/3306/mha.cnf..Thu Jun 15 17:45:32 2017 - [info] Setting max_ping_errors to 10, ping_interval to 3.Thu Jun 15 17:45:32 2017 - [info] MHA::MasterMonitor version 0.57.Thu Jun 15 17:45:32 2017 - [info] GTID failover mode = 1Thu Jun 15 17:45:32 2017 - [info] Dead Servers:Thu Jun 15 17:45:32 2017 - [info] Alive Servers:Thu Jun 15 17:45:32 2017 - [info]   192.168.0.20(192.168.0.20:3306)Thu Jun 15 17:45:32 2017 - [info]   192.168.0.30(192.168.0.30:3306)Thu Jun 15 17:45:32 2017 - [info]   192.168.0.40(192.168.0.40:3306)Thu Jun 15 17:45:32 2017 - [info] Alive Slaves:Thu Jun 15 17:45:32 2017 - [info]   192.168.0.30(192.168.0.30:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabledThu Jun 15 17:45:32 2017 - [info]     GTID ONThu Jun 15 17:45:32 2017 - [info]     Replicating from 192.168.0.20(192.168.0.20:3306)Thu Jun 15 17:45:32 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Thu Jun 15 17:45:32 2017 - [info]   192.168.0.40(192.168.0.40:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabledThu Jun 15 17:45:32 2017 - [info]     GTID ONThu Jun 15 17:45:32 2017 - [info]     Replicating from 192.168.0.20(192.168.0.20:3306)Thu Jun 15 17:45:32 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Thu Jun 15 17:45:32 2017 - [info] Current Alive Master: 192.168.0.20(192.168.0.20:3306)Thu Jun 15 17:45:32 2017 - [info] Checking slave configurations..Thu Jun 15 17:45:32 2017 - [info] Checking replication filtering settings..Thu Jun 15 17:45:32 2017 - [info]  binlog_do_db= , binlog_ignore_db= Thu Jun 15 17:45:32 2017 - [info]  Replication filtering check ok.Thu Jun 15 17:45:32 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.Thu Jun 15 17:45:32 2017 - [info] Checking SSH publickey authentication settings on the current master..Thu Jun 15 17:45:33 2017 - [info] HealthCheck: SSH to 192.168.0.20 is reachable.Thu Jun 15 17:45:33 2017 - [info] 192.168.0.20(192.168.0.20:3306) (current master) +--192.168.0.30(192.168.0.30:3306) +--192.168.0.40(192.168.0.40:3306)Thu Jun 15 17:45:33 2017 - [info] Checking replication health on 192.168.0.30..Thu Jun 15 17:45:33 2017 - [info]  ok.Thu Jun 15 17:45:33 2017 - [info] Checking replication health on 192.168.0.40..Thu Jun 15 17:45:33 2017 - [info]  ok.Thu Jun 15 17:45:33 2017 - [warning] master_ip_failover_script is not defined.Thu Jun 15 17:45:33 2017 - [warning] shutdown_script is not defined.Thu Jun 15 17:45:33 2017 - [info] Got exit code 0 (Not master dead).MySQL Replication Health is OK.[root@db_server_yayun_04 ~]# 
View Code

2. 配置后端MySQL。登入ProxySQL,把MySQL主从的信息添加进去。将主库master也就是做写入的节点放到HG 100中,salve节点做读放到HG 1000。在proxysql输入命令:

mysql -uadmin -padmin -h127.0.0.1 -P6032
[admin@127.0.0.1][(none)]> insert into mysql_servers(hostgroup_id,hostname,port,weight,max_connections,max_replication_lag,comment) values(100,192.168.0.20,3306,1,1000,10,test my proxysql);Query OK, 1 row affected (0.00 sec)[admin@127.0.0.1][(none)]> insert into mysql_servers(hostgroup_id,hostname,port,weight,max_connections,max_replication_lag,comment) values(1000,192.168.0.30,3306,1,1000,10,test my proxysql);Query OK, 1 row affected (0.00 sec)[admin@127.0.0.1][(none)]> insert into mysql_servers(hostgroup_id,hostname,port,weight,max_connections,max_replication_lag,comment) values(1000,192.168.0.40,3306,1,1000,10,test my proxysql); Query OK, 1 row affected (0.00 sec)[admin@127.0.0.1][(none)]> select * from mysql_servers;+--------------+--------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+------------------+| hostgroup_id | hostname     | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment          |+--------------+--------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+------------------+| 100          | 192.168.0.20 | 3306 | ONLINE | 1      | 0           | 1000            | 10                  | 0       | 0              | test my proxysql || 1000         | 192.168.0.30 | 3306 | ONLINE | 1      | 0           | 1000            | 10                  | 0       | 0              | test my proxysql || 1000         | 192.168.0.40 | 3306 | ONLINE | 1      | 0           | 1000            | 10                  | 0       | 0              | test my proxysql |+--------------+--------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+------------------+3 rows in set (0.00 sec)

3. 配置后端MySQL用户。这个用户需要先在后端MySQL(20,30,40)里真实存在,一个是监控账号、一个是程序账号。
(1) 监控账号(用来监控后端mysql是否存活以及read_only变量):

GRANT SUPER, REPLICATION CLIENT ON *.* TO proxysql@192.168.0.10 IDENTIFIED BY proxysql;

(2) 程序账号(这里为了后面测试方便给了all权限):

GRANT all ON *.* TO yayun@192.168.0.10 identified by yayun;

4. 在后端MySQL里添加完之后再配置ProxySQL:这里需要注意,default_hostgroup需要和上面的对应。(proxysql)

[admin@127.0.0.1][(none)]> insert into mysql_users(username,password,active,default_hostgroup,transaction_persistent) values(yayun,yayun,1,100,1);Query OK, 1 row affected (0.00 sec)[admin@127.0.0.1][(none)]> select * from mysql_users;+----------+----------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+| username | password | active | use_ssl | default_hostgroup | default_schema | schema_locked | transaction_persistent | fast_forward | backend | frontend | max_connections |+----------+----------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+| yayun    | yayun    | 1      | 0       | 100               | NULL           | 0             | 1                      | 0            | 1       | 1        | 10000           |+----------+----------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+1 row in set (0.00 sec)[admin@127.0.0.1][(none)]> 

5. 设置健康监测账号(proxysql):

[admin@127.0.0.1][(none)]> set mysql-monitor_username=proxysql;Query OK, 1 row affected (0.00 sec)[admin@127.0.0.1][(none)]> set mysql-monitor_password=proxysql;Query OK, 1 row affected (0.00 sec)[admin@127.0.0.1][(none)]> 

6. 加载配置和变量:因为修改了servers、users和variables,所以加载的时候要执行:

[admin@127.0.0.1][(none)]> load mysql servers to runtime;Query OK, 0 rows affected (0.01 sec)[admin@127.0.0.1][(none)]> load mysql users to runtime;Query OK, 0 rows affected (0.00 sec)[admin@127.0.0.1][(none)]> load mysql variables to runtime;Query OK, 0 rows affected (0.00 sec)[admin@127.0.0.1][(none)]> save mysql servers to disk;Query OK, 0 rows affected (0.05 sec)[admin@127.0.0.1][(none)]> save mysql users to disk;Query OK, 0 rows affected (0.02 sec)[admin@127.0.0.1][(none)]> save mysql variables to disk;Query OK, 74 rows affected (0.01 sec)[admin@127.0.0.1][(none)]> 

7. 连接数据库,通过proxysql的客户端接口访问(6033):(我这里从40从库上面发起连接)

 mysql -uyayun -pyayun -h192.168.0.10 -P6033
[root@db_server_yayun_04 ~]# mysql -uyayun -pyayun -h192.168.0.10 -P6033               mysql: [Warning] Using a password on the command line interface can be insecure.Welcome to the MySQL monitor.  Commands end with ; or \g.Your MySQL connection id is 4Server version: 5.5.30 (ProxySQL)Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.Oracle is a registered trademark of Oracle Corporation and/or itsaffiliates. Other names may be trademarks of their respectiveowners.Type help; or \h for help. Type \c to clear the current input statement.[yayun@192.168.0.10][(none)]> show databases;+--------------------+| Database           |+--------------------+| information_schema || mysql              || performance_schema || sys                || yayun              |+--------------------+5 rows in set (0.01 sec)[yayun@192.168.0.10][(none)]> 

 8. 创建表并且写入数据进行查询:

[yayun@192.168.0.10][(none)]> use yayunDatabase changed, 2 warnings[yayun@192.168.0.10][yayun]> create table t1 ( id int);Query OK, 0 rows affected (0.08 sec)[yayun@192.168.0.10][yayun]> insert into t1 select 1;Query OK, 1 row affected (0.05 sec)Records: 1  Duplicates: 0  Warnings: 0[yayun@192.168.0.10][yayun]> select * from t1;+------+| id   |+------+|    1 |+------+1 row in set (0.00 sec)[yayun@192.168.0.10][yayun]> 

可以看到创建了表,并且插入了数据,查询也没问题。proxysql有个类似审计的功能,可以查看各类SQL的执行情况。在proxysql执行SQL查看。

[admin@127.0.0.1][(none)]> select * from stats_mysql_query_digest;+-----------+--------------------+----------+--------------------+----------------------------------+------------+------------+------------+----------+----------+----------+| hostgroup | schemaname         | username | digest             | digest_text                      | count_star | first_seen | last_seen  | sum_time | min_time | max_time |+-----------+--------------------+----------+--------------------+----------------------------------+------------+------------+------------+----------+----------+----------+| 100       | yayun              | yayun    | 0xA9518ABEA63705E6 | create table t1 ( id int)        | 1          | 1497577980 | 1497577980 | 79733    | 79733    | 79733    || 100       | yayun              | yayun    | 0x3765930C7143F468 | select * from t1                 | 1          | 1497577997 | 1497577997 | 1537     | 1537     | 1537     || 100       | yayun              | yayun    | 0x4BBB5CD4BC2CFD94 | insert into t1 select ?          | 1          | 1497577986 | 1497577986 | 33350    | 33350    | 33350    || 100       | information_schema | yayun    | 0x620B328FE9D6D71A | SELECT DATABASE()                | 1          | 1497577955 | 1497577955 | 4994     | 4994     | 4994     || 100       | information_schema | yayun    | 0x594F2C744B698066 | select USER()                    | 1          | 1497577951 | 1497577951 | 0        | 0        | 0        || 100       | information_schema | yayun    | 0x226CD90D52A2BA0B | select @@version_comment limit ? | 1          | 1497577951 | 1497577951 | 0        | 0        | 0        |+-----------+--------------------+----------+--------------------+----------------------------------+------------+------------+------------+----------+----------+----------+6 rows in set (0.00 sec)[admin@127.0.0.1][(none)]> 

9. 可以看到读写都发送到了组100上面,组100是主库,说明没有读写分离。那是因为还有配置没有完成,我们需要自己定义规则。
定义路由规则,如:除select * from tb for update的select全部发送到slave,其他的的语句发送到master。

[admin@127.0.0.1][(none)]> INSERT INTO mysql_query_rules(active,match_pattern,destination_hostgroup,apply) VALUES(1,^SELECT.*FOR UPDATE$,100,1);Query OK, 1 row affected (0.00 sec)[admin@127.0.0.1][(none)]> INSERT INTO mysql_query_rules(active,match_pattern,destination_hostgroup,apply) VALUES(1,^SELECT,1000,1);Query OK, 1 row affected (0.00 sec)[admin@127.0.0.1][(none)]> load mysql query rules to runtime;Query OK, 0 rows affected (0.00 sec)[admin@127.0.0.1][(none)]> save mysql query rules to disk;Query OK, 0 rows affected (0.02 sec)[admin@127.0.0.1][(none)]> select rule_id,active,match_pattern,destination_hostgroup,apply from runtime_mysql_query_rules;+---------+--------+----------------------+-----------------------+-------+| rule_id | active | match_pattern        | destination_hostgroup | apply |+---------+--------+----------------------+-----------------------+-------+| 1       | 1      | ^SELECT.*FOR UPDATE$ | 100                   | 1     || 2       | 1      | ^SELECT              | 1000                  | 1     |+---------+--------+----------------------+-----------------------+-------+2 rows in set (0.00 sec)[admin@127.0.0.1][(none)]> 

清理掉统计信息,再次进行测试。

select * from stats_mysql_query_digest_reset;

再次运行读写,然后再查看,发现已经实现读写分离。

[admin@127.0.0.1][(none)]> select * from stats_mysql_query_digest;                                                                    +-----------+------------+----------+--------------------+-------------------------+------------+------------+------------+----------+----------+----------+| hostgroup | schemaname | username | digest             | digest_text             | count_star | first_seen | last_seen  | sum_time | min_time | max_time |+-----------+------------+----------+--------------------+-------------------------+------------+------------+------------+----------+----------+----------+| 1000      | yayun      | yayun    | 0x3765930C7143F468 | select * from t1        | 1          | 1497578494 | 1497578494 | 21751    | 21751    | 21751    || 100       | yayun      | yayun    | 0x4BBB5CD4BC2CFD94 | insert into t1 select ? | 1          | 1497578492 | 1497578492 | 54852    | 54852    | 54852    |+-----------+------------+----------+--------------------+-------------------------+------------+------------+------------+----------+----------+----------+2 rows in set (0.00 sec)[admin@127.0.0.1][(none)]> 

重点来了,如何配合MHA实现高可用呢?那么需要利用到proxysql里面的mysql_replication_hostgroups表。mysql_replication_hostgroups 表的主要作用是监视指定主机组中所有服务器的read_only值,并且根据read_only的值将服务器分配给写入器或读取器主机组,定义 hostgroup 的主从关系。ProxySQL monitor 模块会监控 HG 后端所有servers 的 read_only 变量,如果发现从库的 read_only 变为0、主库变为1,则认为角色互换了,自动改写 mysql_servers 表里面 hostgroup 关系,达到自动 Failover 效果。

我们看看mysql_replication_hostgroups表结构:

[admin@127.0.0.1][(none)]> show create table mysql_replication_hostgroups\G*************************** 1. row ***************************       table: mysql_replication_hostgroupsCreate Table: CREATE TABLE mysql_replication_hostgroups (    writer_hostgroup INT CHECK (writer_hostgroup>=0) NOT NULL PRIMARY KEY,    reader_hostgroup INT NOT NULL CHECK (reader_hostgroup<>writer_hostgroup AND reader_hostgroup>0),    comment VARCHAR,    UNIQUE (reader_hostgroup))1 row in set (0.00 sec)[admin@127.0.0.1][(none)]> 

3个字段很明显,写主机组,读主机组,备注。那么我们现在插入数据。我们的100是写,1000是读。

[admin@127.0.0.1][(none)]> insert into  mysql_replication_hostgroups (writer_hostgroup,reader_hostgroup,comment)values(100,1000,测试我的读写分离高可用);Query OK, 1 row affected (0.00 sec)[admin@127.0.0.1][(none)]> load mysql servers to runtime;Query OK, 0 rows affected (0.01 sec)[admin@127.0.0.1][(none)]> save mysql servers to disk;Query OK, 0 rows affected (0.03 sec)[admin@127.0.0.1][(none)]> select * from runtime_mysql_replication_hostgroups;+------------------+------------------+-----------------------------------+| writer_hostgroup | reader_hostgroup | comment                           |+------------------+------------------+-----------------------------------+| 100              | 1000             | 测试我的读写分离高可用            |+------------------+------------------+-----------------------------------+1 row in set (0.00 sec)[admin@127.0.0.1][(none)]> 

我们用vc-mysql-sniffer在从库抓一下。看看能看见什么。

[root@db_server_yayun_03 ~]# ./vc-mysql-sniffer # Time: 061617 10:26:45# User@Host: unknown_user[unknown_user] @ 192.168.0.10:36371 []# Query_time: 0.005198SHOW GLOBAL VARIABLES LIKE read_only;# Time: 061617 10:26:46# User@Host: unknown_user[unknown_user] @ 192.168.0.10:36372 []# Query_time: 0.000564SHOW SLAVE STATUS;# Time: 061617 10:26:46# User@Host: unknown_user[unknown_user] @ 192.168.0.10:36372 []# Query_time: 0.004008SHOW GLOBAL VARIABLES LIKE read_only;# Time: 061617 10:26:48# User@Host: unknown_user[unknown_user] @ 192.168.0.10:36371 []# Query_time: 0.004923SHOW GLOBAL VARIABLES LIKE read_only;# Time: 061617 10:26:49# User@Host: unknown_user[unknown_user] @ 192.168.0.10:36372 []# Query_time: 0.003617SHOW GLOBAL VARIABLES LIKE read_only;

可以看见在检查read_only变量。我们先看看现在主机组的关系:

[admin@127.0.0.1][(none)]> select * from runtime_mysql_servers;+--------------+--------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+------------------+| hostgroup_id | hostname     | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment          |+--------------+--------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+------------------+| 100          | 192.168.0.20 | 3306 | ONLINE | 1      | 0           | 1000            | 10                  | 0       | 100            | test my proxysql || 1000         | 192.168.0.30 | 3306 | ONLINE | 1      | 0           | 1000            | 10                  | 0       | 100            | test my proxysql || 1000         | 192.168.0.40 | 3306 | ONLINE | 1      | 0           | 1000            | 10                  | 0       | 100            | test my proxysql |+--------------+--------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+------------------+3 rows in set (0.00 sec)[admin@127.0.0.1][(none)]> 

可以看见100主机组是主库,1000主机组是从库。下面使用MHA在线变换主从关系。我们把从库40提升为主库。

技术分享
[root@db_server_yayun_04 ~]# masterha_master_switch --master_state=alive --conf=/data/mha/3306/mha.cnf --new_master_host=192.168.0.40 --new_master_port=3306 --orig_master_is_new_slave Fri Jun 16 10:30:39 2017 - [info] MHA::MasterRotate version 0.57.Fri Jun 16 10:30:39 2017 - [info] Starting online master switch..Fri Jun 16 10:30:39 2017 - [info] Fri Jun 16 10:30:39 2017 - [info] * Phase 1: Configuration Check Phase..Fri Jun 16 10:30:39 2017 - [info] Fri Jun 16 10:30:39 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Fri Jun 16 10:30:39 2017 - [info] Reading application default configuration from /data/mha/3306/mha.cnf..Fri Jun 16 10:30:39 2017 - [info] Updating application default configuration from /usr/local/bin/load_cnf..Fri Jun 16 10:30:39 2017 - [info] Reading server configuration from /data/mha/3306/mha.cnf..Fri Jun 16 10:30:39 2017 - [info] Setting max_ping_errors to 10, ping_interval to 3.Fri Jun 16 10:30:39 2017 - [info] GTID failover mode = 1Fri Jun 16 10:30:39 2017 - [info] Current Alive Master: 192.168.0.20(192.168.0.20:3306)Fri Jun 16 10:30:39 2017 - [info] Alive Slaves:Fri Jun 16 10:30:39 2017 - [info]   192.168.0.30(192.168.0.30:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabledFri Jun 16 10:30:39 2017 - [info]     GTID ONFri Jun 16 10:30:39 2017 - [info]     Replicating from 192.168.0.20(192.168.0.20:3306)Fri Jun 16 10:30:39 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Fri Jun 16 10:30:39 2017 - [info]   192.168.0.40(192.168.0.40:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabledFri Jun 16 10:30:39 2017 - [info]     GTID ONFri Jun 16 10:30:39 2017 - [info]     Replicating from 192.168.0.20(192.168.0.20:3306)Fri Jun 16 10:30:39 2017 - [info]     Primary candidate for the new Master (candidate_master is set)It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.0.20(192.168.0.20:3306)? (YES/no): yFri Jun 16 10:30:40 2017 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..Fri Jun 16 10:30:40 2017 - [info]  ok.Fri Jun 16 10:30:40 2017 - [info] Checking MHA is not monitoring or doing failover..Fri Jun 16 10:30:40 2017 - [info] Checking replication health on 192.168.0.30..Fri Jun 16 10:30:40 2017 - [info]  ok.Fri Jun 16 10:30:40 2017 - [info] Checking replication health on 192.168.0.40..Fri Jun 16 10:30:40 2017 - [info]  ok.Fri Jun 16 10:30:40 2017 - [info] 192.168.0.40 can be new master.Fri Jun 16 10:30:40 2017 - [info] From:192.168.0.20(192.168.0.20:3306) (current master) +--192.168.0.30(192.168.0.30:3306) +--192.168.0.40(192.168.0.40:3306)To:192.168.0.40(192.168.0.40:3306) (new master) +--192.168.0.30(192.168.0.30:3306) +--192.168.0.20(192.168.0.20:3306)Starting master switch from 192.168.0.20(192.168.0.20:3306) to 192.168.0.40(192.168.0.40:3306)? (yes/NO): yesFri Jun 16 10:30:42 2017 - [info] Checking whether 192.168.0.40(192.168.0.40:3306) is ok for the new master..Fri Jun 16 10:30:42 2017 - [info]  ok.Fri Jun 16 10:30:42 2017 - [info] 192.168.0.20(192.168.0.20:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.Fri Jun 16 10:30:42 2017 - [info] 192.168.0.20(192.168.0.20:3306): Resetting slave pointing to the dummy host.Fri Jun 16 10:30:42 2017 - [info] ** Phase 1: Configuration Check Phase completed.Fri Jun 16 10:30:42 2017 - [info] Fri Jun 16 10:30:42 2017 - [info] * Phase 2: Rejecting updates Phase..Fri Jun 16 10:30:42 2017 - [info] Fri Jun 16 10:30:42 2017 - [info] Executing master ip online change script to disable write on the current master:Fri Jun 16 10:30:42 2017 - [info]   /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=192.168.0.20 --orig_master_ip=192.168.0.20 --orig_master_port=3306 --orig_master_user=root --orig_master_password=123 --new_master_host=192.168.0.40 --new_master_ip=192.168.0.40 --new_master_port=3306 --new_master_user=root --new_master_password=123 --orig_master_ssh_user=root --new_master_ssh_user=root   --orig_master_is_new_slaveFri Jun 16 10:30:42 2017 783184 Set read_only on the new master.. ok.Fri Jun 16 10:30:42 2017 786894 Drpping app user on the orig master..Fri Jun 16 10:30:42 2017 787716 Waiting all running 3 threads are disconnected.. (max 1500 milliseconds){Time => 0,Command => Sleep,db => undef,Id => 282,Info => undef,User => proxysql,State => ‘‘,Host => 192.168.0.10:54482}{Time => 1620,Command => Binlog Dump GTID,db => undef,Id => 296,Info => undef,User => repl,State => Master has sent all binlog to slave; waiting for more updates,Host => 192.168.0.30:53451}{Time => 874,Command => Binlog Dump GTID,db => undef,Id => 310,Info => undef,User => repl,State => Master has sent all binlog to slave; waiting for more updates,Host => 192.168.0.40:53136}Fri Jun 16 10:30:43 2017 294399 Waiting all running 2 threads are disconnected.. (max 1000 milliseconds){Time => 1621,Command => Binlog Dump GTID,db => undef,Id => 296,Info => undef,User => repl,State => Master has sent all binlog to slave; waiting for more updates,Host => 192.168.0.30:53451}{Time => 875,Command => Binlog Dump GTID,db => undef,Id => 310,Info => undef,User => repl,State => Master has sent all binlog to slave; waiting for more updates,Host => 192.168.0.40:53136}Fri Jun 16 10:30:43 2017 800400 Waiting all running 2 threads are disconnected.. (max 500 milliseconds){Time => 1621,Command => Binlog Dump GTID,db => undef,Id => 296,Info => undef,User => repl,State => Master has sent all binlog to slave; waiting for more updates,Host => 192.168.0.30:53451}{Time => 875,Command => Binlog Dump GTID,db => undef,Id => 310,Info => undef,User => repl,State => Master has sent all binlog to slave; waiting for more updates,Host => 192.168.0.40:53136}Fri Jun 16 10:30:44 2017 306937 Set read_only=1 on the orig master.. ok.Fri Jun 16 10:30:44 2017 311939 Waiting all running 3 queries are disconnected.. (max 500 milliseconds){Time => 0,Command => Sleep,db => undef,Id => 281,Info => undef,User => proxysql,State => ‘‘,Host => 192.168.0.10:54481}{Time => 1622,Command => Binlog Dump GTID,db => undef,Id => 296,Info => undef,User => repl,State => Master has sent all binlog to slave; waiting for more updates,Host => 192.168.0.30:53451}{Time => 876,Command => Binlog Dump GTID,db => undef,Id => 310,Info => undef,User => repl,State => Master has sent all binlog to slave; waiting for more updates,Host => 192.168.0.40:53136}Fri Jun 16 10:30:44 2017 810421 Killing all application threads..Fri Jun 16 10:30:44 2017 816513 done.SIOCSIFFLAGS: Cannot assign requested addressFri Jun 16 10:30:45 2017 - [info]  ok.Fri Jun 16 10:30:45 2017 - [info] Locking all tables on the orig master to reject updates from everybody (including root):Fri Jun 16 10:30:45 2017 - [info] Executing FLUSH TABLES WITH READ LOCK..Fri Jun 16 10:30:45 2017 - [info]  ok.Fri Jun 16 10:30:45 2017 - [info] Orig master binlog:pos is mysql-bin.000001:1610.Fri Jun 16 10:30:45 2017 - [info]  Waiting to execute all relay logs on 192.168.0.40(192.168.0.40:3306)..Fri Jun 16 10:30:45 2017 - [info]  master_pos_wait(mysql-bin.000001:1610) completed on 192.168.0.40(192.168.0.40:3306). Executed 0 events.Fri Jun 16 10:30:45 2017 - [info]   done.Fri Jun 16 10:30:45 2017 - [info] Getting new masters binlog name and position..Fri Jun 16 10:30:45 2017 - [info]  mysql-bin.000002:194Fri Jun 16 10:30:45 2017 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=192.168.0.40, MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER=repl, MASTER_PASSWORD=xxx;Fri Jun 16 10:30:45 2017 - [info] Executing master ip online change script to allow write on the new master:Fri Jun 16 10:30:45 2017 - [info]   /usr/local/bin/master_ip_online_change --command=start --orig_master_host=192.168.0.20 --orig_master_ip=192.168.0.20 --orig_master_port=3306 --orig_master_user=root --orig_master_password=123 --new_master_host=192.168.0.40 --new_master_ip=192.168.0.40 --new_master_port=3306 --new_master_user=root --new_master_password=123 --orig_master_ssh_user=root --new_master_ssh_user=root   --orig_master_is_new_slaveFri Jun 16 10:30:45 2017 744690 Set read_only=0 on the new master.Fri Jun 16 10:30:45 2017 745124 Creating app user on the new master..Fri Jun 16 10:30:45 2017 - [info]  ok.Fri Jun 16 10:30:45 2017 - [info] Fri Jun 16 10:30:45 2017 - [info] * Switching slaves in parallel..Fri Jun 16 10:30:45 2017 - [info] Fri Jun 16 10:30:45 2017 - [info] -- Slave switch on host 192.168.0.30(192.168.0.30:3306) started, pid: 9457Fri Jun 16 10:30:45 2017 - [info] Fri Jun 16 10:30:45 2017 - [info] Log messages from 192.168.0.30 ...Fri Jun 16 10:30:45 2017 - [info] Fri Jun 16 10:30:45 2017 - [info]  Waiting to execute all relay logs on 192.168.0.30(192.168.0.30:3306)..Fri Jun 16 10:30:45 2017 - [info]  master_pos_wait(mysql-bin.000001:1610) completed on 192.168.0.30(192.168.0.30:3306). Executed 0 events.Fri Jun 16 10:30:45 2017 - [info]   done.Fri Jun 16 10:30:45 2017 - [info]  Resetting slave 192.168.0.30(192.168.0.30:3306) and starting replication from the new master 192.168.0.40(192.168.0.40:3306)..Fri Jun 16 10:30:45 2017 - [info]  Executed CHANGE MASTER.Fri Jun 16 10:30:45 2017 - [info]  Slave started.Fri Jun 16 10:30:45 2017 - [info] End of log messages from 192.168.0.30 ...Fri Jun 16 10:30:45 2017 - [info] Fri Jun 16 10:30:45 2017 - [info] -- Slave switch on host 192.168.0.30(192.168.0.30:3306) succeeded.Fri Jun 16 10:30:45 2017 - [info] Unlocking all tables on the orig master:Fri Jun 16 10:30:45 2017 - [info] Executing UNLOCK TABLES..Fri Jun 16 10:30:45 2017 - [info]  ok.Fri Jun 16 10:30:45 2017 - [info] Starting orig master as a new slave..Fri Jun 16 10:30:45 2017 - [info]  Resetting slave 192.168.0.20(192.168.0.20:3306) and starting replication from the new master 192.168.0.40(192.168.0.40:3306)..Fri Jun 16 10:30:45 2017 - [info]  Executed CHANGE MASTER.Fri Jun 16 10:30:46 2017 - [info]  Slave started.Fri Jun 16 10:30:46 2017 - [info] All new slave servers switched successfully.Fri Jun 16 10:30:46 2017 - [info] Fri Jun 16 10:30:46 2017 - [info] * Phase 5: New master cleanup phase..Fri Jun 16 10:30:46 2017 - [info] Fri Jun 16 10:30:46 2017 - [info]  192.168.0.40: Resetting slave info succeeded.Fri Jun 16 10:30:46 2017 - [info] Switching master to 192.168.0.40(192.168.0.40:3306) completed successfully.
View Code

可以看见已经成功切换。我们再来看看运行主机组的状态:

[admin@127.0.0.1][(none)]> select * from runtime_mysql_servers;+--------------+--------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+------------------+| hostgroup_id | hostname     | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment          |+--------------+--------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+------------------+| 100          | 192.168.0.40 | 3306 | ONLINE | 1      | 0           | 1000            | 10                  | 0       | 100            | test my proxysql || 1000         | 192.168.0.30 | 3306 | ONLINE | 1      | 0           | 1000            | 10                  | 0       | 100            | test my proxysql || 1000         | 192.168.0.40 | 3306 | ONLINE | 1      | 0           | 1000            | 10                  | 0       | 100            | test my proxysql || 1000         | 192.168.0.20 | 3306 | ONLINE | 1      | 0           | 1000            | 10                  | 0       | 100            | test my proxysql |+--------------+--------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+------------------+4 rows in set (0.00 sec)[admin@127.0.0.1][(none)]> 

可以看见40自动设置成了主机组100,也就是master。我们进行读写测试。

[admin@127.0.0.1][(none)]> select * from stats_mysql_query_digest;+-----------+------------+----------+--------------------+-----------------------------+------------+------------+------------+----------+----------+----------+| hostgroup | schemaname | username | digest             | digest_text                 | count_star | first_seen | last_seen  | sum_time | min_time | max_time |+-----------+------------+----------+--------------------+-----------------------------+------------+------------+------------+----------+----------+----------+| 1000      | yayun      | yayun    | 0x3765930C7143F468 | select * from t1            | 1          | 1497580568 | 1497580568 | 34920    | 34920    | 34920    || 100       | yayun      | yayun    | 0x5A680F86B3D8FB2B | select * from t1 for update | 1          | 1497580565 | 1497580565 | 9609     | 9609     | 9609     || 100       | yayun      | yayun    | 0x4BBB5CD4BC2CFD94 | insert into t1 select ?     | 1          | 1497580557 | 1497580557 | 133003   | 133003   | 133003   || 100       | yayun      | yayun    | 0x99531AEFF718C501 | show tables                 | 1          | 1497580544 | 1497580544 | 2051     | 2051     | 2051     || 100       | yayun      | yayun    | 0x74A739578E179F19 | show processlist            | 1          | 1497580532 | 1497580532 | 4335     | 4335     | 4335     |+-----------+------------+----------+--------------------+-----------------------------+------------+------------+------------+----------+----------+----------+5 rows in set (0.00 sec)[admin@127.0.0.1][(none)]> 

可以看见除了for update语句和insert发送到主库,查询是发送到了从库。当然虽然1000主机组里面有40,也就是会有非常少量的查询才会发送到主库。这个没法避免。
下面进行主库宕机测试,启动mha监控,关闭主库。mha切换日志,可以看见30当选主库:

技术分享
[root@db_server_yayun_04 log]# cat manager.log Fri Jun 16 10:38:59 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Fri Jun 16 10:38:59 2017 - [info] Reading application default configuration from /data/mha/3306/mha.cnf..Fri Jun 16 10:38:59 2017 - [info] Updating application default configuration from /usr/local/bin/load_cnf..Fri Jun 16 10:38:59 2017 - [info] Reading server configuration from /data/mha/3306/mha.cnf..Fri Jun 16 10:38:59 2017 - [info] Setting max_ping_errors to 10, ping_interval to 3.Fri Jun 16 10:39:53 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Fri Jun 16 10:39:53 2017 - [info] Reading application default configuration from /data/mha/3306/mha.cnf..Fri Jun 16 10:39:53 2017 - [info] Updating application default configuration from /usr/local/bin/load_cnf..Fri Jun 16 10:39:53 2017 - [info] Reading server configuration from /data/mha/3306/mha.cnf..Fri Jun 16 10:39:53 2017 - [info] Setting max_ping_errors to 10, ping_interval to 3.nfo]   192.168.0.20(192.168.0.20:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabledFri Jun 16 10:38:59 2017 - [info]     GTID ONFri Jun 16 10:38:59 2017 - [info]     Replicating from 192.168.0.40(192.168.0.40:3306)Fri Jun 16 10:38:59 2017 - [info]   192.168.0.30(192.168.0.30:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabledFri Jun 16 10:38:59 2017 - [info]     GTID ONFri Jun 16 10:38:59 2017 - [info]     Replicating from 192.168.0.40(192.168.0.40:3306)Fri Jun 16 10:38:59 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Fri Jun 16 10:38:59 2017 - [info] Current Alive Master: 192.168.0.40(192.168.0.40:3306)Fri Jun 16 10:38:59 2017 - [info] Checking slave configurations..Fri Jun 16 10:38:59 2017 - [info] Checking replication filtering settings..Fri Jun 16 10:38:59 2017 - [info]  binlog_do_db= , binlog_ignore_db= Fri Jun 16 10:38:59 2017 - [info]  Replication filtering check ok.Fri Jun 16 10:38:59 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.Fri Jun 16 10:38:59 2017 - [info] Checking SSH publickey authentication settings on the current master..Fri Jun 16 10:38:59 2017 - [info] HealthCheck: SSH to 192.168.0.40 is reachable.Fri Jun 16 10:38:59 2017 - [info] 192.168.0.40(192.168.0.40:3306) (current master) +--192.168.0.20(192.168.0.20:3306) +--192.168.0.30(192.168.0.30:3306)Fri Jun 16 10:38:59 2017 - [warning] master_ip_failover_script is not defined.Fri Jun 16 10:38:59 2017 - [warning] shutdown_script is not defined.Fri Jun 16 10:38:59 2017 - [info] Set master ping interval 3 seconds.Fri Jun 16 10:38:59 2017 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.Fri Jun 16 10:38:59 2017 - [info] Starting ping health check on 192.168.0.40(192.168.0.40:3306)..Fri Jun 16 10:38:59 2017 - [info] Ping(SELECT) succeeded, waiting until MySQL doesnt respond..Fri Jun 16 10:39:26 2017 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)Fri Jun 16 10:39:26 2017 - [info] Executing SSH check script: exit 0Fri Jun 16 10:39:27 2017 - [info] HealthCheck: SSH to 192.168.0.40 is reachable.Fri Jun 16 10:39:29 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at reading initial communication packet, system error: 111)Fri Jun 16 10:39:29 2017 - [warning] Connection failed 2 time(s)..Fri Jun 16 10:39:32 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at reading initial communication packet, system error: 111)Fri Jun 16 10:39:32 2017 - [warning] Connection failed 3 time(s)..Fri Jun 16 10:39:35 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at reading initial communication packet, system error: 111)Fri Jun 16 10:39:35 2017 - [warning] Connection failed 4 time(s)..Fri Jun 16 10:39:38 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at reading initial communication packet, system error: 111)Fri Jun 16 10:39:38 2017 - [warning] Connection failed 5 time(s)..Fri Jun 16 10:39:41 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at reading initial communication packet, system error: 111)Fri Jun 16 10:39:41 2017 - [warning] Connection failed 6 time(s)..Fri Jun 16 10:39:44 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at reading initial communication packet, system error: 111)Fri Jun 16 10:39:44 2017 - [warning] Connection failed 7 time(s)..Fri Jun 16 10:39:47 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at reading initial communication packet, system error: 111)Fri Jun 16 10:39:47 2017 - [warning] Connection failed 8 time(s)..Fri Jun 16 10:39:50 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at reading initial communication packet, system error: 111)Fri Jun 16 10:39:50 2017 - [warning] Connection failed 9 time(s)..Fri Jun 16 10:39:53 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at reading initial communication packet, system error: 111)Fri Jun 16 10:39:53 2017 - [warning] Connection failed 10 time(s)..Fri Jun 16 10:39:53 2017 - [warning] Master is not reachable from health checker!Fri Jun 16 10:39:53 2017 - [warning] Master 192.168.0.40(192.168.0.40:3306) is not reachable!Fri Jun 16 10:39:53 2017 - [warning] SSH is reachable.Fri Jun 16 10:39:53 2017 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /data/mha/3306/mha.cnf again, and trying to connect to all servers to check server status..Fri Jun 16 10:39:53 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Fri Jun 16 10:39:53 2017 - [info] Reading application default configuration from /data/mha/3306/mha.cnf..Fri Jun 16 10:39:53 2017 - [info] Updating application default configuration from /usr/local/bin/load_cnf..Fri Jun 16 10:39:53 2017 - [info] Reading server configuration from /data/mha/3306/mha.cnf..Fri Jun 16 10:39:53 2017 - [info] Setting max_ping_errors to 10, ping_interval to 3.Fri Jun 16 10:39:53 2017 - [info] GTID failover mode = 1Fri Jun 16 10:39:53 2017 - [info] Dead Servers:Fri Jun 16 10:39:53 2017 - [info]   192.168.0.40(192.168.0.40:3306)Fri Jun 16 10:39:53 2017 - [info] Alive Servers:Fri Jun 16 10:39:53 2017 - [info]   192.168.0.20(192.168.0.20:3306)Fri Jun 16 10:39:53 2017 - [info]   192.168.0.30(192.168.0.30:3306)Fri Jun 16 10:39:53 2017 - [info] Alive Slaves:Fri Jun 16 10:39:53 2017 - [info]   192.168.0.20(192.168.0.20:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabledFri Jun 16 10:39:53 2017 - [info]     GTID ONFri Jun 16 10:39:53 2017 - [info]     Replicating from 192.168.0.40(192.168.0.40:3306)Fri Jun 16 10:39:53 2017 - [info]   192.168.0.30(192.168.0.30:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabledFri Jun 16 10:39:53 2017 - [info]     GTID ONFri Jun 16 10:39:53 2017 - [info]     Replicating from 192.168.0.40(192.168.0.40:3306)Fri Jun 16 10:39:53 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Fri Jun 16 10:39:53 2017 - [info] Checking slave configurations..Fri Jun 16 10:39:53 2017 - [info] Checking replication filtering settings..Fri Jun 16 10:39:53 2017 - [info]  Replication filtering check ok.Fri Jun 16 10:39:53 2017 - [info] Master is down!Fri Jun 16 10:39:53 2017 - [info] Terminating monitoring script.Fri Jun 16 10:39:53 2017 - [info] Got exit code 20 (Master dead).Fri Jun 16 10:39:53 2017 - [info] MHA::MasterFailover version 0.57.Fri Jun 16 10:39:53 2017 - [info] Starting master failover.Fri Jun 16 10:39:53 2017 - [info] Fri Jun 16 10:39:53 2017 - [info] * Phase 1: Configuration Check Phase..Fri Jun 16 10:39:53 2017 - [info] Fri Jun 16 10:39:54 2017 - [info] GTID failover mode = 1Fri Jun 16 10:39:54 2017 - [info] Dead Servers:Fri Jun 16 10:39:54 2017 - [info]   192.168.0.40(192.168.0.40:3306)Fri Jun 16 10:39:54 2017 - [info] Checking master reachability via MySQL(double check)...Fri Jun 16 10:39:54 2017 - [info]  ok.Fri Jun 16 10:39:54 2017 - [info] Alive Servers:Fri Jun 16 10:39:54 2017 - [info]   192.168.0.20(192.168.0.20:3306)Fri Jun 16 10:39:54 2017 - [info]   192.168.0.30(192.168.0.30:3306)Fri Jun 16 10:39:54 2017 - [info] Alive Slaves:Fri Jun 16 10:39:54 2017 - [info]   192.168.0.20(192.168.0.20:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabledFri Jun 16 10:39:54 2017 - [info]     GTID ONFri Jun 16 10:39:54 2017 - [info]     Replicating from 192.168.0.40(192.168.0.40:3306)Fri Jun 16 10:39:54 2017 - [info]   192.168.0.30(192.168.0.30:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabledFri Jun 16 10:39:54 2017 - [info]     GTID ONFri Jun 16 10:39:54 2017 - [info]     Replicating from 192.168.0.40(192.168.0.40:3306)Fri Jun 16 10:39:54 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Fri Jun 16 10:39:54 2017 - [info] Starting GTID based failover.Fri Jun 16 10:39:54 2017 - [info] Fri Jun 16 10:39:54 2017 - [info] ** Phase 1: Configuration Check Phase completed.Fri Jun 16 10:39:54 2017 - [info] Fri Jun 16 10:39:54 2017 - [info] * Phase 2: Dead Master Shutdown Phase..Fri Jun 16 10:39:54 2017 - [info] Fri Jun 16 10:39:54 2017 - [info] Forcing shutdown so that applications never connect to the current master..Fri Jun 16 10:39:54 2017 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master IP address.Fri Jun 16 10:39:54 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.Fri Jun 16 10:39:54 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.Fri Jun 16 10:39:54 2017 - [info] Fri Jun 16 10:39:54 2017 - [info] * Phase 3: Master Recovery Phase..Fri Jun 16 10:39:54 2017 - [info] Fri Jun 16 10:39:54 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..Fri Jun 16 10:39:54 2017 - [info] Fri Jun 16 10:39:54 2017 - [info] The latest binary log file/position on all slaves is mysql-bin.000002:497Fri Jun 16 10:39:54 2017 - [info] Retrieved Gtid Set: 900f20f2-f48c-11e6-8d78-000c2930a8b9:1Fri Jun 16 10:39:54 2017 - [info] Latest slaves (Slaves that received relay log files to the latest):Fri Jun 16 10:39:54 2017 - [info]   192.168.0.20(192.168.0.20:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabledFri Jun 16 10:39:54 2017 - [info]     GTID ONFri Jun 16 10:39:54 2017 - [info]     Replicating from 192.168.0.40(192.168.0.40:3306)Fri Jun 16 10:39:54 2017 - [info]   192.168.0.30(192.168.0.30:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabledFri Jun 16 10:39:54 2017 - [info]     GTID ONFri Jun 16 10:39:54 2017 - [info]     Replicating from 192.168.0.40(192.168.0.40:3306)Fri Jun 16 10:39:54 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Fri Jun 16 10:39:54 2017 - [info] The oldest binary log file/position on all slaves is mysql-bin.000002:497Fri Jun 16 10:39:54 2017 - [info] Retrieved Gtid Set: 900f20f2-f48c-11e6-8d78-000c2930a8b9:1Fri Jun 16 10:39:54 2017 - [info] Oldest slaves:Fri Jun 16 10:39:54 2017 - [info]   192.168.0.20(192.168.0.20:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabledFri Jun 16 10:39:54 2017 - [info]     GTID ONFri Jun 16 10:39:54 2017 - [info]     Replicating from 192.168.0.40(192.168.0.40:3306)Fri Jun 16 10:39:54 2017 - [info]   192.168.0.30(192.168.0.30:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabledFri Jun 16 10:39:54 2017 - [info]     GTID ONFri Jun 16 10:39:54 2017 - [info]     Replicating from 192.168.0.40(192.168.0.40:3306)Fri Jun 16 10:39:54 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Fri Jun 16 10:39:54 2017 - [info] Fri Jun 16 10:39:54 2017 - [info] * Phase 3.3: Determining New Master Phase..Fri Jun 16 10:39:54 2017 - [info] Fri Jun 16 10:39:54 2017 - [info] Searching new master from slaves..Fri Jun 16 10:39:54 2017 - [info]  Candidate masters from the configuration file:Fri Jun 16 10:39:54 2017 - [info]   192.168.0.30(192.168.0.30:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabledFri Jun 16 10:39:54 2017 - [info]     GTID ONFri Jun 16 10:39:54 2017 - [info]     Replicating from 192.168.0.40(192.168.0.40:3306)Fri Jun 16 10:39:54 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Fri Jun 16 10:39:54 2017 - [info]  Non-candidate masters:Fri Jun 16 10:39:54 2017 - [info]  Searching from candidate_master slaves which have received the latest relay log events..Fri Jun 16 10:39:54 2017 - [info] New master is 192.168.0.30(192.168.0.30:3306)Fri Jun 16 10:39:54 2017 - [info] Starting master failover..Fri Jun 16 10:39:54 2017 - [info] From:192.168.0.40(192.168.0.40:3306) (current master) +--192.168.0.20(192.168.0.20:3306) +--192.168.0.30(192.168.0.30:3306)To:192.168.0.30(192.168.0.30:3306) (new master) +--192.168.0.20(192.168.0.20:3306)Fri Jun 16 10:39:54 2017 - [info] Fri Jun 16 10:39:54 2017 - [info] * Phase 3.3: New Master Recovery Phase..Fri Jun 16 10:39:54 2017 - [info] Fri Jun 16 10:39:54 2017 - [info]  Waiting all logs to be applied.. Fri Jun 16 10:39:54 2017 - [info]   done.Fri Jun 16 10:39:54 2017 - [info]  Replicating from the latest slave 192.168.0.20(192.168.0.20:3306) and waiting to apply..Fri Jun 16 10:39:54 2017 - [info]  Waiting all logs to be applied on the latest slave.. Fri Jun 16 10:39:54 2017 - [info]  Resetting slave 192.168.0.30(192.168.0.30:3306) and starting replication from the new master 192.168.0.20(192.168.0.20:3306)..Fri Jun 16 10:39:54 2017 - [info]  Executed CHANGE MASTER.Fri Jun 16 10:39:54 2017 - [info]  Slave started.Fri Jun 16 10:39:54 2017 - [info]  Waiting to execute all relay logs on 192.168.0.30(192.168.0.30:3306)..Fri Jun 16 10:39:54 2017 - [info]  master_pos_wait(mysql-bin.000001:1879) completed on 192.168.0.30(192.168.0.30:3306). Executed 4 events.Fri Jun 16 10:39:54 2017 - [info]   done.Fri Jun 16 10:39:54 2017 - [info]   done.Fri Jun 16 10:39:54 2017 - [info] Getting new masters binlog name and position..Fri Jun 16 10:39:54 2017 - [info]  mysql-bin.000002:459Fri Jun 16 10:39:54 2017 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=192.168.0.30, MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER=repl, MASTER_PASSWORD=xxx;Fri Jun 16 10:39:54 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000002, 459, 56195f28-36e2-11e7-991b-000c29e3f5ab:1-6,900f20f2-f48c-11e6-8d78-000c2930a8b9:1Fri Jun 16 10:39:54 2017 - [warning] master_ip_failover_script is not set. Skipping taking over new master IP address.Fri Jun 16 10:39:54 2017 - [info] Setting read_only=0 on 192.168.0.30(192.168.0.30:3306)..Fri Jun 16 10:39:54 2017 - [info]  ok.Fri Jun 16 10:39:54 2017 - [info] ** Finished master recovery successfully.Fri Jun 16 10:39:54 2017 - [info] * Phase 3: Master Recovery Phase completed.Fri Jun 16 10:39:54 2017 - [info] Fri Jun 16 10:39:54 2017 - [info] * Phase 4: Slaves Recovery Phase..Fri Jun 16 10:39:54 2017 - [info] Fri Jun 16 10:39:54 2017 - [info] Fri Jun 16 10:39:54 2017 - [info] * Phase 4.1: Starting Slaves in parallel..Fri Jun 16 10:39:54 2017 - [info] Fri Jun 16 10:39:54 2017 - [info] -- Slave recovery on host 192.168.0.20(192.168.0.20:3306) started, pid: 9905. Check tmp log /data/mha/3306/log/192.168.0.20_3306_20170616103953.log if it takes time..Fri Jun 16 10:39:54 2017 - [info] Fri Jun 16 10:39:54 2017 - [info] Log messages from 192.168.0.20 ...Fri Jun 16 10:39:54 2017 - [info] Fri Jun 16 10:39:54 2017 - [info]  Resetting slave 192.168.0.20(192.168.0.20:3306) and starting replication from the new master 192.168.0.30(192.168.0.30:3306)..Fri Jun 16 10:39:54 2017 - [info]  Executed CHANGE MASTER.Fri Jun 16 10:39:54 2017 - [info]  Slave started.Fri Jun 16 10:39:54 2017 - [info]  gtid_wait(56195f28-36e2-11e7-991b-000c29e3f5ab:1-6,900f20f2-f48c-11e6-8d78-000c2930a8b9:1) completed on 192.168.0.20(192.168.0.20:3306). Executed 0 events.Fri Jun 16 10:39:54 2017 - [info] End of log messages from 192.168.0.20.Fri Jun 16 10:39:54 2017 - [info] -- Slave on host 192.168.0.20(192.168.0.20:3306) started.Fri Jun 16 10:39:54 2017 - [info] All new slave servers recovered successfully.Fri Jun 16 10:39:54 2017 - [info] Fri Jun 16 10:39:54 2017 - [info] * Phase 5: New master cleanup phase..Fri Jun 16 10:39:54 2017 - [info] Fri Jun 16 10:39:54 2017 - [info] Resetting slave info on the new master..Fri Jun 16 10:39:54 2017 - [info]  192.168.0.30: Resetting slave info succeeded.Fri Jun 16 10:39:54 2017 - [info] Master failover to 192.168.0.30(192.168.0.30:3306) completed successfully.Fri Jun 16 10:39:54 2017 - [info] ----- Failover Report -----mha: MySQL Master failover 192.168.0.40(192.168.0.40:3306) to 192.168.0.30(192.168.0.30:3306) succeededMaster 192.168.0.40(192.168.0.40:3306) is down!Check MHA Manager logs at db_server_yayun_04:/data/mha/3306/log/manager.log for details.Started automated(non-interactive) failover.Selected 192.168.0.30(192.168.0.30:3306) as a new master.192.168.0.30(192.168.0.30:3306): OK: Applying all logs succeeded.192.168.0.20(192.168.0.20:3306): OK: Slave started, replicating from 192.168.0.30(192.168.0.30:3306)192.168.0.30(192.168.0.30:3306): Resetting slave info succeeded.Master failover to 192.168.0.30(192.168.0.30:3306) completed successfully.
View Code

再来看看proxysql里面现在主机组的情况:

[admin@127.0.0.1][(none)]> select * from runtime_mysql_servers;+--------------+--------------+------+---------+--------+-------------+-----------------+---------------------+---------+----------------+------------------+| hostgroup_id | hostname     | port | status  | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment          |+--------------+--------------+------+---------+--------+-------------+-----------------+---------------------+---------+----------------+------------------+| 100          | 192.168.0.30 | 3306 | ONLINE  | 1      | 0           | 1000            | 10                  | 0       | 100            | test my proxysql || 1000         | 192.168.0.30 | 3306 | ONLINE  | 1      | 0           | 1000            | 10                  | 0       | 100            | test my proxysql || 1000         | 192.168.0.40 | 3306 | SHUNNED | 1      | 0           | 1000            | 10                  | 0       | 100            | test my proxysql || 1000         | 192.168.0.20 | 3306 | ONLINE  | 1      | 0           | 1000            | 10                  | 0       | 100            | test my proxysql |+--------------+--------------+------+---------+--------+-------------+-----------------+---------------------+---------+----------------+------------------+4 rows in set (0.01 sec)[admin@127.0.0.1][(none)]> 

可以看见40已经是SHUNNED状态,这个已经自动被踢出。30当选主库,自动被设置在主机组100. 再次运行读写测试:

[admin@127.0.0.1][(none)]> select * from stats_mysql_query_digest;+-----------+--------------------+----------+--------------------+----------------------------------+------------+------------+------------+----------+----------+----------+| hostgroup | schemaname         | username | digest             | digest_text                      | count_star | first_seen | last_seen  | sum_time | min_time | max_time |+-----------+--------------------+----------+--------------------+----------------------------------+------------+------------+------------+----------+----------+----------+| 100       | yayun              | yayun    | 0x4BBB5CD4BC2CFD94 | insert into t1 select ?          | 1          | 1497581038 | 1497581038 | 114743   | 114743   | 114743   || 1000      | information_schema | yayun    | 0x620B328FE9D6D71A | SELECT DATABASE()                | 1          | 1497581026 | 1497581026 | 31128    | 31128    | 31128    || 100       | information_schema | yayun    | 0x594F2C744B698066 | select USER()                    | 1          | 1497581025 | 1497581025 | 0        | 0        | 0        || 1000      | yayun              | yayun    | 0x3765930C7143F468 | select * from t1                 | 1          | 1497581045 | 1497581045 | 3283     | 3283     | 3283     || 100       | information_schema | yayun    | 0x226CD90D52A2BA0B | select @@version_comment limit ? | 1          | 1497581025 | 1497581025 | 0        | 0        | 0        |+-----------+--------------------+----------+--------------------+----------------------------------+------------+------------+------------+----------+----------+----------+5 rows in set (0.00 sec)

可以看见依然没有问题。到这里本文就结束了,上面是1主2从,如果是1主1从,那么从挂了怎么办呢?需要额外一条规则,那就是在mysql_servers的hostgroup 1000 里面要插一条主库的记录,然后把weight设小,当读不到从库,回去主库查询

关于ProxySQL比较详细的文章可以参考如下:


ProxySQL之安装及配置详解

ProxySQL之读写分离与分库路由演示

 

MHA+ProxySQL实现读写分离高可用