MySQL MHA简介及其优点

MHA是众多使用MySQL数据库企业高可用的不二选择,它简单易用,功能强大,实现了基于MySQL replication架构的自动主从故障转移,本文主要使用原文描述MHA的主要特性及其优点,尽可能通过原文来理解透彻,供大家参考。 


MHA performs automating master failover and slave promotion with minimal downtime, usually within 10-30 seconds. MHA prevents replication consistency problems and saves on expenses of having to acquire additional servers. All this with zero performance degradation, no complexity (easy-to-install) and requiring no change to existing deployments.

MHA also provides scheduled online master switching, safely changing the currently running master to a new master, within mere seconds (0.5-2 seconds) of downtime (blocking writes only).

MHA provides the following functionality, and can be useful in many deployments in which high availability, data integrity and near non-stop master maintenance are required.

Automated master monitoring and failover(自动监控Master以及故障转移)

MHA can monitor MySQL masters in an existing replication environment, performing automatic master failover upon detection of master failure. MHA guarantees the consistency of all the slaves by identifying differential relay log events from the most current slave and applying them to all the other slaves, including those slaves which still haven't received the latest relay log events. MHA can normally perform failover in a matter of seconds: 9-12 seconds to detect master failure, optionally 7-10 seconds to power off the master machine to avoid split brain, and a few seconds to apply differential relay logs to the new master. Total downtime is normally 10-30 seconds. A specific slave can be designated as a candidate master (setting priorities) in a configuration file. Since MHA maintains consistency between slaves, any slave can be promoted to become the new master. Consistency problems, which would ordinarily cause sudden replication failure, will not occur.

Interactive (manually initiated) Master Failover(交互模式,手动触发Master故障转移)

MHA can be configured for manually initiated (non-automatic), interactive failover, without monitoring the master.

Non-interactive master failover (非交互模式Master故障转移)

Non-interactive, automatic master failover without monitoring the master is also supported. This feature is especially useful when MySQL master software monitoring is already in use. For example, you can use Pacemaker(Heartbeat) for detecting master failure and virtual IP address takeover, while using MHA for master failover and slave promotion.

Online switching master to a different host(在线切换Master到异机)

It is often necessary to migrate an existing master to a different machine, like when the current master has H/W RAID controller or RAM problems, or when you want to replace it with a faster machine, etc. This is not a master crash, but scheduled master maintenance is required. Scheduled master maintenance should be done as quickly as possible, since it entails partial downtime (master writes are disabled). On the other hand, you should block/kill current running sessions very carefully because consistency problems between different masters may occur (i.e "updating master1, updating master 2, committing master1, getting error on committing master 2" will result in data inconsistency). Both fast master switch and graceful blocking writes are required.

MHA provides graceful master switching within 0.5-2 seconds of writer blockage. 0.5-2 seconds of writer downtime is often acceptable, so you can switch masters even without allocating a scheduled maintenance window. Actions such as upgrading to higher versions, faster machine, etc. become much easier.


Master failover and slave promotion can be done very quickly

MHA normally can do failover in seconds (9-12 seconds to detect master failure, optionally 7-10 seconds to power off the master machine to avoid split brain, a few seconds for applying differential relay logs to the new master, so total downtime is normally 10-30 seconds), as long as slaves does not delay replication seriously. After recovering the new master, MHA recovers the rest slaves in parallel. Even though you have tens of slaves, it does not affect master recovery time, and you can recover slaves very quickly.


DeNA uses MHA on 150+ {master, slaves} environments. When one of the master crashed, MHA completed failover in 4 seconds. Doing failover in 4 seconds is never possible with traditional active/passive clustering solution.

Master crash does not result in data inconsistency