ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

主从环境(配置手工切换)故障模拟

2022-07-26 16:03:29  阅读:174  来源: 互联网

标签:00 OK 手工 dmdbms DSC 故障模拟 INST OPEN 主从


环境:
OS:Centos7
DB:DM8
主库:192.168.1.135
备库:192.168.1.134
主备库dmwatcher.ini配置文件如下:

[dmdba@host134 slnngk]$ more dmwatcher.ini
[GRP1]
DW_TYPE       =  GLOBAL     ##全局守护类型
DW_MODE       =  MANUAL     ##手工切换
DW_ERROR_TIME    =  10      ##远程守护进程故障认定时间
INST_RECOVER_TIME =  60     ##主库守护进程启动恢复的间隔时间
INST_ERROR_TIME  =  10      ##本地实例故障认定时间
INST_OGUID     =  453332    ##守护系统唯一 OGUID 值
INST_INI      =  /dmdbms/data/slnngk/dm.ini  #dm.ini配置文件路径
INST_AUTO_RESTART =  1      ##打开实例的自动启动功能
INST_STARTUP_CMD  =  /dmdbms/product/bin/dmserver #命令行方式启动
RLOG_SEND_THRESHOLD =  0    ##指定主库发送日志到备库的时间阀值,默认关闭
RLOG_APPLY_THRESHOLD =  0   ##指定备库重演日志的时间阀值,默认关闭

 

1.停掉备库
[root@host134 ~]#systemctl stop DmServiceslnngk.service

发现dmwatcher会把数据库拉起来
[root@host134 ~]# ps -ef|grep slnngk
dmdba 19750 1 0 Jul15 ? 00:19:34 /dmdbms/product/bin/dmwatcher path=/dmdbms/data/slnngk/dmwatcher.ini -noconsole
dmdba 23199 1 1 13:49 ? 00:00:00 /dmdbms/product/bin/dmserver /dmdbms/data/slnngk/dm.ini mount
root 23538 32322 0 13:50 pts/4 00:00:00 grep --color=auto slnngk
dmdba 31905 26367 0 10:16 pts/0 00:00:14 dmmonitor /dmdbms/data/slnngk/dmmonitor.ini

 

2.停掉备库的dmwatcher进程
[root@host134 ~]#systemctl stop DmWatcherServiceGRP1

这个时候备库的守护进程dmwatcher进程和数据库进程都停掉了
[root@host134 ~]# ps -ef|grep slnngk
root 25001 32322 0 14:01 pts/4 00:00:00 grep --color=auto slnngk
dmdba 31905 26367 0 10:16 pts/0 00:00:15 dmmonitor /dmdbms/data/slnngk/dmmonitor.ini

 

启动守护进程dmwatcher
[root@host134 ~]#systemctl start DmWatcherServiceGRP1

这个时候守护进程dmwatcher会把备库拉起来
[root@host134 ~]# ps -ef|grep slnngk
dmdba 25477 1 0 14:04 ? 00:00:00 /dmdbms/product/bin/dmwatcher path=/dmdbms/data/slnngk/dmwatcher.ini -noconsole
dmdba 25507 1 1 14:04 ? 00:00:00 /dmdbms/product/bin/dmserver /dmdbms/data/slnngk/dm.ini mount
root 25694 32322 0 14:05 pts/4 00:00:00 grep --color=auto slnngk
dmdba 31905 26367 0 10:16 pts/0 00:00:15 dmmonitor /dmdbms/data/slnngk/dmmonitor.ini

 

 

3.停掉主库
[root@host135 soft]# systemctl stop DmServiceslnngk.service

这个时候守护进程会把主库拉起来
[root@host135 soft]# ps -ef|grep slnngk
dmdba 694 1 0 Jul15 ? 00:20:14 /dmdbms/product/bin/dmwatcher path=/dmdbms/data/slnngk/dmwatcher.ini -noconsole
dmdba 710 1 1 14:23 ? 00:00:00 /dmdbms/product/bin/dmserver /dmdbms/data/slnngk/dm.ini mount
root 981 11261 0 14:24 pts/5 00:00:00 grep --color=auto slnngk

 

数据库状态是打开的

[dmdba@host135 ~]$ disql sysdba/dameng123

Server[LOCALHOST:5236]:mode is primary, state is open
login used time : 2.627(ms)
disql V8
SQL> select status$ from SYS."V$DATABASE";

LINEID     STATUS$    
---------- -----------
1          4

used time: 3.409(ms). Execute id is 800.

 

 

尝试kill掉进程
[root@host135 soft]# ps -ef|grep slnngk
dmdba 694 1 0 Jul15 ? 00:20:14 /dmdbms/product/bin/dmwatcher path=/dmdbms/data/slnngk/dmwatcher.ini -noconsole
dmdba 710 1 0 14:23 ? 00:00:01 /dmdbms/product/bin/dmserver /dmdbms/data/slnngk/dm.ini mount
root 2319 11261 0 14:31 pts/5 00:00:00 grep --color=auto slnngk

[root@host135 soft]#kill -9 710

这个时候因为主从我是配置为手工切换的,所以不会发生切换,守护进程会自动把主库拉起来,角色还是主库的角色

 

4.停掉主库的dmwatcher进程
[root@host135 soft]# systemctl stop DmWatcherServiceGRP1
这个时候数据库进程和数据库守护进程没有了
[root@host135 soft]# ps -ef|grep slnngk
root 3872 11261 0 14:37 pts/5 00:00:00 grep --color=auto slnngk

这个时候监控机无法监控到主库的信息了

show
2022-07-26 14:47:02 
#================================================================================#
GROUP            OGUID       MON_CONFIRM     MODE            MPP_FLAG  
GRP1             453332      TRUE            MANUAL          FALSE     


<<DATABASE GLOBAL INFO:>>
DW_IP               MAL_DW_PORT  WTIME                WTYPE     WCTLSTAT  WSTATUS        INAME            INST_OK   N_EP  N_OK  ISTATUS     IMODE     DSC_STATUS     RTYPE     RSTAT    
192.168.1.134       52141        2022-07-26 14:47:01  GLOBAL    VALID     OPEN           SLNNGKBAK        OK        1     1     OPEN        STANDBY   DSC_OPEN       REALTIME  INVALID  

EP INFO:
INST_IP             INST_PORT  INST_OK   INAME            ISTATUS     IMODE     DSC_SEQNO  DSC_CTL_NODE RTYPE     RSTAT    FSEQ            FLSN            CSEQ            CLSN            DW_STAT_FLAG          
192.168.1.134       5236       OK        SLNNGKBAK        OPEN        STANDBY   0          0            REALTIME  UNKNOWN  382176          437742          382176          437742          NONE                  

DATABASE(SLNNGKBAK) APPLY INFO FROM (UNKNOWN), REDOS_PARALLEL_NUM (1):
DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[383314, 383314, 383314], (RLSN, SLSN, KLSN)[437742, 437742, 437742], N_TSK[0], TSK_MEM_USE[0] 
REDO_LSN_ARR: (437742)

 

手工启动守护进程
[root@host135 soft]# systemctl start DmWatcherServiceGRP1

这个时候主库恢复了,角色还是主库,没有发生切换,因为我配置的是手工切换.

 

5.手工把备库切换成主库

choose takeover GRP1
Can choose one of the following instances to do takeover:
1: SLNNGKBAK

takeover GRP1.SLNNGKBAK

这个时候查看数据库状态

show
2022-07-26 15:25:12 
#================================================================================#
GROUP            OGUID       MON_CONFIRM     MODE            MPP_FLAG  
GRP1             453332      TRUE            MANUAL          FALSE     


<<DATABASE GLOBAL INFO:>>
DW_IP               MAL_DW_PORT  WTIME                WTYPE     WCTLSTAT  WSTATUS        INAME            INST_OK   N_EP  N_OK  ISTATUS     IMODE     DSC_STATUS     RTYPE     RSTAT    
192.168.1.134       52141        2022-07-26 15:25:11  GLOBAL    VALID     OPEN           SLNNGKBAK        OK        1     1     OPEN        PRIMARY   DSC_OPEN       REALTIME  VALID    

EP INFO:
INST_IP             INST_PORT  INST_OK   INAME            ISTATUS     IMODE     DSC_SEQNO  DSC_CTL_NODE RTYPE     RSTAT    FSEQ            FLSN            CSEQ            CLSN            DW_STAT_FLAG          
192.168.1.134       5236       OK        SLNNGKBAK        OPEN        PRIMARY   0          0            REALTIME  VALID    383964          440754          383964          440755          NONE                  

ERROR DATABASE:

<<DATABASE GLOBAL INFO:>>
DW_IP               MAL_DW_PORT  WTIME                WTYPE     WCTLSTAT  WSTATUS        INAME            INST_OK   N_EP  N_OK  ISTATUS     IMODE     DSC_STATUS     RTYPE     RSTAT    
192.168.1.135       52141        2022-07-26 15:19:11  GLOBAL    VALID     ERROR          SLNNGK           OK        1     1     OPEN        PRIMARY   DSC_OPEN       REALTIME  VALID    

EP INFO:
INST_IP             INST_PORT  INST_OK   INAME            ISTATUS     IMODE     DSC_SEQNO  DSC_CTL_NODE RTYPE     RSTAT    FSEQ            FLSN            CSEQ            CLSN            DW_STAT_FLAG          
192.168.1.135       5236       OK        SLNNGK           OPEN        PRIMARY   0          0            REALTIME  VALID    383951          439290          383951          439290          NONE                  

#================================================================================#

这个时候原来的主库192.168.1.135状态是ERROR的.

 

我们尝试在目前的主库写入数据,然后启动原来的主库,看数据是否同步

192.168.1.134
su - dmdba
[dmdba@host134 ~]$ disql hxl/dameng123

Server[LOCALHOST:5236]:mode is primary, state is open
login used time : 3.029(ms)
disql V8
SQL> select * from tb_test01;

LINEID ID NAME
---------- -- -----
1 1 name1
2 2 name2
3 3 name3
4 4 name4
5 5 name5

used time: 4.038(ms). Execute id is 600.
SQL> insert into tb_test01 values(6,'name6');
affect rows 1

used time: 1.427(ms). Execute id is 601.
SQL> insert into tb_test01 values(7,'name7');
affect rows 1

SQL> commit;
executed successfully
used time: 9.266(ms). Execute id is 603.
SQL> select * from tb_test01;

LINEID ID NAME
---------- -- -----
1 1 name1
2 2 name2
3 3 name3
4 4 name4
5 5 name5
6 6 name6
7 7 name7

7 rows got

 

这个时候启动原来的主库守护进程
[root@host135 soft]# systemctl start DmWatcherServiceGRP1

show  
2022-07-26 15:32:23 
#================================================================================#
GROUP            OGUID       MON_CONFIRM     MODE            MPP_FLAG  
GRP1             453332      TRUE            MANUAL          FALSE     


<<DATABASE GLOBAL INFO:>>
DW_IP               MAL_DW_PORT  WTIME                WTYPE     WCTLSTAT  WSTATUS        INAME            INST_OK   N_EP  N_OK  ISTATUS     IMODE     DSC_STATUS     RTYPE     RSTAT    
192.168.1.134       52141        2022-07-26 15:32:23  GLOBAL    VALID     OPEN           SLNNGKBAK        OK        1     1     OPEN        PRIMARY   DSC_OPEN       REALTIME  VALID    

EP INFO:
INST_IP             INST_PORT  INST_OK   INAME            ISTATUS     IMODE     DSC_SEQNO  DSC_CTL_NODE RTYPE     RSTAT    FSEQ            FLSN            CSEQ            CLSN            DW_STAT_FLAG          
192.168.1.134       5236       OK        SLNNGKBAK        OPEN        PRIMARY   0          0            REALTIME  VALID    384114          440911          384114          440912          NONE                  

<<DATABASE GLOBAL INFO:>>
DW_IP               MAL_DW_PORT  WTIME                WTYPE     WCTLSTAT  WSTATUS        INAME            INST_OK   N_EP  N_OK  ISTATUS     IMODE     DSC_STATUS     RTYPE     RSTAT    
192.168.1.135       52141        2022-07-26 15:32:23  GLOBAL    VALID     OPEN           SLNNGK           OK        1     1     OPEN        STANDBY   DSC_OPEN       REALTIME  VALID    

EP INFO:
INST_IP             INST_PORT  INST_OK   INAME            ISTATUS     IMODE     DSC_SEQNO  DSC_CTL_NODE RTYPE     RSTAT    FSEQ            FLSN            CSEQ            CLSN            DW_STAT_FLAG          
192.168.1.135       5236       OK        SLNNGK           OPEN        STANDBY   0          0            REALTIME  VALID    383954          440910          383954          440910          NONE                  

DATABASE(SLNNGK) APPLY INFO FROM (SLNNGKBAK), REDOS_PARALLEL_NUM (1):
DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[384113, 384113, 384114], (RLSN, SLSN, KLSN)[440910, 440910, 440911], N_TSK[0], TSK_MEM_USE[512] 
REDO_LSN_ARR: (440910)

这个时候原主库启动了,加入到集群中的角色变成了备库,查看下数据同步情况

 

192.168.1.135
su - dmdba
[dmdba@host135 ~]$ disql hxl/dameng123
SQL> select * from tb_test01;

LINEID ID NAME
---------- -- -----
1 1 name1
2 2 name2
3 3 name3
4 4 name4
5 5 name5
6 6 name6
7 7 name7

7 rows got

used time: 6.329(ms). Execute id is 0

可以看到数据同步过来的.

 

标签:00,OK,手工,dmdbms,DSC,故障模拟,INST,OPEN,主从
来源: https://www.cnblogs.com/hxlasky/p/16521320.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有