ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

KingbaseES 单实例环境wal(xlog)日志清理故障案例

2022-05-13 13:31:06  阅读:218  来源: 互联网

标签:wal xlog kingbase sys 12 26003 日志 KingbaseES


案例说明:
在通过sys_archivecleanup工具手工清理wal日志时,在control文件中查询的检查点对应的wal日志是“000000010000000000000008”,但是在执行清理时,误将“000000010000000000000009”以前的wal日志都被清理,在启动数据库时,无法读取checkpoint所在的wal日志,导致数据库启动失败。

数据库版本:

test=# select version;
                                                       version                                                       
------------------------------------------------------------------------------------------------------------------
 KingbaseES V008R006C005B0054 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-
bit

以下为wal日志清理的操作:

1)查看当前control文件信息

2)查看wal日志信息并清理

清理前:

[kingbase@node1 sys_wal]$ ls -lh
total 80M
-rw------- 1 kingbase kingbase 16M May 11 13:26 000000010000000000000006
-rw------- 1 kingbase kingbase 16M May 11 13:26 000000010000000000000007
-rw------- 1 kingbase kingbase 16M May 11 13:26 000000010000000000000008
-rw------- 1 kingbase kingbase 16M May 11 13:00 000000010000000000000009
-rw------- 1 kingbase kingbase 16M May 11 13:02 00000001000000000000000A
drwx------ 2 kingbase kingbase  78 May 11 13:49 archive_status

日志清理:
[kingbase@node1 bin]$ ./sys_archivecleanup /data/kingbase/v8r6_054/data/sys_wal 000000010000000000000009

清理后:

[kingbase@node1 sys_wal]$ ls -lh
total 32M

-rw------- 1 kingbase kingbase 16M May 11 13:00 000000010000000000000009
-rw------- 1 kingbase kingbase 16M May 11 13:02 00000001000000000000000A
drwx------ 2 kingbase kingbase  78 May 11 13:49 archive_status

一、启动数据库出现故障

1、启动数据库服务

[kingbase@node1 bin]$ ./sys_ctl start -D /data/kingbase/v8r6_054/data/
......
2022-05-12 15:29:34.641 CST [25993] HINT:  Future log output will appear in directory "sys_log".
...... stopped waiting
sys_ctl: could not start server
Examine the log output.

2、查看数据库sys_log日志

2022-05-12 15:29:35.309 CST [26003] LOG:  invalid primary checkpoint record
2022-05-12 15:29:35.309 CST [26003] PANIC:  could not locate a valid checkpoint record
2022-05-12 15:29:35.309 CST [26003] LOG:  kingbase ran into a problem it couldn't handle,it needs to be shutdown to prevent damage to your data
2022-05-12 15:29:35.346 CST [26003] WARNING:  
        ERROR:  -----------------------stack error start-----------------------
        ERROR:  TIME: 2022-05-12 15:29:35.309749+08
        ERROR:  1 26003 0x7fc2aa18ef6b debug_backtrace (backtrace.so)
        ERROR:  2 26003 0x7fc2aa18f53a <symbol not found> (backtrace.so)
        ERROR:  3 26003 0x7fc2b390a670 <symbol not found> (libc.so.6)
        ERROR:  4 26003 0x7fc2b390a5f7 gsignal (libc.so.6)
        ERROR:  5 26003 0x7fc2b390bce8 abort (libc.so.6)
        ERROR:  6 26003 0x9148dc errfinish + 0x4d008d3c
        ERROR:  7 26003 0x54011c StartupXLOG + 0x4cc3457c
        ERROR:  8 26003 0x774f51 StartupProcessMain + 0x4ce693b1
        ERROR:  9 26003 0x550550 AuxiliaryProcessMain + 0x4cc449b0
        ERROR:  10 26003 0x76f5c7 StartChildProcess + 0x4ce63a27
        ERROR:  11 26003 0x77350d PostmasterMain + 0x4ce6796d
        ERROR:  12 26003 0x6cb0af main + 0x4cdbf50f
        ERROR:  13 26003 0x7fc2b38f6b15 __libc_start_main (libc.so.6)
        ERROR:  14 26003 0x4a1659 _start + 0x4cbaac39

2022-05-12 15:29:40.654 CST [25993] LOG:  startup process (PID 26003) was terminated by signal 6: Aborted
2022-05-12 15:29:40.654 CST [25993] LOG:  aborting startup due to startup process failure
2022-05-12 15:29:40.728 CST [25993] LOG:  database system is shut down

=如上所示,数据库启动时,无法通过wal日志,读取到checkpoint信息,导致数据库启动失败。=

二、读取数据库控制文件信息

[kingbase@node1 bin]$ ./sys_controldata -D /data/kingbase/v8r6_054/data
sys_control version number:            1201
Catalog version number:               202202151
Database system identifier:           7096019857358041449
Database cluster state:               in production
sys_control last modified:             Wed 11 May 2022 01:26:44 PM CST
Latest checkpoint location:           0/8000058
Latest checkpoint's REDO location:    0/8000028
Latest checkpoint's REDO WAL file:    000000010000000000000008

三、查看当前的wal日志

=如下所示,检查点对应的wal日志文件“000000010000000000000008”已经缺失。=

[kingbase@node1 sys_wal]$ ls -lh
total 32M
-rw------- 1 kingbase kingbase 16M May 11 13:00 000000010000000000000009
-rw------- 1 kingbase kingbase 16M May 11 13:02 00000001000000000000000A
drwx------ 2 kingbase kingbase  78 May 11 13:49 archive_status

Tips:
=由于数据库checkpoint对应的wal日志缺失,数据库启动时,无法判断数据库的一致性状态,导致启动失败。对于以上情况,可以通过物理备份,将数据库恢复到过去的时间点,启动数据库;如果没有物理备份,也可以通过重建控制文件,启动数据库。但是这两种方法都会导致数据丢失,所以在执行数据库的日志清理时,操作之前一定要确认,选择的wal日志文件是正确的。=

四、重建控制文件

1、通过sys_resetwal重建控制文件

[kingbase@node1 bin]$ ./sys_resetwal -l 00000001000000000000000A -D /data/kingbase/v8r6_054/data
The database server was not shut down cleanly.
Resetting the write-ahead log might cause data to be lost.
If you want to proceed anyway, use -f to force reset.
[kingbase@node1 bin]$ ./sys_resetwal -l 00000001000000000000000A -D /data/kingbase/v8r6_054/data -f
Write-ahead log reset

2、查看控制文件重建后的wal日志

[kingbase@node1 sys_wal]$ ls -lh
total 16M
-rw------- 1 kingbase kingbase 16M May 12 15:46 00000001000000000000000B
drwx------ 2 kingbase kingbase   6 May 12 15:46 archive_status

3、查看控制文件信息

[kingbase@node1 bin]$ ./sys_controldata -D /data/kingbase/v8r6_054/data
sys_control version number:            1201
Catalog version number:               202202151
Database system identifier:           7096019857358041449
Database cluster state:               shut down
sys_control last modified:             Thu 12 May 2022 03:46:38 PM CST
Latest checkpoint location:           0/B000028
Latest checkpoint's REDO location:    0/B000028
Latest checkpoint's REDO WAL file:    00000001000000000000000B

五、启动数据库实例及验证

1、启动数据库

[kingbase@node1 bin]$ ./sys_ctl start -D /data/kingbase/v8r6_054/data/
waiting for server to start....2022-05-12 15:54:53.731 CST [30496] LOG:  sepapower extension initialized
.....
 done
server started

2、查看sys_log日志(数据库正常启动)

[kingbase@node1 sys_log]$ tail -100 kingbase-2022-05-12_155453.log
2022-05-12 15:54:53.919 CST [30498] LOG:  database system was shut down at 2022-05-12 15:46:38 CST
2022-05-12 15:54:54.132 CST [30496] LOG:  database system is ready to accept connections

3、访问数据库

[kingbase@node1 bin]$ ./ksql -U system -W  test -p 54322
Password: 
ksql (V8.0)
Type "help" for help.


test=# \d prod
Did not find any relation named "prod".
test=# \d
               List of relations
 Schema |        Name         | Type  | Owner  
--------+---------------------+-------+--------
 public | sys_stat_statements | view  | system
 public | t1                  | table | system
(2 rows)

六、总结

1、对于wal日志清理,可以使用sys_archivecleanup工具,首先通过控制文件判断需要保留的wal日志。
2、在执行清理时,一定要确认保留的日志是正确的。
3、对于生产环境执行此操作,最好由双人确认操作的正确性。

标签:wal,xlog,kingbase,sys,12,26003,日志,KingbaseES
来源: https://www.cnblogs.com/kingbase/p/16266365.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有