机房突然断电zabbix运行日志出现大量的pg_error报错信息

zabbix_server.log日志出现query failed: [0] PGRES_FATAL_ERROR:ERROR: could not read block 619 in file "base/17376/55998": read only 0 of 32768 bytes大量报错

机房断电zabbix恢复运行，查看日志发现有大量的PGRES_FATAL_ERROR错误信息，截取部分日志。

报错信息：

[select clock,ns,value from history_uint where itemid=36570 and clock<=1662337221 and clock>1661732421 order by clock desc limit 2]

134751:20220906:092021.356 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR: could not read block 619 in file "base/17376/55998": read only 0 of 32768 bytes

[select clock,ns,value from history_uint where itemid=36570 and clock<=1662337221 and clock>1661732421 order by clock desc limit 2]

134751:20220906:092021.359 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR: could not read block 873 in file "base/17376/55991": read only 0 of 32768 bytes

[select clock,ns,value from history_uint where itemid=36567 and clock<=1662337221 and clock>1661732421 order by clock desc limit 2]

134751:20220906:092021.361 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR: could not read block 873 in file "base/17376/55991": read only 0 of 32768 bytes

[Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR: could not read block 874

分析：

发现是因为突然断电，导致pg数据库的表部分索引出现问题了，需要修复，但是由于zabbix数据使用timescaledb时序数据库插件以及超表分区功能，故无法指定单表修复。

修复损坏的 PostgreSQL 数据库/ 表

If your server happened to crash, PostgresSQL database is corrupted, but didn’t contain too precious information, you may try the following fix.

如果你的服务器突然发生崩溃，PostgresSQL 突然被中断，但是没有包含太多之前的信息，你可以尝试安装以下方法修复。

The typical symptoms of a corrupted Postgres database would be like below:

常见的因为数据库运行突然中断的日志结果如下：

2013-03-05 11:29:50 GMT ERROR: invalid page header in block 608102 of

relation base/16385/16615 2013-03-05 11:29:50 GMT STATEMENT: COPY

public.history (itemid, clock, value) TO stdout; 2013-03-05 11:29:50

GMT LOG: could not send data to client: Broken pipe

Or 或者

Query failed: [0] PGRES_FATAL_ERROR:ERROR: right sibling's left-link doesn't match:

block 149266 links to 70823 instead of expected 71357 in index "history_uint_1"

The actual fix is quite easy, and basically sets “zero_damaged_pages = on”, then performs vacuum and reindexing.

实际的修复也简单，在数据库设置sets “zero_damaged_pages = on”，然后执行vacuum and reindexing重新建立索引即可。

DATABASE=yourdatabase

TABLES=$(echo \\d | psql $DATABASE | grep "^ public" | awk '{print $3}')

for TABLE in $TABLES; do

echo $TABLE

echo "SET zero_damaged_pages = on; VACUUM FULL $TABLE; REINDEX TABLE $TABLE" | psql $DATABASEdone

在zabbix server或者是pg数据库服务器，创建shell脚本，将以上的复制到脚本，修改为在使用的数据库。

修复脚本：

vim pg_repair_index.sh

!# /bin/bash

DATABASE=zabbix #报错的数据库名称

TABLES=$(echo \\d | psql $DATABASE | grep "^ public" | awk '{print $3}')

for TABLE in $TABLES; do

echo $TABLE

echo "SET zero_damaged_pages = on; VACUUM FULL $TABLE; REINDEX TABLE $TABLE" | psql $DATABASE

done

给脚本执行权限

chmod +x pg_repair_index.sh

执行执行脚本 ./pg_repair_index.sh，会自动重置每个表的索引。

发表于 2022-09-06 15:13
阅读 ( 2299 )
分类：zabbix开源软件

机房突然断电zabbix运行日志出现大量的pg_error报错信息

报错信息：

分析：

修复损坏的 PostgreSQL 数据库/ 表

修复脚本：

你可能感兴趣的文章

相关问题

0 条评论

作家榜 »