机房突然断电zabbix运行日志出现大量的pg_error报错信息

zabbix_server.log日志出现query failed: [0] PGRES_FATAL_ERROR:ERROR: could not read block 619 in file "base/17376/55998": read only 0 of 32768 bytes大量报错

机房断电zabbix恢复运行,查看日志发现有大量的PGRES_FATAL_ERROR错误信息,截取部分日志。

报错信息:

 [select clock,ns,value from history_uint where itemid=36570 and clock<=1662337221 and clock>1661732421 order by clock desc limit 2]

134751:20220906:092021.356 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR:  could not read block 619 in file "base/17376/55998": read only 0 of 32768 bytes

 [select clock,ns,value from history_uint where itemid=36570 and clock<=1662337221 and clock>1661732421 order by clock desc limit 2]

134751:20220906:092021.359 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR:  could not read block 873 in file "base/17376/55991": read only 0 of 32768 bytes

 [select clock,ns,value from history_uint where itemid=36567 and clock<=1662337221 and clock>1661732421 order by clock desc limit 2]

134751:20220906:092021.361 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR:  could not read block 873 in file "base/17376/55991": read only 0 of 32768 bytes

[Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR: could not read block 874


分析:

发现是因为突然断电,导致pg数据库的表部分索引出现问题了,需要修复,但是由于zabbix数据使用timescaledb时序数据库插件以及超表分区功能,故无法指定单表修复。

 

修复损坏的 PostgreSQL 数据库/

If your server happened to crash, PostgresSQL database is corrupted, but didn’t contain too precious information, you may try the following fix.

如果你的服务器突然发生崩溃,PostgresSQL 突然被中断,但是没有包含太多之前的信息,你可以尝试安装以下方法修复。

The typical symptoms of a corrupted Postgres database would be like below:

常见的因为数据库运行突然中断的日志结果如下:

 

2013-03-05 11:29:50 GMT ERROR:  invalid page header in block 608102 of

relation base/16385/16615 2013-03-05 11:29:50 GMT STATEMENT:  COPY

public.history (itemid, clock, value) TO stdout; 2013-03-05 11:29:50

GMT LOG:  could not send data to client: Broken pipe

Or 或者

Query failed: [0] PGRES_FATAL_ERROR:ERROR:  right sibling's left-link doesn't match:

block 149266 links to 70823 instead of expected 71357 in index "history_uint_1"

The actual fix is quite easy, and basically sets “zero_damaged_pages = on”, then performs vacuum and reindexing.

实际的修复也简单,在数据库设置sets “zero_damaged_pages = on”,然后执行vacuum and reindexing重新建立索引即可。

DATABASE=yourdatabase

 TABLES=$(echo \\d | psql $DATABASE | grep "^ public" | awk '{print $3}')

 for TABLE in $TABLES; do 

   echo $TABLE

   echo "SET zero_damaged_pages = on; VACUUM FULL $TABLE; REINDEX TABLE $TABLE" | psql $DATABASEdone

 

zabbix server或者是pg数据库服务器,创建shell脚本,将以上的复制到脚本,修改为在使用的数据库。

修复脚本:

vim pg_repair_index.sh

!# /bin/bash

 

DATABASE=zabbix   #报错的数据库名称

 

TABLES=$(echo \\d | psql $DATABASE | grep "^ public" | awk '{print $3}')

 

for TABLE in $TABLES; do

   echo $TABLE

   echo "SET zero_damaged_pages = on; VACUUM FULL $TABLE; REINDEX TABLE $TABLE" | psql $DATABASE

done

 

给脚本执行权限

chmod +x pg_repair_index.sh

执行执行脚本 ./pg_repair_index.sh,会自动重置每个表的索引。

0 条评论

请先 登录 后评论
御前侍卫张五哥
御前侍卫张五哥

羽毛球菜鸟

9 篇文章

作家榜 »

  1. 乐维君 398 文章
  2. YOHOHO 14 文章
  3. 机灵小和尚 13 文章
  4. 我是一只小菜鸡 12 文章
  5. 细雨闲花 11 文章
  6. 。。。 9 文章
  7. 御前侍卫张五哥 9 文章
  8. 小黄人 8 文章