机房断电zabbix恢复运行,查看日志发现有大量的PGRES_FATAL_ERROR错误信息,截取部分日志。
[select clock,ns,value from history_uint where itemid=36570 and clock<=1662337221 and clock>1661732421 order by clock desc limit 2]
134751:20220906:092021.356 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR: could not read block 619 in file "base/17376/55998": read only 0 of 32768 bytes
[select clock,ns,value from history_uint where itemid=36570 and clock<=1662337221 and clock>1661732421 order by clock desc limit 2]
134751:20220906:092021.359 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR: could not read block 873 in file "base/17376/55991": read only 0 of 32768 bytes
[select clock,ns,value from history_uint where itemid=36567 and clock<=1662337221 and clock>1661732421 order by clock desc limit 2]
134751:20220906:092021.361 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR: could not read block 873 in file "base/17376/55991": read only 0 of 32768 bytes
[Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR: could not read block 874
发现是因为突然断电,导致pg数据库的表部分索引出现问题了,需要修复,但是由于zabbix数据使用timescaledb时序数据库插件以及超表分区功能,故无法指定单表修复。
If your server happened to crash, PostgresSQL database is corrupted, but didn’t contain too precious information, you may try the following fix.
如果你的服务器突然发生崩溃,PostgresSQL 突然被中断,但是没有包含太多之前的信息,你可以尝试安装以下方法修复。
The typical symptoms of a corrupted Postgres database would be like below:
常见的因为数据库运行突然中断的日志结果如下:
2013-03-05 11:29:50 GMT ERROR: invalid page header in block 608102 of
relation base/16385/16615 2013-03-05 11:29:50 GMT STATEMENT: COPY
public.history (itemid, clock, value) TO stdout; 2013-03-05 11:29:50
GMT LOG: could not send data to client: Broken pipe
Or 或者
Query failed: [0] PGRES_FATAL_ERROR:ERROR: right sibling's left-link doesn't match:
block 149266 links to 70823 instead of expected 71357 in index "history_uint_1"
The actual fix is quite easy, and basically sets “zero_damaged_pages = on”, then performs vacuum and reindexing.
实际的修复也简单,在数据库设置sets “zero_damaged_pages = on”,然后执行vacuum and reindexing重新建立索引即可。
DATABASE=yourdatabase
TABLES=$(echo \\d | psql $DATABASE | grep "^ public" | awk '{print $3}')
for TABLE in $TABLES; do
echo $TABLE
echo "SET zero_damaged_pages = on; VACUUM FULL $TABLE; REINDEX TABLE $TABLE" | psql $DATABASEdone
在zabbix server或者是pg数据库服务器,创建shell脚本,将以上的复制到脚本,修改为在使用的数据库。
vim pg_repair_index.sh
!# /bin/bash
DATABASE=zabbix #报错的数据库名称
TABLES=$(echo \\d | psql $DATABASE | grep "^ public" | awk '{print $3}')
for TABLE in $TABLES; do
echo $TABLE
echo "SET zero_damaged_pages = on; VACUUM FULL $TABLE; REINDEX TABLE $TABLE" | psql $DATABASE
done
给脚本执行权限
chmod +x pg_repair_index.sh
执行执行脚本 ./pg_repair_index.sh,会自动重置每个表的索引。
错误信息