首页 > 代码库 > pg_repack bloat 处理测试初步

pg_repack bloat 处理测试初步

一、软件安装


1.软件需求:

postgresql-9.5.2.tar.gz

pg_repack-1.3.4.zip


2.安装pg_repack


[root@localhost pg_repack-1.3.4]# export PATH=/opt/pgsql/9.5.2/bin:$PATH

[root@localhost pg_repack-1.3.4]# export LD_LIBRARY_PATH=/opt/pgsql/9.5.2/lib

[root@localhost pg_repack-1.3.4]# export MANPATH=/opt/pgsql/9.5.2/share/man:$MANPATH

[root@localhost pg_repack-1.3.4]# make

make[1]: Entering directory `/home/soft/pg_repack-1.3.4/bin‘

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -O2 -I/opt/pgsql/9.5.2/include -DREPACK_VERSION=1.3.4 -I. -I./ -I/opt/pgsql/9.5.2/include/server -I/opt/pgsql/9.5.2/include/internal -D_GNU_SOURCE   -c -o pg_repack.o pg_repack.c

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -O2 -I/opt/pgsql/9.5.2/include -DREPACK_VERSION=1.3.4 -I. -I./ -I/opt/pgsql/9.5.2/include/server -I/opt/pgsql/9.5.2/include/internal -D_GNU_SOURCE   -c -o pgut/pgut.o pgut/pgut.c

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -O2 -I/opt/pgsql/9.5.2/include -DREPACK_VERSION=1.3.4 -I. -I./ -I/opt/pgsql/9.5.2/include/server -I/opt/pgsql/9.5.2/include/internal -D_GNU_SOURCE   -c -o pgut/pgut-fe.o pgut/pgut-fe.c

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -O2 pg_repack.o pgut/pgut.o pgut/pgut-fe.o -L/opt/pgsql/9.5.2/lib -lpq -L/opt/pgsql/9.5.2/lib -Wl,--as-needed -Wl,-rpath,‘/opt/pgsql/9.5.2/lib‘,--enable-new-dtags  -lpgcommon -lpgport -lz -lreadline -lrt -lcrypt -ldl -lm -o pg_repack

make[1]: Leaving directory `/home/soft/pg_repack-1.3.4/bin‘

make[1]: Entering directory `/home/soft/pg_repack-1.3.4/lib‘

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -O2 -fpic -DREPACK_VERSION=1.3.4 -I. -I./ -I/opt/pgsql/9.5.2/include/server -I/opt/pgsql/9.5.2/include/internal -D_GNU_SOURCE   -c -o repack.o repack.c

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -O2 -fpic -DREPACK_VERSION=1.3.4 -I. -I./ -I/opt/pgsql/9.5.2/include/server -I/opt/pgsql/9.5.2/include/internal -D_GNU_SOURCE   -c -o pgut/pgut-be.o pgut/pgut-be.c

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -O2 -fpic -DREPACK_VERSION=1.3.4 -I. -I./ -I/opt/pgsql/9.5.2/include/server -I/opt/pgsql/9.5.2/include/internal -D_GNU_SOURCE   -c -o pgut/pgut-spi.o pgut/pgut-spi.c

( echo ‘{ global:‘; gawk ‘/^[^#]/ {printf "%s;\n",$1}‘ exports.txt; echo ‘ local: *; };‘ ) >exports.list

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -O2 -fpic -shared -Wl,--version-script=exports.list -o pg_repack.so repack.o pgut/pgut-be.o pgut/pgut-spi.o -L/opt/pgsql/9.5.2/lib -Wl,--as-needed -Wl,-rpath,‘/opt/pgsql/9.5.2/lib‘,--enable-new-dtags  

sed ‘s,REPACK_VERSION,1.3.4,g‘ pg_repack.sql.in > pg_repack--1.3.4.sql;

sed ‘s,REPACK_VERSION,1.3.4,g‘ pg_repack.control.in > pg_repack.control

make[1]: Leaving directory `/home/soft/pg_repack-1.3.4/lib‘

make[1]: Entering directory `/home/soft/pg_repack-1.3.4/regress‘

make[1]: Nothing to be done for `all‘.

make[1]: Leaving directory `/home/soft/pg_repack-1.3.4/regress‘

[root@localhost pg_repack-1.3.4]# make install

make[1]: Entering directory `/home/soft/pg_repack-1.3.4/bin‘

/bin/mkdir -p ‘/opt/pgsql/9.5.2/bin‘

/usr/bin/install -c  pg_repack ‘/opt/pgsql/9.5.2/bin‘

make[1]: Leaving directory `/home/soft/pg_repack-1.3.4/bin‘

make[1]: Entering directory `/home/soft/pg_repack-1.3.4/lib‘

/bin/mkdir -p ‘/opt/pgsql/9.5.2/lib‘

/bin/mkdir -p ‘/opt/pgsql/9.5.2/share/extension‘

/bin/mkdir -p ‘/opt/pgsql/9.5.2/share/extension‘

/usr/bin/install -c -m 755  pg_repack.so ‘/opt/pgsql/9.5.2/lib/pg_repack.so‘

/usr/bin/install -c -m 644 .//pg_repack.control ‘/opt/pgsql/9.5.2/share/extension/‘

/usr/bin/install -c -m 644  pg_repack--1.3.4.sql pg_repack.control ‘/opt/pgsql/9.5.2/share/extension/‘

make[1]: Leaving directory `/home/soft/pg_repack-1.3.4/lib‘

make[1]: Entering directory `/home/soft/pg_repack-1.3.4/regress‘

make[1]: Nothing to be done for `install‘.

make[1]: Leaving directory `/home/soft/pg_repack-1.3.4/regress‘

[root@localhost pg_repack-1.3.4]# 


3.创建初始环境


[postgres@localhost ~]$ createdb bloatdb

[postgres@localhost ~]$ psql -d bloatdb -c "create extension pgstattuple;"

CREATE EXTENSION

[postgres@localhost ~]$ psql -d bloatdb -c "CREATE EXTENSION pg_repack;"

CREATE EXTENSION

[postgres@localhost ~]$ 

$ psql bloatdb

psql (9.5.2)

Type "help" for help.


bloatdb=# \dx

                                   List of installed extensions

    Name     | Version |   Schema   |                         Description                          

-------------+---------+------------+--------------------------------------------------------------

 pg_repack   | 1.3.4   | public     | Reorganize tables in PostgreSQL databases with minimal locks

 pgstattuple | 1.3     | public     | show tuple-level statistics

 plpgsql     | 1.0     | pg_catalog | PL/pgSQL procedural language

(3 rows)



二、静态(无活跃交易)膨胀整理测试


1.处理表tbl指定索引

1).准备环境

bloatdb=# create table tbl(id int primary key, first varchar(20),second varchar(20));

CREATE TABLE

bloatdb=# create index idx_tbl_first on tbl (first);

CREATE INDEX

bloatdb=# create index idx_tbl_second on tbl (second);

CREATE INDEX

bloatdb=# SELECT count(*) FROM tbl;

 count 

-------

     0

(1 row)


bloatdb=# SELECT pg_size_pretty(pg_total_relation_size(‘tbl‘));

 pg_size_pretty 

----------------

 24 kB

(1 row)


bloatdb=# INSERT INTO tbl VALUES(generate_series(1,10000), ‘first‘||(random()*(10^3))::integer, ‘second‘||(random()*(10^3))::integer);

INSERT 0 10000

bloatdb=# SELECT count(*) FROM tbl;

 count 

-------

 10000

(1 row)


bloatdb=# SELECT pg_size_pretty(pg_total_relation_size(‘tbl‘));

 pg_size_pretty 

----------------

 1584 kB

(1 row)


bloatdb=# 


更新列

bloatdb=# UPDATE tbl SET first= ‘updated-001‘;

UPDATE 10000

bloatdb=# SELECT count(*) FROM tbl;

 count 

-------

 10000

(1 row)


bloatdb=# SELECT pg_size_pretty(pg_total_relation_size(‘tbl‘));

 pg_size_pretty 

----------------

 3376 kB

(1 row)


bloatdb=# 


2).查询膨胀率

建立膨胀统计表

[postgres@localhost ~]$ /home/soft/pg_bloat_check-master/pg_bloat_check.py -c "dbname=bloatdb" --create_stats_table


膨胀统计

[postgres@localhost ~]$ /home/soft/pg_bloat_check-master/pg_bloat_check.py -c "dbname=bloatdb" -t tbl 

1. public.idx_tbl_second.......................................................(52.69%) 417 kB wasted

2. public.idx_tbl_first........................................................(52.64%) 413 kB wasted

3. public.tbl_pkey.............................................................(57.79%) 388 kB wasted

[postgres@localhost ~]$ 


3).处理膨胀

指定数据库的特定索引

[postgres@localhost ~]$ pg_repack -d bloatdb --index idx_tbl_first

INFO: repacking index "public"."idx_tbl_first"

[postgres@localhost ~]$ /home/soft/pg_bloat_check-master/pg_bloat_check.py -c "dbname=bloatdb" -t tbl

1. public.idx_tbl_second.......................................................(52.69%) 417 kB wasted

2. public.tbl_pkey.............................................................(57.79%) 388 kB wasted

3. public.idx_tbl_first.....................................................(0.93%) 3121 bytes wasted

[postgres@localhost ~]$ 


2.处理表tbl所有索引

1).准备环境

bloatdb=# update tbl set second=‘chris‘;

UPDATE 10000

bloatdb=# SELECT count(*) FROM tbl;

 count 

-------

 10000

(1 row)


bloatdb=# SELECT pg_size_pretty(pg_total_relation_size(‘tbl‘));

 pg_size_pretty 

----------------

 3600 kB

(1 row)


bloatdb=#

bloatdb=# update tbl set first=‘chris‘;

UPDATE 10000

bloatdb=# SELECT count(*) FROM tbl;

 count 

-------

 10000

(1 row)


bloatdb=# SELECT pg_size_pretty(pg_total_relation_size(‘tbl‘));

 pg_size_pretty 

----------------

 4176 kB

(1 row)


bloatdb=# 

2).检查膨胀

[postgres@localhost ~]$ /home/soft/pg_bloat_check-master/pg_bloat_check.py -c "dbname=bloatdb" -t tbl 

1. public.idx_tbl_second.......................................................(59.94%) 820 kB wasted

2. public.idx_tbl_first........................................................(40.94%) 409 kB wasted

3. public.tbl_pkey.............................................................(28.73%) 193 kB wasted

[postgres@localhost ~]$ 


3).处理tbl表所有索引膨胀

[postgres@localhost ~]$ pg_repack -d bloatdb --table tbl --only-indexes

INFO: repacking indexes of "tbl"

INFO: repacking index "public"."idx_tbl_first"

INFO: repacking index "public"."idx_tbl_second"

INFO: repacking index "public"."tbl_pkey"

[postgres@localhost ~]$ /home/soft/pg_bloat_check-master/pg_bloat_check.py -c "dbname=bloatdb" -t tbl 

1. public.idx_tbl_first.....................................................(1.23%) 3028 bytes wasted

2. public.idx_tbl_second....................................................(1.23%) 3028 bytes wasted

3. public.tbl_pkey..........................................................(1.23%) 3028 bytes wasted

[postgres@localhost ~]$ 


3.处理tbl数据和索引膨胀

1).索引膨胀

[postgres@localhost ~]$ /home/soft/pg_bloat_check-master/pg_bloat_check.py -c "dbname=bloatdb" -t tbl

1. public.idx_tbl_first.........................................................(57.87%) 49 MB wasted

2. public.idx_tbl_second........................................................(39.29%) 34 MB wasted

3. public.tbl_pkey..............................................................(51.22%) 26 MB wasted

 

2).处理膨胀online VACUUM FULL 数据库bloatdb表tbl(数据和索引)

[postgres@localhost ~]$ pg_repack --no-order --table tbl -d bloatdb

INFO: repacking table "tbl"

[postgres@localhost ~]$ /home/soft/pg_bloat_check-master/pg_bloat_check.py -c "dbname=bloatdb" -t tbl

1. public.tbl_pkey..............................................................(0.0%) 0 bytes wasted

2. public.idx_tbl_second........................................................(0.0%) 0 bytes wasted

3. public.idx_tbl_first.........................................................(0.0%) 0 bytes wasted

[postgres@localhost ~]$ 


三、动态(有交易发生时)膨胀处理


1.整个表做膨胀处理

1).初始条件

-- clear table data

bloatdb=# select * from tbl;

 id | first | second 

----+-------+--------

(0 rows)


bloatdb=# 

bloatdb=# INSERT INTO tbl VALUES(generate_series(1,100000), ‘first‘||(random()*(10^3))::integer, ‘second‘||(random()*(10^3))::integer);

INSERT 0 100000

bloatdb=# UPDATE tbl SET first= ‘updated-001‘;

UPDATE 100000

bloatdb=# 

-- check bloat

[postgres@localhost ~]$ /home/soft/pg_bloat_check-master/pg_bloat_check.py -c "dbname=bloatdb" -t tbl 

1. public.idx_tbl_second........................................................(67.26%) 17 MB wasted

2. public.idx_tbl_first.........................................................(67.46%) 17 MB wasted

3. public.tbl_pkey............................................................(63.91%) 9832 kB wasted

[postgres@localhost ~]$ 


2).大量插入数据同时做膨胀处理

statement_timeout=0, 视情况调整:maintenance_work_mem,wal_keep_segments(streaming,SSD<2000>)

先插入数据,过程中处理膨胀加上-T参数值为3600.

-- session 1:insert data

bloatdb=# INSERT INTO tbl VALUES(generate_series(100001,3000000), ‘first‘||(random()*(10^3))::integer, ‘second‘||(random()*(10^3))::integer);

光标闪烁

-- session 2:repack during insert

$ pg_repack -d bloatdb --no-order --table tbl --wait-timeout=3600

INFO: repacking table "tbl"

光标闪烁


--session 1 finish insert

bloatdb=# INSERT INTO tbl VALUES(generate_series(100001,3000000), ‘first‘||(random()*(10^3))::integer, ‘second‘||(random()*(10^3))::integer);

INSERT 0 2900000

bloatdb=# 


-- session 2: finish repack

[postgres@localhost ~]$ pg_repack -d bloatdb --no-order --table tbl --wait-timeout=3600

INFO: repacking table "tbl"

-- session 2:膨胀检查

[postgres@localhost ~]$ /home/soft/pg_bloat_check-master/pg_bloat_check.py -c "dbname=bloatdb" -t tbl 

1. public.tbl_pkey..............................................................(0.0%) 0 bytes wasted

2. public.idx_tbl_second........................................................(0.0%) 0 bytes wasted

3. public.idx_tbl_first.........................................................(0.0%) 0 bytes wasted

[postgres@localhost ~]$ 

-- session 1: 数据检查

bloatdb=# select count(*) from tbl ;

  count  

---------

 3000000

(1 row)


bloatdb=# 


2.指定tbl表所有索引膨胀处理


1).准备数据


--session 1: insert data

bloatdb=# delete FROM tbl;

DELETE 3000000

bloatdb=# INSERT INTO tbl VALUES(generate_series(1,100000), ‘first‘||(random()*(10^3))::integer, ‘second‘||(random()*(10^3))::integer);

INSERT 0 100000

bloatdb=# update tbl set first=‘chris‘;

UPDATE 100000

bloatdb=# 


-- session 2:check bloat


[postgres@localhost ~]$ /home/soft/pg_bloat_check-master/pg_bloat_check.py -c "dbname=bloatdb" -t tbl 

1. public.tbl_pkey..............................................................(41.14%) 28 MB wasted

2. public.idx_tbl_second.......................................................(4.32%) 4471 kB wasted

3. public.idx_tbl_first........................................................(2.96%) 2889 kB wasted

[postgres@localhost ~]$


2).online insert and repack

--session 1: insert large data

bloatdb=# INSERT INTO tbl VALUES(generate_series(100001,3000000), ‘first‘||(random()*(10^3))::integer, ‘second‘||(random()*(10^3))::integer);

光标闪烁


-- session 2:process bloat,during session 1 inert large data

[postgres@localhost ~]$ pg_repack -d bloatdb --table tbl --only-indexes -T 3600

INFO: repacking indexes of "tbl"

INFO: repacking index "public"."idx_tbl_first"

INFO: repacking index "public"."idx_tbl_second"

光标闪烁

--session 1:insert finish

bloatdb=# INSERT INTO tbl VALUES(generate_series(100001,3000000), ‘first‘||(random()*(10^3))::integer, ‘second‘||(random()*(10^3))::integer);

INSERT 0 2900000

bloatdb=# 

--session 2:repack finish

[postgres@localhost ~]$ pg_repack -d bloatdb --table tbl --only-indexes -T 3600

INFO: repacking indexes of "tbl"

INFO: repacking index "public"."idx_tbl_first"

INFO: repacking index "public"."idx_tbl_second"

INFO: repacking index "public"."tbl_pkey"


3) check table data and index bloat

--session 2:check bloat

[postgres@localhost ~]$ /home/soft/pg_bloat_check-master/pg_bloat_check.py -c "dbname=bloatdb" -t tbl 

1. public.tbl_pkey..............................................................(0.0%) 0 bytes wasted

2. public.idx_tbl_first.........................................................(0.0%) 0 bytes wasted

3. public.idx_tbl_second........................................................(0.0%) 0 bytes wasted

[postgres@localhost ~]$ 

--session 1:check table data

bloatdb=# select count(*) from tbl;

  count  

---------

 3000000

(1 row)


bloatdb=# 



3.指定tbl表指定索引膨胀处理


注意:--index(concurrently方式创建索引),--only-indexes无法同时使用。

[postgres@localhost ~]$ pg_repack -d bloatdb --index idx_tbl_first --only-indexes 

ERROR: cannot specify --index (-i) and --only-indexes (-x)


1).准备数据

-- read data

bloatdb=# delete FROM tbl;

DELETE 3000000

bloatdb=# INSERT INTO tbl VALUES(generate_series(1,100000), ‘first‘||(random()*(10^3))::integer, ‘second‘||(random()*(10^3))::integer);

INSERT 0 100000

bloatdb=# update tbl set first=‘chris‘;

UPDATE 100000

bloatdb=# 

-- check bloat

[postgres@localhost ~]$ /home/soft/pg_bloat_check-master/pg_bloat_check.py -c "dbname=bloatdb" -t tbl 

1. public.idx_tbl_second........................................................(47.57%) 97 MB wasted

2. public.tbl_pkey.............................................................(9.44%) 7206 kB wasted

3. public.idx_tbl_first........................................................(3.11%) 3040 kB wasted

[postgres@localhost ~]$ 



2).online insert and repack

--session 1: insert large data

bloatdb=# INSERT INTO tbl VALUES(generate_series(100001,3000000), ‘first‘||(random()*(10^3))::integer, ‘second‘||(random()*(10^3))::integer);

光标闪烁


-- session 2:process bloat,during session 1 inert large data

[postgres@localhost ~]$ pg_repack -d bloatdb --index idx_tbl_second --wait-timeout=3600

INFO: repacking index "public"."idx_tbl_second"

光标闪烁


--session 1:insert finish

bloatdb=# INSERT INTO tbl VALUES(generate_series(100001,3000000), ‘first‘||(random()*(10^3))::integer, ‘second‘||(random()*(10^3))::integer);

INSERT 0 2900000

bloatdb=# 

--session 2:repack finish

[postgres@localhost ~]$ pg_repack -d bloatdb --index idx_tbl_second --wait-timeout=3600

INFO: repacking index "public"."idx_tbl_second"

[postgres@localhost ~]$ 


3) check table data and index bloat

--session 2:check bloat

[postgres@localhost ~]$ /home/soft/pg_bloat_check-master/pg_bloat_check.py -c "dbname=bloatdb" -t tbl 

1. public.idx_tbl_first........................................................(50.77%) 102 MB wasted

2. public.tbl_pkey...............................................................(47.6%) 65 MB wasted

3. public.idx_tbl_second........................................................(0.0%) 0 bytes wasted

[postgres@localhost ~]$ 

--session 1:check table data

bloatdb=# select count(*) from tbl;

  count  

---------

 3000000

(1 row)


bloatdb=# 

 


测试结论:

  1. 一般同等条件下,索引比数据更容易膨胀。

  2. 在磁盘空间较紧张的情况下,建议一条接着一条索引处理。

  3. 一般bloat处理所需磁盘空闲空间是对象size的2倍,所以处理前必须先关注空闲磁盘空间大小。

  4. 注意pg_repack版本对Pg版本的支持情况,9.6截至2016-11-26仍未支持,详见http://pgxn.org/dist/pg_repack/doc/pg_repack.html#Releases。

  5. 处理存在在线交易的表或者索引对象的bloat时,注意设置超时参数--wait-timeout,一般设置为1800或3600(特别感谢李海龙建议)。

特别声明:本说明只针对此次测试环境,在生产环境要在业务低峰时期运行,为了保证系统数据安全,建议先备份数据,然后做膨胀处理


本文出自 “yiyi” 博客,请务必保留此出处http://heyiyi.blog.51cto.com/205455/1876843

pg_repack bloat 处理测试初步