PostgreSQL 14 added support for using LZ4 as a TOAST compression algorithm. Compared with the long-standing PGLZ option, LZ4 is notably faster, and in many cases its compression ratio is just as good. From the perspective of reducing database CPU overhead, LZ4 is generally the more attractive choice.
Setting up a simple comparison
To see the difference directly, create two tables with the same structure and only change the compression method used for the TEXT column:
# 创建 t2 表,压缩使用 pglz
CREATE TABLE T2(
ID INT,
C1 TEXT COMPRESSION pglz
);
# 创建 t3 表,压缩使用 lz4
CREATE TABLE T3(
ID INT,
C1 TEXT COMPRESSION lz4
);
Before running the test, enable timing in psql:
tmp01=# \timing
Timing is on.
Insert performance
Load 100,000 rows into each table using the same generated text pattern:
tmp01=# INSERT INTO T2 SELECT i, repeat(i::varchar||'kkk',1000) FROM generate_series(1,100000) a(i);
INSERT 0 100000
Time: 2003.361 ms (00:02.003)
tmp01=# INSERT INTO T3 SELECT i, repeat(i::varchar||'kkk',1000) FROM generate_series(1,100000) a(i);
INSERT 0 100000
Time: 563.887 ms
With 100,000 inserted rows, the LZ4 table finishes the write in much less time. In this test, insert time drops by about 71% compared with PGLZ.
Verifying which compression was used
You can confirm the compression method on stored values with pg_column_compression:
tmp01=# SELECT ID, pg_column_compression ( c1 ) FROM t2 LIMIT 1;
id | pg_column_compression
----+-----------------------
1 | pglz
(1 row)
tmp01=# SELECT ID, pg_column_compression ( c1 ) FROM t3 LIMIT 1;
id | pg_column_compression
----+-----------------------
1 | lz4
(1 row)
As expected, t2.c1 is compressed with PGLZ and t3.c1 with LZ4.
Storage footprint
Compression speed is only part of the picture. The test data also shows a size difference in favor of LZ4.
Use the following query to inspect the relation size information:
SELECT
relname,
relpages,
relpages * 8 / 1024 size_MB,
reltuples,
reltoastrelid,
pg_relation_filepath ( reltoastrelid )
FROM
pg_class
WHERE
relname IN ( 't2', 't3' );
Output:
relname | relpages | size_mb | reltuples | reltoastrelid | pg_relation_filepath
---------+----------+---------+-----------+---------------+----------------------
t2 | 1799 | 14 | 100000 | 40970 | base/16385/40970
t3 | 1126 | 8 | 100000 | 40975 | base/16385/40975
For this particular dataset, LZ4 not only runs faster but also produces a smaller result. That does not guarantee the same outcome for every real-world workload, but in this test it clearly has an advantage.
Update performance
The same pattern appears when updating the column values:
tmp01=# UPDATE t2 SET c1 = REPEAT('abc',1000);
UPDATE 100000
Time: 1026.617 ms (00:01.027)
tmp01=# UPDATE t3 SET c1 = REPEAT('abc',1000);
UPDATE 100000
Time: 456.406 ms
LZ4 benefits here both from faster compression and, in this case, from producing smaller compressed values, which can also help read and write performance.
What happens with random strings
Compression is not always applied. If the data is effectively random, PostgreSQL may decide that compression is not worthwhile and store the original value instead.
Create a helper function that generates random strings, then insert test data:
-- 创建 random_string 函数
CREATE
OR REPLACE FUNCTION random_string ( INT ) RETURNS TEXT AS $$ SELECT
array_to_string(
ARRAY ( SELECT chr( ascii( 'B' ) + round( random( ) * 25 ) :: INTEGER ) FROM generate_series ( 1, $1 ) ),
''
) $$ LANGUAGE SQL;
-- 插入数据
INSERT INTO T4 SELECT i, random_string(1000) FROM generate_series(1,100000) a(i);
Then inspect one row from t4 (which uses the same structure as t2):
tmp01=# SELECT ID, pg_column_compression ( c1 ) FROM t4 LIMIT 1;
id | pg_column_compression
----+-----------------------
1 |
(1 row)
An empty result from pg_column_compression means no compression algorithm was used. The value was stored as-is.
This is expected behavior: when PostgreSQL determines that compression would not reduce the data enough, it keeps the original form instead of forcing compression.
Checking storage strategy and compression metadata
pg_column_compression is convenient for looking at stored values, but you can also inspect a column’s storage policy and compression setting directly in the catalog:
SELECT
attrelid,
attname,
attstorage,
attcompression
FROM
pg_attribute
WHERE
attrelid IN ( 't2' :: regclass, 't3' :: regclass )
AND attname = 'c1';
Output:
attrelid attname attstorage attcompression
40967 c1 x p
40972 c1 x l
In this case, neither table explicitly changed the storage strategy at creation time, so both columns use the default x, which means extended: the value may be compressed and may also be moved to the TOAST table.
The compression codes are:
p: PGLZl: LZ4
The storage strategy codes are:
- p : plain
- x : extended
- e : external
- m : main
One important limitation is that these options only apply to variable-length data types. Fixed-size base types such as int do not support being moved out of line in this way.
Changing an existing column to LZ4
If a table already exists, you can switch the column’s compression setting like this:
ALTER TABLE t4 ALTER COLUMN c1 SET COMPRESSION lz4;
That change only affects rows written after the setting is changed. Existing rows keep their previous compression behavior, which can be verified with pg_column_compression.
For workloads with compressible text data, PostgreSQL 14’s LZ4 support is worth serious attention. In the test shown here, it improves write speed substantially and also reduces storage size, though the exact gain still depends on the shape of the data itself.