Home About Me

Testing TOAST Compression in PostgreSQL 14: How LZ4 Compares to PGLZ

PostgreSQL 14 added support for using LZ4 as a TOAST compression algorithm. Compared with the long-standing PGLZ option, LZ4 is notably faster, and in many cases its compression ratio is just as good. From the perspective of reducing database CPU overhead, LZ4 is generally the more attractive choice.

Setting up a simple comparison

To see the difference directly, create two tables with the same structure and only change the compression method used for the TEXT column:

# 创建 t2 表,压缩使用 pglz
CREATE TABLE T2(
    ID INT,
    C1 TEXT COMPRESSION pglz
);

# 创建 t3 表,压缩使用 lz4
CREATE TABLE T3(
    ID INT,
    C1 TEXT COMPRESSION lz4
);

Before running the test, enable timing in psql:

tmp01=# \timing
Timing is on.

Insert performance

Load 100,000 rows into each table using the same generated text pattern:

tmp01=# INSERT INTO T2 SELECT i, repeat(i::varchar||'kkk',1000) FROM generate_series(1,100000) a(i);
INSERT 0 100000
Time: 2003.361 ms (00:02.003)
tmp01=# INSERT INTO T3 SELECT i, repeat(i::varchar||'kkk',1000) FROM generate_series(1,100000) a(i);
INSERT 0 100000
Time: 563.887 ms

With 100,000 inserted rows, the LZ4 table finishes the write in much less time. In this test, insert time drops by about 71% compared with PGLZ.

Verifying which compression was used

You can confirm the compression method on stored values with pg_column_compression:

tmp01=# SELECT ID, pg_column_compression ( c1 ) FROM t2 LIMIT 1;
 id | pg_column_compression
----+-----------------------
  1 | pglz
(1 row)

tmp01=# SELECT ID, pg_column_compression ( c1 ) FROM t3 LIMIT 1;
 id | pg_column_compression
----+-----------------------
  1 | lz4
(1 row)

As expected, t2.c1 is compressed with PGLZ and t3.c1 with LZ4.

Storage footprint

Compression speed is only part of the picture. The test data also shows a size difference in favor of LZ4.

Use the following query to inspect the relation size information:

SELECT
    relname,
    relpages,
    relpages * 8 / 1024 size_MB,
    reltuples,
    reltoastrelid,
    pg_relation_filepath ( reltoastrelid )
FROM
    pg_class
WHERE
    relname IN ( 't2', 't3' );

Output:

 relname | relpages | size_mb | reltuples | reltoastrelid | pg_relation_filepath
---------+----------+---------+-----------+---------------+----------------------
 t2      |     1799 |      14 |    100000 |         40970 | base/16385/40970
 t3      |     1126 |       8 |    100000 |         40975 | base/16385/40975

For this particular dataset, LZ4 not only runs faster but also produces a smaller result. That does not guarantee the same outcome for every real-world workload, but in this test it clearly has an advantage.

Update performance

The same pattern appears when updating the column values:

tmp01=# UPDATE t2 SET c1 =  REPEAT('abc',1000);
UPDATE 100000
Time: 1026.617 ms (00:01.027)
tmp01=# UPDATE t3 SET c1 =  REPEAT('abc',1000);
UPDATE 100000
Time: 456.406 ms

LZ4 benefits here both from faster compression and, in this case, from producing smaller compressed values, which can also help read and write performance.

What happens with random strings

Compression is not always applied. If the data is effectively random, PostgreSQL may decide that compression is not worthwhile and store the original value instead.

Create a helper function that generates random strings, then insert test data:

-- 创建 random_string 函数
CREATE
    OR REPLACE FUNCTION random_string ( INT ) RETURNS TEXT AS $$ SELECT
    array_to_string(
        ARRAY ( SELECT chr( ascii( 'B' ) + round( random( ) * 25 ) :: INTEGER ) FROM generate_series ( 1, $1 ) ),
        ''
    ) $$ LANGUAGE SQL;


-- 插入数据
INSERT INTO T4 SELECT i, random_string(1000) FROM generate_series(1,100000) a(i);

Then inspect one row from t4 (which uses the same structure as t2):

tmp01=# SELECT ID, pg_column_compression ( c1 ) FROM t4 LIMIT 1;
 id | pg_column_compression
----+-----------------------
  1 |
(1 row)

An empty result from pg_column_compression means no compression algorithm was used. The value was stored as-is.

This is expected behavior: when PostgreSQL determines that compression would not reduce the data enough, it keeps the original form instead of forcing compression.

Checking storage strategy and compression metadata

pg_column_compression is convenient for looking at stored values, but you can also inspect a column’s storage policy and compression setting directly in the catalog:

SELECT
    attrelid,
    attname,
    attstorage,
    attcompression
FROM
    pg_attribute
WHERE
    attrelid IN ( 't2' :: regclass, 't3' :: regclass )
    AND attname = 'c1';

Output:

attrelid    attname attstorage  attcompression
40967   c1  x   p
40972   c1  x   l

In this case, neither table explicitly changed the storage strategy at creation time, so both columns use the default x, which means extended: the value may be compressed and may also be moved to the TOAST table.

The compression codes are:

  • p: PGLZ
  • l: LZ4

The storage strategy codes are:

  • p : plain
  • x : extended
  • e : external
  • m : main

One important limitation is that these options only apply to variable-length data types. Fixed-size base types such as int do not support being moved out of line in this way.

Changing an existing column to LZ4

If a table already exists, you can switch the column’s compression setting like this:

ALTER TABLE t4 ALTER COLUMN c1 SET COMPRESSION lz4;

That change only affects rows written after the setting is changed. Existing rows keep their previous compression behavior, which can be verified with pg_column_compression.

For workloads with compressible text data, PostgreSQL 14’s LZ4 support is worth serious attention. In the test shown here, it improves write speed substantially and also reduces storage size, though the exact gain still depends on the shape of the data itself.