Postgres Pro TDE — security and performance / forpes.ru

Главная
Postgres Pro TDE — security and performance

Postgres Pro TDE — security and performance +11

15.09.2025 13:29

melanny20 0 407 Источник

Data needs protection

Databases hold a lot of information. And usually, part of that information is so critical that it has to be properly protected from unauthorized access. At the SQL interface level, this is handled by the Access Control List (ACL). Commands like GRANT and REVOKE work with that mechanism — and they do their job well and reliably.

Special features of Postrges Pro Enterprise

Postgres Pro Enterprise provides special additional access control features through ACL. There are separation of duties between privileged DBMS users and restricting DBMS administrator's data access.

But what if an attacker gets direct access to the database files? That can happen, for example, if a backup ends up stored in an insufficiently secure location. Or if your database is deployed in the cloud and you have no idea what kind of administrators might have physical access to the servers.

Yes, you can encrypt backups — most modern backup systems support that. But imagine you need to archive 10 TB of data. Even copying that much data takes ages, and if you also encrypt it… the pain gets real. Especially if you have to use GOST crypto, since there’s no hardware acceleration for it in Intel/AMD, ARM, or even Elbrus CPUs. Restoring that archive takes forever too.

And what if an attacker has access to the file system? Say, a hardware admin in a data center who maintains servers, including your DB servers. They might not have SQL access, but they see the database files. Nothing stops them from logging into the server and running a simple cp to grab the entire database. Disk encryption won’t save you here, and neither will encrypted backups. In a large data center with different teams handling different roles, the probability of this happening might be low — but it’s not zero.

For threats like this, the next bastion of defense was invented: encrypting the database files themselves. Naturally, there are multiple ways to implement this. For legal reasons, I’ll call it protective transformation or encoding.

Level 0. Client-side encoding (pgcrypto)

The most obvious and easiest way is: don’t store sensitive data in plaintext in the DB, but encode it before it even gets there. That means the encoding keys must be kept either on the client or on some proxy that sits in the middle of your SQL queries and results.

PostgreSQL has a well-known extension called pgcrypto, which implements this. Many vendors have built their own solutions on top of it.

But this method never became mainstream. Why? The downsides are obvious:

Keys have to be distributed, synchronized, and stored securely on every client.
Every client must have the right keys to read/write protected tables.
If a key is compromised or rotated — you need to re-encrypt everything and re-distribute new keys to all clients.
No filtering or searching in pgcrypto-protected tables — SQL only sees unreadable blobs.
No constraints — SQL doesn’t know what’s inside those columns.
Queries must be rewritten to call pgcrypto functions explicitly.

What we need instead is something transparent for both SQL and the user. That’s called Transparent Data Encryption (TDE) — though for consistency, I’ll keep calling it Transparent Data Encoding.

Transparent Data Encoding

It’s called “transparent” because for SQL and users it looks like there’s nothing special going on. You run normal INSERT, UPDATE, SELECT. The executor works with “clean” data. But what actually lands on disk is unreadable noise. Perfect!

In this model, the DBMS itself handles the keys and the encoding/decoding. There are several ways to implement it, from simple but slow to complex and efficient. Let’s go through them in order.

Level 1. Percona pg_tde_basic

This was the “classic” TDE from Percona (now discontinued: github.com/percona/pg_tde).

The idea: move encoding/decoding into the memory buffer manager. That way, data is encrypted before being written to table files or WAL, and decrypted when read back. The easiest way is to write a custom Table Access Method (TAM) — basically a heap clone that encodes tuples on write and decodes them on read. Done, data protected!

But here’s the catch:

What about indexes? They don’t live in heap/pg_tde.
What about custom TAMs like Citus? They’re left unprotected.
And the biggest issue: it’s slow. Really slow.

Our tests show even in optimistic cases, queries slow down massively. If you’re reading the same tuples repeatedly (e.g. a user dictionary), each read requires decoding again and again. Even with AES hardware acceleration, that’s expensive.

Clearly, the encoding/decoding needs to be pushed deeper.

Level 2. Cybertec PGEE, EnterpriseDB

These two TDE implementations differ internally but share one approach: encrypt the entire cluster. Every table, every index, even the system catalog.

The database can operate in two modes: either with no encryption at all, or as a Postgres cluster fully encrypted end-to-end — every table, every index, even the system catalog. This approach is the simplest to implement: you never need to wonder whether a given page is encrypted or not, or what’s happening inside the WAL buffer — the cluster always behaves consistently, as defined at initdb time. In other words, the main advantages here are simplicity and reliability.

As for the encryption itself, this mode typically uses AES-XTS for relation pages and AES-CTR for WAL segments. CTR relies on the LSN counter as nonce/counter, which ensures strong protection. We’ll return to the question of XTS security a bit later.

So the benefits of this approach are clear. But the downsides are just as obvious:

Reduced performance. Since the entire cluster is always encrypted, if only 10 out of 100 tables actually require protection, the remaining 90% of encryption effort slows the system down without providing any benefit.
Complications with technical support. PostgreSQL is extremely reliable, but even so, failures and tricky cases do happen. For a non-encrypted, non-sensitive table, it’s usually easy to get quick internal approval to send a “broken” file to the vendor’s support team. But requesting decryption in order to provide a file is a major headache.

To address these two problems, another approach is used — selectively marking protected, encrypted tables.

Level 3. Pangolin SE, Fujitsu EP, Percona Distribution

To avoid encrypting everything indiscriminately, there needs to be a way to mark which tables should be protected. Both Pangolin and Fujitsu independently arrived at the idea that the most natural way to do this is by creating a “protected” tablespace. All tables located in such a tablespace are encrypted using specific keys, while all others remain unencrypted, avoiding unnecessary CPU load and preserving performance.

In both systems, the sign of a table being protected is a special file located in the root of the protected tablespace directory. This approach typically uses AES‑XTS or AES‑CBC for encrypting relation files and AES‑CTR for encrypting WAL files.

Data flow when protecting individual tablespaces

The key difference between these approaches is that Pangolin encrypts the entire WAL, while Fujitsu encrypts only the data of protected tables. Unfortunately, the method chosen by Fujitsu is highly invasive: it requires inserting around a hundred changes with OpenSSL calls across various resource managers, which creates major issues when merging with the vanilla upstream. It also makes generating a unique nonce/counter a non-trivial task, since at the moment of XLogInsert the reliable counter value — the LSN — is not yet known.

The TDE implementation in Pangolin was developed several years ago, and its use in production systems has proven both feasible and reliable.

We should also mention the new open-source Percona TDE implementation, released in Percona Distribution for PostgreSQL 17.4.1. Building on the key management framework from pg_tde_basic, they designed an approach very similar to Pangolin and Fujitsu. However, instead of placing protected tables into a dedicated tablespace, Percona marks them with a special TAM, pg_tde. Based on this marker, the system operates in much the same way as Pangolin, though there are important differences.

The most obvious drawback of the Percona approach is its inability to protect non-standard TAMs such as Citus Columnar and TigerData Hypercore (from the TimescaleDB authors). An additional serious limitation of Percona’s current TDE implementation in release 17.5 is that it does not support pg_rewind, which effectively makes it unsuitable for complex data recovery scenarios. Tantor’s TDE implementation is also based on this same open-source solution from Percona.

This model works better, but we wanted even stronger security.

So what’s the drawback of the previous approach, and why did we at Postgres Professional take a slightly different path? There are two main issues, and they share a common root:

Key load — how many gigabytes of data can you safely encrypt without changing the encryption key.
Re-encrypting modified pages under the same key.

Let me unpack this a bit. In cryptography, it’s considered safe to encrypt only a limited amount of data with a single key. Encrypt too much, and the cipher becomes vulnerable to attacks based on statistical analysis of the data. For WAL files encrypted using AES‑CTR with the WAL LSN counter, this limit is quite large. But for AES‑XTS and AES‑CBC, there’s a real threat: comparing encrypted data across different points in time can help an attacker break the encryption. And the more data the attacker can collect, the higher the risk.

The systems we mentioned earlier implement various strategies to reduce key load — for example, using a separate encryption key per table, or even per relation file fork. Still, these measures can fall short if the same table pages are repeatedly re-encrypted with the same key. Rotating a table key (switching to a new one) requires re-encrypting the entire table — a slow, blocking operation.

Why is re-encryption so risky? The threat model we’re defending against is regular, repeated access by an attacker to database files. The attacker can accumulate copies of encrypted data over time and then run specialized analysis tools on it, greatly simplifying the task of breaking the encryption. That’s why, even though a single relation file fork is limited to 1 GB, in practice an attacker can end up with orders of magnitude more data by repeatedly copying that file at different times.

How to get 2 GB of data from a 1 GB file

If we dive a bit deeper into cryptography, the origins of XTS and its shortcomings are well described in the article “You Don’t Want XTS”. The same article also explains why CBC isn’t the best choice for protecting blocks that are modified repeatedly.

So, to increase both security and efficiency, we need to move away from XTS/CBC and adopt high-speed authenticated algorithms, like GCM or similar. Almost all OpenSSL implementations support GCM, so it can be considered the primary choice. We also need key rotation, which allows us to limit how much data is encrypted under a single key.

All three AES modes — XTS, CBC, GCM — operate on 16-byte blocks. However, CBC is a feedback-based algorithm: encoding the next block depends on the result of the previous one, which prevents fast parallel execution. In XTS, the encryption of the next 16-byte block also depends on data processed from the previous block. GCM, on the other hand, is designed so that the AES operation itself can be executed on blocks in parallel. GCM is a highly reliable algorithm and is one of the three approved for TLS 1.3 encryption.

Level 80. Postgres Pro Enterprise

When we started developing our own TDE implementation, our goal was to tackle these issues and provide reliable protection against threats from direct access to database files:

The ability to rotate table encryption keys on the fly.
Encryption must prevent analysis of changes to the same table page, meaning each new encryption requires a unique IV (Initialization Vector).
The integrity of encrypted data must be additionally protected using a MAC (Message Authentication Code).

Another objective was to support protection both for individual tables and for entire databases — the latter being especially important for highly sensitive data.

Let’s now see how we implemented this.

Key rotation without re-encrypting tables

The idea behind this solution is simple: let’s remember which key was used for each table page. We don’t need to store the full key for every page — just its index or identifier in a global key list, which is stored securely elsewhere. This way, after a key rotation, all new data is encrypted with the new key, while old data remains accessible with the old keys used to encrypt it previously. All keys — old and new — are securely stored in a central key bundle.

If a full re-encryption of a table is ever needed, it can be done with a VACUUM FULL command, which creates a complete copy of the table encrypted entirely with the new key. This allows rapid response if a table’s encryption key is compromised. A single VACUUM FULL command is enough to re-encrypt tables with a new key, eliminating the risk posed by a compromised key.

But here’s where it gets tricky. In AES-XTS/CBC implementations mentioned earlier, no additional metadata was required. Now, we need to store the key identifier for each page, retrieve it from the key list for decryption, and manage this metadata for every relation page.

An obvious solution might be to store a few extra bytes directly on the page. We seriously considered this, but it turned out to be extremely complex. Postgres assumes the special space size is identical for all relations of a given TAM. To implement it directly on pages, we’d either have to shrink the usable page space for all TAMs, including non-standard ones like Citus and Timescale DB, or introduce a huge number of code changes to handle this variability. The first option would degrade performance for unencrypted tables, and the second would be prohibitively complex and risky — not worth the effort for ongoing maintenance.

We borrowed a solution from another successful feature — compressed tables (CFS). Analogous to CFS’s cfm files, our TDE implementation keeps special tde files alongside relation files. These tde files don’t store compression info — they store metadata for decrypting pages. Each tde file holds 128 bytes of metadata per relation page. The 128-byte size was chosen with plenty of headroom, enough to support a variety of encryption schemes. Naturally, during key rotation, as well as when creating new encrypted tablespaces, new keys are automatically and securely propagated to standby servers via WAL.

By creating a separate metadata file, we successfully solved the problem of storing the key index for each page. Of course, we had to work on fault tolerance, since writing a modified page now requires two separate file operations. But leveraging our experience with CFS, we achieved high reliability.

Protection against time-based analysis

As mentioned earlier, the main weakness of XTS (and even more so CBC or CTR) for protecting individual relation file pages is its vulnerability to time-based analysis — attacks that compare snapshots of the same page taken at different times. In cryptography, secure encryption should produce a byte sequence indistinguishable from white noise, meaning it has full entropy and no detectable patterns. Even if the same data is encrypted multiple times, the output should look completely different each time.

How do we achieve this? By generating and using random initialization vectors (IVs). But once a page has been encrypted with a random IV, how do we decrypt it later? The IV must be stored safely, and this is where our tde filecomes in again. The 128 bytes allocated per page provide ample space to store the IV along with other metadata.

During page encryption, the system:

Generates a random IV.
Encrypts the page using AES‑GCM with that IV.
Saves both the encrypted page and the corresponding metadata (including the IV) to disk.

During decryption, the process is reversed:

Read the encrypted page from disk.
Retrieve the 128-byte metadata.
Extract the IV and the key identifier.
Decrypt the page.

Another significant advantage of AES‑GCM is performance. On modern CPUs with x86–64 AVX2 and AVX-512 instruction sets, AES‑GCM in recent OpenSSL versions runs faster than AES‑CBC and even AES‑XTS.

With this design, each saved page snapshot is protected by a unique IV, ensuring strong security against attacks that attempt to analyze changes over time.

Protection against tampering using MAC

A MAC of encrypted data acts like a checksum, allowing verification of its integrity. OpenSSL calculates the MAC during encryption, which can later be checked during decryption. To support this, we store the MAC for each encrypted page in the tde file alongside the other metadata.

PostgreSQL’s standard 16-bit page checksum provides only a basic level of protection against data corruption. Here, the MAC serves as an additional integrity check: a 32-bit MAC offers orders of magnitude stronger detection of corrupted data than the standard 16-bit checksum.

For compatibility with standard PostgreSQL utilities like pg_basebackup or pg_amcheck, Postgres Enterprise also computes the usual 16-bit checksum on the already encrypted data and stores it on disk. This checksum, along with the page header, remains unencrypted, allowing standard utilities to operate correctly on protected table and index files without attempting to interpret the page contents.

Full database encryption

Because our TDE implementation is not tied to any specific TAM, we can apply encryption not only to individual tables but to entire databases in the cluster. This means that all regular tables, created by users or applications, as well as the system catalog, are encrypted.

The process is straightforward: create the database in a protected tablespace:

CREATE DATABASE my_secret_database TABLESPACE my_tde_tablespace;

You can also encrypt an existing database by moving it entirely into a protected tablespace:

ALTER DATABASE my_secret_database SET TABLESPACE my_tde_tablespace;

Note

Moving an entire database into a protected tablespace is a blocking operation and can take a considerable amount of time.

Postgres Pro Enterprise TDE — solving key issues

Let’s review the problems we highlighted while examining earlier approaches to database encryption, and see how Postgres Pro Enterprise addresses them.

Problem: The need to distribute, synchronize, and securely store access keys among all clients accessing protected data
Solution: Solved. Transparent Data Encryption (TDE) uses keys safely stored on the database server. Clients accessing protected data do not require any special encryption keys.

Problem: Inability to filter or search protected data using SQL in pgcrypto tables
Solution: Solved. TDE allows the SQL engine to operate on “plain” (unencrypted) data, making filtering and searching fully functional.

Problem: Constraints cannot be used
Solution: Solved. TDE allows SQL constraints to work on plain, unencrypted data, preserving database integrity checks.

Problem: Database queries must be specially modified to include pgcrypto calls
Solution: Solved. Queries require no modifications. TDE has minimal impact on query performance, thanks to the high-speed AES-GCM encryption algorithm.

Problem: Key usage limitations
Solution: Solved. The encryption key load can be controlled by using multiple keys for a single table. Postgres Pro Enterprise TDE is resistant to statistical attacks that rely on large volumes of data encrypted under the same key.

Problem: Vulnerability to time-based attacks
Solution: Solved. The AES-GCM algorithm with a random initialization vector (IV) is resistant to time-based attacks.

What’s next

Perfection has no limits! We don’t stop at what’s already achieved and continue evolving our TDE implementation. The main directions for improvement are as follows:

CFS + TDE: сompressed and encrypted tables

We have often been asked whether TDE can protect tables stored in compressed tablespaces. The main obstacle was that, in the current architecture, pages were first encrypted, and compression occurred at a lower level. Since encrypted data has very high entropy, compression was effectively impossible. In upcoming releases, we plan to solve this by applying TDE encryption after the data has been compressed by CFS. The CFS system will manage the metadata for encryption, allowing the creation of compressed-and-encrypted tablespaces.

Do not encrypt data headers in WAL

The current TDE implementation in Postgres Pro Enterprise, like most others, encrypts WAL data entirely. This requires access to encryption keys for utilities such as pg_waldump and pg_probackup. Viewing WAL structure with pg_waldumpor performing PITR (Point In Time Recovery) with pg_probackup requires access to encryption keys. Consequently, running pg_probackup in remote mode for encrypted databases can be challenging.

At first glance, the solution seems simple: add a flag to the WAL record header indicating encryption. However, this approach comes with many difficulties. Following the "Fujitsu way" would require modifying all resource managers, spreading encryption logic across many lines of code. Even then, third-party resource managers from plug-in TAMs such as Citus Columnar or TigerData Hypercore would remain unprotected.

The challenge is that WAL writing in PostgreSQL is append-only. For reliability, PostgreSQL always appends new data to WAL segment files, replacing zeroed data with real data, and never modifies already written records. Moreover, the checksum is stored at the beginning of the WAL record, which makes retroactive encryption impossible.

Nevertheless, we found a solution, and in the next major release, Postgres Pro Enterprise 18, we plan to implement it. Unfortunately, this cannot be backported to Enterprise 17 because it requires slight changes to the WAL file format.