A common task in enterprise systems is to load large volumes of data into PostgreSQL — sometimes tens or even hundreds of millions of rows. At first glance, this seems simple: just write a loop in Java and call save() for every record. But in reality, such an approach can be painfully slow. Even a perfectly tuned PostgreSQL instance won’t help if the application is sending data inefficiently.

This article explains how to significantly accelerate bulk inserts when working with PostgreSQL through Spring and Hibernate. We’ll walk through which Spring and Hibernate settings are worth enabling, why they matter, and how much performance they can actually unlock. We’ll also look at how to build your own data-insertion layer for PostgreSQL — one that lets you switch between different insertion strategies, leverage PostgreSQL’s custom capabilities, and parallelize the process. Finally, we’ll see how to integrate this layer with Spring and what real gains each approach can deliver.
Spring and Hibernate Configuration
Let’s start with the most obvious step — tuning Spring + Hibernate to speed up data insertion into the database.
Baseline implementation using Spring + Hibernate
At first, look at how long a plain, baseline insert takes. We download a clean Spring IO starter, hook up a repository, configure the Entity, connect the database, and run a simple transactional snippet — inserting random entities via the save() method.
@Transactional
fun saveBySpring(count: Int) {
val currencies = currencyRepo.findAll()
val accounts = accountRepo.findAll()
for (i in 0 until count) {
pdCustomRepository.save(
getRandomEntity(id = null, currencies.random(), accounts.random())
)
}
}
The result is pretty underwhelming — over 10 minutes. We’ll use this number as our baseline and see how far we can push it down.

Hibernate also consumed a noticeable amount of RAM — I logged memory usage, and it was quite high. The reason is that Hibernate keeps every object in the session-level cache until the transaction ends. With a small heap size, this can easily lead to an OutOfMemoryError.
If you enable logging, you’ll notice that data is being sent to the database exactly at the moment when save() is called, somewhere in the middle of entity generation — not at the end.

Hibernate has a very useful setting, generate_statistics, which shows detailed information about the queries sent to the database. In our case, 4 million records were sent — and zero of them were batched.

Why bother with batching at all? Put simply, it’s like moving books from one shelf to another: doing it in stacks of several books is much faster than carrying each book one by one. Let’s see how to achieve that.
Enabling batching: JDBC batch size
Hibernate provides the jdbc.batch_size setting, but it doesn’t always work as expected. Simply turning it on won’t necessarily enable batching — in my case, it didn’t do anything.

When I first created the database, I defined a sequence to generate IDs for me. The idea was that there could eventually be multiple backend instances sharing the same database, and I wanted object IDs to stay consistent. This allowed me to use GenerationType.IDENTITY without worrying about generating IDs on the backend side.
However, according to Hibernate’s documentation, batching does not work when the ID uses GenerationType.IDENTITY.

So what do we do in this situation? We have to request IDs from the database explicitly. To do that, we add a SequenceGenerator and switch to GenerationType.SEQUENCE.

Now let’s see how this affects performance.

The improvement is decent — around 23%. The high memory consumption remains, though, because the session-level cache is still there.
If we check the logs again, we can see that 40 batches were sent, but the total number of SQL statements — 4 million — hasn’t gone anywhere. The overhead comes from reading IDs from the database. It’s also worth noting that batching now happens at the end of the transaction, not during execution.

How to reduce ID read requests from the database
As you might have guessed, this is about caching. There are two options here. The first is provided by PostgreSQL itself, implemented by adding the CACHE keyword to the sequence. The second is implemented in Hibernate, i.e., on the backend side.

Let’s break down how these approaches differ.
In the first approach, a cache is created at the session level. This helps PostgreSQL avoid concurrent access to the sequence. For example, on the first request, the sequence will return 50 IDs at once, but only one will be sent to the backend, while the remaining 49 are cached in the database session. Unused IDs are not returned to the sequence. So, if only one record is created in a session, the other 49 IDs are lost.
In the second approach, caching happens on the backend side. Here, the sequence should use INCREMENT BY 50, which increases the counter by 50 on each sequence call. The backend then knows it has 50 IDs in reserve, which it can use across the application without further database queries. Let’s see how this approach looks from a logging perspective.

The number of sequence requests decreased by 50 times — which is quite significant. Now, let’s compare the performance of these two approaches.

The biggest performance gain was achieved using application-side caching — about 30%. With database-side caching, the gain was much smaller, around 2%. I also decided to test both approaches under a 10-thread load and got the following results:

The delta between these two approaches decreased, but the difference is still notable. Let’s summarize ID caching.
Application-side caching is faster because it reduces the number of database calls, cutting down on transport overhead. After all, the fastest DB request is the one that never happens!
Database-side caching is slower than application-side caching. Its speed benefit comes only from reducing PostgreSQL sequence locks.
Database-side caching is session-based. Unused IDs disappear when the session ends. With application-side caching, values are lost only if the backend restarts, which happens much less frequently.
These approaches can be combined, taking into account the characteristics of each.
Method saveAll()
The next idea is to collect all entities into a list and call saveAll(). Let’s see what performance gain this approach brings. The test method looks like this:
@Transactional
fun saveBySpring(count: Int) {
val currencies = currencyRepo.findAll()
val accounts = accountRepo.findAll()
(0..count)
.map { getRandomEntity(id = null, currencies.random(), accounts.random()) }
.let { it: List<PaymentDocumentEntity>
pdCustomRepository.saveAll(it)
}
}
The improvement was minimal — around 2% overall.

The reason is that under the hood, saveAll simply calls save in a loop. So, we don’t see a significant difference.

Now let’s look at the PostgreSQL and Hibernate logs. You can see that Hibernate sends data row by row, adding an INSERT INTO statement for each entity along with column names. PostgreSQL writes the data exactly as Hibernate sends it.

Besides regular Inserts, there are batch (multi-row) inserts, where you send the header once and then pass multiple rows in the VALUES clause separated by commas. PostgreSQL supports this second type, and in theory, it should be faster than regular inserts for batch operations.

So, how can we turn regular inserts into multi-row inserts? Unfortunately, Spring doesn’t do this by default, but the JDBC driver can.
Driver configuration: reWriteBatchedInserts
There is a very useful setting called reWriteBatchedInserts, which allows the JDBC driver to transform regular inserts into multi-row inserts like this:

Let’s see what changes appear in the logs.

Hibernate still sends inserts one by one, but the driver combines them on the fly into a multi-row insert. This gives a significant speed boost without changing any code.

The total time is now 4 minutes 37 seconds, and this is just with this setting. Next, we’ll think about how to reduce memory usage.
Clearing the session-level cache via EntityManager
As mentioned earlier, Hibernate keeps all objects in the session-level cache until the end of the transaction. However, we can clear it if the data already sent to the database is no longer needed.
To do this, we inject an EntityManager using @PersistenceContext and manually manage the entire insert process. When it’s time to batch, we flush the data to the database and clear the cache.
@PersistenceContext
latelnit var entityManager: EntityManager
@Transactional
fun saveBySpringWithManualBathing(count: Int) {
val currencies = currencyRepo.findAll()
val accounts = accountRepo.findAll()
for (i in 0 ≤ until < count) {
entityManager.persist(getRandomEntity( id: null, currencies.random(), accounts.random())
if (i != 0 && i % batchSize == 0) {
entityManager.flush()
entityManager.clear()
}
}
}
Besides a time gain of 18 seconds, this approach significantly reduces memory usage. You don’t have to worry about OutOfMemoryError even if your heap size is small.

Configuring order_inserts
Another useful feature is the order_inserts setting. When we insert multiple entity types mixed together in one transaction (for example, customer → supplier → customer → supplier), there’s a risk of ending up with as many batches as there are records. This happens because Hibernate closes the current batch and starts a new one whenever an entity of a different type appears.
Setting order_inserts=true forces Hibernate to group objects by type first, and then form batches. This allows batching to work more efficiently and consistently.

Intermediate results: with all the settings applied, we reduced the time from 10 minutes 30 seconds to 4 minutes 19 seconds. The table below summarizes the performance boost for each approach.

So, what if we skip Hibernate and try to manually generate the inserts we need?
Writing your own layer
Nothing prevents us from creating such a line ourselves and writing it to the database. Let’s try to build our own layer that generates the required INSERT statements from entities.
INSERT INTO payment_document (account_id, amount, expense, cur, order_date, order_number, payment_purpose, prop_10, prop_15, prop_20)
VALUES (1000004, '10.23', true, 'RUB', '2023-06-25', '123456', ‘some purpose®', ‘some 10', ‘some 15', ‘some 20'),
(1000005, '11.23', true, 'RUB', '2023-06-26', '123457', ‘some purposel', ‘some 10', ‘some 15', ‘some 20'),
(1000006, '12.23', true, 'RUB', '2023-06-27', '123458', ‘some purposel', ‘some 10', ‘some 15', ‘some 20');
But a natural question arises: how to construct this line?
Custom insert implementation
The first idea is to take an entity, use reflection to extract data and column names, and generate the required SQL string.
val data = PaymentDocumentEntity(
account = AccountEntity().apply { id = 1 },
expense = false,
amount = BigDecimal( val "10.11"),
cur = CurrencyEntity(code = "RUB"),
orderDate = LocalDate.parse( text: "2023-01-01"),
orderNumber = "123",
prop20 = "1345",
prop15 = "END",
paymentPurpose = "paymentPurpose",
prop10 = "prop10",
}
fun getDataFromEntity(entity: BaseEntity) =
entity.javcClass.declaredFields.map { field ->
field.trySetAccessible()
getDataFromEntityByField(entity, field).toString() ^map
}
fun getDataFromEntityByField(entity: BaseEntity, field: Field) =
when (val obj: Any? = field.get(entity)) {
null -> null
is BaseEntity -> {
field.annotations Array(out)Annotations!
?.filterIsInstance(JoinColumn::class.java) List=JoinColumns?
?.firstOrNull() JoinColumn?
?.referenceColumnName String?
?.takeJf { it.isNotEmpty() }
?.let { obj.javcClass.getDeclaredField(it) } Field?
?.apply { trySetAccessible() }
?.get(obj)
?: obj.id
}
else -> obj
}
However, everyone knows that accessing data via reflection is not fast, so let’s consider other options. Manually listing column names for insertion is not recommended, because a column might be renamed in the table, and there’s no validation for native queries.
Kotlin allows specifying column names via KProperty using KClass::field. There’s also the Hibernate jpa-modelgenlibrary, which generates constants with column names in the format Entity._field.

The performance difference between all these methods turned out to be minor — about 3 seconds per operation for 4 million entities. In this article, I left the reflection approach as the simplest option.

Architecture of the сustom layer
The layer consists of three parts:
Processor contains the logic to transform data for saving
Saver manages the saving process;
The factory selects the correct saver based on the saver type.

Looking at the implementations, you can see abstract classes with common logic and many savers. Each saver is responsible for its own insertion method. There are quite a few in the repository, and we’ll cover the main ones in this article.

How it works
The first abstraction layer is responsible for committing and rolling back transactions and receives a Connection object.

The second abstraction layer maintains a counter and sends data in batches.

The very first implementation is a basic multi-row insert, where the data is passed directly inside the INSERT statement.
INSERT INTO payment_document (account_id, amount, expense, cur, order_date, order_number, payment_purpose, prop_10,
prop_15, prop_20)
VALUES (1000004, '10.23', true, 'RUB', '2023-06-25', '123456', 'some purpose', 'some 10', 'some 15', 'some 20'), (1900005, '11.23', true, 'RUB', '2023-06-26', '123457', 'some purpose', 'some 18', 'some 15', 'some 20'), (1900006, '12.23', true, 'RUB', '2023-06-27', '123458', 'some purposel', 'some 10', 'some 15', 'some 20');
The saver itself looks like this:

It contains a small cache where we store data strings extracted via reflection in the addDataForSave method. The saveData method is called automatically when a batch is sent to the database.
We obtain the required saver through a factory. The method accepts an enum representing the saver type and returns the corresponding implementation.

How do we use it? With the @Component annotation, we create the factory implementation for the required entity and inject it into a service. Based on the requested type, we get the specific saver from the factory, generate random data, and then commit it to the database.
Let’s see what this gives us in terms of time and memory.

First, note the memory consumption. It is quite low: the application stays within 600–900 MB. The downward spikes indicate minor GC activity. However, if we look at execution time, we can see that it is 46 seconds slower than the latest Hibernate-based approach.
So, let’s try rewriting our insert to use prepared statements instead of regular statements.

The first issue I ran into with this approach was exceeding the maximum number of parameters allowed in a query (the limit is described here).

This brings us to the topic of batch size. Determining the maximum batch size is straightforward: take the maximum allowed number of parameters and divide it by the number of columns.
I also ran benchmarks with different batch sizes and insertion strategies, and summarized the results in the table below.

In my case, the most efficient batch size turned out to be 5,000 records. Another important observation: inserting data via prepared statements is faster than using regular statements.
Most developers already know that regular statements are unsafe due to SQL injection. But let’s break down why prepared statements are faster. To do this, we enable logging on the PostgreSQL side and collect execution timings for the parse, bind, and execute phases.

I put the collected data into a table. The most important metrics are at the top.


From this table, we see that the parse phase for a prepared statement is executed only five times. Also, if you look closely at the bind phase, you’ll notice that it takes longer for the first ten executions. Let’s understand why.

The reason is the so-called server-side prepare, an optimization that allows PostgreSQL to reuse query plans, avoiding repeated parsing and planning (documentation).
Here’s how it works:
The JDBC driver detects that the query is not “garbage” and decides to cache it.
It assigns an alias (e.g., S_1) and asks PostgreSQL to compute and cache an optimized plan for this query.
After that, the driver no longer sends the full SQL text — it sends only the alias. PostgreSQL then uses the cached query plan.

The number of times a query must be repeated before it is cached is controlled by the prepareThreshold parameter, which defaults to 5. That’s why the prepared statement is parsed only five times. After that, the plan is cached, and during the next five calls the bind phase is slower because PostgreSQL is computing the optimal plan.
So, let’s move forward and think about what else could be slowing down data insertion in PostgreSQL.
Indexes and triggers
Triggers are tricky — they may contain business logic, and removing or disabling them often requires approvals from the business side. In my case, the table has no triggers.
But indexes are easier to work with. You can drop them before the insert and recreate them afterward.
Many people use this approach when restoring a database from scratch: drop indexes → load data → recreate indexes. But I do not recommend doing this in production systems. First, something might go wrong — a handler may crash and leave you without indexes. Second, during large insert operations, users run without indexes, which is also far from ideal.
However, colleagues say that they consider this approach. So I decided to try it.
PostgreSQL has a view called pg_indexes, which returns index names as well as the script needed to recreate each index.

By pulling the data from this view, we can determine which indexes to drop and how to restore them later.

So the process is:
Drop indexes before the insert.
Perform the insert
Run the index creation script to restore them.
Let’s see what the measurements show.

In this case, the results were the opposite of what we wanted. Instead of getting a performance boost, execution time got worse — 7 minutes 19 seconds, which is way too much.
Let’s figure out why this happened. Looking at the logs:

We can see that both the DROP INDEX and the insert itself run fast. But restoring the indexes takes 5 minutes 11 seconds. On top of that, the process heavily loads the CPU.
Let’s check what else PostgreSQL can offer.
Using PostgreSQL’s custom methods
Besides regular INSERT, PostgreSQL provides the COPY command, which allows copying data between a file and a table. Let’s try using it for bulk inserts and see what kind of performance boost it brings (documentation).
The syntax for COPY is quite simple: you provide the table name, list of columns, and the data source. PostgreSQL’s JDBC driver includes a CopyAPI that we can use directly.

COPY via CSV and file
Since COPY is designed for transferring data between a file and a table, let’s benchmark this scenario. We generate a CSV file with the required data and feed it to the driver.

Here are the results:

Even though the data must first be written to disk before insertion, this method turned out faster than multi-row INSERT. Memory consumption is also excellent: around 100 MB, because the data is streamed to disk instead of being stored in RAM.
This approach is especially convenient when inserting data from external sources — for example, large XML files. You can build a streaming processor that converts a large XML into CSV and then pipes it directly into the database.
Let’s see what happens if we keep the data in memory instead of on disk.
COPY with In-memory buffer
In the next test, we use the same COPY mechanism but store our buffer in memory using a StringWriter.

Here are the results:

As expected, performance improves — writing to memory is faster than writing to disk.
But PostgreSQL also supports a binary COPY format.
COPY in binary format
Theoretically, the binary format should be faster than the text format — PostgreSQL claims so in its documentation. But there’s a catch: the same documentation warns that the binary format may not be portable across architectures and PostgreSQL versions. That’s a serious concern — nobody wants to upgrade PostgreSQL and suddenly run into data format incompatibilities.
Still, let’s see how fast it is.
To create a binary file, we must explicitly write:
the file header,
the column count,
the data,
the trailer.

It’s not always as trivial as with a long. Take LocalDate as an example — converting it into PostgreSQL’s binary representation requires the following transformation:

And it gets even more complicated with BigDecimal, for which I reused PostgreSQL driver logic.

Let’s see how much overhead these conversions introduce. I ran the following benchmark:

We can see that the overall code runs fast. If we compare raw reflection extraction with reflection → binary conversion, the difference is not dramatic — around 300 ms.
Now let’s look at full measurements. Here comes the unpleasant surprise:

Even though we prepared all data exactly as PostgreSQL expects, the binary version took longer.
Comparing CSV vs binary. As shown in the screenshot, the file sizes differ — the CSV file is smaller. Both formats took the same time to generate (56 seconds), but writing the binary data took 5 seconds longer.

In-memory binary COPY
Below is an example of binary COPY with an in-memory buffer.

The performance difference is roughly the same as in the file-based version (1–2%).
Note: I also performed tests on PostgreSQL 17.4, where binary COPY started outperforming CSV by the same 1–2%. PostgreSQL clearly keeps optimizing this area.
Integrating COPY with Spring
Now let’s figure out how to integrate COPY into Spring transactions so that we can use our saver implementations within transactional boundaries, with code that looks like this:

Here’s what we need to do:
Add a saveByCopy() method to our abstract repository.
-
Obtain the Spring-managed Connection via TransactionSynchronizationManager.

-
Register the saver as a resource of the current transaction.

Before commit, call saveData() on the saver to flush remaining data to the DB.
Since we’re using Spring's managed connection, we don’t need to commit manually — Spring handles it for us.
Here is what the saveByCopy() method looks like. We obtain the saver, call addDataForSave to buffer the data, and everything else is handled by Spring.

Then we extend our JPA repository with functionality from the abstract repository.
As a result, the custom JPA repository provides all JPA methods plus our new functionality.

Let’s check the results:

The performance is almost the same as with the factory-based approach — but now we can use this method inside Spring-managed transactions.
Now let’s try parallelizing the insertion process and see what kind of performance gain we can get.
Parallel insertion
Before implementing anything, I first logged how long it actually takes to send data to the database. It turned out to be 2 minutes 53 seconds — roughly two-thirds of the total time. During this period, our main thread is simply sleeping, waiting for the database to accept the data.

So let’s hand off the data-sending step to separate threads, while the main thread keeps generating new data instead of idling. Here’s what we’ll do:
Create a dedicated ThreadPoolExecutor to manage tasks. You could use plain threads, but ThreadPoolExecutor gives better control over application threads.
Introduce a ConcurrentSaverHandler, which will coordinate the saving process and distribute the load across multiple savers. As soon as the handler detects that the current saver is about to send data to the DB, it switches to another saver and continues writing there.
Extend CopyByEntitySaver so we can submit non-blocking save calls to the database.
A diagram might look like this:

Drawbacks of this approach:
Higher connection usage under load. Instead of a single connection per transaction, you might now need five.
Loss of transaction atomicity for parallel savers.
We’ll be committing saver data before the main transaction commits, which means that if something goes wrong at commit time, part of the data may remain unwritten — you must account for this.
Implementation Details
We’ll create CopyByEntityConcurrentSaver, which extends CopyByEntitySaver. It accepts an ExecutorService and stores the current database-sending task in a saveDataJob variable.

If the main thread loops through all savers and catches up to one that hasn’t finished sending data, we have to wait for that send to complete.
Here’s the flow: at each stage of interaction with the DB, we check whether the current task has completed by inspecting saveDataJob. Calling get() waits for completion and returns the result. Once the task finishes, the next call to saveDatasubmits a new job to the executor, invoking the parent class’s saveData method.

We also maintain counters for the number of created entities and for the number of saver switches.
The first helps determine when a saver’s batch is full and it’s time to send data.
The second tells us which saver we’re currently using.

The easiest way to choose the active saver and detect batch fullness is via modulo arithmetic (%).

It’s important to highlight that you must close connections on both commit and rollback. Obvious to some, but not to everyone: if you forget to close a connection, it will leak from the pool, eventually exhausting all available connections.
Integrating ConcurrentSaverHandler with Spring transactions
We again use TransactionSynchronizationManager to retrieve the handler and register it as a transaction resource on first access.

Before commit, we call commit() on the handler, which in turn commits all saver instances. Then, via the afterCompletion hook, we verify that everything succeeded; if not, we roll back any uncommitted transactions.

Using our multithreaded copy saver from the repository looks exactly the same as using the regular one.

Let’s check the results.

Results
The results are excellent — parallelization dramatically speeds up the operation. Memory usage also looks very healthy.
Final thoughts
Spring and Hibernate introduce overhead when inserting data, and you can speed things up. In my case, I managed to increase insert throughput by roughly 40%.
In your own abstraction layer, you can tailor things exactly as you need, use PostgreSQL’s custom functions, and even parallelize insertions. It will run faster — but you lose many of Spring’s built-in goodies.
On many Spring projects, auditing and other features rely on its ecosystem. If you don’t need them, this approach is viable; if you do, you’ll need to optimize within Spring.COPY is about 10% faster than multi-row INSERT — at least on my hardware. But keep in mind that it only works with PostgreSQL; other databases won’t support this.
Binary Copy provides only marginal benefits over CSV. I didn’t see any real gain. In PostgreSQL 17.4 it’s slightly faster; in 14.5 it’s slightly slower.
And the biggest win overall: I managed to speed up the insert process by a factor of six, from 10 minutes 28 seconds down to 1 minute 47 seconds.
Feel free to test these approaches on your own datasets — and let me know in the comments how it goes! Feedback and new ideas are very welcome.
Akina
А чего не на хинди, или путунхуа, или ещё каком?
LeshaRB
https://habr.com/ru/companies/gazprombank/articles/965634/
Akina
Ааа... так это ещё один способ "набить" публикаций на пустом месте, оказывается. Ну тогда воспринимайте мой предыдущий комментарий как рекомендацию по кратному увеличению "выхлопа" от этого метода.