By default, the attributes of an entity are loaded eager (all at once). Are you sure that you want that?
Description:If not, then is important to know that attributes can be loaded lazily, as well via Hibernate bytecode instrumentation (another approach is via subentities). This is useful for column types that store large amounts of data: CLOB
, BLOB
, VARBINARY
, etc.
Key points:
pom.xml
, activate Hibernate bytecode instrumentation (e.g. use Maven bytecode enhancement plugin as follows)@Basic(fetch = FetchType.LAZY)
Source code can be found here.
Without seeing and inspecting the SQL fired behind the scenes and the corresponding binding parameters, we are prone to introduce performance penalties that may remain there for a long time (e.g. N+1).
Update (please read):The solution described below is useful if you alreadyhave Log4J 2 in your project. If not, is better to rely on TRACE
(thank you Peter Wippermann for your suggestion) or log4jdbc
(thank you Sergei Poznanski for your suggestion and SOanswer). Both approaches doesn't require the exclusion of Spring Boot's Default Logging. An example of TRACE
can be found here, and an example of log4jdbc
here.
Description based on Log4J 2:While the application is under development, maintenance is useful to view and inspect the prepared statement binding parameters values instead of assuming them. One way to do this is via Log4J 2 logger setting.
Key points:
pom.xml
, exclude Spring Boot's Default Logging (read update above)pom.xml
, add the Log4j 2 dependencylog4j2.xml,
add the following:
Output sample:
Source code can be found here.
Without ensuring that batching is actually working, we are prone to serious performance penalties. There are different cases when batching is disabled, even if we have it set up and think that it is working behind the scene. For checking, we can use hibernate.generate_statistics
to display details (including batching details), but we can go with the datasource-proxy, as well.
Description:View the query details (query type, binding parameters, batch size, etc.) via datasource-proxy.
Key points:
pom.xml
the datasource-proxy
dependencyDataSource
beanDataSource
bean via the ProxyFactory
and an implementation of the MethodInterceptor
Output sample:
Source code can be found here.
By default, 100 inserts will result in 100 SQL INSERT
statements, and this is bad since it results in 100 database round trips.
Description:Batching is a mechanism capable of grouping INSERTs
, UPDATEs,
and DELETEs,
and as a consequence, it significantly reduces the number of database round trips. One way to achieve batch inserts consists in using the SimpleJpaRepository#saveAll(Iterable
< S
> entities)
method. Here, we do this with MySQL.
Key points:
application.properties,
set spring.jpa.properties.hibernate.jdbc.batch_size
application.properties,
set spring.jpa.properties.hibernate.generate_statistics
(just to check that batching is working)application.properties,
set JDBC URL with rewriteBatchedStatements=true
(optimization specific for MySQL)application.properties
set JDBC URL with cachePrepStmts=true
(enable caching and is useful if you decide to set prepStmtCacheSize
, prepStmtCacheSqlLimit
, etc as well; without this setting the cache is disabled)application.properties
set JDBC URL with useServerPrepStmts=true
(this way you switch to server-side prepared statements (may lead to significant performance boost))IDENTITY
will cause batching to be disabled@Version
property of type Long
to avoid extra- SELECT
fired before batching (also prevent lost updates in multi-request transactions). Extra- SELECTs
are the effect of using merge()
instead of persist()
. Behind the scenes, saveAll()
uses save(),
which in case of non-new entities (having IDs), will call merge()
, which instructs Hibernate to fire to a SELECT
statement to ensure that there is no record in the database having the same identifier.saveAll()
to not "overwhelm" the persistence context. Normally, the EntityManager
should be flushed and cleared from time to time, but during the saveAll()
execution, you simply cannot do that, so if in saveAll()
there is a list with a high amount of data, all that data will hit the persistence context (1st level cache) and will be in-memory until flush time. Using a relatively small amount of data should be OK. For a large amount of data, please check the next example (item 5).Output sample:
Source code can be found here.
Using batching should result in a boost in performance, but pay attention to the amount of data stored in the persistence context before flushing it. Storing a large amount of data in memory can lead again to performance penalties. Item 4 fits well for a relatively small amount of data.
Description:Batch inserts via EntityManager
in MySQL (or other RDBMS). This way you can easily control the flush()
and clear()
of the persistence context (1st level cache). This is not possible via Spring Boot, saveAll(Iterable
< S
> entities)
. Another advantage is that you can call persist()
instead of merge()
— this is used behind the scene by the Spring Boot methods, saveAll(Iterable
< S
> entities)
and save(S entity)
.
Key points:
application.properties,
set spring.jpa.properties.hibernate.jdbc.batch_size
application.properties,
set spring.jpa.properties.hibernate.generate_statistics
(just to check that batching is working)application.properties,
set JDBC URL with rewriteBatchedStatements=true
(optimization specific for MySQLapplication.properties
set JDBC URL with cachePrepStmts=true
(enable caching and is useful if you decide to set prepStmtCacheSize
, prepStmtCacheSqlLimit
, etc as well; without this setting the cache is disabled)application.properties
set JDBC URL with useServerPrepStmts=true
(this way you switch to server-side prepared statements (may lead to significant performance boost))IDENTITY
will cause batching to be disabledOutput sample:
Source code can be found here.
The way, we fetch data from the database that determines how an application will perform. In order to build the optimal fetching plan, we need to be aware of each fetching type. Direct fetchingis the simplest (since we don't write any explicit query) and very useful when we know the entity Primary Key.
Description:Direct fetching via Spring Data, EntityManager
, and Hibernate Session
examples.
Key points:
findById()
EntityManager#find()
Session#get()
Source code can be found here.
Fetching more data than needed is one of the most common issue causing performance penalties. Fetching entities without the intention of modifying them is also a bad idea.
Description:Fetch only the needed data from the database via Spring Data Projections (DTOs). See also items 25-32.
Key points:
List
LIMIT
). Here, we can use the query builder mechanism built into the Spring Data repository infrastructureOutput example (select first 2 rows; select only "name" and "age"):
Source code can be found here.
Storing date, time and timestamps in the database in different/specific formats can cause real issues when dealing with conversions.
Description:This recipe shows you how to store date, time, and timestamps in UTC time zone in MySQL. For other RDBMSs (e.g. PostgreSQL), just remove " useLegacyDatetimeCode=false
" and adapt the JDBC URL.
Key points:
spring.jpa.properties.hibernate.jdbc.time_zone=UTC
spring.datasource.url=jdbc:mysql://localhost:3306/db_screenshot?useLegacyDatetimeCode=false
Source code can be found here.
Executing more SQLs than needed is always a performance penalty. It is important to strive to reduce their number as much as possible, and relying on references is one of the easy to use optimization.
Description:A Proxy
can be useful when a child entity can be persisted with a reference to its parent. In such cases, fetching the parent entity from the database (execute the SELECT
statement) is a performance penalty and a pointless action. Hibernate can set the underlying foreign key value for an uninitialized Proxy
.
Key points:
EntityManager#getReference()
JpaRepository#getOne()
load()
Tournament
and TennisPlayer
, and a tournament can have multiple players ( @OneToMany
).Proxy
(this will not trigger a SELECT
), and we create a new tennis player and set the Proxy
as the tournament for this player and we save the player (this will trigger an INSERT
in the tennis players table, tennis_player
)Output sample:
INSERT
is triggered, and no SELECT
Source code can be found here.
N+1 is another issue that may cause serious performance penalties. In order to eliminate it, you have to find/recognize it. Is not always easy, but here is one of the most common scenarios that lead to N+1.
Description:N+1 is an issue of lazy fetching (but, eager is not exempt). Just in case that you didn't have the chance to see it in action, this application reproduces the N+1 behavior. In order to avoid N+1 is better to rely on JOIN+DTO (there are examples of JOIN+DTOs in items 36-42).
Key points:
Category
and Product,
having a @OneToMany
relationshipProduct
lazy, so without Category
(results in 1 query)Product
collection, and for each entry, fetch the corresponding Category
(results N queries)Output sample:
Source code can be found here.
Passing SELECT DISTINCT
to an RDBMS has a negative impact on performance.
Description:Starting with Hibernate 5.2.2, we can optimize SELECT DISTINCT
via the HINT_PASS_DISTINCT_THROUGH
hint. Mainly, the DISTINCT
keyword will not hit the RDBMS, and Hibernate will take care of the de-duplication task.
Key points:
@QueryHints(value = @QueryHint(name = HINT_PASS_DISTINCT_THROUGH, value = "false"))
Output sample:
Source code can be found here.
Java Reflection is considered slow and, therefore, a performance penalty.
Description:Prior to Hibernate version 5, the dirty checkingmechanism relies on Java Reflection API. Starting with Hibernate version 5, the dirty checkingmechanism relies on bytecode enhancement. This approach sustain a better performance, especially when you have a relatively large number of entities.
Key points:
pom.xml
(e.g. use Maven bytecode enhancement plugin)Output sample:
User.class
, here.Source code can be found here.
Treating Java 8 Optional
as a "silver bullet" for dealing with nulls can cause more harm than good. Using things for what they were designed is the best approach.
Description:This application is a proof of concept of how is correct to use the Java 8 Optional
in entities and queries.
Key points:
Optional
(e.g. findById()
)Optional
Optional
in entities gettersdata-mysql.sql
Source code can be found here.
There are a few ways to screw up your @OneToMany
bi-directional relationship implementation. And, I am sure that this is a thing that you want to do it correctly right from the start.
Description:This application is a proof of concept of how is correct to implement the bidirectional @OneToMany
association.
Key points:
mappedBy
on the parentorphanRemoval
on the parent in order to remove children without referencesequals()
and hashCode()
as hereSource code can be found here.
When direct fetching is not an option, we can think of JPQL/HQL query fetching.
Description:This application is a proof of concept of how to write a query via JpaRepository
, EntityManager
and Session
.
Key points:
JpaRepository,
use @Query
or Spring Data Query CreationEntityManager
and Session,
use the createQuery()
methodSource code can be found here.
In MySQL, the TABLE
generator is something that you will always want to avoid. Neveruse it!
Description:In MySQL and Hibernate 5, the GenerationType.AUTO
generator type will result in using the TABLE
generator. This adds a significant performance penalty. Turning this behavior to IDENTITY
generator can be obtained by using GenerationType.IDENTITY
or the native generator.
Key points:
- Use GenerationType.IDENTITY
instead of GenerationType.AUTO
- Use the native generator exemplified in this source code
Output sample:
Source code can be found here.
We love to call this method, don't we? But, calling it for managed entities is a bad idea since Hibernate uses dirty checking mechanism to help us to avoid such redundant calls.
Description:This application is an example when calling save()
for a managed entity is redundant.
Key points:
UPDATE
statements for managed entities without the need to explicitly call the save()
methodSource code can be found here.
In PostgreSQL, using GenerationType.IDENTITY
will disable insert batching.
Description:The ( BIG
) SERIAL
is acting "almost" like MySQL, AUTO_INCREMENT
. In this example, we use the GenerationType.SEQUENCE,
which enables insert batching, and we optimize it via the hi/lo
optimization algorithm.
Key points:
GenerationType.SEQUENCE
instead of GenerationType.IDENTITY
hi/lo
algorithm to fetch multiple identifiers in a single database roundtrip (you can go even further and use the Hibernate pooled
and pooled-lo
identifier generators (these are optimizations of hi/lo
)).Output sample:
Source code can be found here.
JPA supports SINGLE_TABLE
, JOINED
, TABLE_PER_CLASS
inheritance strategies. Each of them have their pros and cons. For example, in the case of SINGLE_TABLE
, reads and writes are fast, but as the main drawback, NOT NULL constraints are not allowed for columns from subclasses.
Description:This application is a sample of JPA Single Table inheritance strategy ( SINGLE_TABLE
)
Key points:
@Inheritance(strategy=InheritanceType.SINGLE_TABLE)
)Output example (below is a single table obtained from four entities):
Source code can be found here.
Without counting and asserting SQL statements, it is very easy to lose control of the SQL executed behind the scene and, therefore, introduce performance penalties.
Description:This application is a sample of counting and asserting SQL statements triggered "behind the scenes." Is very useful to count the SQL statements in order to ensure that your code is not generating more SQLs that you may think (e.g., N+1 can be easily detected by asserting the number of expected statements).
Key points:
pom.xml,
add dependencies for datasource-proxy
and Vlad Mihalcea's db-util
ProxyDataSourceBuilder
with countQuery()
SQLStatementCountValidator.reset()
INSERT
, UPDATE
, DELETE,
and SELECT
via assertInsert{Update
/ Delete
/ Select}Count(long expectedNumberOfSql
Output example (when the number of expected SQLs is not equal with the reality an exception is thrown):
Source code can be found here.
Don't reinvent the wheel when you need to tie up specific actions to a particular entity lifecycle event. Simply rely on built-in JPA callbacks.
Description:This application is a sample of enabling the JPA callbacks ( Pre
/ PostPersist
, Pre
/ PostUpdate
, Pre
/ PostRemove,
and PostLoad
).
Key points:
void
and take no argumentsOutput sample:
Source code can be found here.
A bidirectional @OneToOne
is less efficient than a unidirectional @OneToOne
that shares the Primary Key with the parent table.
Description:Instead of a bidirectional @OneToOne
, it is better rely on a unidirectional @OneToOne
and @MapsId
. This application is a proof of concept.
Key points:
@MapsId
on the child side@OneToOne
association, this will share the Primary Keywith the parent tableSource code can be found here.
Fetching more data than needed is bad. Moreover, fetching entities (add them in the persistence context) when you don't plan to modify them is one of the most common mistakes that draws implicitly performance penalties. Items 25-32 show different ways of extracting DTOs.
Description:Using DTOs allows us to extract only the needed data. In this application, we rely on SqlResultSetMapping
and EntityManager
.
Key points:
SqlResultSetMapping
and EntityManager
Source code can be found here.
Stay tuned for out next installment where we explore the remaining top 25 best performance practices for Spring Boot 2 and Hibernate 5!
If you liked the article, you might also like the book.
See you in part 2!