If you work with Java applications that use Hibernate, you’ve probably encountered large log files containing hundreds or thousands of SQL statements. Analyzing these queries manually can be time-consuming, especially when running automated test cases where each test generates its own logs.

In this guide, you’ll learn how to automatically extract Hibernate SQL queries from multiple log files and export them into a CSV file for analysis, reporting, troubleshooting, or performance testing.

What Problem Are We Solving?

Imagine you have a folder structure like this:

TestCase1/
    cdc-server.log

TestCase2/
    cdc-server.log

TestCase3/
    cdc-server.log

...

TestCase16/
    cdc-server.log

Each cdc-server.log contains Hibernate SQL statements such as:

2024-04-26 10:21:25.856 [main] DEBUG org.hibernate.SQL RequestId= -

    select
        consentdat0_.id as id1_16_,
        consentdat0_.delete_status as delete_s4_16_
    from
        consent_data consentdat0_
    where
        consentdat0_.delete_status=?

The SQL query may span multiple lines and can be followed by another SQL statement or a completely different log entry:

2024-04-26 10:21:25.876 [main] INFO c.p.cdc.OpenBankingApplication RequestId= -
FailedTokenFallback.csv already exists

The goal is to generate a CSV file like this:

Test Case	Timestamp	SQL Query
TestCase1	2024-04-26 10:21:25.856	SELECT …
TestCase1	2024-04-26 10:21:30.112	INSERT …
TestCase2	2024-04-26 10:22:15.443	UPDATE …

Why Extract Hibernate Queries?

There are many reasons to extract SQL queries from logs:

Performance Analysis

Identify slow-running queries.

Database Optimization

Review generated SQL and improve indexing.

Test Automation

Compare queries executed by different test cases.

Auditing

Maintain a record of database operations.

Troubleshooting

Detect unexpected table access or missing joins.

Understanding Hibernate SQL Logs

Hibernate logs SQL statements using:

DEBUG org.hibernate.SQL

A typical query begins with:

2024-04-26 10:21:03.473 [main] DEBUG org.hibernate.SQL RequestId= -

and continues until:

another log entry starts
another Hibernate SQL statement starts
the file ends

Example:

create table achq_payments (
   transaction_id varchar(255) not null,
   achq_token varchar(255),
   primary key (transaction_id)
)

Notice that the SQL spans multiple lines.

Challenges When Parsing Hibernate Logs

Challenge 1: Multi-Line Queries

Unlike simple log messages, SQL statements often span dozens of lines.

Example:

select
    id,
    name,
    email
from
    customer
where
    status=?

A parser must collect all lines belonging to the same query.

Challenge 2: Consecutive SQL Statements

Example:

2024-04-26 10:21:03.473 DEBUG org.hibernate.SQL ...

create table table1 ...

2024-04-26 10:21:03.519 DEBUG org.hibernate.SQL ...

create table table2 ...

The parser must recognize where one query ends and the next begins.

Challenge 3: Mixed Log Levels

Example:

DEBUG org.hibernate.SQL ...

followed by:

INFO OpenBankingApplication ...

The parser must stop collecting SQL when another log entry appears.

Python Solution

The following Python script processes all test case folders and extracts Hibernate SQL statements into a CSV file.

import os
import re
import csv

BASE_PATH = r"C:\Logs"

sql_start_pattern = re.compile(
    r'^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3}).*DEBUG org\.hibernate\.SQL'
)

log_line_pattern = re.compile(
    r'^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3}'
)

with open('hibernate_queries.csv', 'w', newline='', encoding='utf-8') as csvfile:

    writer = csv.writer(csvfile)
    writer.writerow(['Test Case', 'Timestamp', 'SQL Query'])

    for test_case in os.listdir(BASE_PATH):

        folder_path = os.path.join(BASE_PATH, test_case)

        if not os.path.isdir(folder_path):
            continue

        log_file = os.path.join(folder_path, 'cdc-server.log')

        if not os.path.exists(log_file):
            continue

        with open(log_file, 'r', encoding='utf-8', errors='ignore') as f:

            current_timestamp = None
            current_query = []
            collecting = False

            for line in f:

                sql_match = sql_start_pattern.match(line)

                if sql_match:

                    if current_query:
                        writer.writerow([
                            test_case,
                            current_timestamp,
                            ' '.join(current_query)
                        ])

                    current_timestamp = sql_match.group(1)
                    current_query = []
                    collecting = True
                    continue

                if collecting:

                    if log_line_pattern.match(line):

                        writer.writerow([
                            test_case,
                            current_timestamp,
                            ' '.join(current_query)
                        ])

                        current_query = []
                        collecting = False

                        if sql_start_pattern.match(line):
                            current_timestamp = sql_start_pattern.match(line).group(1)
                            collecting = True

                    else:
                        current_query.append(line.strip())

            if current_query:
                writer.writerow([
                    test_case,
                    current_timestamp,
                    ' '.join(current_query)
                ])

print("Done.")

Sample Output

The generated CSV may look like:

Test Case,Timestamp,SQL Query
TestCase1,2024-04-26 10:21:03.473,"create table achq_payments (...)"
TestCase1,2024-04-26 10:21:03.519,"create table finicity_bank (...)"
TestCase1,2024-04-26 10:21:03.559,"create table finicity_bank_address (...)"
TestCase2,2024-04-26 10:21:25.856,"select consentdat0_.id ... order by consentdat0_.decision_limit_time asc"

Enhancements You Can Add

Export to Excel

Instead of CSV, use:

import pandas as pd

df = pd.read_csv("hibernate_queries.csv")
df.to_excel("hibernate_queries.xlsx")

Count Queries per Test Case

SELECT COUNT(*)
FROM extracted_queries
GROUP BY test_case

Or in Python:

from collections import Counter

counter = Counter()

counter[test_case] += 1

Detect Slow Queries

If Hibernate logs execution times, you can extract:

Query executed in 523 ms

and identify performance bottlenecks.

Compare Queries Between Test Runs

Store results from multiple executions:

Run1.csv
Run2.csv
Run3.csv

Then compare:

Missing queries
Additional queries
Different SQL generation

This is useful during application upgrades and regression testing.

Best Practices

Enable SQL Logging Only When Needed

Hibernate SQL logging can generate huge files.

Example:

logging.level.org.hibernate.SQL=DEBUG

Use it primarily in:

Development
Testing
Troubleshooting

Archive Log Files

Large test suites can produce gigabytes of logs.

Compress old logs:

zip testlogs.zip *.log

Store Extracted Queries Separately

Keep:

Raw Logs
Extracted Queries
Performance Reports

in separate folders for easier analysis.

Conclusion

Hibernate SQL logs contain valuable information about the database operations performed by your application. When executing multiple automated test cases, manually reviewing these logs becomes impractical.

By using a simple Python parser, you can automatically:

Scan hundreds of log files
Detect Hibernate SQL statements
Handle multi-line queries
Identify query boundaries
Export results to CSV or Excel
Simplify debugging and performance analysis

This approach is particularly useful for QA teams, database administrators, performance engineers, and Java developers working with Hibernate-based applications. Proper SQL extraction can dramatically reduce investigation time and provide deeper insight into application behavior during testing and production troubleshooting.

Extracting Hibernate SQL Queries from Log Files and Exporting Them to CSV: A Complete Beginner’s Guide

What Problem Are We Solving?

Why Extract Hibernate Queries?

Performance Analysis

Database Optimization

Test Automation

Auditing

Troubleshooting

Understanding Hibernate SQL Logs

Challenges When Parsing Hibernate Logs

Challenge 1: Multi-Line Queries

Challenge 2: Consecutive SQL Statements

Challenge 3: Mixed Log Levels

Python Solution

Sample Output

Enhancements You Can Add

Export to Excel

Count Queries per Test Case

Detect Slow Queries

Compare Queries Between Test Runs

Best Practices

Enable SQL Logging Only When Needed

Archive Log Files

Store Extracted Queries Separately

Conclusion

This article is inspired by real-world challenges we tackle in our projects. If you're looking for expert solutions or need a team to bring your idea to life,

Let's talk!

Extracting Hibernate SQL Queries from Log Files and Exporting Them to CSV: A Complete Beginner’s Guide

What Problem Are We Solving?

Why Extract Hibernate Queries?

Performance Analysis

Database Optimization

Test Automation

Auditing

Troubleshooting

Understanding Hibernate SQL Logs

Challenges When Parsing Hibernate Logs

Challenge 1: Multi-Line Queries

Challenge 2: Consecutive SQL Statements

Challenge 3: Mixed Log Levels

Python Solution

Sample Output

Enhancements You Can Add

Export to Excel

Count Queries per Test Case

Detect Slow Queries

Compare Queries Between Test Runs

Best Practices

Enable SQL Logging Only When Needed

Archive Log Files

Store Extracted Queries Separately

Conclusion

This article is inspired by real-world challenges we tackle in our projects. If you're looking for expert solutions or need a team to bring your idea to life,

Let's talk!

Related Posts

Create Tables with Composite Primary Keys and One-to-Many Relationships in SQL Server

Fix “Pool Empty. Unable to Fetch a Connection” Errors in Java Applications

A Robust Linux Backup Script