If you work with Java applications that use Hibernate, you’ve probably encountered large log files containing hundreds or thousands of SQL statements. Analyzing these queries manually can be time-consuming, especially when running automated test cases where each test generates its own logs.
In this guide, you’ll learn how to automatically extract Hibernate SQL queries from multiple log files and export them into a CSV file for analysis, reporting, troubleshooting, or performance testing.
What Problem Are We Solving?
Imagine you have a folder structure like this:
TestCase1/
cdc-server.log
TestCase2/
cdc-server.log
TestCase3/
cdc-server.log
...
TestCase16/
cdc-server.logEach cdc-server.log contains Hibernate SQL statements such as:
2024-04-26 10:21:25.856 [main] DEBUG org.hibernate.SQL RequestId= -
select
consentdat0_.id as id1_16_,
consentdat0_.delete_status as delete_s4_16_
from
consent_data consentdat0_
where
consentdat0_.delete_status=?The SQL query may span multiple lines and can be followed by another SQL statement or a completely different log entry:
2024-04-26 10:21:25.876 [main] INFO c.p.cdc.OpenBankingApplication RequestId= -
FailedTokenFallback.csv already existsThe goal is to generate a CSV file like this:
| Test Case | Timestamp | SQL Query |
|---|---|---|
| TestCase1 | 2024-04-26 10:21:25.856 | SELECT … |
| TestCase1 | 2024-04-26 10:21:30.112 | INSERT … |
| TestCase2 | 2024-04-26 10:22:15.443 | UPDATE … |
Why Extract Hibernate Queries?
There are many reasons to extract SQL queries from logs:
Performance Analysis
Identify slow-running queries.
Database Optimization
Review generated SQL and improve indexing.
Test Automation
Compare queries executed by different test cases.
Auditing
Maintain a record of database operations.
Troubleshooting
Detect unexpected table access or missing joins.
Understanding Hibernate SQL Logs
Hibernate logs SQL statements using:
DEBUG org.hibernate.SQLA typical query begins with:
2024-04-26 10:21:03.473 [main] DEBUG org.hibernate.SQL RequestId= -and continues until:
- another log entry starts
- another Hibernate SQL statement starts
- the file ends
Example:
create table achq_payments (
transaction_id varchar(255) not null,
achq_token varchar(255),
primary key (transaction_id)
)Notice that the SQL spans multiple lines.
Challenges When Parsing Hibernate Logs
Challenge 1: Multi-Line Queries
Unlike simple log messages, SQL statements often span dozens of lines.
Example:
select
id,
name,
email
from
customer
where
status=?A parser must collect all lines belonging to the same query.
Challenge 2: Consecutive SQL Statements
Example:
2024-04-26 10:21:03.473 DEBUG org.hibernate.SQL ...
create table table1 ...
2024-04-26 10:21:03.519 DEBUG org.hibernate.SQL ...
create table table2 ...The parser must recognize where one query ends and the next begins.
Challenge 3: Mixed Log Levels
Example:
DEBUG org.hibernate.SQL ...followed by:
INFO OpenBankingApplication ...The parser must stop collecting SQL when another log entry appears.
Python Solution
The following Python script processes all test case folders and extracts Hibernate SQL statements into a CSV file.
import os
import re
import csv
BASE_PATH = r"C:\Logs"
sql_start_pattern = re.compile(
r'^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3}).*DEBUG org\.hibernate\.SQL'
)
log_line_pattern = re.compile(
r'^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3}'
)
with open('hibernate_queries.csv', 'w', newline='', encoding='utf-8') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['Test Case', 'Timestamp', 'SQL Query'])
for test_case in os.listdir(BASE_PATH):
folder_path = os.path.join(BASE_PATH, test_case)
if not os.path.isdir(folder_path):
continue
log_file = os.path.join(folder_path, 'cdc-server.log')
if not os.path.exists(log_file):
continue
with open(log_file, 'r', encoding='utf-8', errors='ignore') as f:
current_timestamp = None
current_query = []
collecting = False
for line in f:
sql_match = sql_start_pattern.match(line)
if sql_match:
if current_query:
writer.writerow([
test_case,
current_timestamp,
' '.join(current_query)
])
current_timestamp = sql_match.group(1)
current_query = []
collecting = True
continue
if collecting:
if log_line_pattern.match(line):
writer.writerow([
test_case,
current_timestamp,
' '.join(current_query)
])
current_query = []
collecting = False
if sql_start_pattern.match(line):
current_timestamp = sql_start_pattern.match(line).group(1)
collecting = True
else:
current_query.append(line.strip())
if current_query:
writer.writerow([
test_case,
current_timestamp,
' '.join(current_query)
])
print("Done.")Sample Output
The generated CSV may look like:
Test Case,Timestamp,SQL Query
TestCase1,2024-04-26 10:21:03.473,"create table achq_payments (...)"
TestCase1,2024-04-26 10:21:03.519,"create table finicity_bank (...)"
TestCase1,2024-04-26 10:21:03.559,"create table finicity_bank_address (...)"
TestCase2,2024-04-26 10:21:25.856,"select consentdat0_.id ... order by consentdat0_.decision_limit_time asc"Enhancements You Can Add
Export to Excel
Instead of CSV, use:
import pandas as pd
df = pd.read_csv("hibernate_queries.csv")
df.to_excel("hibernate_queries.xlsx")Count Queries per Test Case
SELECT COUNT(*)
FROM extracted_queries
GROUP BY test_caseOr in Python:
from collections import Counter
counter = Counter()
counter[test_case] += 1Detect Slow Queries
If Hibernate logs execution times, you can extract:
Query executed in 523 msand identify performance bottlenecks.
Compare Queries Between Test Runs
Store results from multiple executions:
Run1.csv
Run2.csv
Run3.csvThen compare:
- Missing queries
- Additional queries
- Different SQL generation
This is useful during application upgrades and regression testing.
Best Practices
Enable SQL Logging Only When Needed
Hibernate SQL logging can generate huge files.
Example:
logging.level.org.hibernate.SQL=DEBUGUse it primarily in:
- Development
- Testing
- Troubleshooting
Archive Log Files
Large test suites can produce gigabytes of logs.
Compress old logs:
zip testlogs.zip *.logStore Extracted Queries Separately
Keep:
Raw Logs
Extracted Queries
Performance Reportsin separate folders for easier analysis.
Conclusion
Hibernate SQL logs contain valuable information about the database operations performed by your application. When executing multiple automated test cases, manually reviewing these logs becomes impractical.
By using a simple Python parser, you can automatically:
- Scan hundreds of log files
- Detect Hibernate SQL statements
- Handle multi-line queries
- Identify query boundaries
- Export results to CSV or Excel
- Simplify debugging and performance analysis
This approach is particularly useful for QA teams, database administrators, performance engineers, and Java developers working with Hibernate-based applications. Proper SQL extraction can dramatically reduce investigation time and provide deeper insight into application behavior during testing and production troubleshooting.


