Fixing UnicodeDecodeError When Migrating Python 2.7 Log Processing Scripts to Python 3

Migrating legacy Python 2.7 scripts to Python 3 is a common task in modern software maintenance. While most syntax changes are straightforward, file encoding issues are among the most frequent and confusing problems developers encounter—especially when processing log files.

One of the most common errors looks like this:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x90

This article explains why this error occurs, how Python 3 handles text differently, and how to fix it safely and correctly, using a real-world log-parsing example.

Why This Error Appears in Python 3 (But Not in Python 2)

Python 2.7 Behavior

Files were read as raw byte strings (str)
Encoding issues were often hidden
Decoding happened implicitly or not at all

Python 3 Behavior

Text files are decoded automatically
The default encoding depends on the OS
- Windows: usually cp1252
- Linux/macOS: usually utf-8
Invalid byte sequences raise UnicodeDecodeError

This makes Python 3 safer and more explicit, but also exposes previously hidden problems.

Typical Scenario: Parsing Server Logs

A common use case is scanning application logs to extract request identifiers, correlation IDs, or trace IDs for debugging or analytics.

Example requirements:

Read a large log file line by line
Extract request IDs using a regular expression
Group duplicate request IDs
Output aggregated results

When such logs contain non-ASCII characters, Python 3 may fail while reading them.

The Root Cause: Incorrect File Encoding

The error:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x90

means:

Python tried to decode the file using cp1252
The file contains bytes that do not exist in that encoding

This is extremely common for:

Java application logs
Logs aggregated from Linux systems
Logs containing binary payloads or stack traces

The Correct Solution: Explicit Encoding Handling

Option 1: Use UTF-8 (Recommended)

Most modern systems generate UTF-8 logs.

with open(file_path, "r", encoding="utf-8") as file:
    ...

Option 2: Use UTF-8 with Error Handling (Safest)

If logs may contain mixed or invalid characters:

with open(file_path, "r", encoding="utf-8", errors="replace") as file:
    ...

This ensures:

No crashes
Invalid characters are safely replaced
Log parsing continues uninterrupted

Final Python 3–Compatible Script (Production-Safe)

import re

file_path = "C:/path/to/application.log"

pattern = r'(Connector.*X-Request-ID:)"([a-f0-9\-]+)"'

x_request_id_data = {}

with open(file_path, "r", encoding="utf-8", errors="replace") as file:
    for line in file:
        match = re.search(pattern, line)
        if match:
            substring, x_request_id = match.groups()
            x_request_id_data.setdefault(x_request_id, []).append(substring)

for x_request_id, substrings in x_request_id_data.items():
    print(f"Duplicates for X-Request-ID: {x_request_id}, count {len(substrings)}")
    for substring in substrings:
        print(f"\t{substring}")
    print()

Why `errors="replace"` Is Often the Best Choice for Logs

✔ Prevents runtime crashes
✔ Keeps parsing logic simple
✔ Preserves valid data
✔ Ideal for large, heterogeneous log files

Alternative values:

errors="ignore" → skips invalid characters silently
errors="strict" → default, raises exceptions (not recommended for logs)

Best Practices for Python 3 Log Processing

Always specify encoding explicitly
Never rely on OS defaults
Use defensive decoding for production scripts
Avoid loading entire log files into memory
Prefer line-by-line processing

Conclusion

When upgrading Python 2.7 scripts to Python 3, encoding issues are not bugs—they are signals that your code is now behaving correctly.

By explicitly defining file encoding and handling invalid characters safely, you can:

Make your scripts future-proof
Avoid platform-specific failures
Process real-world logs reliably

This approach is essential for modern DevOps, observability, and troubleshooting workflows.

This article is inspired by real-world challenges we tackle in our projects. If you're looking for expert solutions or need a team to bring your idea to life,