Fixing UnicodeDecodeError When Migrating Python 2.7 Log Processing Scripts to Python 3

Migrating legacy Python 2.7 scripts to Python 3 is a common task in modern software maintenance. While most syntax changes are straightforward, file encoding issues are among the most frequent and confusing problems developers encounter—especially when processing log files.

One of the most common errors looks like this:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x90

This article explains why this error occurs, how Python 3 handles text differently, and how to fix it safely and correctly, using a real-world log-parsing example.


Why This Error Appears in Python 3 (But Not in Python 2)

Python 2.7 Behavior

  • Files were read as raw byte strings (str)
  • Encoding issues were often hidden
  • Decoding happened implicitly or not at all

Python 3 Behavior

  • Text files are decoded automatically
  • The default encoding depends on the OS
    • Windows: usually cp1252
    • Linux/macOS: usually utf-8
  • Invalid byte sequences raise UnicodeDecodeError

This makes Python 3 safer and more explicit, but also exposes previously hidden problems.


Typical Scenario: Parsing Server Logs

A common use case is scanning application logs to extract request identifiers, correlation IDs, or trace IDs for debugging or analytics.

Example requirements:

  • Read a large log file line by line
  • Extract request IDs using a regular expression
  • Group duplicate request IDs
  • Output aggregated results

When such logs contain non-ASCII characters, Python 3 may fail while reading them.


The Root Cause: Incorrect File Encoding

The error:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x90

means:

  • Python tried to decode the file using cp1252
  • The file contains bytes that do not exist in that encoding

This is extremely common for:

  • Java application logs
  • Logs aggregated from Linux systems
  • Logs containing binary payloads or stack traces

The Correct Solution: Explicit Encoding Handling

Option 1: Use UTF-8 (Recommended)

Most modern systems generate UTF-8 logs.

with open(file_path, "r", encoding="utf-8") as file:
    ...

Option 2: Use UTF-8 with Error Handling (Safest)

If logs may contain mixed or invalid characters:

with open(file_path, "r", encoding="utf-8", errors="replace") as file:
    ...

This ensures:

  • No crashes
  • Invalid characters are safely replaced
  • Log parsing continues uninterrupted

Final Python 3–Compatible Script (Production-Safe)

import re

file_path = "C:/path/to/application.log"

pattern = r'(Connector.*X-Request-ID:)"([a-f0-9\-]+)"'

x_request_id_data = {}

with open(file_path, "r", encoding="utf-8", errors="replace") as file:
    for line in file:
        match = re.search(pattern, line)
        if match:
            substring, x_request_id = match.groups()
            x_request_id_data.setdefault(x_request_id, []).append(substring)

for x_request_id, substrings in x_request_id_data.items():
    print(f"Duplicates for X-Request-ID: {x_request_id}, count {len(substrings)}")
    for substring in substrings:
        print(f"\t{substring}")
    print()

Why errors="replace" Is Often the Best Choice for Logs

✔ Prevents runtime crashes
✔ Keeps parsing logic simple
✔ Preserves valid data
✔ Ideal for large, heterogeneous log files

Alternative values:

  • errors="ignore" → skips invalid characters silently
  • errors="strict" → default, raises exceptions (not recommended for logs)

Best Practices for Python 3 Log Processing

  1. Always specify encoding explicitly
  2. Never rely on OS defaults
  3. Use defensive decoding for production scripts
  4. Avoid loading entire log files into memory
  5. Prefer line-by-line processing

Conclusion

When upgrading Python 2.7 scripts to Python 3, encoding issues are not bugs—they are signals that your code is now behaving correctly.

By explicitly defining file encoding and handling invalid characters safely, you can:

  • Make your scripts future-proof
  • Avoid platform-specific failures
  • Process real-world logs reliably

This approach is essential for modern DevOps, observability, and troubleshooting workflows.

This article is inspired by real-world challenges we tackle in our projects. If you're looking for expert solutions or need a team to bring your idea to life,

Let's talk!

    Please fill your details, and we will contact you back

      Please fill your details, and we will contact you back