Migrating legacy Python 2.7 scripts to Python 3 is a common task in modern software maintenance. While most syntax changes are straightforward, file encoding issues are among the most frequent and confusing problems developers encounter—especially when processing log files.
One of the most common errors looks like this:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90
This article explains why this error occurs, how Python 3 handles text differently, and how to fix it safely and correctly, using a real-world log-parsing example.
Why This Error Appears in Python 3 (But Not in Python 2)
Python 2.7 Behavior
- Files were read as raw byte strings (
str) - Encoding issues were often hidden
- Decoding happened implicitly or not at all
Python 3 Behavior
- Text files are decoded automatically
- The default encoding depends on the OS
- Windows: usually
cp1252 - Linux/macOS: usually
utf-8
- Windows: usually
- Invalid byte sequences raise
UnicodeDecodeError
This makes Python 3 safer and more explicit, but also exposes previously hidden problems.
Typical Scenario: Parsing Server Logs
A common use case is scanning application logs to extract request identifiers, correlation IDs, or trace IDs for debugging or analytics.
Example requirements:
- Read a large log file line by line
- Extract request IDs using a regular expression
- Group duplicate request IDs
- Output aggregated results
When such logs contain non-ASCII characters, Python 3 may fail while reading them.
The Root Cause: Incorrect File Encoding
The error:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90
means:
- Python tried to decode the file using
cp1252 - The file contains bytes that do not exist in that encoding
This is extremely common for:
- Java application logs
- Logs aggregated from Linux systems
- Logs containing binary payloads or stack traces
The Correct Solution: Explicit Encoding Handling
Option 1: Use UTF-8 (Recommended)
Most modern systems generate UTF-8 logs.
with open(file_path, "r", encoding="utf-8") as file:
...
Option 2: Use UTF-8 with Error Handling (Safest)
If logs may contain mixed or invalid characters:
with open(file_path, "r", encoding="utf-8", errors="replace") as file:
...
This ensures:
- No crashes
- Invalid characters are safely replaced
- Log parsing continues uninterrupted
Final Python 3–Compatible Script (Production-Safe)
import re
file_path = "C:/path/to/application.log"
pattern = r'(Connector.*X-Request-ID:)"([a-f0-9\-]+)"'
x_request_id_data = {}
with open(file_path, "r", encoding="utf-8", errors="replace") as file:
for line in file:
match = re.search(pattern, line)
if match:
substring, x_request_id = match.groups()
x_request_id_data.setdefault(x_request_id, []).append(substring)
for x_request_id, substrings in x_request_id_data.items():
print(f"Duplicates for X-Request-ID: {x_request_id}, count {len(substrings)}")
for substring in substrings:
print(f"\t{substring}")
print()
Why errors="replace" Is Often the Best Choice for Logs
✔ Prevents runtime crashes
✔ Keeps parsing logic simple
✔ Preserves valid data
✔ Ideal for large, heterogeneous log files
Alternative values:
errors="ignore"→ skips invalid characters silentlyerrors="strict"→ default, raises exceptions (not recommended for logs)
Best Practices for Python 3 Log Processing
- Always specify encoding explicitly
- Never rely on OS defaults
- Use defensive decoding for production scripts
- Avoid loading entire log files into memory
- Prefer line-by-line processing
Conclusion
When upgrading Python 2.7 scripts to Python 3, encoding issues are not bugs—they are signals that your code is now behaving correctly.
By explicitly defining file encoding and handling invalid characters safely, you can:
- Make your scripts future-proof
- Avoid platform-specific failures
- Process real-world logs reliably
This approach is essential for modern DevOps, observability, and troubleshooting workflows.


