Describe the bug
Not sure whether it is a bug or just a different behaviour.
Not critical either way, but thought filing it for reference anyway.
- When decoding invalid UTF-8 with
errors='replace', CPython (and PyPy for that matter) decodes every invalid byte as �.
- GraalPy, OTOH, treats invalid UTF-8 triplets as one character.
Operating system
Linux
CPU architecture
x86_64
GraalPy version
GraalPy 3.12.8 (GraalVM CE Native 25.0.2)
JDK version
No response
Context configuration
No response
Steps to reproduce
GraalPy:
$ graalpy -c "print(b'\xed\xae\x80\xed\xb0\x80'.decode(errors='replace'))"
��
CPython:
$ python -c "print(b'\xed\xae\x80\xed\xb0\x80'.decode(errors='replace'))"
������
Expected behavior
To match CPython unless there is a good reason not to.
Stack trace
Additional context
No response
Describe the bug
Not sure whether it is a bug or just a different behaviour.
Not critical either way, but thought filing it for reference anyway.
errors='replace', CPython (and PyPy for that matter) decodes every invalid byte as�.Operating system
Linux
CPU architecture
x86_64
GraalPy version
GraalPy 3.12.8 (GraalVM CE Native 25.0.2)
JDK version
No response
Context configuration
No response
Steps to reproduce
GraalPy:
CPython:
Expected behavior
To match CPython unless there is a good reason not to.
Stack trace
Additional context
No response