Convert Java Byte Array to a UTF 8 Python String

I'm pulling JSON data through a REST API using Requests in Python. Unfortunately, one of the fields contains all sorts of unescaped and control characters that breaks the JSON.

I don't control the data, but I can request it undecoded as a string that the application stores as a Java byte array.

For example: [B@1cf3bd82

The question is how do I decode the string back into the original UTF-8 text as I'm working through the JSON? All of the examples I've found seem to work with a byte object, not a encoded string.

Thoughts?

Jon Skeet
people
quotationmark

You're currently printing out the result of calling toString() on the byte[]. That's never a good idea - arrays don't override toString().

You should use the new String(byte[], Charset) constructor:

String text = new String(bytes, StandardCharsets.UTF_8);

It's not entirely clear to me from the question where what is happening in terms of the data, but basically you need to modify the Java code - any Python code is probably irrelevant here.

people

See more on this question at Stackoverflow