WTF-8 [entries|reading|network|archive]
simont

[ userinfo | dreamwidth userinfo ]
[ archive | journal archive ]

Tue 2006-11-21 12:09
WTF-8
LinkReply
[personal profile] fanfTue 2006-11-21 14:02
This kind of thing is why practical software should treat text with a bit of DWIMmery. Fortunately it's almost always possible to correctly glark the encoding without extra metadata.

(In fact the JSON spec requires decoders to implement this kind of trick; however because JSON files are Unicode and always start off with two ASCII characters, the decoder can work out the transformation format and endianness from the position of the nulls in the first four octets.)
Link Reply to this | Parent | Thread
[identity profile] kaet.livejournal.comTue 2006-11-21 14:08
Ooh, yes, I'd not noticed that proprerty of JSON, but you're right! Very useful. There's similar tedious description of that technique with the "<?xm" in the XML spec. Amazingly, at least a year ago, nobody seemed to have implemented the relevant DWIM filter [from InputStream (octet sequence) to Reader (character sequence)] for Java for XML, and I've not been able to convince people that implementing that would be a productive use of my time. So we all arse around with character set objects and get it wrong, :(.
Link Reply to this | Parent
navigation
[ go | Previous Entry | Next Entry ]
[ add | to Memories ]