Accidental i18n [entries|reading|network|archive]
simont

[ userinfo | dreamwidth userinfo ]
[ archive | journal archive ]

Fri 2018-02-16 12:41
Accidental i18n

I told a silly story in the pub last night which I suddenly realise would make a fun post here as well. It's from a few years ago originally, but I don't think it matters.

You may have heard of that old chestnut in which an alleged Cambridge University researcher allegedly claims that people can still read written text with no problems even if the internal letters of each word are arbitrarily reordered, as long as the first and last letters of each word are still the right ones.

This is nonsense, of course, and it's been debunked before. But a few years ago, Gareth and I were discussing it, and I dashed off a Perl one-liner to do that scrambling transformation. (Perhaps it seemed like a good Perl-golf challenge to waste half an hour on, or something like that.)

I got a draft implementation working quickly enough, although it didn't quite fit on one line:

$ perl -pe 's!(?<=\b[a-z])[a-z]*(?=[a-z]\b)!join"",map{$_->[1]}
sort{$a->[0]<=>$b->[0]}map{[rand,$_]}split//,$&!egi'
But soft, what light through yonder window breaks?
But soft, what lghit tughroh yedonr woindw bkears?

But shortly before the working version, I made a small error of a kind that Perl makes uniquely easy: I briefly got my scalar and list contexts confused, tried omitting the join step, and this happened:

$ perl -pe 's!(?<=\b[a-z])[a-z]*(?=[a-z]\b)!map{$_->[1]}
sort{$a->[0]<=>$b->[0]}map{[rand,$_]}split//,$&!egi'
But soft, what light through yonder window breaks?
B1t s2t, w2t l3t t5h y4r w4w b4s?

Of course – if you don't explicitly use join to turn a list of characters back into a single string, then Perl's default conversion when you use a list in scalar context is to replace it with the length of the list. Slap forehead, mutter ‘oh yes, this is Perl’, fix bug.

But I'm glad I made the mistake, because look at what the wrong program is actually doing: it's exactly a tool for abbreviating long words in the style of ‘i18n’ and ‘l10n’. Of course that's not a hard thing to do, but I was very amused to have managed to do it completely by accident!

LinkReply
[personal profile] sidereaSat 2018-02-17 02:22
Hah! Love your new password generator!
Link Reply to this
[personal profile] kaberettMon 2018-02-19 22:25
... that is charming and I thank you for sharing. :-)
Link Reply to this
[personal profile] sunflowerinrainThu 2018-05-03 08:07
People have problems reading anagrams? Oh. Or was it just too much of a generalisation? I can attest, as a copy-editor and proof-reader in a previous life, that sometimes people don't even notice that letters are out of order - but maybe that's just the writers...

Pretty Perl, though. Makes me wish I'd tried harder to learn it.

Happy birthday with Hugs! xx

Link Reply to this | Thread
[personal profile] simontThu 2018-05-03 08:16
Thank you :-) *hug*

I think the debunking page pointed out that it does depend a lot on what anagram you pick. Certainly in a lot of cases English is redundant enough that there's obviously only one word in the language that's an anagram of what's written. And in many cases only one word would make sense at all in the context of the sentence – so it doesn't even matter whether what's written is an anagram or a different kind of typo or simply an inkblot, because you could make sense of the sentence regardless. And people get pretty good at unscrambling the kinds of anagram that arise from easy typos, because we see typos all the time and get used to it.

(Indeed, sometimes I'll find myself misreading one word as another word that made a plausible typo for it, because my internal typo-correction system got its suggestion in before another part of my mind realised that the sentence made sense in its original form, if the latter was delayed because the sentence was unusually contorted or complex. My favourite one of those was the time I managed to misread 'auction' as 'suction' – sadly I can't remember the context any more, but what struck me at the time was that until I made that subconscious misreading I hadn't even known that the two words were only a typo apart!)

But in other cases, where the word is not a common one and the context doesn't make it clear, you could imagine it being a lot harder. Imagine, for example, a sentence such as "The patient was taken to hospital with the symptoms of axiwuoefhibrfjksda". From context you'd guess that the last word might be an anagram of some polysyllabic medical term (I mean, if it wasn't obvious that in this case I just flailed on the keyboard at random) – but that's as far as context could take you, and you'd still have the whole space of polysyllabic medical terms to narrow down, which would be especially hard if it wasn't a word you already knew!
Link Reply to this | Parent
navigation
[ go | Previous Entry | Next Entry ]
[ add | to Memories ]