My Evil Hack of the Week
It's early in the week, but I doubt I'll beat this in the next few days: yesterday evening I implemented a string search function (equivalent to Perl's rindex
) recursively.
It's reasonably well known among LiveJournal users that putting tables in an LJ entry has a tendency to break the formatting of people's friends pages. In fact, it's not all tables; it's only tables which don't explicitly close <tr>
and <td>
tags.
This is because LJ's HTML cleaning function, which is applied separately to the HTML generated by each individual journal entry, doesn't know that those two elements can be implicitly closed in HTML 4; so if it sees either of those tags without a corresponding explicit close tag, it remembers, and at the very end of the string it's cleaning it splurges out a huge string of pointless </tr>
and </td>
tags, long after the </table>
before which they might have at least managed to be relatively harmless if silly. When such an LJ entry is output within a page style that uses tables for layout, these unexpected closing tags disrupt the style's external tables and things go badly wrong.
Once upon a time I attempted to prepare a patch against the HTML cleaner which fixed this problem, but unfortunately failed because the code was more hideous than I was prepared to deal with when not being paid large amounts of money. So instead I worked around it in my own journal style (which, thanks to ick-
, I use for absolutely all my LJ reading): an S2 journal style is a program in a Perl-</tr>
and </td>
tags at the end of those HTML fragments and stripping them off.
Unfortunately, the recent introduction of ‘tagging’ of LJ entries broke my defensive code: the HTML fragment which shows the tags on an entry seems to be appended to what comes out of the HTML cleaner before it gets passed to my style code, meaning that those spurious tags now aren't guaranteed to appear at the very end of the entry, but can be somewhere in the middle. Yesterday I encountered the first LJ entry I'd seen with both a bug-
After a bit of thought, the obvious answer seemed to be that any </tr>
or </td>
tag which appears after the last </table>
can be identified as spurious and removed. That sounded easy enough. So I went to the S2 documentation and looked for their equivalent of rindex
… and they don't have one. Their string class provides a boolean contains()
method to tell you if one string occurs in another, but sadly not one which will tell you where it occurs.
Well, that's not a crippling problem; string functions aren't that hard, so I can just write it myself. The problem is that S2 doesn't provide any looping constructs except foreach
: no while
, and no general for
. It looks rather as if they were trying to ensure the language was unable to express an indefinite loop, in the manner of Douglas Hofstadter's BlooP, presumably to prevent badly-
Fortunately, they did leave in recursion. (In fact I was already using recursion for my earlier table-fix_tables
function, if given a string ending in a spurious close tag, stripped the tag off and then recursed in case there was another.) So now I had to implement a string search function using recursion as my only method of looping. Given the existence of the contains()
primitive, binary search seemed appropriate: divide the haystack string in half, see if the right-
So now my S2 style is running a recursive fix_tables
function, which in turn is calling a recursive rindex
function several times. Total recursion depth is (number of spurious close tags) + log(size of LJ entry HTML), which fortunately doesn't seem to have run into the recursion limit yet. And it all seems to work: the entry that triggered the problem yesterday has been fixed, but I reproduced the problem in a private entry of my own and am confident my defence now works again.
It's thoroughly silly that I had to do it in the first place, of course. If LJ were to feel like supplying index
and rindex
primitives, or better still fixing the HTML-
(I described this hack and the reasons for it at post-
no subject
no subject
no subject
no subject
But yeah, this was fun :-)
no subject
no subject
no subject
If that isn't part of a full-on filk, it damn well should be. (If so, bang goes my productivity for the afternoon.)
no subject
no subject
no subject
You might be thinking of the time I searched my diary archives for sentences in iambic pentameter and managed to find just enough rhyming ones to turn into a rather disjointed not-quite-sonnet, but that certainly wasn't my collection of filk-seeds.
My filk-seed collection includes, off the top of my head:
For the last of those I have about three quarters of the rest of the song, if I can still remember it, including a triple rhyme that would have made Tom Lehrer blush (maybe). Oh, and there was one where it occurred to me to start off "Eleanor Rigby" with "Eleanor Ripley" and turn it into an Aliens filk, but I never came up with any actual lines for it, althoughGenerally, though, my attitude to filk is that once you've come up with the one outstandingly well crafted line that sparks the whole thing off, coming up with enough filler to pad out the rest of the song is not really motivating and you might just as well leave it at that.
A meme? Moi?
Don't worry -- I didn't mean to imply that you'd copied it from anywhere, merely that (a) it was an automated summary of your character in an extremely oblique way and (b) people might like to copy it from you :)
once you've one outstandingly line ... coming up with enough filler isn't really motivating
It took me *aaaages* to work out what filk *was*. I had associations because I knew what people described as it, but it didn't seem to make sense. Then I went to wikipedia and discovered my understanding was right :) Their description is beautifully worked out, something like, songs characterised by being about, or by the sort of people who write filk about, scifi/fantasy songs... It really is "know it when you hear it" but wikipedia was good at capturing that essence like a butterfly of etymology[1].
[1] Entomology pun not intended :)
no subject
no subject
* Because closing tags is the one true way? I disagree. I see the good side of all tags being open-content-close, but I can also see good reasons for ones that are open-content-separator-content-separator-content-separator-close.
* Because you shouldn't break LJ? That's not moral, that's dirty pragmatism :)
no subject
Suggestion 2: foreach [1...n] {strip} where n is greater than the likely available stack space :)
no subject
rindex
in S2; the compiler writer will still have to either do the job or write code to do it for him (which is bound to be at least as hard).Suggestion 2: if I knew of an S2 language construction that would generate [1..n] for a given n then I'd be a lot happier...
no subject
2. Oh. Sorry, I assumed "loop n times" would be in *any* nonstupid language, I hadn't actually looked in to S2. Foreach a string? Can you make a very long loop and break out after n decrements to zero? Sorry, I don't know.
no subject
then
move_where_I_make_the_split
no subject
no subject