My Evil Hack of the Week
It's early in the week, but I doubt I'll beat this in the next few days: yesterday evening I implemented a string search function (equivalent to Perl's rindex
) recursively.
It's reasonably well known among LiveJournal users that putting tables in an LJ entry has a tendency to break the formatting of people's friends pages. In fact, it's not all tables; it's only tables which don't explicitly close <tr>
and <td>
tags.
This is because LJ's HTML cleaning function, which is applied separately to the HTML generated by each individual journal entry, doesn't know that those two elements can be implicitly closed in HTML 4; so if it sees either of those tags without a corresponding explicit close tag, it remembers, and at the very end of the string it's cleaning it splurges out a huge string of pointless </tr>
and </td>
tags, long after the </table>
before which they might have at least managed to be relatively harmless if silly. When such an LJ entry is output within a page style that uses tables for layout, these unexpected closing tags disrupt the style's external tables and things go badly wrong.
Once upon a time I attempted to prepare a patch against the HTML cleaner which fixed this problem, but unfortunately failed because the code was more hideous than I was prepared to deal with when not being paid large amounts of money. So instead I worked around it in my own journal style (which, thanks to ick-
, I use for absolutely all my LJ reading): an S2 journal style is a program in a Perl-</tr>
and </td>
tags at the end of those HTML fragments and stripping them off.
Unfortunately, the recent introduction of ‘tagging’ of LJ entries broke my defensive code: the HTML fragment which shows the tags on an entry seems to be appended to what comes out of the HTML cleaner before it gets passed to my style code, meaning that those spurious tags now aren't guaranteed to appear at the very end of the entry, but can be somewhere in the middle. Yesterday I encountered the first LJ entry I'd seen with both a bug-
After a bit of thought, the obvious answer seemed to be that any </tr>
or </td>
tag which appears after the last </table>
can be identified as spurious and removed. That sounded easy enough. So I went to the S2 documentation and looked for their equivalent of rindex
… and they don't have one. Their string class provides a boolean contains()
method to tell you if one string occurs in another, but sadly not one which will tell you where it occurs.
Well, that's not a crippling problem; string functions aren't that hard, so I can just write it myself. The problem is that S2 doesn't provide any looping constructs except foreach
: no while
, and no general for
. It looks rather as if they were trying to ensure the language was unable to express an indefinite loop, in the manner of Douglas Hofstadter's BlooP, presumably to prevent badly-
Fortunately, they did leave in recursion. (In fact I was already using recursion for my earlier table-fix_tables
function, if given a string ending in a spurious close tag, stripped the tag off and then recursed in case there was another.) So now I had to implement a string search function using recursion as my only method of looping. Given the existence of the contains()
primitive, binary search seemed appropriate: divide the haystack string in half, see if the right-
So now my S2 style is running a recursive fix_tables
function, which in turn is calling a recursive rindex
function several times. Total recursion depth is (number of spurious close tags) + log(size of LJ entry HTML), which fortunately doesn't seem to have run into the recursion limit yet. And it all seems to work: the entry that triggered the problem yesterday has been fixed, but I reproduced the problem in a private entry of my own and am confident my defence now works again.
It's thoroughly silly that I had to do it in the first place, of course. If LJ were to feel like supplying index
and rindex
primitives, or better still fixing the HTML-
(I described this hack and the reasons for it at post-