simont: A picture of me in 2016 (Default)
simont ([personal profile] simont) wrote2003-05-30 04:04 pm

(no subject)

Whew. When someone looks back over my life of geekery and compiles my all-time list of Most Evil Hacks, I think this may well score somewhere near the top.

Now that my LiveJournal account allows me to tinker with the style system, I thought I'd redesign my friends page so that it's actually legible; the standard Generator style is pretty good, but not quite to my taste. So off I went and started fiddling.

The template for a friends-page entry is allowed to contain %%ljuser:user%%, which produces a standard reference-to-LJ-user for the author of that entry. Except that if the entry is in a community, it produces a ref-to-LJ-user for the name of the community. Not too helpful in itself; that's why it's also allowed to contain %%altposter%%, which allows you to provide an extra piece of HTML for when the poster name and the journal name differ. So my initial stab at the title line on my friends page went something like:

  • Define %%altposter%% to expand to ‘%%ljuser:poster%% on ’.
  • Define the full author label to be ‘%%altposter%%%%ljuser:user%%’.

Thus, on an ordinary post I got ‘username’, but on a community post I got ‘username on community’. All very simple.

Only trouble is, it looks a bit ugly having an LJ user icon before the community name. There's a separate community icon, after all, so why can't I use that?

Well, it's hard, because although the ‘altposter’ mechanism allows you to include extra HTML when there's a community involved, there's no easy way to take HTML out when there's a community involved (or equivalently to include extra HTML when there isn't a community involved). So any text I put in the main HTML template to display the name of an ordinary user with an LJ user icon will necessarily also appear in community posts, displaying the name of the community with an LJ user icon, which is the one thing I don't want.

So I had a think. My first idea was to use CSS: there's a CSS tag which completely inhibits display of any text it's applied to, to allow CSSed pages to provide fallback navigation stuff for non-CSS browsers. But I occasionally view my friends page in w3m, which doesn't support CSS, so that was no good.

What else in HTML inhibits display of any text enclosed in it? I slept on the problem and realised the answer is obvious: comments. And HTML comments, it turns out, don't nest! So this seemed the obvious solution:

  • Define %%altposter%% to expand to ‘%%ljuser:owner%% on %%ljcomm:community%% <!--’.
  • Define the full author label to be ‘%%altposter%% %%ljuser:user%% <!-- -->’.

So when the altposter text is present, it produces a single large comment covering the normal user name, the second open-comment is ignored, and the final --> closes the comment. When the altposter text is missing, you just get the ordinary user name followed by a trivial HTML comment. Simple and elegant, I thought.

Only it doesn't work. Perusing the LJ source code (yay for open-source websites!) I found that the value of %%altposter%% was being run through Perl's HTML::Parser to clean out any particularly malicious tags, and HTML::Parser was marking the unclosed comment as ‘text’ rather than ‘comment’ or ‘tag’. And LJ, taking no chances, therefore turned the angle bracket into an HTML entity just in case. So, back to the drawing board.

Now here's where it gets really unpleasant.

It suddenly occurred to me that the HTML sequence <!--> has a rather interesting function. If you see it outside a comment, it should open a new comment. But if you see it inside a comment, it should close one. It took some deep thinking to work out how to apply this, but eventually I came up with a solution which worked on paper:

  • Define %%altposter%% as ‘<!-->%%ljuser:owner%% on %%ljcomm:community%%<!-->’.
  • Define the full author label as ‘<!-- -%%altposter%%->%%ljuser:user%%<!-- -->’.

So when the altposter text is absent, the author label is a trivial HTML comment, followed by the ordinary user name, followed by another trivial HTML comment. But when the altposter text is present, the first <!--> closes the initial comment so that the community-specific text gets displayed, and then the second <!--> opens a new comment which swallows the -> and the ordinary user name. Then the final <!-- --> makes sure that we aren't in a comment no matter what happened. And meanwhile, HTML::Parser thinks the entire altposter text is a single large comment, so it doesn't mess with it at all.

Unfortunately, this still didn't work in practice, because Mozilla thinks <!--> is a completely self-contained comment equivalent to <!-- --> – i.e. it will leave you outside comment state no matter which state you saw it in. Fortunately I managed to work around this special case by adding a further space to the second <!-->, making it <!-- >. I'm not sure why that worked, but it seems to.

So now, in a community post, I get the user icon for the user name and the community icon for the community name; and for an ordinary post, I just get an ordinary user name with an ordinary user icon. I really don't think this should have had to be that hard; but I'm pretty pleased with myself for having risen to the challenge :-)

[identity profile] senji.livejournal.com 2003-05-30 08:12 am (UTC)(link)
Unfortunatly you don't appear to have any unlocked posts in communities in your friends list, so I can't look at this...

[identity profile] senji.livejournal.com 2003-05-30 08:21 am (UTC)(link)
<!-- >-> is a very interesting html fragment, as for that matter is -<!-->

friendsfriends seems very silly - I have about 1800 of those!

[identity profile] enslore.livejournal.com 2003-05-30 08:29 am (UTC)(link)
I have no idea what yiu're talking about techincally, but "Ooh, pretty".

[identity profile] kaet.livejournal.com 2003-05-30 08:46 am (UTC)(link)
An SGML true-comment is any text ecluding COM (--), surrounded by two COM. There are a number of places it can be used, inside many MD's (<-- ... >) at sensible places, and also in an MD of its own. Any number of true-comments can occur in a comment MD, including zero, so <!> is a valid comment.

<!-- --> should only leave you out of a comment if you are not in one. If you are in a comment the first <! is seen as part of that comment (not an MDO), the COM is taken as the comment end, the space is permitted, the next COM puts you back into the next comment and the > is seen as comment material in the next comment (not as an MDC). The easiset way to illustrate this is as

<!-- > These <!-- --> are <!-- --> each <!-- --> comments <!-- --> in <!-- --> a <!-- --> single <!-- --> comment <!-- --> MD <!-->

which is a ingle comment declaration containing multiple comments with the texts:


  • > These <!
  • > are <!
  • > each <!
  • > comments <!
  • > in <!
  • > a <!
  • > single <!
  • > comment <!
  • > MD <!


SGML, yay! Goodness knows what any browsers do with that (and it was a real arse entering all those character entities just now!) and HTML probably "traditionally" does something wrong too.

I can't think of a way of closing a comment declaration whether in one or not, (though I'm convinced there must be a way), it's difficult because of the way COM is used both to open and to close a comment within a comment declaration and is yet banned from a comment.

The space will have worked because mozilla is, spit using MDO-COM (<!--) as a comment start token and COM-MDC (-->) as a comment close token, and has a lexer dumb enough not to step over the whole token before looking for the end, the COM standing for both tokens, like parsing abc as ab followed by bc.

[identity profile] brad.livejournal.com 2003-05-30 10:43 am (UTC)(link)
Or, switch to the new style system (S2) at:

http://www.livejournal.com/customize/

The new style system is a full programming language, tweaked for templating, but sandboxed so you can't muck with the host system. No %%%% rubbish.

S2 will eventually become the default, but we're working on the site's distributed memory cachce (http://www.livejournal.com/community/lj_maintenance/60984.html) before we officially announce it.

BTW, S2 also lets you customize the comments and reply pages.

[identity profile] brad.livejournal.com 2003-05-30 11:21 am (UTC)(link)
We used to HTML-clean each entry for entry-specific rules (like lj-cut expansion) and then clean the whole page, but we cut off that final step to improve performance after we started caching a pre-cleaned version of the style first.

Of course, there are any number of ways to make the style and entries cooperate to acheive a malicious result, so we go to great lengths to detect a bunch of other hacks like yours and break them enough so they don't work. Yours might be new and actually slip through. I'll look into it.

The lame fix might be as easy as replacing all occurrences of %%- with %%- or %% -

In any case, 90+% of the world uses IE6, and IE6 introduced the "http-only" cookie, which we use, so even if people sneak JavaScript it, they can't read 90% of people's sessions. The remaining 10% either use fancy browsers which optionally disable access to cookies from JS, or those users are paranoid and do IP-bound login sessions, so if their session cookie is stolen, it's useless.

Heh... we used to just put the easily-replayable md5 password in the cookie. :P We've come a little ways since then. ;)

[identity profile] senji.livejournal.com 2003-05-30 11:46 am (UTC)(link)
In any case, 90+% of the world uses IE6

Really? I'd've guessed at about:

50% IE6
25% IE5.5
10% IE5
5% Minority browsers on Windows
10% Everyone else

Of course, I guess that Livejournal doesn't have a representative group of all internet users, any more than anything else (short of maybe google) does.

[identity profile] brad.livejournal.com 2003-05-30 12:04 pm (UTC)(link)
Sorry, I guess what I meant to write was: 90% of the world uses IE, most of which are IE6. But you're right... IE6 isn't that dominant. The lesser IEs account for quite a bit.

[identity profile] damerell.livejournal.com 2003-06-01 12:45 pm (UTC)(link)
You believe User-Agent, don't you?

[identity profile] brad.livejournal.com 2003-06-02 08:43 am (UTC)(link)
I'm sorry, you're right.

I meant to say: 89.8% percent IE users, and 0.02% Opera/Konqueror users faking as IE.

There are also the stupid spiders faking as IE who are obviously not human and we shut them down when they don't obey, so we don't count those.

[identity profile] damerell.livejournal.com 2003-06-02 08:49 am (UTC)(link)
Where do those "89.8%" and "0.02%" figures come from? They seem like sheer speculation - conventional wisdom says the proportion of non-IE browsers is very small, but then it's largely based on User-Agent and a few more bogus techniques.

What we do know is that nearly all non-IE browsers have an option to fake the User-Agent, as do most proxies and suchlike tools.

[It is also interesting that the UK's largest Web-on-TV service fakes User-Agent as IE, as well.]

[identity profile] brad.livejournal.com 2003-06-02 09:33 am (UTC)(link)
In the same way one can TCP-fingerprint an OS, I'm sure I could spend a little more effort and fingerprint the real browser based on HTTP behavior, but I don't think it's worth it....

I can't believe it's more than a few percent.

(I only use Mozilla on Linux, so I'm no pro-IE junkie, either... I'm just being realistic....)

[identity profile] damerell.livejournal.com 2003-06-02 09:40 am (UTC)(link)
I don't think it's more than a few percent either, but all I'm saying is that I don't _know_, and neither does anyone else unless they do as you propose (on a large traffic site that's accessible to all browsers - can you think of anything like that?) - and that to some degree our perceptions of what's realistic are informed by the common view that's based purely on bogus data from User-Agent (and often also on counting all HTTP requests, which plays down text browsers, and sometimes from sites that only work well with IE!), and so our idea of what's realistic might actually be quite wrong.

BTW, while I'm bypassing the proper channels to bug you about stuff, can I ask if all this new styles stuff is likely to make it feasible for users to view other people's comments pages (and ideally journals) in a style of their choosing?

[I'm only one person, but the day you implement that, I'll be a paid user. :-]

[identity profile] brad.livejournal.com 2003-06-02 10:05 am (UTC)(link)
I wonder if Google's Zeitgeist takes into account user-agent spoofing. They'd be the ideal people to publish accurate numbers.

S2 will let you, yes, but all S2 stuff is on hold for performance work right now. I figure I'll resume S2 in a week.

I think we decided there will be an option, "[ ] Show friends' entries in my style", which will turn all links on your friends page into having "?style=mine". And that style=mine will work on any S1 or S2 URL as well.

[identity profile] damerell.livejournal.com 2003-06-04 04:50 pm (UTC)(link)
I predict, by the way, that you will face a storm of protest from people bemoaning their inability to make other people view their revolting styles.