(Reply) [entries|reading|network|archive]
simont

[ userinfo | dreamwidth userinfo ]
[ archive | journal archive ]

[personal profile] simont Mon 2002-12-02 06:25
I said N^N because that's the total number of different ways to choose one of N ending states for each of N starting states. The number of bits you'd need to specify one of those ways is the base-2 log of that many, which is more like N*log(N); I intended this to be clear but probably failed :-)

The multiple-byte stuff is a concern, yes; in particular, the number of necessary states in the C state machine gets pretty large as soon as you start considering backslashes escaping closing quotes, and the state when you've just seen a / and may or may not be about to see a * (or vice versa), and that sort of thing. I'm working on a shortcut to avoid having to store all those states explicitly.

Nested comments would be a problem, I agree; in particular, they would make the number of states in the lexical analyser infinite. And one thing I've been so far trying to avoid is having variable-size annotations in each node, because that way lies garbage collection and all sorts of yuck (to say nothing of the fact that if you can't put a constant upper bound on the time required to combine two node annotations, you lose the log-time guarantee on the B-tree operations...).

Still, glad you liked it; I'd hate to have wasted that much typing :-)
Link Read Comments
Reply:
This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting