simont |
Mon 2002-12-02 06:25 |
I said N^N because that's the total number of different ways to choose one of N ending states for each of N starting states. The number of bits you'd need to specify one of those ways is the base-2 log of that many, which is more like N*log(N); I intended this to be clear but probably failed :-)
The multiple-byte stuff is a concern, yes; in particular, the number of necessary states in the C state machine gets pretty large as soon as you start considering backslashes escaping closing quotes, and the state when you've just seen a / and may or may not be about to see a * (or vice versa), and that sort of thing. I'm working on a shortcut to avoid having to store all those states explicitly.
Nested comments would be a problem, I agree; in particular, they would make the number of states in the lexical analyser infinite. And one thing I've been so far trying to avoid is having variable-size annotations in each node, because that way lies garbage collection and all sorts of yuck (to say nothing of the fact that if you can't put a constant upper bound on the time required to combine two node annotations, you lose the log-time guarantee on the B-tree operations...).
Still, glad you liked it; I'd hate to have wasted that much typing :-)
|
|