Line comments versus line splicing
Here's a thing I was pondering the other day about lexing.
Suppose you have a language containing two moderately common lexical features: a comment mechanism in which comments are newline-//
, shell #
), and a line-
C++ and sh have different answers. In C++, a backslash at the end of a //
comment still line-//
comments.) In shell, it's the other way round: a comment renders everything to end-
Anyway. Leaving aside how existing languages handle this interaction, how would we like to handle it, if we were designing a new syntax with both of these features? For the sake of illustration, I'll assume that the language also has syntactically significant newlines, so that line-
A thing that's often annoyed me in make
is that if you have a sprawling variable definition with lots of line-
OBJECTS = one.o \
two.o \ # putting a comment here doesn't work
three.o \
four.o # putting it here doesn't either, arrgh \
five.o
So, for a start, it would be nice to arrange that at least one of the above syntaxes works. Either would be OK, but I prefer the first, because of the scenario I mentioned above about people doing ASCII art in comments –
Another credible use case is if you want to comment one of the line-
OBJECTS = one.o \
two.o \
# three.o \
four.o
(Also, you'd probably want to include a comment saying why it was commented out, which adds extra confusion.)
So I think that leads me to the following rules:
- A backslash causes line splicing if it is the last lexical token on a physical line, i.e. it still line-
splices even if a comment- to- eol appears after it. (Also, while I'm here, it also shouldn't matter if whitespace appears after it. Syntactically significant whitespace is often unhelpful, but syntactically significant trailing whitespace is especially egregious.) - A backslash appearing at the end of a comment-
to- eol is just part of the comment and has no line- splicing effect. - If a line with a line-
splicing backslash is followed by one or more lines containing nothing but comments- to- eol (with optional initial whitespace), then those comment- only lines are completely ignored and the subsequent one will be spliced on to the original line instead.
So that permits all of the following usages:
OBJECTS = one.o \
two.o \ # This object is notable for some reason
three.o \
# the next two objects must not be reordered
four_a.o \
four_b.o \
# end of reordering constraint
five.o \
# six.o \ # removed until bug 1234 is fixed
seven.o \
eight.o
So. What use case have I missed with that analysis, and/or why is my modest proposal going to lead to crawling horrors in some totally different scenario which existing languages don't mess up so badly?
no subject
I think this is precisely where pragmatism and conceptual simplicity diverge: the option you describe leads directly to the ASCII art trap, which it seems to me is a severe violation of the Principle of Least Astonishment (especially in C/C++'s strict implementation where invisible trailing whitespace can change the meaning) and imparts a fair amount of mindbendingness to my way of thinking.
your point that the syntax of preprocessor directives in C/C++ is an obvious misfeature in the first place. (If they could be freely interpolated, life would be much better in many ways [...])
That's a considerably broader point than the one I'd intended to make, but I'll cheerfully subscribe to it regardless. Being able to embed macro definitions anywhere, including in the expansions of other macros, would certainly offer scope for some exciting perversions :-)
Though that makes comment-to-nl itself feel like a mistake in the language
It is odd, now I think about it, that neither of C/C++'s comment syntaxes nests properly. If they were going to throw in a second comment syntax, they could have had it fix the bug in the first one!
you don't make explicit what you want this to do: [...] I'm guessing you want it to treat ten.o and eleven.o as being on the same line?
Yes, that was my intention. (I thought I had made that explicit, though perhaps I left an unintended loophole in the wording. The line containing ten.o has a line-splicing backslash by rule 1, and therefore rule 3 applies to the line following it.)
Rule 3 is a curveball, I agree, but then I was approaching the entire problem from a use-cases perspective to see what would happen, and since the conceptually clean approaches in C/C++ and shell both turn out to lead to weird use cases, it didn't surprise me that my use-case-focused reanalysis led to weird concepts :-)
a far neater solution is to set aside a dedicated character, not used for anything else, and say that character to end of line is whitespace
Hmm, yes, that is quite neat; a sort of comment-to-just-after-nl as opposed to (or rather, offered as an alternative to) the more usual comment-to-just-before-nl.
no subject
…but they've prohibited it from working the way you might expect (and hijacked # inside macros for another purpose, which is why I was sick and used "define" as the macro parameter's name (-8 ).
I'm guessing they decided that macros defining macros was a can of worms best left unopened.
At this point I'm remembering Modula-3's pragma syntax: <* … *> to go with the Wirth (* … *) for comments. (How come everyone's happy using [ … ] for arrays in Pascal instead of (. … .), but still uses (* … *) instead of { … } for comments?) Had C used something similar, we could have avoided all this __attribute__ and __declspec nonsense for starters!
If I ever design a language in the C problem space, knowing what I know now, the pragma syntax will be something like: