For clarity, I'd prefer to think of three or more features interacting, not two:
Newline to terminate a comment
Newline for at least one other purpose
A token that can precede a newline to 'escape' it, token+nl turning into nothing
There are then three issues:
Precedence
Whether parsing transforms any of these syntactic structures into something that affects lower-precedence ones
Whether or not we want a single newline to be able to serve multiple syntactic purposes
Having the escaping mechanism at lowest precedence is obviously pointless. So should escaping have higher precedence than all possible uses of a newline, or be lower than some but higher than others? To my way of thinking, the least mind-bending option is to say that escaping has unequivocally higher precedence than any syntax that uses newlines: either you see the escape token at the end of a line, or the newline isn't escaped. End of story.
End of that story, at any rate. There is still the question of how syntaxes that rely on newlines should compose when an unescaped newline occurs. C/C++ say that comment-to-nl has higher precedence than preprocessor directives, and that comment-to-nl is equivalent to newline, so '#define MOO baa//quack' defines MOO to baa. That, I think, is fair enough. Accepting, of course, your point that the syntax of preprocessor directives in C/C++ is an obvious misfeature in the first place. (If they could be freely interpolated, life would be much better in many ways, and the issue of whether comment-to-nl was deemed equivalent to whitespace or newline wouldn't arise. Though that makes comment-to-nl itself feel like a mistake in the language; hohum.)
So far as it goes, I'm fairly happy with C/C++'s rules. They let me write what I need to without tripping up, and when I do see something grotesque (which, naturally, was written by someone else) I can work out what it does. There are worse syntactic problems to worry about.
Looking at your proposed rules, by my understanding you're saying comment-to-nl should be treated as a newline, and should have higher precedence than the escape token, that newline as a statement delimiter should have lower precedence than the escape token. Then your rule 3 throws a curveball.
Picking things apart more carefully, I think the totality of what you're suggesting is:
'//' to end of line is tokenised as comment
'\' nl? comment+ is elided
…though you don't make explicit what you want this to do:
ten.o \ # A comment
# another comment
eleven.o
…I'm guessing you want it to treat ten.o and eleven.o as being on the same line?
While that has some heuristic benefits, it looks pretty clunky when formalised. Maybe a better solution is for shell to have a commenting style that sits within the line, so the problem goes away!
If designing a syntax from scratch, a far neater solution is to set aside a dedicated character, not used for anything else, and say that character to end of line is whitespace. Now your example becomes:
OBJECTS = one.o ♦
two.o ♦ This object is notable for some reason
three.o ♦
♦ the next two objects must not be reordered
four_a.o ♦
four_b.o ♦
♦ end of reordering constraint
five.o ♦
♦ six.o ♦ removed until bug 1234 is fixed
seven.o ♦
eight.o
…which I find far more tolerable.
Even better, people should heed the moral, clear in hindsight, that newline is a syntactic element like any other and, like any other, you should avoid confusing nesting and interaction of syntactic constructs like the plague!
To my way of thinking, the least mind-bending option is to say that escaping has unequivocally higher precedence than any syntax that uses newlines: either you see the escape token at the end of a line, or the newline isn't escaped.
I think this is precisely where pragmatism and conceptual simplicity diverge: the option you describe leads directly to the ASCII art trap, which it seems to me is a severe violation of the Principle of Least Astonishment (especially in C/C++'s strict implementation where invisible trailing whitespace can change the meaning) and imparts a fair amount of mindbendingness to my way of thinking.
your point that the syntax of preprocessor directives in C/C++ is an obvious misfeature in the first place. (If they could be freely interpolated, life would be much better in many ways [...])
That's a considerably broader point than the one I'd intended to make, but I'll cheerfully subscribe to it regardless. Being able to embed macro definitions anywhere, including in the expansions of other macros, would certainly offer scope for some exciting perversions :-)
Though that makes comment-to-nl itself feel like a mistake in the language
It is odd, now I think about it, that neither of C/C++'s comment syntaxes nests properly. If they were going to throw in a second comment syntax, they could have had it fix the bug in the first one!
you don't make explicit what you want this to do: [...] I'm guessing you want it to treat ten.o and eleven.o as being on the same line?
Yes, that was my intention. (I thought I had made that explicit, though perhaps I left an unintended loophole in the wording. The line containing ten.o has a line-splicing backslash by rule 1, and therefore rule 3 applies to the line following it.)
Rule 3 is a curveball, I agree, but then I was approaching the entire problem from a use-cases perspective to see what would happen, and since the conceptually clean approaches in C/C++ and shell both turn out to lead to weird use cases, it didn't surprise me that my use-case-focused reanalysis led to weird concepts :-)
a far neater solution is to set aside a dedicated character, not used for anything else, and say that character to end of line is whitespace
Hmm, yes, that is quite neat; a sort of comment-to-just-after-nl as opposed to (or rather, offered as an alternative to) the more usual comment-to-just-before-nl.
The perversion of defining a macro in the expansion of another macro is orthogonal to the issue of whether or not preprocessor directives can be freely interpolated. The C/C++'s preprocessor could have allowed:
#define MOO(define) #define x QUACK
…but they've prohibited it from working the way you might expect (and hijacked # inside macros for another purpose, which is why I was sick and used "define" as the macro parameter's name (-8 ).
I'm guessing they decided that macros defining macros was a can of worms best left unopened.
At this point I'm remembering Modula-3's pragma syntax: <* … *> to go with the Wirth (* … *) for comments. (How come everyone's happy using [ … ] for arrays in Pascal instead of (. … .), but still uses (* … *) instead of { … } for comments?) Had C used something similar, we could have avoided all this __attribute__ and __declspec nonsense for starters!
If I ever design a language in the C problem space, knowing what I know now, the pragma syntax will be something like:
If designing a syntax from scratch, a far neater solution is to set aside a dedicated character, not used for anything else, and say that character to end of line is whitespace.
That looks suspiciously much like how TeX does comments (and line-splicing).
In a language with significant newlines you need both comment-with-splice and comment-without-splice - the latter so that you can add a comment without affecting the meaning of the layout. For example, I might have a makefile containing
For clarity, I'd prefer to think of three or more features interacting, not two:
There are then three issues:
Having the escaping mechanism at lowest precedence is obviously pointless. So should escaping have higher precedence than all possible uses of a newline, or be lower than some but higher than others? To my way of thinking, the least mind-bending option is to say that escaping has unequivocally higher precedence than any syntax that uses newlines: either you see the escape token at the end of a line, or the newline isn't escaped. End of story.
End of that story, at any rate. There is still the question of how syntaxes that rely on newlines should compose when an unescaped newline occurs. C/C++ say that comment-to-nl has higher precedence than preprocessor directives, and that comment-to-nl is equivalent to newline, so '#define MOO baa//quack' defines MOO to baa. That, I think, is fair enough. Accepting, of course, your point that the syntax of preprocessor directives in C/C++ is an obvious misfeature in the first place. (If they could be freely interpolated, life would be much better in many ways, and the issue of whether comment-to-nl was deemed equivalent to whitespace or newline wouldn't arise. Though that makes comment-to-nl itself feel like a mistake in the language; hohum.)
So far as it goes, I'm fairly happy with C/C++'s rules. They let me write what I need to without tripping up, and when I do see something grotesque (which, naturally, was written by someone else) I can work out what it does. There are worse syntactic problems to worry about.
Looking at your proposed rules, by my understanding you're saying comment-to-nl should be treated as a newline, and should have higher precedence than the escape token, that newline as a statement delimiter should have lower precedence than the escape token. Then your rule 3 throws a curveball.
Picking things apart more carefully, I think the totality of what you're suggesting is:
- '//' to end of line is tokenised as comment
- '\' nl? comment+ is elided
…though you don't make explicit what you want this to do:ten.o \ # A comment # another comment eleven.o…I'm guessing you want it to treat ten.o and eleven.o as being on the same line?While that has some heuristic benefits, it looks pretty clunky when formalised. Maybe a better solution is for shell to have a commenting style that sits within the line, so the problem goes away!
If designing a syntax from scratch, a far neater solution is to set aside a dedicated character, not used for anything else, and say that character to end of line is whitespace. Now your example becomes:
OBJECTS = one.o ♦ two.o ♦ This object is notable for some reason three.o ♦ ♦ the next two objects must not be reordered four_a.o ♦ four_b.o ♦ ♦ end of reordering constraint five.o ♦ ♦ six.o ♦ removed until bug 1234 is fixed seven.o ♦ eight.o…which I find far more tolerable.
Even better, people should heed the moral, clear in hindsight, that newline is a syntactic element like any other and, like any other, you should avoid confusing nesting and interaction of syntactic constructs like the plague!
I think this is precisely where pragmatism and conceptual simplicity diverge: the option you describe leads directly to the ASCII art trap, which it seems to me is a severe violation of the Principle of Least Astonishment (especially in C/C++'s strict implementation where invisible trailing whitespace can change the meaning) and imparts a fair amount of mindbendingness to my way of thinking.
your point that the syntax of preprocessor directives in C/C++ is an obvious misfeature in the first place. (If they could be freely interpolated, life would be much better in many ways [...])
That's a considerably broader point than the one I'd intended to make, but I'll cheerfully subscribe to it regardless. Being able to embed macro definitions anywhere, including in the expansions of other macros, would certainly offer scope for some exciting perversions :-)
Though that makes comment-to-nl itself feel like a mistake in the language
It is odd, now I think about it, that neither of C/C++'s comment syntaxes nests properly. If they were going to throw in a second comment syntax, they could have had it fix the bug in the first one!
you don't make explicit what you want this to do: [...] I'm guessing you want it to treat ten.o and eleven.o as being on the same line?
Yes, that was my intention. (I thought I had made that explicit, though perhaps I left an unintended loophole in the wording. The line containing ten.o has a line-splicing backslash by rule 1, and therefore rule 3 applies to the line following it.)
Rule 3 is a curveball, I agree, but then I was approaching the entire problem from a use-cases perspective to see what would happen, and since the conceptually clean approaches in C/C++ and shell both turn out to lead to weird use cases, it didn't surprise me that my use-case-focused reanalysis led to weird concepts :-)
a far neater solution is to set aside a dedicated character, not used for anything else, and say that character to end of line is whitespace
Hmm, yes, that is quite neat; a sort of comment-to-just-after-nl as opposed to (or rather, offered as an alternative to) the more usual comment-to-just-before-nl.
…but they've prohibited it from working the way you might expect (and hijacked # inside macros for another purpose, which is why I was sick and used "define" as the macro parameter's name (-8 ).
I'm guessing they decided that macros defining macros was a can of worms best left unopened.
At this point I'm remembering Modula-3's pragma syntax: <* … *> to go with the Wirth (* … *) for comments. (How come everyone's happy using [ … ] for arrays in Pascal instead of (. … .), but still uses (* … *) instead of { … } for comments?) Had C used something similar, we could have avoided all this __attribute__ and __declspec nonsense for starters!
If I ever design a language in the C problem space, knowing what I know now, the pragma syntax will be something like:
void f() { #pragma #switch(COMPILER_VERSION) { RECENT_GCC: diagnostic push; diagnostic ignored "-Wuninitialized"; break; MSVC: warning(push); warning(disable: C4700); break; } // Do some stuff... #pragma #switch(COMPILER_VERSION) { RECENT_GCC: diagnostic pop; break; MSVC: warning(pop); break; } }That looks suspiciously much like how TeX does comments (and line-splicing).
OUTPS=${SRC:.tex=.ps}
OUTPDF=${SRC:.tex=.pdf}
OUTDVI=${SRC:.tex=.dvi}
And I should be able to comment out any of those lines without inadvertently concatenating them.
But I do see what you mean. /-8
I would like it if more languages had similar.