simont: A picture of me in 2016 (Default)
simont ([personal profile] simont) wrote2011-03-17 10:23 am

C abuse of the week

A while back I wanted to write some C code that looped over every element in a collection. But I wanted the code to be parametrisable in the face of different implementations of that collection, because it might have to link against several other pieces of code which might all store the things in question in a different format.

So I thought I'd do this:

    LOOP_OVER_ALL_ELEPHANTS(elephant) {
/* loop body executed once for each elephant */
}

And then each implementation would define the macro LOOP_OVER_ALL_ELEPHANTS appropriately for how it happened to be storing its elephants. If it was an array, the macro would expand to

    for (elephant = array; elephant < array + len; elephant++)

and if it was a linked list, it would expand to

    for (elephant = head; elephant != NULL; elephant = elephant->next)

and I presumed that any more complicated thing like a C++ STL collection would have some kind of similar idiom available.

Unfortunately, the very first implementation I tried to test against turned out to be storing its elephants in a circularly linked list – and you can't iterate over that in a for statement, because the start condition is the same as the end condition. If you write the apparently obvious adaptation of the linear-list code,

    for (elephant = head; elephant != head; elephant = elephant->next)

then the loop terminates after zero iterations and you look foolish.

I solved the problem by rewriting my loop in the form

    LOOP_OVER_ALL_ELEPHANTS(elephant) {
/* loop body executed once for each elephant */
} END_LOOP_OVER_ALL_ELEPHANTS(elephant)

which permitted the porting code to define a pair of macros containing the two ends of a do-while, so as to defer the first loop termination test until after the first run of the loop body. This felt inelegant and cumbersome, and I found myself wishing for a more metaprogrammable language to work in.

But this morning, I do believe I've solved it. You can iterate over a circularly linked list using a construction that can be encapsulated into my original macro, requiring no loop machinery after the single closing brace. It's strictly legal standard C as far as I can see, and it goes like this:

    switch (elephant = head, 0)
while (elephant = elephant->next, elephant != head)
case 0:

Then you can follow that with a single braced block (or even an unbraced statement), and everything works – including continue. So I could have put that lot into my macro, and it would have been fine!

More generally, this construction permits a sort of variant form of the for statement, which does its test after the loop body rather than before. You could almost define it as a macro in its own right:

#define for_after(init, condition, increment) \
switch(init, 0) while (increment, condition) case 0:

Of course that isn't quite syntactically right, because the three clauses of the for are separated by commas rather than semicolons, and can't contain initialisers in the way that a proper for can. Also there's the unusual constraint that the body of such a statement cannot contain case statements intended to bind to switches outside the loop (so you can't combine a loop of this form with Duff's Device).

But it's closer than I thought I'd be able to get!

eta: there's a bug in the switch-based solution above. See my follow-up post for details.

crazyscot: Selfie, with C, in front of an alpine lake (Default)

[personal profile] crazyscot 2011-03-17 10:46 am (UTC)(link)
*applause*
andrewducker: (Default)

[personal profile] andrewducker 2011-03-17 10:48 am (UTC)(link)
I just wrote a few paragraphs on why this kind of code would make me tear my hair out.

And then realised that you were working in C, not C++, and thus objects and interfaces were not an option for you!

And that made me remember why I like OOP so much.
andrewducker: (Default)

[personal profile] andrewducker 2011-03-17 11:03 am (UTC)(link)
Yeah - this is much easier in C#, where all of the built in collection types support IEnumerable, and if you were creating a new collection type then not supporting IEnumerable would be a bit silly.

(And yes, I know that C# isn't suitable for all sorts of things.)
andrewducker: (Default)

[personal profile] andrewducker 2011-03-17 11:20 am (UTC)(link)
Yup. Array implements IEnumerable and IEnumerable<>. I think it does some jiggery-pokery under the covers to do so, but I've never dug into how they actually work. I would have suspected they were significantly slower than pure C++, but it seems that I am incorrect:
http://www.mightyware.com/cpp_vs_csharp_arrays.bhs

[identity profile] plasmasturm.org (from livejournal.com) 2011-03-17 04:54 pm (UTC)(link)

Which is why I like functional programming so much.

;-)

[identity profile] plasmasturm.org (from livejournal.com) 2011-03-17 04:56 pm (UTC)(link)

Or heck, I’ll throw Forth into the ring too. =)

gerald_duck: (Duckula)

[personal profile] gerald_duck 2011-03-17 05:29 pm (UTC)(link)
Nonono! C++ beats C for this sort of thing, but not because of objects and interfaces. You don't want to sully a simple linked-list traversal with run-time polymorphism, abstract interfaces and virtual function calls!

As I said below, the C++ feature that's actually useful here is generic programming and concept conformance. (Function overloading is indispensable to implementation of the forthcoming C++0x idiom.) Nothing about the idiom there (pedantry: except using ++i to increment the iterator and i==j to compare, rather than inc(i) and equal(i,j)) uses object-orientation at all.
andrewducker: (Default)

[personal profile] andrewducker 2011-03-18 04:28 pm (UTC)(link)
Ok - not being a C++ person, how does it know what kind of iterator to assign to a particular collection type, and how do you write new ones?
gerald_duck: (choccyduck)

[personal profile] gerald_duck 2011-03-18 07:35 pm (UTC)(link)
By convention (in the Standard Template Library, originally designed by SGI but then standardised), if you have a container type C then you expect C::iterator to be an iterator over it that lets you modify the contained values, C::const_iterator to be one that only lets you read the contained values. Maybe C::iterator is the actual declared type; maybe it's a suitable typedef.

Also, by convention, if you have a container c, you expect c.begin() and c.end() to give you, respectively, an iterator pointing at the first item in the container and an iterator pointing immediately beyond the last item.

This latter convention is being firmly embedded in the C++0x standard that's due out any day now: when someone says for (int &x: myContainer) that is shorthand expressing a for loop between myContainer.begin() and myContainer.end(). (It doesn't actually care what the type returned by those functions is, provided thy both return the same type and it can be incremented, compared and dereferenced.)

This is achieved by container authors implementing iterators for the container types they create. Indeed, in most cases, creating the iterators is the very meat of container implementation and much else gets expressed in terms of them, at least implicitly.

As mentioned, iterators don't exhibit an interface in the Java sense: there's no run-time polymorphism. An iterator is, broadly, any type for which you can compare values for equality, increment values and dereference values (i.e. apply prefix * to them). Given those features, you can write generic code that works on any kind of iterator. (In reality that's approximately the input iterator concept — there are other concepts both more and less stringent, as well as regrettable gotchas, but the power of the system more than compensates in practice.)
andrewducker: (Default)

[personal profile] andrewducker 2011-03-18 09:46 pm (UTC)(link)
I assume that there's a reason for having a convention rather than an actual contract. It seems odd to me that you wouldn't specify that "I am a container, and do the things that containers do" rather than just implementing the necessary methods and _not_ declaring that you're doing so.
gerald_duck: (mallard)

[personal profile] gerald_duck 2011-03-18 10:59 pm (UTC)(link)
The "input iterator" concept is just a useful set of features, coupled with the observation that if some classes implement all those features and some algorithms confine themselves to requiring just those features then they'll get along usefully together.

Unfortunately, one of the wrinkles is the key distinction between an input iterator and a forward iterator: you can copy a forward iterator and use the copy later to make a second pass through the data, whereas an input iterator is once-only (compare traversing a linked list with reading lines from a TCP socket). This distinction between contexts can't be directly enforced by the compiler. There is an iterator_category mechanism to get around the problem, but it's clunky.

There is also a proposal to extend the language with explicit concepts, but it's typical Stroustrup filth and was rightly rejected from C++0x. In due course, someone will come up with something nice to replace it, I hope.

[identity profile] atheorist.livejournal.com 2011-03-17 03:07 pm (UTC)(link)
Do you prefer the switch to a flag like this?

int specialfirsttimeflag;
for (elephant = head, specialfirsttimeflag= 1;
elephant != head && specialfirsttimeflag--;
elephant = elephant->next) {

Or you can't because it might conflict with a name outside the macro?

[identity profile] cartesiandaemon.livejournal.com 2011-03-17 09:13 pm (UTC)(link)
I was trying an alternative solution. I think you may be able to fix the problems this approach has with not quite being semantically identical to a "for" block, but I couldn't overcome the restrictions on not being able to declare another variable.

You could possibly wrap up the "compare elephant and head, except the first time, then just return true" logic into a function with a static variable in, but it would mean the code would be definitely non-reentrant. (Unless your preprocessor did something clever with the LINE directive in the macro, but I can't remember, that wasn't in C90 either, right?)

[identity profile] cartesiandaemon.livejournal.com 2011-03-18 10:48 am (UTC)(link)
:)

Thank you. I don't think I quite thought through what i did want. But i think that where I'm going is that if you want to use a function to represent the "compare el against head, except ignore the first and every second match" logic instead of a simple local var, it needs to identify when its part of the same loop and when it isn't, which it might be able to do with a combination of (a) using some thead local storage or thread id to maintain when the next call is from the same thread and (b) using a line directive to determine when its called from different (nested) places in the same file and thread.

Your solution is probably right because it uses syntax to resolve a problem with insufficiently expressive syntax. Whereas mine (if it worked) would be how to solve a problem if the constraints here were logical parts of the program for some reason.

I agree it's horrendous, that's why i recommend using a language with convenient local variables, fewer macros, etc, etc :)
gerald_duck: (ascii)

[personal profile] gerald_duck 2011-03-17 05:24 pm (UTC)(link)
Seriously, seriously, I know there's lots to hate in C++, starting with Bjarne Stroustrup having been born and working through to C++0x's lambda function syntax, but generic programming is one thing it gets more right than any other language I've seen.

The STL iterator concepts are a well thought out framework for generic programming.

You can write code like this:

for (Elephant::iterator i=elephant.begin(); i!=elephant.end(); ++i) {
    // loop body executed once for each elephant
}


…and it works regardless of the type of elephant. In C++0x, it gets even simpler:

for (auto &e: elephant) {
    // loop body executed once for each elephant
}


The iterator for each container type deals with the specifics of stepping through the elements of an array, a linked list, a circularly linked list, a balanced tree, a hash table, a readdir(), a circular buffer, the characters of a UTF-8 string… whatever. Because the code is generic rather than run-time polymorphic, the optimiser is told precisely what's going on for a particular use of the idiom, gets to work on the code as an inlined entirety and compiles it efficiently.

What's more, unlike a macro your generic code gets syntax checked up-front, even if full semantic checking isn't possible until it knows what type it's going to get used on.

It's things like this that make me think C++ is a strictly superior language to C, these days. And these days you can code in C++ for a $10 consumer device, so compiler unavailability is almost a non-issue. C++ code even compiles on Solaris for crying out loud!

[identity profile] nunfetishist.livejournal.com 2011-03-17 06:31 pm (UTC)(link)
Actually, C++ is a real ballache on a $3 MCU :)
gerald_duck: (Oh really?)

[personal profile] gerald_duck 2011-03-18 07:45 pm (UTC)(link)
Howso? What do you need for C++ that you don't need for C? OK, there's plenty it's nice to have for C++ — a terminate handler, some typeinfo structures, perhaps a library of two — but none are essential, surely?

[identity profile] cartesiandaemon.livejournal.com 2011-03-17 07:20 pm (UTC)(link)
Yeah, this.

(Although I'm unsurprised Simon's effectively unable to change the codebase, to me something like this is a BIG FLAG that switching to a more expressive language is long overdue.)

[identity profile] cartesiandaemon.livejournal.com 2011-03-17 07:23 pm (UTC)(link)
I'm quite impressed.

Although honestly, while I love the tweaks, I think for all practical purposes, making it clear that LOOP_OVER_ALL_ELEPHANTS can't substitute in as an arbitrary statement is fine, it's clearly magic, you shouldn't TRY to use it in a more sophisticated context. (Also, pretty much always put braces after ifs and loops, unless the loop is a single short statement on the same line as the if, and even then I very rarely find I want to.)