it looks naïvely as though the viewitems version would take O(n) time
The Python help says that if D is a dictionary then D.viewitems() returns "a set-like object providing a view on D's items", which seems to me like a clear hint that it's not going to the effort of physically constructing an actual Python set object but is instead providing a duck-typed thing that behaves like a set. So I would expect it to DTRT, performance-wise.
In C++, what you want is relatively straightforward:
It's straightforward if you're happy to use a whole statement and a variable declaration! All of the Python idioms I suggest above, including the two-clause one that seems to me most analogous to your C++ version, can be used in the middle of a larger expression without having to stop to declare local variables first.
Indeed, a quick test confirms that viewitems() answers queries of this type quickly as I'd expected. I just did this:
>>> d = dict([(i,i*i) for i in range(1000000)])
>>> (5,47) in d.viewitems()
False
>>> (5,47) in d.items()
False
and there was a delay of a second or so when constructing the dictionary in the first place, and when testing via items(), but testing via viewitems() was instant.
Yeah, I like C++, but that just seems to be a different way of expressing "if key in dictionary and dictionary[key] == value". FWIW, I think you can obviously choose to use a local variable in python, or to write out "dictionary.find(key)" twice in C++, depending whether you trust your optimiser to avoid doing double work if you write the code twice.
It seems what you actually want is an operator built into the language which does something like:
#define MACRO(lhs,rhs) (lhs) && (lhs rhs)
MACRO(dictionary[key],==value)
but without the ugliness of being a macro. But I don't know if any language does have that?
ROFL. Yeah, I guess you can. The limiting factor is that in most languages, you can't pass "==value" as an argument, but I guess in LISP that's just a function and then it works. It just has lots of brackets in :)
If any C++ compiler managed to optimise two calls to dictionary.find(key) down to one I'd be extraordinarily impressed. Even more so if it managed without complicity between compiler and STL.
Delving deeper, I've always felt the STL gets things significantly wrong with iterators. Rather than iterators, it should be dealing in ranges, things you can dereference, increment and test for truth. Amongst the many more major advantages, that would mean one could do:
if (auto r = dictionary.find(key))
if (r->second==value)
You're still doing two steps, but at least they nest and scope tidily!
If any C++ compiler managed to optimise two calls to dictionary.find(key) down to one I'd be extraordinarily impressed
Hm. I guess I thought "calling a member function with no side effects, doing a comparison, then calling the same function again with the same arguments" was one of the most obvious optimisations, but I guess (a) "no side effects" isn't obvious (if this is in a function, dictionary should probably be passed as a const reference, and then it will use a const version of find, assuming there is one, but I guess even then, if it's in a library, there's no way for the calling code to know it wasn't modifying some global or mutable state) and (b) and even if it were I know really, really little about what compilers actually do in practice, so you're probably right.
Also, the compiler doesn't know the structure find is traversing doesn't alias other objects (the key, for example). And it needs to pay attention to what side effects the compare functor might have (probably none, but not definitely none, without analysis).
Also, it's easy to write inherently efficient code that uses std::map, so compiler authors might not have seen optimising inefficient uses as a priority.
A quick with "gcc -S" shows you're right. I don't know if a whole program optimisation can do any better, although probably not.
I feel silly, because I've been generally trying to prioritise clarity over micro-optimisation, which normally means _not_ using an extra local variable (unless it makes it equally or more clear, as in some loops, or is necessary, as in cpu-bound inner loops) and now I wonder if I'm wrong.
I wonder, _should_ C++ be able to apply "const" somewhere to say "this function doesn't change any global or static state"? Would it help?
You can specify the const keyword when defining a class method, indicating that it's permissible to apply the method to a const instance of that class.
But the semantics aren't quite the same. It's easy to imagine functions that don't change any state and are safe to apply to a const object, but which nonetheless don't reliably return the same value if called twice. (For instance, a function that returns the number of nanoseconds until the object's expiry date, or one that returns the current contents of the file whose pathname is stored in the object. eta: or, come to think of it, a function that simply dereferences a pointer stored in the object! Much simpler, and no need to invoke the mysterious concurrency of the real world :-)
So you actually want a marker less like "const" and more like "pure", indicating that whatever the function does has no side effects and depends on only the obvious input values, so that the optimiser is permitted to assume that apparently identical calls to it are actually identical and fold them together.
const is a semantic contract, and an overloaded-function differentiator, which is more to ensure correctness for the benefit of user code than intended to be any help at all to the compiler.
Apart from overload resolution, I believe there is never any circumstance in which changing the constness of a function would affect how it compiled. In the very most extreme case, it might const_cast<MyType *>(this).
Conversely, except in the actually-quite-common case where polymorphic objects or function pointers impinge, the compiler could, in theory, recursively annotate functions with their purity. But take a look at just how deep the call stack gets when you say if (dictionary.count(key) && dictionary[key]==value), having regard to all the std::pair accessors, forwardings to the underlying red-black tree, std::less calls, etc. While determining purity and eliminating a richer set of common sub-expressions is conceptually possible, and might even be achievable in time better than O(n2) on program size, it's a tall order.
In many cases the STL manages to look like it ought to make what you want to do short and elegant (if not always comprehensible to the untrained eye), but on deeper inspection either the standard or all implementations manage to make it just ever so slightly the wrong side of impossible. (If it didn't get so close, it wouldn't be so irritating.)
if (std::count_if(C.equal_range(key), value)>0) ...
Except that equal_range returns a std::pair of iterators and the <algorithm> functions and equivalent container methods only take individual iterators, so you have to at least calculate the range as a temporary.
if (C.count(C::value_type(key, value))>0) ...
Except that count() (or just find()!) are only defined on key_type, not value_type.
This is another case in point for my claim up there that the STL should offer ranges instead of — or at least as well as — iterators.
Given range support, especially if containers themselves presented the range concept, the algorithms that currently accept a pair of iterators would all accept a range and your std::count_if example would work out of the box.
Now that C++11 is out there, I'm seriously considering putting some concrete proposals in for the next revision of C++, including ranges.
The Python help says that if
Dis a dictionary thenD.viewitems()returns "a set-like object providing a view onD's items", which seems to me like a clear hint that it's not going to the effort of physically constructing an actual Python set object but is instead providing a duck-typed thing that behaves like a set. So I would expect it to DTRT, performance-wise.In C++, what you want is relatively straightforward:
It's straightforward if you're happy to use a whole statement and a variable declaration! All of the Python idioms I suggest above, including the two-clause one that seems to me most analogous to your C++ version, can be used in the middle of a larger expression without having to stop to declare local variables first.
viewitems()answers queries of this type quickly as I'd expected. I just did this:and there was a delay of a second or so when constructing the dictionary in the first place, and when testing viaitems(), but testing viaviewitems()was instant.It seems what you actually want is an operator built into the language which does something like:
#define MACRO(lhs,rhs) (lhs) && (lhs rhs)
MACRO(dictionary[key],==value)
but without the ugliness of being a macro. But I don't know if any language does have that?
cockcook one up in Lisp :-)Delving deeper, I've always felt the STL gets things significantly wrong with iterators. Rather than iterators, it should be dealing in ranges, things you can dereference, increment and test for truth. Amongst the many more major advantages, that would mean one could do:
if (auto r = dictionary.find(key)) if (r->second==value)You're still doing two steps, but at least they nest and scope tidily!
Hm. I guess I thought "calling a member function with no side effects, doing a comparison, then calling the same function again with the same arguments" was one of the most obvious optimisations, but I guess (a) "no side effects" isn't obvious (if this is in a function, dictionary should probably be passed as a const reference, and then it will use a const version of find, assuming there is one, but I guess even then, if it's in a library, there's no way for the calling code to know it wasn't modifying some global or mutable state) and (b) and even if it were I know really, really little about what compilers actually do in practice, so you're probably right.
Also, it's easy to write inherently efficient code that uses std::map, so compiler authors might not have seen optimising inefficient uses as a priority.
I feel silly, because I've been generally trying to prioritise clarity over micro-optimisation, which normally means _not_ using an extra local variable (unless it makes it equally or more clear, as in some loops, or is necessary, as in cpu-bound inner loops) and now I wonder if I'm wrong.
I wonder, _should_ C++ be able to apply "const" somewhere to say "this function doesn't change any global or static state"? Would it help?
constkeyword when defining a class method, indicating that it's permissible to apply the method to aconstinstance of that class.But the semantics aren't quite the same. It's easy to imagine functions that don't change any state and are safe to apply to a const object, but which nonetheless don't reliably return the same value if called twice. (For instance, a function that returns the number of nanoseconds until the object's expiry date, or one that returns the current contents of the file whose pathname is stored in the object. eta: or, come to think of it, a function that simply dereferences a pointer stored in the object! Much simpler, and no need to invoke the mysterious concurrency of the real world :-)
So you actually want a marker less like "const" and more like "pure", indicating that whatever the function does has no side effects and depends on only the obvious input values, so that the optimiser is permitted to assume that apparently identical calls to it are actually identical and fold them together.
Apart from overload resolution, I believe there is never any circumstance in which changing the constness of a function would affect how it compiled. In the very most extreme case, it might const_cast<MyType *>(this).
Conversely, except in the actually-quite-common case where polymorphic objects or function pointers impinge, the compiler could, in theory, recursively annotate functions with their purity. But take a look at just how deep the call stack gets when you say if (dictionary.count(key) && dictionary[key]==value), having regard to all the std::pair accessors, forwardings to the underlying red-black tree, std::less calls, etc. While determining purity and eliminating a richer set of common sub-expressions is conceptually possible, and might even be achievable in time better than O(n2) on program size, it's a tall order.
if (std::count_if(C.equal_range(key), value)>0) ...
Except that equal_range returns a std::pair of iterators and the <algorithm> functions and equivalent container methods only take individual iterators, so you have to at least calculate the range as a temporary.
if (C.count(C::value_type(key, value))>0) ...
Except that count() (or just find()!) are only defined on key_type, not value_type.
Given range support, especially if containers themselves presented the range concept, the algorithms that currently accept a pair of iterators would all accept a range and your std::count_if example would work out of the box.
Now that C++11 is out there, I'm seriously considering putting some concrete proposals in for the next revision of C++, including ranges.