simont: A picture of me in 2016 (Default)
simont ([personal profile] simont) wrote2011-12-07 03:45 pm

RepugNaNt

I had a disgusting thought just now. I needed to have some Python code check whether a dictionary contains a particular key/value pair. The obvious approach,

if dictionary[key] == value:

isn't sufficient, because I need to not throw an exception if the dictionary doesn't contain any mapping for that key at all.

An obvious way is to use a second clause to handle the latter case, so you end up writing

if key in dictionary and dictionary[key] == value:

But that's unpleasantly verbose, and also undesirable if dictionary or key is a complicated expression or one with side effects. So I wondered if we could do better.

Another possibility is to use the get() method of Python dictionaries, which lets you specify a default return value if the dictionary hasn't got the key in question. So you might say, for instance,

if dictionary.get(key, None) == value:

But in order to be able to do that, you have to be able to think of a default that will guarantee to compare unequal to value; for instance here, what if the value is None? If you know something about value (e.g. its type) then this is probably feasible one way or another, but one can imagine more general situations in which that wouldn't be the case.

I think the right answer, in fact, is to write

if (key, value) in dictionary.viewitems():

where viewitems() is a standard method which presents the dictionary as if it were a set of ordered pairs. (You could also use dictionary.items(), which would be semantically equivalent, but much slower due to constructing an explicit list and then iterating along it.)

But before I found the viewitems() method, I thought a bit harder about the get() approach. Perhaps, I thought, I could define a class with a comparison method that always returned false, so that an instance of that class would compare unequal to anything, including itself.

And then, as I thought the phrase ‘compare unequal to anything, including itself’, a stray neuron fired in my head. I think this would work pretty reliably:

if dictionary.get(key, float('nan')) == value:

and its only downside is that it is the most disgusting thing ever! :-)

ptc24: (Default)

[personal profile] ptc24 2011-12-07 04:11 pm (UTC)(link)
Ewww...

Especially the way you have to say float('nan') - surely there's a better way of getting hold of NaN??

[identity profile] cartesiandaemon.livejournal.com 2011-12-07 04:14 pm (UTC)(link)
*thinks* Now that you mention it, if someone calls your hypothetical function with (key, NaN) is the desired behaviour to return false under all circumstances, or to test for NaN? :)
gerald_duck: (frontal)

[personal profile] gerald_duck 2011-12-07 04:16 pm (UTC)(link)
I'm no Python expert, but it looks naïvely as though the viewitems version would take O(n) time where the others took O(log n)? Or does Python optimise?

In C++, what you want is relatively straightforward:
auto i = dictionary.find(key);
if (i!=dictionary.end() && *i==value) {
I assume Python has nothing similar to C++ STL's iterators to help you?
gerald_duck: (ascii)

[personal profile] gerald_duck 2011-12-07 04:20 pm (UTC)(link)
…except strictly it's probably i->second==value rather than *i==value, because dictionary will probably be an associative container.

…and you'd have to name the iterator's type explicitly before C++11.

…and in C++ you might be using an associative container that can map the same key to multiple values, in which case you'd need to know whether you wanted all entries with that key to have the specified value or at least one.

[identity profile] ptc24.livejournal.com 2011-12-07 04:39 pm (UTC)(link)
http://www.python.org/dev/peps/pep-3106/ (http://www.python.org/dev/peps/pep-3106/) has "pseudo-code to define the semantics":

class d_items:

    def __init__(self, d):
        self.__d = d

    def __len__(self):
        return len(self.__d)

    def __contains__(self, (key, value)):
        return key in self.__d and self.__d[key] == value

    (etc.)


which suggests that that sort of optimisation (if you can really call it that) is what is in mind.

[identity profile] cartesiandaemon.livejournal.com 2011-12-07 04:49 pm (UTC)(link)
Yeah, I like C++, but that just seems to be a different way of expressing "if key in dictionary and dictionary[key] == value". FWIW, I think you can obviously choose to use a local variable in python, or to write out "dictionary.find(key)" twice in C++, depending whether you trust your optimiser to avoid doing double work if you write the code twice.

It seems what you actually want is an operator built into the language which does something like:

#define MACRO(lhs,rhs) (lhs) && (lhs rhs)

MACRO(dictionary[key],==value)

but without the ugliness of being a macro. But I don't know if any language does have that?

[identity profile] cartesiandaemon.livejournal.com 2011-12-07 05:05 pm (UTC)(link)
ROFL. Yeah, I guess you can. The limiting factor is that in most languages, you can't pass "==value" as an argument, but I guess in LISP that's just a function and then it works. It just has lots of brackets in :)

[identity profile] ptc24.livejournal.com 2011-12-07 04:55 pm (UTC)(link)
I'm sure that in lisp macros don't affect the level of ugliness.
gerald_duck: (penelope)

[personal profile] gerald_duck 2011-12-07 05:16 pm (UTC)(link)
If any C++ compiler managed to optimise two calls to dictionary.find(key) down to one I'd be extraordinarily impressed. Even more so if it managed without complicity between compiler and STL.

Delving deeper, I've always felt the STL gets things significantly wrong with iterators. Rather than iterators, it should be dealing in ranges, things you can dereference, increment and test for truth. Amongst the many more major advantages, that would mean one could do:

if (auto r = dictionary.find(key))
    if (r->second==value)

You're still doing two steps, but at least they nest and scope tidily!

[identity profile] cartesiandaemon.livejournal.com 2011-12-07 05:29 pm (UTC)(link)
If any C++ compiler managed to optimise two calls to dictionary.find(key) down to one I'd be extraordinarily impressed

Hm. I guess I thought "calling a member function with no side effects, doing a comparison, then calling the same function again with the same arguments" was one of the most obvious optimisations, but I guess (a) "no side effects" isn't obvious (if this is in a function, dictionary should probably be passed as a const reference, and then it will use a const version of find, assuming there is one, but I guess even then, if it's in a library, there's no way for the calling code to know it wasn't modifying some global or mutable state) and (b) and even if it were I know really, really little about what compilers actually do in practice, so you're probably right.
gerald_duck: (Duck of Doom)

[personal profile] gerald_duck 2011-12-07 05:45 pm (UTC)(link)
Also, the compiler doesn't know the structure find is traversing doesn't alias other objects (the key, for example). And it needs to pay attention to what side effects the compare functor might have (probably none, but not definitely none, without analysis).

Also, it's easy to write inherently efficient code that uses std::map, so compiler authors might not have seen optimising inefficient uses as a priority.

[identity profile] cartesiandaemon.livejournal.com 2011-12-08 11:24 am (UTC)(link)
A quick with "gcc -S" shows you're right. I don't know if a whole program optimisation can do any better, although probably not.

I feel silly, because I've been generally trying to prioritise clarity over micro-optimisation, which normally means _not_ using an extra local variable (unless it makes it equally or more clear, as in some loops, or is necessary, as in cpu-bound inner loops) and now I wonder if I'm wrong.

I wonder, _should_ C++ be able to apply "const" somewhere to say "this function doesn't change any global or static state"? Would it help?
gerald_duck: (duck and computer)

[personal profile] gerald_duck 2011-12-08 11:53 am (UTC)(link)
const is a semantic contract, and an overloaded-function differentiator, which is more to ensure correctness for the benefit of user code than intended to be any help at all to the compiler.

Apart from overload resolution, I believe there is never any circumstance in which changing the constness of a function would affect how it compiled. In the very most extreme case, it might const_cast<MyType *>(this).

Conversely, except in the actually-quite-common case where polymorphic objects or function pointers impinge, the compiler could, in theory, recursively annotate functions with their purity. But take a look at just how deep the call stack gets when you say if (dictionary.count(key) && dictionary[key]==value), having regard to all the std::pair accessors, forwardings to the underlying red-black tree, std::less calls, etc. While determining purity and eliminating a richer set of common sub-expressions is conceptually possible, and might even be achievable in time better than O(n2) on program size, it's a tall order.

[identity profile] deliberateblank.livejournal.com 2011-12-07 08:07 pm (UTC)(link)
In many cases the STL manages to look like it ought to make what you want to do short and elegant (if not always comprehensible to the untrained eye), but on deeper inspection either the standard or all implementations manage to make it just ever so slightly the wrong side of impossible. (If it didn't get so close, it wouldn't be so irritating.)

if (std::count_if(C.equal_range(key), value)>0) ...

Except that equal_range returns a std::pair of iterators and the <algorithm> functions and equivalent container methods only take individual iterators, so you have to at least calculate the range as a temporary.

if (C.count(C::value_type(key, value))>0) ...

Except that count() (or just find()!) are only defined on key_type, not value_type.
gerald_duck: (ascii)

[personal profile] gerald_duck 2011-12-08 11:31 am (UTC)(link)
This is another case in point for my claim up there that the STL should offer ranges instead of — or at least as well as — iterators.

Given range support, especially if containers themselves presented the range concept, the algorithms that currently accept a pair of iterators would all accept a range and your std::count_if example would work out of the box.

Now that C++11 is out there, I'm seriously considering putting some concrete proposals in for the next revision of C++, including ranges.
fanf: (Default)

[personal profile] fanf 2011-12-07 04:45 pm (UTC)(link)
if dictionary.get(key, not value) == value:

[identity profile] cartesiandaemon.livejournal.com 2011-12-07 05:03 pm (UTC)(link)
Yeah. I considered that, if the function is parameterised by the type of value, you could have a meta-type, but I didn't expect to think of anything neat. Thinking about it, the tricky (but not very efficient) way would be to have a vector (or any other container type) containing value (assuming python doesn't support vectors nested infinitely deep :))

But "not" is very good. Are there any weird edge cases? :)
fanf: (Default)

[personal profile] fanf 2011-12-07 05:08 pm (UTC)(link)
It doesn't fix the NaN <> NaN problem :-)

[identity profile] ptc24.livejournal.com 2011-12-07 05:13 pm (UTC)(link)
Not _quite_ equivalent to what simont had originally:
>>> class foo:
...   def __cmp__(self, other):
...     if other is True: return 0
...     if other is False: return 0
...     return 1
... 
>>> f = foo()
>>> g = foo()
>>> f == g
False
>>> d = {}
>>> d[5] = f
>>> d.get(6, not f) == f
True
>>> 6 in d and d[6] == f
False

Note cmp is -1,0,1 for less than/equal/greater than.
fanf: (Default)

[personal profile] fanf 2011-12-07 05:46 pm (UTC)(link)
If you abuse semantics like that you deserve to lose :-)

[identity profile] geekette8.livejournal.com 2011-12-07 05:07 pm (UTC)(link)
I may be missing something, but what is wrong with:

try:
  if dictionary[key] == value:
    # do stuff
  else:
    # do other stuff
except:
  # do different stuff

?

[identity profile] fluffyrichard.livejournal.com 2011-12-22 07:27 am (UTC)(link)
Actually, that's close to what I'd probably write (though not always; context is everything, and I'd do "except KeyError" not bare except, of course). Though the reason for that is that I wouldn't call the variable tmpvar; I'd give it a meaningful name, and think of it as an opportunity to write more self-documenting code.

I don't think I've ever come across this in practice, though; normally, I just work with dicts which either don't contain None, or where None would be expected to be treated the same as the entry being missing.
emperor: (Default)

[personal profile] emperor 2011-12-07 06:03 pm (UTC)(link)
That should be "except KeyError:" :p

[sorry, "except:" is one of my pet python hates]
andrewducker: (Default)

[personal profile] andrewducker 2011-12-07 10:28 pm (UTC)(link)
Why not just create a function that does the nasty multi-line thing, and give it a nice interface, rather than trying to create something horrific that you can put inline?
andrewducker: (Default)

[personal profile] andrewducker 2011-12-07 11:21 pm (UTC)(link)
It doesn't read naturally to me,and tell me what it's doing. I'd rather end up with code like:
if dictionary.itemequals(key,value):
but that may just be my lack of familiary with Python showing. If the former code is idiomatic Python then go for it.
andrewducker: (Default)

[personal profile] andrewducker 2011-12-08 08:42 am (UTC)(link)
My brain clicked this morning, and suddenly the viewitems seems much more natural. I was clearly too tired to be looking at code when I commented last night.
ext_3375: Banded Tussock (Default)

[identity profile] hairyears.livejournal.com 2011-12-07 10:52 pm (UTC)(link)
It's a rare and precious thing for me to see 'a proper language' doing something even nastier than my daily fare of VBA.
sparrowsion: photo of male house sparrow (string-handling kitten)

[personal profile] sparrowsion 2011-12-08 10:11 am (UTC)(link)
Standard Python idiom for creating unique sentinel values is to create an object() just for the job (since if you try to == it will fall back to is). So:
if dictionary.get(key, object()) == value:
gets you the same solution without NaN abuse. And remember that objects are pretty lightweight—they don't even have a __dict__.

[identity profile] ptc24.livejournal.com 2011-12-08 04:30 pm (UTC)(link)
Again, vulnerable to stupid classes such as:

>>> class bar:
...   def __cmp__(self, other):
...     return 0


but as [livejournal.com profile] fanf says, if you do that, you deserve to lose.
sparrowsion: photo of male house sparrow (string-handling kitten)

[personal profile] sparrowsion 2011-12-08 05:49 pm (UTC)(link)
If value can be a bar then I think the whole concept is an intractable problem. (Unless you know about bar and special case it, and given that the original issue was how to do this elegantly, that fails heavily.)