RepugNaNt

RepugNaNt

[entries|reading|network|archive]

simont

[ userinfo | dreamwidth userinfo ]

[ archive | journal archive ]

Wed 2011-12-07 15:45

RepugNaNt

Link

simont

Wed 2011-12-07 16:31

it looks naïvely as though the viewitems version would take O(n) time

The Python help says that if D is a dictionary then D.viewitems() returns "a set-like object providing a view on D's items", which seems to me like a clear hint that it's not going to the effort of physically constructing an actual Python set object but is instead providing a duck-typed thing that behaves like a set. So I would expect it to DTRT, performance-wise.

In C++, what you want is relatively straightforward:

It's straightforward if you're happy to use a whole statement and a variable declaration! All of the Python idioms I suggest above, including the two-clause one that seems to me most analogous to your C++ version, can be used in the middle of a larger expression without having to stop to declare local variables first.

Link	Reply to this \| Parent \| Thread

simont

Wed 2011-12-07 16:35

Indeed, a quick test confirms that viewitems() answers queries of this type quickly as I'd expected. I just did this:

>>> d = dict([(i,i*i) for i in range(1000000)])
>>> (5,47) in d.viewitems()
False
>>> (5,47) in d.items()
False

and there was a delay of a second or so when constructing the dictionary in the first place, and when testing via items(), but testing via viewitems() was instant.

Link	Reply to this \| Parent

cartesiandaemon.livejournal.com

Wed 2011-12-07 16:49

Yeah, I like C++, but that just seems to be a different way of expressing "if key in dictionary and dictionary[key] == value". FWIW, I think you can obviously choose to use a local variable in python, or to write out "dictionary.find(key)" twice in C++, depending whether you trust your optimiser to avoid doing double work if you write the code twice.

It seems what you actually want is an operator built into the language which does something like:

#define MACRO(lhs,rhs) (lhs) && (lhs rhs)

MACRO(dictionary[key],==value)

but without the ugliness of being a macro. But I don't know if any language does have that?

Link	Reply to this \| Parent \| Thread

simont

Wed 2011-12-07 16:54

I'm sure you could ~~cock~~ cook one up in Lisp :-)

Link	Reply to this \| Parent \| Thread

cartesiandaemon.livejournal.com

Wed 2011-12-07 17:05

ROFL. Yeah, I guess you can. The limiting factor is that in most languages, you can't pass "==value" as an argument, but I guess in LISP that's just a function and then it works. It just has lots of brackets in :)

Link	Reply to this \| Parent

ptc24.livejournal.com

Wed 2011-12-07 16:55

I'm sure that in lisp macros don't affect the level of ugliness.

Link	Reply to this \| Parent \| Thread

simont

Wed 2011-12-07 16:58

Beautifully put :-)

Link	Reply to this \| Parent

gerald_duck

Wed 2011-12-07 17:16

If any C++ compiler managed to optimise two calls to dictionary.find(key) down to one I'd be extraordinarily impressed. Even more so if it managed without complicity between compiler and STL.

Delving deeper, I've always felt the STL gets things significantly wrong with iterators. Rather than iterators, it should be dealing in ranges, things you can dereference, increment and test for truth. Amongst the many more major advantages, that would mean one could do:

if (auto r = dictionary.find(key))
    if (r->second==value)

You're still doing two steps, but at least they nest and scope tidily!

Link	Reply to this \| Parent \| Thread

cartesiandaemon.livejournal.com

Wed 2011-12-07 17:29

If any C++ compiler managed to optimise two calls to dictionary.find(key) down to one I'd be extraordinarily impressed

Hm. I guess I thought "calling a member function with no side effects, doing a comparison, then calling the same function again with the same arguments" was one of the most obvious optimisations, but I guess (a) "no side effects" isn't obvious (if this is in a function, dictionary should probably be passed as a const reference, and then it will use a const version of find, assuming there is one, but I guess even then, if it's in a library, there's no way for the calling code to know it wasn't modifying some global or mutable state) and (b) and even if it were I know really, really little about what compilers actually do in practice, so you're probably right.

Link	Reply to this \| Parent \| Thread

gerald_duck

Wed 2011-12-07 17:45

Also, the compiler doesn't know the structure find is traversing doesn't alias other objects (the key, for example). And it needs to pay attention to what side effects the compare functor might have (probably none, but not definitely none, without analysis).

Also, it's easy to write inherently efficient code that uses std::map, so compiler authors might not have seen optimising inefficient uses as a priority.

Link	Reply to this \| Parent \| Thread

cartesiandaemon.livejournal.com

Thu 2011-12-08 11:24

A quick with "gcc -S" shows you're right. I don't know if a whole program optimisation can do any better, although probably not.

I feel silly, because I've been generally trying to prioritise clarity over micro-optimisation, which normally means _not_ using an extra local variable (unless it makes it equally or more clear, as in some loops, or is necessary, as in cpu-bound inner loops) and now I wonder if I'm wrong.

I wonder, _should_ C++ be able to apply "const" somewhere to say "this function doesn't change any global or static state"? Would it help?

Link	Reply to this \| Parent \| Thread

simont

Thu 2011-12-08 11:30

You can specify the const keyword when defining a class method, indicating that it's permissible to apply the method to a const instance of that class.

But the semantics aren't quite the same. It's easy to imagine functions that don't change any state and are safe to apply to a const object, but which nonetheless don't reliably return the same value if called twice. (For instance, a function that returns the number of nanoseconds until the object's expiry date, or one that returns the current contents of the file whose pathname is stored in the object. eta: or, come to think of it, a function that simply dereferences a pointer stored in the object! Much simpler, and no need to invoke the mysterious concurrency of the real world :-)

So you actually want a marker less like "const" and more like "pure", indicating that whatever the function does has no side effects and depends on only the obvious input values, so that the optimiser is permitted to assume that apparently identical calls to it are actually identical and fold them together.

Link	Reply to this \| Parent

gerald_duck

Thu 2011-12-08 11:53

const is a semantic contract, and an overloaded-function differentiator, which is more to ensure correctness for the benefit of user code than intended to be any help at all to the compiler.

Apart from overload resolution, I believe there is never any circumstance in which changing the constness of a function would affect how it compiled. In the very most extreme case, it might const_cast<MyType *>(this).

Conversely, except in the actually-quite-common case where polymorphic objects or function pointers impinge, the compiler could, in theory, recursively annotate functions with their purity. But take a look at just how deep the call stack gets when you say if (dictionary.count(key) && dictionary[key]==value), having regard to all the std::pair accessors, forwardings to the underlying red-black tree, std::less calls, etc. While determining purity and eliminating a richer set of common sub-expressions is conceptually possible, and might even be achievable in time better than O(n²) on program size, it's a tall order.

Link	Reply to this \| Parent

deliberateblank.livejournal.com

Wed 2011-12-07 20:07

In many cases the STL manages to look like it ought to make what you want to do short and elegant (if not always comprehensible to the untrained eye), but on deeper inspection either the standard or all implementations manage to make it just ever so slightly the wrong side of impossible. (If it didn't get so close, it wouldn't be so irritating.)

if (std::count_if(C.equal_range(key), value)>0) ...

Except that equal_range returns a std::pair of iterators and the <algorithm> functions and equivalent container methods only take individual iterators, so you have to at least calculate the range as a temporary.

if (C.count(C::value_type(key, value))>0) ...

Except that count() (or just find()!) are only defined on key_type, not value_type.

Link	Reply to this \| Parent \| Thread

gerald_duck

Thu 2011-12-08 11:31

This is another case in point for my claim up there that the STL should offer ranges instead of — or at least as well as — iterators.

Given range support, especially if containers themselves presented the range concept, the algorithms that currently accept a pair of iterators would all accept a range and your std::count_if example would work out of the box.

Now that C++11 is out there, I'm seriously considering putting some concrete proposals in for the next revision of C++, including ranges.

Link	Reply to this \| Parent

navigation

[	go	\|	Previous Entry \| Next Entry	]
[	add	\|	to Memories	]