Lvalue min and max [entries|reading|network|archive]
simont

[ userinfo | dreamwidth userinfo ]
[ archive | journal archive ]

Sat 2019-03-09 10:34
Lvalue min and max

I was just reading a paper on gcd algorithms, and was struck by the following statement appearing in a snippet of pseudocode:

max{u, v} := |u-v| / 2;

in which the obviously intended meaning was ‘Replace the larger of u and v with the value computed on the RHS.’ In other words, the author is implicitly considering the max function to return an lvalue.

It had never occurred to me before that max could return an lvalue! If I'd wanted to express the same concept myself, I'd surely have either pre-sorted my variables, or written some tedious if statements out longhand, or else resorted to the notation of ‘arg max’, in which you replace your set of values to compare with some kind of mapping of indices to values (say, an array or hash), and where max(set) simply returns the largest element of the set, argmax(mapping) returns whatever index maps to the largest value. I wouldn't even have thought of writing it like this.

I think for pseudocode purposes, any of my other options would have been worse, because each one introduces extra bookkeeping or formalism that the reader has to mentally clear out of the way to get at the underlying concept. So I quite liked this notation for its pedagogical value – but at the same time, I recognised that you'd have to translate it back into one of those other less clear idioms as soon as you wanted to implement the algorithm in a real programming language.

Or do you? Now that I think about it, there's no reason at all why a programming language couldn't define a function like min or max to return an lvalue. In fact, in C++, a simple implementation is as easy as adding three & characters to the most obvious version:

// before
template<typename T>
T max(T u, T v) { return u>v ? u : v; }
// after
template<typename T>
T &max(T &u, T &v) { return u>v ? u : v; }

But for maximum flexibility, you'd surely want max to return an lvalue if possible, and fall back to an rvalue if not. You'd like to be able to write max(u,v) = w as the above pseudocode did, but you also still want to be able to write x = max(y,1000) without a compile error.

And that doesn't seem so easy in C++. The obvious approach is to have both of the above template definitions in scope at the same time. Overload resolution will use the by-value version in cases where only that one matches, and you hope it will manage to pick the reference version in cases where both match. But sadly, when I tried it, that hope turned out to be misplaced – the language defines no preference order between those two templates.

I wonder if any other languages allow you to get the lvalue/rvalue flexible version to work!

LinkReply
[personal profile] jackSat 2019-03-09 11:26
Everybody stand back, I know inadvisable preprocessor tricks!

#define max(a,b) (a>b?a:b)

Works, right? If both arguments are l-values, you can assign to it. If either isn't, you can't. Obviously with all the usual reasons why you don't use a define for that. But it doesn't work with a function?

Link Reply to this | Thread
[personal profile] simontSat 2019-03-09 11:35

Hmmm. I was going to say that the C ?: operator doesn't generate an lvalue, even if both arguments are. But I'd forgotten that it does in C++!

So yes, you're right: the same language in which I just failed to do this in the sensible way clearly can do this equivocation at some level. Hmmm.

(But then, it's also perfectly possible that there is some way to do it with templates, using one of those fiddly modern things like std::enable_if. I deliberately didn't say it was impossible, only that the obvious approach didn't work.)

Link Reply to this | Parent
[personal profile] jackSat 2019-03-09 11:31

Wait, what about:

template
T& max(T& u, T& v) { return u>v ? u : v; }

template
const T& max(const T& u, const T& v) { return u>v ? u : v; }

The compiler seems to figure that out, even though it couldn't decide between the reference version and the non-reference version. It did seem like the sort of thing that C++ would have, although sadly I'm not surprised it turns out to be finicky what combinations you can actually do.

Link Reply to this
[personal profile] fanfSat 2019-03-09 14:22

How do you disambiguate which lvalue to use when they are equal?

Link Reply to this | Thread
[personal profile] simontSat 2019-03-09 14:44

If the arguments are being presented as a list, you can use list index as the tie-breaking criterion. If they're in some unordered context like a multiset, why does it matter? :-)

Link Reply to this | Parent
[personal profile] crazyscotSat 2019-03-09 20:33

(edit: posted correct version of my scratch code)

Isn't this what rvalue references were made for?


#include <iostream>
#include <utility>

template<typename T>
T&& rvmax(T&& u, T&& v) { return std::forward<T>(u>v ? u : v); }

int main()
{
int var1 = 100;
int var2 = 9;
std::cout << "before: " << var1 << ',' << var2 << std::endl;
rvmax(var1,var2) = (var1-var2) / 2;
std::cout << "after: " << var1 << ',' << var2 << std::endl;
std::cout << "rvalue max: " << rvmax(100,9) << std::endl;
}


$./a.out
before: 100,9
after: 45,9
rvalue max: 100

Link Reply to this | Thread
[personal profile] simontSat 2019-03-09 21:34

Apparently you know more about std::forward than I do! I only know of it as the mystic rune you copy and paste when you want to pass the argument list of one function straight to another (usually in conjunction with a template parameter pack); I've never got my head round exactly what it does in detail.

I suppose that does put it in precisely this kind of headspace of 'preserve whatever lvalueosity you had as input', but I certainly couldn't have come up with this more sophisticated use of it.

That said, hmmm. This technique lets me give two variables as the arguments, or two integer literals, but something still goes wrong if I give it one of each:

$ cat one-of-each.cpp
#include <utility>

template<typename T>
T&& rvmax(T&& u, T&& v) { return std::forward<T>(u>v ? u : v); }

int f(int x) { return rvmax(x, 1000); }

$ g++ -c one-of-each.cpp
one-of-each.cpp: In function ‘int f(int)’:
one-of-each.cpp:6:36: error: no matching function for call to ‘rvmax(int&, int)’
 int f(int x) { return rvmax(x, 1000); }
                                    ^
one-of-each.cpp:4:5: note: candidate: template<class T> T&& rvmax(T&&, T&&)
 T&& rvmax(T&& u, T&& v) { return std::forward<T>(u>v ? u : v); }
     ^~~~~
one-of-each.cpp:4:5: note:   template argument deduction/substitution failed:
one-of-each.cpp:6:36: note:   deduced conflicting types for parameter ‘T’ (‘int&’ and ‘int’)
 int f(int x) { return rvmax(x, 1000); }
                                    ^
Link Reply to this | Parent | Thread
[personal profile] crazyscotSat 2019-03-09 22:34

std::forward<T>(x) is pretty much exactly "preserve the lvalueosity of x" - it is defined as static_cast<T&&>(x). https://stackoverflow.com/questions/3582001/advantages-of-using-forward goes into detail.

You need to be careful with mixing and matching rvalues and lvalues in this way. In the general case, what do you want to get out - an rvalue or an lvalue? If it's an lvalue, you'd better be passing in a pair of lvalues - you can't expect to be able to write to an rvalue.

If - as your function definition implies - you only wanted an rvalue, well, surely you only need to be passing by value in the first place? (Though you might be dealing with expensive objects and want to have std::move semantics available for efficiency, so you maybe want to support that with rvalue references.)

Messing around, I came to this excrescence. At the cost of writing the template code three times (wasn't that what templates were supposed to save us from?!) you can cope with mix-and-match inputs - but even then, assigning to something that might return an rvalue is quite rightly an error:


#include <iostream>
#include <utility>

template<typename T>
T&& rvmax(T& u, T&& v) { return std::forward<T>(u>v ? u : v); }

template<typename T>
T&& rvmax(T&& u, T& v) { return std::forward<T>(u>v ? u : v); }

template<typename T>
T&& rvmax(T&& u, T&& v) { return std::forward<T>(u>v ? u : v); }

int main()
{
int var1 = 100, var2 = 200;
int test = std::move(rvmax(101, var1));
int test2 = std::move(rvmax(var1,4242));
rvmax(var1, var2) = 999;
// rvmax(42, var1) = 999; // error: using xvalue (rvalue reference) as lvalue
std::cout << var1 << " " << var2 << " " << rvmax(199,var1) << std::endl;
}


Then, in a flash of inspiration, I realised that templates do in fact save the day if you decouple the type deductions. As long as your two inputs are the same type - or sufficiently compatible that '>' is defined and you can return one as if it was the other - you can replace the triply-defined template with this single one:

template<typename T, typename Tp>
T&& rvmax(T&& u, Tp&& v) { return std::forward<T>(u>v ? u : v); }

Link Reply to this | Parent | Thread
[personal profile] simontSun 2019-03-10 09:11

Then, in a flash of inspiration, I realised that templates do in fact save the day if you decouple the type deductions. As long as your two inputs are the same type - or sufficiently compatible that '>' is defined and you can return one as if it was the other - you can replace the triply-defined template with this single one:

Ah – and here is where my original wild guess about std::enable_if comes in, because if you wanted to enforce that those two types really are the same up to references, you can make it into

template<typename T, typename U,
         typename = typename std::enable_if<std::is_same<
             typename std::remove_reference<T>::type,
             typename std::remove_reference<U>::type>::value>::type>
T&& rvmax(T&& u, U&& v) { return std::forward<T>(u>v ? u : v); }

which looks ugly – not much chance of avoiding that in C++ templates – but at least semantically it's directly specifying exactly what I asked for.

But even so, I'm afraid, there's still something wrong. With or without my enable_if clause, I can't do this:

char g(char x) { return rvmax(x, 1); }

because the integer literal has type int, which isn't compatible with the char.

You need to be careful with mixing and matching rvalues and lvalues in this way. In the general case, what do you want to get out - an rvalue or an lvalue?

I think [personal profile] jack more or less nailed the requirements, when he observed that the naïve and dangerous macro implementation of max, for all its failings in other areas, actually does everything I want on the rvalue/lvalue axis. What I want is exactly the same semantics w.r.t. the two input values that the C++ version of the ternary operator would give: return a reference if possible, and if not (either because an input isn't an lvalue to begin with or because it stopped being one due to an implicit type conversion), fall back to returning by value. You might call it "opportunistic lvalue semantics".

Put another way: any invocation of max that would have worked before anyone even started thinking about lvalues should still compile and run successfully somehow, but now, the ones that can return an lvalue should.

In other words, what I really wanted was exactly the thing I tried for in the original post: simply provide both the by-value and the by-reference templates, and have the compiler pick the by-reference one whenever it can and fall back to the other one if it can't. But the problem was that if both templates are valid, the compiler complained rather than picking the one I wanted.

Aha – but now that I've conditioned the reference version with an enable_if, perhaps now I can achieve that by putting the explicit by-value version of the template back in, and conditioning it in the same way on exactly the inverse condition? Then the two templates can never both be valid, and I should get what I want in both cases.

In other words, if you use the above template with an enable_if, and adding this template alongside it, that seems to do what I wanted in every case I've so far checked:

template<typename T, typename U,
         typename = typename std::enable_if<!std::is_same<
             typename std::remove_reference<T>::type,
             typename std::remove_reference<U>::type>::value>::type>
auto rvmax(T u, U v) { return u > v ? u : v; }

although I did need to adjust it by changing the return type to auto, so as to return the same type that the heterogeneous ?: would have chosen given the same pair of input types.

I think the biggest remaining risk is that there might still be some class of input type pairs for which the wrong one of these templates will end up being selected. I can't think of one right now, but it seems very plausible that the next comment will point one out and I'll slap my forehead. But hopefully whatever it is can be worked around by adding further clauses to the two enable_if expressions.

(And even if we've finally got it right now, I don't think this discussion would really sell C++ to someone who wasn't already committed to it for a given project! :-)
Link Reply to this | Parent | Thread
[personal profile] simontSun 2019-03-10 09:29

so as to return the same type that the heterogeneous ?: would have chosen given the same pair of input types

Afterthought: for extra extra credit, you ideally want to select the output type for max and min between two different integer types in a way that guarantees the result type can hold the answer, which the default C++ integer promotions don't.

For example, consider giving an int8_t and a uint64_t to min(). You don't want the return type to be uint64_t, because if the int8_t had a negative value, it won't fit. On the other hand, any value of the uint64_t input that is greater than the largest possible int8_t cannot possibly be the return value anyway, because whatever was in the int8_t must have been smaller. So min(int8_t,uint64_t) can safely have return type int8_t, counter to all the C++ promotion rules!

I think the right answer is to choose whichever input type has a smaller minimum value, for min(), and whichever has a larger maximum value, for max().

I'm confident you can actually achieve all of this with another giant mess of enable_if, but I can't be bothered this margin is too small to contain it.

Link Reply to this | Parent
[identity profile] bjh21.me.ukSun 2019-03-10 10:29
Perl

The flexible version seems to work in Perl. The only oddity is a syntax I'd not seen before to define an lvalue subroutine:

sub max($$) : lvalue {
    $_[0] > $_[1] ? $_[0] : $_[1];
}

You can even do max($a, 10) = 20; and as you'll get an error iff $a > 10.

Link Reply to this | Thread
[identity profile] bjh21.me.ukSun 2019-03-10 10:31
Re: Perl

Gah. I meant “iff $a <= 10”, of course!

Link Reply to this | Parent
navigation
[ go | Previous Entry | Next Entry ]
[ add | to Memories ]