Adages [entries|reading|network|archive]
simont

[ userinfo | dreamwidth userinfo ]
[ archive | journal archive ]

Fri 2013-07-12 09:41
Adages
LinkReply
[personal profile] simontFri 2013-07-12 12:36
  • Multi-paradigm is good, Do Everything My One True Way is bad. Some things are OO-shaped, some functional-shaped, some plain procedural, and you want to be able to switch back and forth within different parts of the same program without undue faff.
  • Compile to nice fast native code, or at the very least, don't design any language so that you can't sensibly do that. Requiring users to wait for Moore's Law is a mug's game.
  • Complete and correct bindings to all OS APIs. This is a major thing that keeps me using C and occasionally C++ in spite of their many flaws: whenever you use a different language, you sooner or later end up having to do some OS interaction that that language's abstraction didn't bother to include. But it's not really a language feature as such; I think what I really want is for all OS maintainers to publish cross-language specs for their APIs so that every language can auto-process them into its own appropriate form.
  • Define the language semantics better than C/C++. Undefined behaviour can bite you in so many different ways, and you have to know all the gotchas to reliably avoid it. Each language feature should be safely usable without having to be an expert in the rest of the language first.
  • Check as much as possible at compile time; in Python it's really annoying to find a code path your tests didn't encounter crashing in the field due to a typo or type-o that static type checking would have picked up instantly.
  • Make it possible to extend the language in a metaprogramming-type way, but try to do better than C++ at making the simple things simple, and try to avoid the need for puzzle games (which C++ is no better at doing than C – the CRTP is a good example).
  • Managed languages (GC, bounds checking etc) versus non-managed (C/C++ with feet firmly in the line of fire at all times) I'm not sure about; sometimes I think what I'd really like is a hybrid (e.g. GC with an optional explicit free operation which causes an allocated thing to become instantly invalid, so that refs from it become GCable in turn and refs to it throw a StalePointerException, and also I want a feature to let me conveniently treat a big byte array as an unmanaged sandbox), and other times what I think I'd really like is a means of writing unmanaged code which includes a static compile-time proof that you don't overrun any array etc. Probably neither is actually feasible.
  • I'm vaguely intrigued by aspect-oriented programming, in that it seems to be a reaction to a thing that's annoyed me a lot, but I've never actually tried it so who knows if it would work well in practice. I think ideally I'd just want a metaprogramming facility strong enough to let me do it myself if I happened to feel like it in a particular program.
  • Extending 'define the semantics clearly' above, a slightly silly idea that I can't quite convince myself is actually wrong would be to specify that all intermediate values in integer arithmetic expressions have the semantics of true mathematical integers, to the extent that the runtime will link in GMP or equivalent if it really has to (which I wanted available anyway, since arbitrary-precision integers are a really nice facility to have available and the best thing about Python is that it provides them as a native type). Narrowing should occur only as a result of an assignment or explicit cast. If you need speed, you can request a compiler warning when the expression you've written requires GMP to code-generate, and then manually insert casts to eliminate the need; then you'll be in control of where the casts go and won't be startled by them.
  • My biggest blue-sky wish is that I'd like a built-in language feature to construct pieces of code on the fly and JIT them into native code, so you could implement all sorts of user-scripting of things in a way that was actually reasonably efficient, and also increase the use of the approach "first laboriously work out exactly what you want to do, and then do it in a loop a zillion times really fast".
  • AND A PONY. There, see what I mean?
Link Reply to this | Parent | Thread
[identity profile] cartesiandaemon.livejournal.comFri 2013-07-12 15:02
Yeah, that seems like a good list :)

I was about to say, it's possibly surprising more languages aren't designed to compile to C, and then let the C compiler writers do all the heavy lifting of making them fast on every platform. But I guess I've just reinvented Java, oops :)

hybrid eg. GC with an optional explicit free operation which causes an allocated thing to become instantly invalid

Yeah, except I always think the of reverse: it seems most variables go out of scope at fairly clearly defined places, and I've grown very fond of having RAII work the C++ way (always) rather than the C# way (every time you use a IDisposable class, you can be leak-safe and exception-safe, provided you remember to add the "don't leak this" boilerplate). But it would be nice to have a garbage collector for things left over. Although I suppose the GC needs to be aware of any other variables which might still hold a pointer to them.
Link Reply to this | Parent | Thread
[personal profile] simontFri 2013-07-12 15:12
Compiling to C: yes, I've had that thought too. It's certainly attractive as a means of writing an initial implementation of an experimental language or one you aren't sure will see wide uptake yet, because as you say, you can get all the portability of C (up to OS API integration) and its performance too by using existing compilers. I think where it falls down worst is debugging – it'd be almost impossible to take a language compiled like that and get sensible integration with gdb or your other debugger of choice, and sooner or later that will become a serious pain. So sooner or later you'll want your language to have an end-to-end compiling and debugging story.

provided you remember to add the "don't leak this" boilerplate

Argh, yes, dammit, any time you find that you always have to remember to add the "work properly" option it's a sign that someone has cocked something up. My usual example of that is find -print0 or grep -Z piped into xargs -0.
Link Reply to this | Parent
[personal profile] andrewduckerFri 2013-07-12 16:25
C# + bolt-ons seems to cover a lot of this. Although bits of it will definitely end up puzzle-like :->
Link Reply to this | Parent
[identity profile] strawberryfrog.livejournal.comFri 2013-07-12 16:32
"My biggest blue-sky wish is that I'd like a built-in language feature to construct pieces of code on the fly"

Th next version of .Net should have something like this - http://en.wikipedia.org/wiki/Microsoft_Roslyn
Link Reply to this | Parent | Thread
[personal profile] simontMon 2013-07-15 10:54
Hmm. That looks interesting, though not quite the same as what I had in mind. The sort of thing I was thinking of might look something like:
    array-of-double coefficients; // coefficients of a polynomial
    array-of-double input, output; // desired evaluations of the polynomial

    // Construct a function to evaluate this polynomial without loop overhead
    jit_function newfunc = new jit_function [ double x -> double ];
    newfunc.mainblock.declare { double ret = 0; }
    for (i = coefficients.length; i-- > 0 ;)
        newfunc.mainblock.append { ret = ret * x + @{coefficients[i]}; }
    newfunc.mainblock.append { return ret; }
    function [ double -> double ] realfunc = newfunc.compile();

    // Now run that function over all our inputs
    for (i = 0; i < input.length; i++)
        output[i] = realfunc(input[i]);
(Disclaimer: syntax is 100% made up on the spot for illustrative purposes and almost certainly needs major reworking to not have ambiguities, infelicities, and no end of other cockups probably including some that don't have names yet. I'm only trying to illustrate the sort of thing that I'd like to be possible, and about how easy it should be for the programmer.)

So an important aspect of this is that parsing and semantic analysis are still done at compile time – the code snippets we're adding to newfunc are not quoted strings, they're their own special kind of entity which the compile-time parser breaks down at the same time as the rest of the code. We want to keep runtime support as small as we can, so we want to embed a code generator at most, not a front end. The idea is that we could figure out statically at compile time the right sequence of calls to an API such as libjit, and indeed that might be a perfectly sensible way for a particular compiler to implement this feature. The smallest possible runtime for highly constrained systems would do no code generation at all – you'd just accumulate a simple bytecode and then execute it – but any performance-oriented implementation would want to do better than that.

Importantly, the function we're constructing gets its own variable scope (ret in the above snippet is scoped to only exist inside newfunc and wouldn't clash with another ret in the outer function), but it's easy to import values from the namespace in which a piece of code is constructed (as I did above with the @ syntax to import coefficients[i]). It should be just as easy to import by reference, so that you end up with a runnable function which changes the outer program's mutable state.

Example uses for this sort of facility include the above (JIT-optimising a computation that we know we're about to do a zillion times), and also evaluation of user-provided code. My vision is that any program which embeds an expression grammar for users to specify what they want done (e.g. gnuplot, or convert -fx) should find that actually the easiest way to implement that grammar is to parse and semantically analyse it, then code-gen by means of calls to the above language feature, and end up with a runnable function that does exactly and only what the user asked for, fast, without the overhead of bytecode evaluation or traversing an AST.

Link Reply to this | Parent | Thread
[identity profile] strawberryfrog.livejournal.comMon 2013-07-15 16:21
If you're going to parse it at compile time, then any language with first-class functions will do something much simpler than this, unless I'm missing something. in C#:

Func<int, int> doubler = x => x * 2;

in Javascript:

var doubler = function(x) { return x * 2 };

I know, there's no "compile time" in JS. But it's equivalent syntax anyway.

If it's deferred until runtime, then the c# syntax is far more complex and unwieldy but probably more flexible: http://blogs.msdn.com/b/csharpfaq/archive/2009/09/14/generating-dynamic-methods-with-expression-trees-in-visual-studio-2010.aspx
Link Reply to this | Parent | Thread
[personal profile] simontMon 2013-07-15 16:36
any language with first-class functions will do something much simpler than this

Indeed, it's simpler and hence less flexible. Both those examples are more or less fixed in form at compile time; you get to plug in some implicit parameters (e.g. capturing variables from the defining scope) but you can't change the number of statements in the function, as I demonstrated in my half-baked polynomial example above. I don't know C# well at all, but I know that in JS you'd only be able to do my polynomial example by building the function source up in a string and doing an eval.

(OK, I suppose you can build one up in JS by composing smaller functions, along the lines of
var poly = function(x) { return 0; }
for (i = ncoeffs; i-- > 0 ;) {
    builder = function(p, coeff) {
        return function(x) { return x*p(x)+coeff; };
    };
    poly = builder(poly, coeffs[i]);
}
but I've no confidence that running it wouldn't still end up with n function call overheads every time a degree-n polynomial was evaluated. Also, I had to try several times to get the recursion to do the right thing in terms of capturing everything by value rather than reference, so even if that does turn out to work efficiently it still fails the puzzle-game test.)
Link Reply to this | Parent | Thread
[identity profile] strawberryfrog.livejournal.comMon 2013-07-15 16:45
I see - you could vary the number of statements with the "generating dynamic methods with expression trees" method above, but it would be a) checked at runtime not compile-time. and b) fugly. Roslyn may address the second issue somewhat, but probably not the first.
Link Reply to this | Parent
[identity profile] mdw [distorted.org.uk]Mon 2013-07-15 15:49
Are you sure you don't want to learn Common Lisp?
Let's see...

  • Multi-paradigm: oh, yes. We got that.

  • Fast compilers: check. More or less. Common Lisp is dynamically typed, but with optional type annotations. Good compilers can do extensive type inference. By default, Lisp is fairly safe (array bounds are checked, for example), but this can be turned off locally. Common Lisp was always intended to be compiled, and designed by people who'd already made Lisp implementations competitive with the Fortran compilers of the day.

  • OS bindings: hmm. Every decent implementation has an FFI. The last time I looked, the CFFI-POSIX project wasn't going anywhere, though.

  • Semantics: yep. Pretty good ANSI standard; a draft of it is available online as an excellently hyperlinked document -- the `HyperSpec' -- which, to the tiny extent that it differs from the official standard, probably reflects real life better. Common Lisp nails down a left-to-right order of evaluation, which eliminates a lot of C annoyance; aliasing just works correctly; and while runtime type-checking isn't mandated, all implementations I know of will do it unless you wind the `safety' knob down.

  • Compile-time checking: hmm. A decent implementation will surprise you with how much it checks, even in the absence of explicit annotations. Unlike Python, Lisp implementations will want you to declare variables explicitly or they'll give you lots of warnings. SBCL's compiler diagnostics are less than excellent. The usual clue is that it emits a warning explaining how it elided the entire body of your function, and then you notice that it proved that you'd passed an integer to `cdr' somewhere near the beginning. If you like writing type declarations then you can do that and get better diagnostics.

  • Metaprogramming: Lisp's most distinctive feature is that it's its own metalanguage.

  • Explicit free: no, sorry. The FFI will let you allocate and free stuff manually, but it's a separate world filled with danger.

  • Slabs of bytes: a standard part of the FFI. Don't expect this to be enjoyable, though.

  • AOP: you can probably make one out of macros. Maybe you'll need a code walker.

  • Integer arithmetic: Lisp integers always have the semantics of true mathematical integers. Lisp systems typically have lots of different integer representations (heap-allocated bignums; immediate fixnums which have type-tag bits and live in the descriptor space; and various sizes of unboxed integer used for intermediate results, and as array or structure elements) and use whichever is appropriate in any given case (so `narrowing' occurs automatically, but only when it's safe). A good compiler, e.g., SBCL, will do hairy interval arithmetic in its type system in order to work out which arithmetic operations might overflow. If you wind up SBCL's `speed' knob you get notes about where the compiler couldn't prove that overflow was impossible. SBCL also has some nonportable features for declaring variables which should have wrap-on-overflow semantics instead.

  • Runtime compiler: got that too. It compiles Lisp code to native machine code. And it's used to program-generated code, because of all the macro expansions.

  • (asdf:operate 'asdf:load-op "CL-PONY"). (Not really.)


Link Reply to this | Parent | Thread
[personal profile] simontMon 2013-07-15 16:25
It is certainly true that some of the things on my wish list are things Lisp has long been famous for having. Unfortunately, I'm sorry to say, I would really like them in a language that isn't syntactically Lisplike!

I'm not so wedded to the C/C++ style of syntax as to tolerate no divergence from even the bad parts (e.g. C's declarator syntax would probably be the first thing to go if I ever did sit down at the drawing board for serious), but I do think that one or two basic amenities such as an infix expression grammar are not things I'm prepared to do without in the language I use all the time for everything. I tolerate Lispy syntax in my .emacs because I don't spend my whole life writing .emacs; I'd lose patience if I did.
Link Reply to this | Parent | Thread
[identity profile] cartesiandaemon.livejournal.comTue 2013-07-16 10:28
Hm. Come to think of it, maybe "deciding on a subset of lisp and a consistent easy-to-use syntax for it" is what language design should be :)
Link Reply to this | Parent
navigation
[ go | Previous Entry | Next Entry ]
[ add | to Memories ]