A very small shell script

A very small shell script

[entries|reading|network|archive]

simont

[ userinfo | dreamwidth userinfo ]

[ archive | journal archive ]

Wed 2009-02-04 10:54

A very small shell script

Link

gerald_duck

Wed 2009-02-04 11:08

Back in 2007 I e-mailed orders@thinkgeek.com, suggesting they should sell the heavy metal t-shirt in hoodie form. I got a reply:

Thank you for your feedback! We always appreciate getting new ideas from our customers. While we don't do custom printing, I have forwarded your suggestion to the appropriate parties. Please let us know if you have any further suggestions.

So the e-mails are read. I don't know if that's a sufficiently encouraging response for you to bother sending them an e-mail, nor whether they'd give you any money if they used an idea submitted in that way.

Link	Reply to this \| Thread

lnr

Wed 2009-02-04 11:24

I bought Mike that t-shirt for Christmas: he complains that several of the metals on there aren't very heavy at all :)

Link	Reply to this \| Parent

simont

Wed 2009-02-04 12:21

I'm guessing the restriction about the USA and Canada is most likely to have something to do with the possibility of paying money, so my guess is that as long as I'm outside North America I can't get them to send me money no matter what I do.

In principle the money isn't the main point (it's not that much anyway), but now that their T&Cs have annoyed me I find I'm inclined to stay annoyed on general principle...

Link	Reply to this \| Parent \| Thread

pseudomonas.livejournal.com

Wed 2009-02-04 12:38

They ought at least to give other people the option of payment in ThinkGeek vouchers or something.

Link	Reply to this \| Parent

pjc50.livejournal.com

Wed 2009-02-04 12:39

I think it may also be them not wanting to bother with non-American IP laws (e.g. not wanting to get into the whole continental mess of "droit moral")

Link	Reply to this \| Parent

cartesiandaemon.livejournal.com

Wed 2009-02-04 12:41

Oh, I cynically assumed that was just a form autoresponse. I guess I shouldn't for thinkgeek.

Link	Reply to this \| Parent \| Thread

simont

Wed 2009-02-04 12:43

On the other hand, thinkgeek might have less moral problem than some people with claiming that a suggestion which was actually ignored had been forwarded to "the appropriate parties", on the basis that they're more mentally equipped than some to consider it perfectly sensible that in a given case the set of appropriate people might happen to be empty :-)

Link	Reply to this \| Parent \| Thread

cartesiandaemon.livejournal.com

Wed 2009-02-04 12:52

ROFL. Touche.

I do love that script. (Although I keep wondering, is it safe to take the remainder of the random number? I thought that was risky. Although admittedly, even if it were, I don't expect Arthur Dent is cryptographically consistent.)

Link	Reply to this \| Parent \| Thread

simont

Wed 2009-02-04 14:18

I actually did almost replace $((RANDOM % 3)) with $((RANDOM / 10923)). I decided not to, in the end, because the former is more readable, and ultimately the aim of this script is to be read and understood by humans.

However, thinking about it further ... (Warning: extreme pedantry ahead.)

You're certainly right in general that with several types of commonly used low-end RNG it's better to take the most significant part of an output value rather than the least significant part, since state only propagates up the word and never down, and hence (e.g.) taking the k low bits of each output from an n-bit linear-congruential generator will give you a period of at most 2k instead of the 2n you'd get by taking the k high bits. Other types of RNG don't suffer from this at all: a linear feedback shift register, for instance, generates a stream of bits with no bit any worse than any other, so if you had one of those then you could safely reduce its output by whatever means was most convenient for you. However, after a quick source dive it looks as if the RNG behind bash's $RANDOM shell variable is indeed a linear congruential one; so yes, this would impair its quality, at least in principle.

However, that analysis applies to the situation where you're extracting many values from the RNG in succession without re-seeding it. Every fresh instance of bash will re-seed its RNG – and since this script only generates one random number during its entire run, it will generate only one random number per re-seed. So in fact the output values will be the seed values fed through one or other arbitrary transformation, and neither of those transformations looks to me as if it'll be likely to turn a crappy stream of seeds (bash mostly seems to seed off its PID) into particularly random-looking output.

To verify this, I've just tried out both versions of the script on a 1000-long test run. Both, in fact, seem pretty poor but in different ways: the mod-3 version as shown above seems to generate a lot of contiguous sequences of four or more of the same phrase, whereas the division version often seems to generate the same phrase every other run for even longer stretches. Short of using /dev/urandom, I suspect the only way to really do better would be to compute some sort of cryptographic hash (e.g. MD5) of the current high-resolution time and the pid. Which would be feasible in a shell script, admittedly, if even less portable. (I'm already unhappy that I had to make the script bash-specific in order to have it be concise enough to still be funny.)

Incidentally, it's also a mistake to consider this sort of thing to be a cryptographic problem: any such RNG is one you certainly should not have been using for crypto purposes in the first place, and any crypto-quality RNG should be equally (or, at least, sufficiently) strong at both ends of its output.

Oh, and finally: Zaphod's original specification of the Arthur Dent script never mentioned good randomness. All you have to do is to program it to say three things. My script definitely says three things. :-þ

Link	Reply to this \| Parent \| Thread

simont

Wed 2009-02-04 14:47

Here's some slightly more rigorous analysis.

In the red corner, we have my original script:

#!/bin/bash
case $((RANDOM % 3)) in
  0) echo "What?" ;;
  1) echo "I don't understand." ;;
  2) echo "Where's the tea?" ;;
esac

In the blue corner, the version which uses the most rather than least significant part of each output word:

#!/bin/bash
case $((RANDOM / 10923)) in
  0) echo "What?" ;;
  1) echo "I don't understand." ;;
  2) echo "Where's the tea?" ;;
esac

And in the green corner, the MD5-based version:

#!/bin/bash
x=`date +$$.%F%T%N | md5sum | head -c7`
case $((0x$x % 3)) in
  0) echo "What?" ;;
  1) echo "I don't understand." ;;
  2) echo "Where's the tea?" ;;
esac

(The last one depends not only on bash but also on the GNU versions of date and head, plus a working md5sum.)

We now generate a file of test output from each one by means of

for i in {1..1000}; do ./vsss; done > output

and then analyse each output file by correlating triples of successive output values:

perl -ne 'chomp; $a=$b;$b=$c;$c=$_;
          $x{"$a:$b:$c"}++ if defined $a;
          END{printf "%s\n", join ",",values %x}' output

And the results are, respectively:

1,52,1,2,4,63,3,62,3,62,96,63,3,53,52,51,1,1,63,102,63,1,51,96,49
27,149,28,1,1,2,28,4,1,3,146,148,148,1,1,155,155
39,45,42,36,41,33,41,36,37,39,39,36,22,37,45,38,34,40,38,38,36,28,34,38,33,41,32

See what I mean? Both the ones based on bash's $RANDOM are utter rubbish, but the MD5 version has actually managed to produce all 27 possible triples at plausibly similar frequencies. I think if anything my original mod-3 version did slightly better than the dividing one: it at least managed to produce 25 of the 27 possible triples to the dividing version's 17, even if its distribution still sucked.

eta: indeed, a chi-squared test confirms that the mod-3 version is the better of the two simple ones. Chi-squared values for the three output distributions are respectively 849, 2728 and 16.5. (With 26 degrees of freedom, you expect a properly random output to be somewhere below 30-50.)

Link	Reply to this \| Parent

cartesiandaemon.livejournal.com

Thu 2009-02-05 15:51

Oh, thank you. You're quite right, I'd hoped someone would do the analysis. The appropriate model should either be a program in a loop, or a program run less than every second (or however long it takes to say "what"). I just had an uneasy feeling that something was wrong.

Zaphod's original specification of the Arthur Dent script never mentioned good randomness.

Yeah, that's what I was thinking, but it didn't manage to prevent me saying anything. Of course, if you put all this in a footnote on the T-shirt, the T-shirt would be really geeky :)

Link	Reply to this \| Parent

navigation

[	go	\|	Previous Entry \| Next Entry	]
[	add	\|	to Memories	]