A very small shell script

A very small shell script

[entries|reading|network|archive]

simont

[ userinfo | dreamwidth userinfo ]

[ archive | journal archive ]

Wed 2009-02-04 10:54

A very small shell script

Link

simont

Wed 2009-02-04 14:47

Here's some slightly more rigorous analysis.

In the red corner, we have my original script:

#!/bin/bash
case $((RANDOM % 3)) in
  0) echo "What?" ;;
  1) echo "I don't understand." ;;
  2) echo "Where's the tea?" ;;
esac

In the blue corner, the version which uses the most rather than least significant part of each output word:

#!/bin/bash
case $((RANDOM / 10923)) in
  0) echo "What?" ;;
  1) echo "I don't understand." ;;
  2) echo "Where's the tea?" ;;
esac

And in the green corner, the MD5-based version:

#!/bin/bash
x=`date +$$.%F%T%N | md5sum | head -c7`
case $((0x$x % 3)) in
  0) echo "What?" ;;
  1) echo "I don't understand." ;;
  2) echo "Where's the tea?" ;;
esac

(The last one depends not only on bash but also on the GNU versions of date and head, plus a working md5sum.)

We now generate a file of test output from each one by means of

for i in {1..1000}; do ./vsss; done > output

and then analyse each output file by correlating triples of successive output values:

perl -ne 'chomp; $a=$b;$b=$c;$c=$_;
          $x{"$a:$b:$c"}++ if defined $a;
          END{printf "%s\n", join ",",values %x}' output

And the results are, respectively:

1,52,1,2,4,63,3,62,3,62,96,63,3,53,52,51,1,1,63,102,63,1,51,96,49
27,149,28,1,1,2,28,4,1,3,146,148,148,1,1,155,155
39,45,42,36,41,33,41,36,37,39,39,36,22,37,45,38,34,40,38,38,36,28,34,38,33,41,32

See what I mean? Both the ones based on bash's $RANDOM are utter rubbish, but the MD5 version has actually managed to produce all 27 possible triples at plausibly similar frequencies. I think if anything my original mod-3 version did slightly better than the dividing one: it at least managed to produce 25 of the 27 possible triples to the dividing version's 17, even if its distribution still sucked.

eta: indeed, a chi-squared test confirms that the mod-3 version is the better of the two simple ones. Chi-squared values for the three output distributions are respectively 849, 2728 and 16.5. (With 26 degrees of freedom, you expect a properly random output to be somewhere below 30-50.)

Link	Reply to this \| Parent

navigation

[	go	\|	Previous Entry \| Next Entry	]
[	add	\|	to Memories	]