A very small shell script [entries|reading|network|archive]
simont

[ userinfo | dreamwidth userinfo ]
[ archive | journal archive ]

Wed 2009-02-04 10:54
A very small shell script
LinkReply
[personal profile] simontWed 2009-02-04 14:47
Here's some slightly more rigorous analysis.

In the red corner, we have my original script:

#!/bin/bash
case $((RANDOM % 3)) in
  0) echo "What?" ;;
  1) echo "I don't understand." ;;
  2) echo "Where's the tea?" ;;
esac
In the blue corner, the version which uses the most rather than least significant part of each output word:
#!/bin/bash
case $((RANDOM / 10923)) in
  0) echo "What?" ;;
  1) echo "I don't understand." ;;
  2) echo "Where's the tea?" ;;
esac
And in the green corner, the MD5-based version:
#!/bin/bash
x=`date +$$.%F%T%N | md5sum | head -c7`
case $((0x$x % 3)) in
  0) echo "What?" ;;
  1) echo "I don't understand." ;;
  2) echo "Where's the tea?" ;;
esac
(The last one depends not only on bash but also on the GNU versions of date and head, plus a working md5sum.)

We now generate a file of test output from each one by means of

for i in {1..1000}; do ./vsss; done > output
and then analyse each output file by correlating triples of successive output values:
perl -ne 'chomp; $a=$b;$b=$c;$c=$_;
          $x{"$a:$b:$c"}++ if defined $a;
          END{printf "%s\n", join ",",values %x}' output
And the results are, respectively:
1,52,1,2,4,63,3,62,3,62,96,63,3,53,52,51,1,1,63,102,63,1,51,96,49
27,149,28,1,1,2,28,4,1,3,146,148,148,1,1,155,155
39,45,42,36,41,33,41,36,37,39,39,36,22,37,45,38,34,40,38,38,36,28,34,38,33,41,32
See what I mean? Both the ones based on bash's $RANDOM are utter rubbish, but the MD5 version has actually managed to produce all 27 possible triples at plausibly similar frequencies. I think if anything my original mod-3 version did slightly better than the dividing one: it at least managed to produce 25 of the 27 possible triples to the dividing version's 17, even if its distribution still sucked.

eta: indeed, a chi-squared test confirms that the mod-3 version is the better of the two simple ones. Chi-squared values for the three output distributions are respectively 849, 2728 and 16.5. (With 26 degrees of freedom, you expect a properly random output to be somewhere below 30-50.)

Link Reply to this | Parent
navigation
[ go | Previous Entry | Next Entry ]
[ add | to Memories ]