Statistical personal history [entries|reading|network|archive]
simont

[ userinfo | dreamwidth userinfo ]
[ archive | journal archive ]

Sun 2006-07-09 14:32
Statistical personal history
LinkReply
[personal profile] simontSun 2006-07-09 14:53
Hm. Sounds as if this would be better done on the cumulative frequency graph (because then you can conveniently measure the residual as the integral of absolute or rms error over the entire run, without having to solve the same problem to find the curve you're trying to approximate to).

The thing that still worries me is, how do you pick how many Gaussians to use? I mean, given as many Gaussians to play with as I have data points, I imagine I'd get my best approximation to the real cumulative frequency graph by arranging one Gaussian centred at the point of each diary entry with a very small variance, and then I'm back where I started. Drop the maximum number of Gaussians and you're forced to get a good fit by matching them to overall features of the graph rather than individual data points, but the number of Gaussians is still an adjustable parameter which trades off overview against detail, so you're back at the same problem of needing a human to judge what tradeoff they really wanted.

The real trouble is, I intuitively feel that there ought to be some combined measure of overview and detail which is maximised (reflecting the idea that you've got a decent amount of both) at some interim level of granularity, but every concrete metric I've so far come up with turns out to be monotonic in granularity one way or the other.
Link Reply to this | Parent | Thread
[identity profile] ptc24.livejournal.comSun 2006-07-09 15:10
how do you pick how many Gaussians to use?

See my other post. Think of it as a predictive model: you're using part of the data to build a model of your mental state - you then test that model by trying to predict the rest of the data. The smoothed curve you generate represents the probablity of making an entry on a particular day. If you use too many Gaussians, you're overfitting, so you use cross-validation to see if you're doing that.

ISTR coming up with a scoring system for guess-the-probability games, there was a log in it somewhere. Alternatively you could come up with a simple scoring system where you take the dot product of the smoothed graph (discretised into days) and the raw data (again binned into days).
Link Reply to this | Parent
navigation
[ go | Previous Entry | Next Entry ]
[ add | to Memories ]