Grand unified theory
Rather late in the day compared to many people, I've recently been taking steps toward joining the DVCS generation.
For a year or two I've been an occasional light user of bzr, either to hold temporary branches off my main SVN repository (e.g. pre-git, by dint of playing with test repositories and examining the output of git fast- until I actually understood how its data structure fitted together and could work out everything else by reasoning about that. Having done so, I immediately migrated all my bzr repositories to git, because that kind of understanding is very valuable to me and bzr's documentation seems to place almost no emphasis on imparting it.
At the weekend, though, I actually did find the document which explains bzr's data structure –git. As, I discovered after a brief browse on another website, is the data structure of Mercurial. The user interfaces can vary, but all three of these DVCSes have an essentially similar underlying data model.
And, curiously, a thing that struck me about this model is that it's surprisingly similar to something I already know about: Usenet.
- Your fundamental entities –
whether articles or commits – are exchanged between sites in a peer- to- peer manner, with no necessary hierarchy forcing it to happen in a particularly organised way. Any given site will probably have only a subset of the entities in existence. - Each entity must therefore have a universally unique identifier, so as to keep track of it once it's wandered around servers for a bit.
- The unique identifiers are far too long and fiddly to type (though you can cut and paste them), so typically client software also lets you use more friendly numeric identifiers (
gitis an exception here, admittedly) –but those are local to a particular site, so you have to take care not to refer to them in your article text or else you'll make no sense to people reading them on a different site. - Entities are linked together by means of each one referencing those from which it naturally follows on. Since there's nothing stopping multiple entities from citing the same parent, this naturally gives rise to a branching tree layout.
- Nearly all entities are followups to existing ones, which quote a lot of context from the parent entity and add a small contribution of their own.
- Entities once created can typically not be rewritten while keeping the same unique identifier; they can instead be superseded by creating a second version of them with a different ID, leading to occasional glitches in the thread diagram when somebody else turned out to have already followed up to the original version of the entity.
Even some of the fine details match up too: git's separate ‘Author’ and ‘Committer’ fields, allowing one person to commit data originally written by another, are reminiscent of Usenet's occasionally distinct ‘From’ and ‘Reply-
Of course, there are some differences too. So the next question is, what can we learn from this comparison? What killer features do DVCSes have from which Usenet could benefit, or vice versa?
- Merges. DVCSes, of course, would be of no use at all if it weren't for the merge commit: a commit which references two or more parent commits, so that two branches of the diverging thread tree recombine into one. This is certainly something which I (and I'd guess others) would occasionally have found useful in a sprawling Usenet discussion. Of course there's nothing stopping you posting an article whose References header cites two posts neither of which is a direct ancestor of the other, but in practice support for gracefully handling this in clients is patchy at best. But the likes of
gitkhave shown the way: when can we expect Usenet clients as a matter of course to generate thread diagrams which support merging? - Cancels. In the opposite direction, Usenet has the concept of a cancel message, which you proactively send out to all sites to warn them that something you (or someone else) previously posted should be discounted. I could see uses for that sort of thing in the DVCS world: if a commit introduces a non-
obvious security hole, for example, it would be nice to be able to tag it after the fact with a warning marker (perhaps GPG- signed to show you mean it) which would be automatically propagated to everyone who pulled from you. Then there'd be some future commit which included a ‘resolved’ marker for that warning, indicating that it fixed the bug – and then the client software could mark all the revisions in between as unsafe, and automatically track which branches the bug had propagated to but the fix had not. - Rebasing. Usenetters occasionally supersede a posted article with one only slightly different from it. DVCSes permit this (
git commit --amend), but many of them go much further, supporting far more complex transformations of the commit history. The typical example is the rebase, in which an entire strand of development is reconstructed as if it originated from a different parent commit –so all the articles on the strand are reposted with different IDs and references, but (essentially) the same actual content. It can surely only be lack of imagination that has prevented Usenet from embracing this idea wholeheartedly: there's an almost practical use case of correcting your error when you've followed up to the wrong article by mistake (and moving over the followups to the erroneous post too if you didn't realise fast enough), but much more excitingly we could introduce the ability to transplant all the comments of your local troll so that they follow up to totally different posts and make no sense! Anyone who thinks Usenet just isn't chaotic enough must be licking their lips already over this one. - AOLers. It can't be too soon to start preparing for the day when major DVCS-
hosted software projects get invaded by legions of people from some mass- market ISP which has made it unwisely easy to access them. Expect repositories around the world to be flooded with commits which just add ‘ME TOO!’ to the end of comments in source files, or which fail to compile because all the punctuation is totally wrong. DVCS maintainers should start working on a killfile mechanism now, before it's too late!
no subject
RSS aggregation irks me as it serves the same function as NNTP with a much less efficient protocol.
no subject
I've been thinking on and off about the peak and decline of text Usenet (http://www.cam.ac.uk/cs/newsserver/newsvolume.html) for a while now. While the easier moderation of blogs and web forums might indeed be part of it, I suspect that actually the greater ease of both creation and discovery is a big part of it too.
Creating a blog or web forum can take as little as seconds; creating a newsgroup in a managed hierarchy takes days a minimum and can take months in the worst case. Equally finding web forums is trivial (http://www.google.com/search?q=cycling+forum) and newsgroups don't tend to appear in the same lists. The ease of creation, plus time, is arguably part of the cause of the ease of discovery.
I think these things are, at least by now, primary and the better moderation features of the web secondary, because you don't get to be put off Usenet by the trolls and flamewars if you don't even get to see it in the first place. (Granted that this might have been less true ten years ago when a greater proportion of the online population would have been aware of Usenet.)
no subject
no subject
- http://eagain.net/articles/git-for-computer-scientists/
- http://keithp.com/blogs/Repository_Formats_Matter/
no subject
no subject
You think you're joking, but honestly, I'm surprised it doesn't sound like that now :) It might not be "me too" or "not me", but, eg. editing brace styles is probably the equivalent for software :)
no subject
I think a world in which unskilled people do devastating things because the bar has been lowered for being able to write software arrived when people started doing security-critical stuff in PHP. Github for
lesbiansAOLers would merely be another manifestation of the problem. )-8no subject
Even crossed out, what on earth have lesbians got to do with any of this?!
no subject
no subject
eta: and it was only three months ago, at least if the HTTP last-modified date on the image can be trusted. My mind is going, Dave.
Imagine if we used Usenet posts to build a product!
(Anonymous) 2009-12-10 05:19 pm (UTC)(link)