Back once again for the origin/master [entries|reading|network|archive]
simont

[ userinfo | dreamwidth userinfo ]
[ archive | journal archive ]

Sun 2014-10-26 19:58
Back once again for the origin/master

This weekend, after preparation and faff lasting several months, I migrated all of my free software projects out of Subversion into a collection of git repositories.

Good grief, it was a faff! It's surprisingly like moving house – there's no end of just-one-more-things that all have to be sorted out, and every so often you turn a corner and find another huge list of things to add to the list, and it's exhausting sorting it all out. I did as much as I could ahead of time (e.g. I did the work of stopping various projects from depending on a monotonic revision number a few weeks in advance of the moment of migration itself), and there's a big list of sortings-out left to be done which I'll get round to once I've rested, but even so, the actual ‘moving day’ still had a lot of bits and pieces I couldn't move to another day. I'll find work restful tomorrow, I think!

[xpost |http://simont.livejournal.com/241653.html]

LinkReply
[personal profile] andrewduckerSun 2014-10-26 21:53

Interesting - what kind of issues did you bump into?

Link Reply to this | Thread
[personal profile] simontMon 2014-10-27 13:10

'Let me 'splain. No, is too much. Let me sum up.'

That's a question inviting a considerably longer answer than the original post :-) In rough categories:

Repository translation. To begin with, I had one big SVN repository containing a lot of largely separate projects of mine in subdirs, so I had to translate that into one git repo per project. I wasn't satisfied with just doing it by git-svn, because that didn't catch the various tags, branching and remerging relationships, within a project or between projects (e.g. my projects ipbt and xtruss are derived from the PuTTY source base, and I wanted to generate git repos for those which would make it easy to continue merging bug fixes from PuTTY in future). Another annoyance arises from the fact that some of my projects are libraries used by others, and I'd previously relied on having a single monotonic revision number covering all of that stuff so I could atomically update the library and one or more of its clients; that doesn't sit well with git, so I ended up converting most of those relationships into git-submodule links. (A double-edged sword – on the one hand this means commits to the library require manual faff to propagate into the client projects, but on the other, it means commits to the library with one project in mind can't break another project as a side effect, because I'll notice when I do the manual update of that project.) Finally, my SVN repo was itself translated from CVS about a decade ago, and badly represented some interesting branching structures that had happened before then, which I wanted to try to make better rather than worse. I ended up writing an entirely custom repository translation script (reusing a chunk of Python I already had to read SVN dump files) which included some reasonably well abstracted basic concepts plus a zillion special cases hardcoded per revision or project. I prepared that ahead of time and ran it repeatedly for test purposes, and even so, when I came to run it for real I found it had missed a couple of things (but only execute permissions on a couple of files, apparently due to inconsistent representation in the SVN repo, which I fixed up after the fact with git filter-branch).

Repository hosting. Having got some git repos, where should I put them, and how should I control access? For reasons of caution (I do maintain a security project), I wanted to keep them on the same server I normally depend on, rather than outsourcing to somewhere like github; also I wanted some kind of mandatory audit trail of who had committed what (because SVN does give you that, at least the way I had it set up, and I didn't want to take a backward step in that area), and also I wanted to avoid having to do the same thing I'd done with SVN (see section 2 in my old article) using userv, because that is inflexible in that it requires all your committers to have shell accounts on the host machine and also looked like considerably more faff to set up in git than it was in svn (because git has a wider range of repo access strategies so the custom-helper mechanism slots in at a different level of the system). I ended up with a system that allows SSH access to the repositories using a service account that has one or more authorised public keys per committer; that account has no write access to the repositories and also runs a custom shell (preventing subversion of OpenSSH restricted keys by Shellshock-style exploits), and once it's done initial verification it hands off actual push requests to a second service account which does further checking (in particular enforcing that committers only push stuff containing their own names, giving the audit trail I mentioned above) before actually allowing the push. So that all needed preparing ahead of time, testing carefully, and getting my co-developers to give me SSH keys and test-commit to a throwaway repo.

Build and versioning. Because all my projects had previously lived in SVN, my automated build system (which takes care of distributing builds across various Windows, Linux and MacOS machines and collecting all the results together into the right places) was only prepared to cope with checking code out of SVN to build. So I had to rework that to cope sensibly with git. Also, the projects' build scripts and version numbering systems often depended on the SVN monotonic integer revision number, because it's just so damn convenient – most of my stuff which didn't really need a development/stabilisation release cycle (e.g. my puzzles) was packaged into tarballs versioned with the SVN revision number, so I had to do a load of per-project work to find alternatives to that everywhere. Again, that was all done ahead of time.

Migration day. On the day of actually throwing the switch, I rendered the SVN repo read-only, ran the migration script for real, moved all the repositories into the hosting setup I'd prepared, reconfigured the build system to pick stuff up from the new git world rather than the old SVN world, and converted my website updating mechanism to retrieve the web content from git too. Then I ran a bunch of test builds and tried to fix all the unanticipated problems; by the time I went to bed it was nearly all working and the overnight cron builds mostly succeeded, bar a couple of trivial things (like failure to recreate directories in new checkouts) that were easily fixed the next morning.

Checkouts. Of course on several machines I had a zillion SVN checkout directories pointing at an SVN upstream that's now no longer current. So in the past couple of days I've also been going through and converting those into shiny new git clones, and that's been pointing out a series of followup glitches (mostly, having left things out of my freshly made up .gitignore files) and also the odd piece of uncommitted work here and there which I've been retrieving and putting where it ought to be. Worse than that, though, is the checkouts that were not SVN but git-svn based, in which I'd derived lots of temporary dev branches from the SVN history to hold patches part-way through polishing (both by me, and submitted by contributors). I still haven't figured out how to convert the most complicated ones of those into equivalent repos based on the now-official upstream git history; I'm fairly sure some kind of complicated git filter-branch is what I want, but I haven't yet got the details to work. And I also need to notify downstream projects (e.g. the Android port of my puzzle collection) which are currently based on informal git conversions of my code, because they will probably need to switch to tracking the new upstream git...

Link Reply to this | Parent | Thread
[personal profile] jackMon 2014-10-27 15:38

Thank you, the details, though long, were very interesting :)

BTW, I think the relevant links for ESR's cvs-fast-export and repo surgeon are probably http://esr.ibiblio.org/?p=4761 and http://www.catb.org/esr/dvcs-migration-guide.html, plus his recent blog. I think the rough history is that there was a tool that mostly worked (taken over by ESR from someone else, I think), but in the last few weeks, he suddenly got serious, make it a lot faster, wrote a decent guide, etc.

Link Reply to this | Parent | Thread
[personal profile] cjwatsonTue 2014-10-28 12:17

Yes, I've had very good results with reposurgeon. As a downstream of Simon's, I'll probably be using this to rewrite the halibut and putty Debian packaging repositories into forms that make them appear as though they have always been based on the new upstream git repositories, unless Simon tells me that's a terrible idea. (This will be a non-fast-forwarding change, of course, but since I'm the only direct committer to both of those, I think that's tolerable.)

Link Reply to this | Parent | Thread
[personal profile] simontTue 2014-10-28 12:31

into forms that make them appear as though they have always been based on the new upstream git repositories

We have always been at war with git-svn. Oh no, wait, we have never been at war with git-svn. :-)

(Seems sensible enough to me. I've just been figuring out how to do the same thing to the more convoluted of my old informal git-svn checkouts using git filter-branch rather than reposurgeon...)

Link Reply to this | Parent
navigation
[ go | Previous Entry | Next Entry ]
[ add | to Memories ]