Archive for December, 2008

My Experiences With git-svn

Tuesday, December 30th, 2008

For the past two days, I’ve been giving git-svn a go at work. I was wanting a way to easily create task branches without touching the central subversion repository, and someone I worked with tipped me off to git’s standard svn interaction (thanks Eric). Plus I was just curious to learn more about git.

Here are my experiences and impressions so far.

My Setup

At work, I’m using IntelliJ Idea 8.0.1 on a Windows XP machine. The code I’m working on right now is a module contained within a larger framework, so I’m actually dealing with two subversion repositories; one for the framework, and another for the module. The larger project points to the module via an svn:externals definition.

Installing git With Cygwin

The first thing I did was get an install of git going. Since I’m using XP, I did this via Cygwin, which is the official route to git on Windows.

To actually be able to use git-svn, I had to install two Cygwin packages: git and subversion-perl.

I ran into some issues with the subversion perl bindings, but reinstalling subversion-perl fixed the problem.

I also needed to download Error.pm from CPAN and place it in <cygwin_root>/lib/perl5.

The final (minor) issue I hit with the installation was git-svn not being in a $PATH directory by default. (If you don’t want to mess with this, you can just say git svn (no hyphen) instead, and it’s all the same — as far as I can tell.)

All in all, the installation wasn’t a walk in the park, but it wasn’t too bad either.

Git-ting The Project

As I mentioned, the code I’m working on is an inner module for a larger piece of code. I toyed with the idea of trying to git the whole thing, but making the svn:externals definition work out seemed like it would be complicated at best. So, for now, I removed the externals definition on my local copy and checked out the inner module directly. (I’d be interested to hear about other ways to handle this.)

To do the checkout, I first prepared a directory:

git-svn init -s http://url.of.the/subversion/repository

(The -s means “stdlayout” — the svn repository has a normal trunk/branches/tags layout. The docs have more info on the different options).

Then to actually grab the repository:

git-svn fetch

Git stores the entire project history along with each repository (see the git tutorial if you want to know more about git), so the fetch can take a long time. This step took about 6 hours to get the ~12k revisions in my project.

If you don’t care about having the whole history, you can tell git to use any revision number as a starting point:

git-svn fetch -rREVISION

Using It

At this point, I could start using normal git commands to manipulate my local repository. To try it out, I created a local branch to do some work in:

rdickerson@rdickerson $> git branch -a     
* master
 
rdickerson@rdickerson$> git checkout -b aa_cache_fix
Switched to a new branch "aa_cache_fix"
 
rdickerson@rdickerson $> git branch -a
* aa_cache_fix
  master

(Again, see the git tutorial if you’re confused).

I made the changes I needed and committed back into the task branch:

rdickerson@rdickerson $> git commit -a

At this point I needed to merge back into the master, but I wanted to consolidate all the incremental changes I had made in the task branch into one big commit I could pass back to subversion. To bundle all of my branch commits together, I used the --squash merge option and made the large single commit to the master branch:

rdickerson@rdickerson $> git checkout master
rdickerson@rdickerson $> git merge aa_cache_fix --squash
rdickerson@rdickerson $> git commit -a -m "Fixed AA Cache"

Then I made sure my project was up-to-date and pushed the changes back out to subversion (the lines = redaction):

rdickerson@rdickerson $> git-svn rebase
Current branch master is up to date.
 
rdickerson@rdickerson $> git-svn dcommit
Committing to --------------- ...
Authentication realm: ------------------
Password for '---------': 
        M       ------- File #1
        M       ------- File #2
Committed r12457
        M       ------- File #1
        M       ------- File #2
r12457 = 09121633e5ca898cc96cd4f2264c987c7f29fc0f (git-svn)
No changes between current HEAD and refs/remotes/git-svn
Resetting to the latest refs/remotes/git-svn

And my commit shows up in both the git and subversion logs. Hooray!

Idea Integration

There are a couple Idea git plugins available, but none of them seem 100% perfect. I went with git4idea, which seems to be the standard.

Idea lets you specify version control systems on a per-directory level, so using git for my inner module and svn for everything else was no problem:

vcs2

The plugin has a nice, simple interface for switching branches:

git_checkout_idea1

Similar things exist for merging, stashing, pushing, pulling, etc.

I did run into some issues after I created a new file, added it to git, then renamed it — everything looked great through the console, but Idea’s file statuses were messed up, even after using the “Synchronize” and “Refresh” commands. I had to reopen the project to get things right again.

Another minor thing is that the commit dialog has no indication of what branch you’re in, so it would be pretty easy to accidentally commit to the wrong branch.

Impressions

So far, using git locally has been great. I can:

  • Create, switch, and merge branches quickly and easily
  • Perform basic git operations from Idea
  • Make commits and view project history even if I’m working offline or the subversion server is down. (Which admittedly happens rarely).

Keep in mind it’s only been two days, so who knows how I’ll feel after using it for a while. So far, though, I’ve learned a lot about git and set myself up with a swank local VCS in the process, so I’m going to say it was worth the effort.

RD.B Repost: The Limits of Proving Programs Correct

Tuesday, December 30th, 2008

This post originally appeared on http://robdickerson.blogspot.com. I’m no longer using that site, but I liked this post, so I’m reposting it here.

My university required senior computer science students to complete a capstone project before they graduated. The idea was to have every person perform a research project or internship that brought together all that he or she had learned over the last four years.

My capstone project was of the independent research variety. I’m very interested in formal logic and proof systems, and wanted to examine parallels between generating a formal proof and writing a computer program.

What I ended up doing was creating a system that started out with a formal program specification written in Z-notation, and simultaneously generated a working Spark-Ada program and a proof of that program’s conformity to the Z spec.

(Since each line of code transforms the program from one state to another, you can figure pre- and post- conditions of individual lines of code. So, the problem becomes a search for a list of statements whose pre- and post- conditions chain together to take the Z-specified preconditions to the Z-specified post conditions. The sequence of statements make up the working code, and the chain of line-by-line pre- and post- conditions are the correctness proof.)

The programs it could generate were very simple (they could only use a ridiculously small subset of Ada). It could write methods that did things like swapping the values of two variables. But it worked for the most part. And it was a lot of fun to work on, and I learned a lot.

The biggest thing I learned was the limited utility of proving code correct.

Something a fellow named Benjamin Russell said on the Haskell mailing list today got me thinking about software proofs (and inspired this post). He said that, “[b]asically, if you can prove that the program is correct, then you don’t need to test it.”

I disagree.

What I learned doing my senior capstone is that when you prove a program correct, what you’re actually proving is that the program conforms to some specification. Maybe it’s a specification formally defined in a language like Z, or maybe it’s a non-formal specification that resides in your head.

But really, a specification is prone to error just like program code. As your specification gets more detailed, it starts to become another programming language. With my capstone, all I really did is write a Z-to-Spark compiler. I’d shifted the opportunity for human error up a level — now, instead of making mistakes in Ada code, people were free to make mistakes in their Z spec.

It’s like saying you can write provably correct assembly code that matches a C specification. Maybe C or Z is an easier language to reason in, but you still have to take a lot of time to examine your specification, to make sure everything in the system meshes together — really, the formal specification becomes your new coding language.

On the flip side of the coin, as your specification becomes less formal, that opportunity for error shifts back toward the coding process. Less constraints in the specification mean more freedom in writing the code, which means more opportunity for mistakes in your actual program.

I guess what I’m saying is that in this approach:

1) come up with an over-arching specification for everything
2) write code that matches the specification

being able to prove code correct (prove it meets some spec) only gets you step two. Step one is just has hard (or harder). Creating a really formal specification means you have a new programming language to deal with, and a less-formal spec means you can have inconsistencies or missing information that bleed into your code base.

There is some essential complexity in developing software that cannot be eliminated by correctness proofs — you’re just moving opportunity for error from the “coding” side of the balance to the “specification” side.

All of that being said, if you’re doing something that has to be right, like air traffic control or data encryption, and you’re willing to spend a lot of time thinking about your formal specification, then I think provably correct software is the right goal. But correctness proofs are just one part of the process. Even if everything is proved correct, you still have to really test your software. If not to test the code, then to test the spec.

I think this is one of the reasons that test-driven development is such a practical idea for “normal” software development. It recognizes that your specifications are just as incomplete and subject-to-change as your code. With TDD, your formal specification is your suite of tests, and proving your code meets the spec is easy — just run the tests.

As far as program specifications go, unit tests are a nice balance between a formal approach (a machine checks conformity) and an informal approach (the tests are always changing and never complete).

For most programming, I think this balance is the best we have.

The Experiment Ends

Tuesday, December 30th, 2008

So I’ve been posting to robdickerson.blogspot.com for a while, to get the feel for how it would work out. I think I’m going to give up on Blogspot though.

  • I have little control over how the blogspot page is set up. I want more robust handling for code listings and console interaction display, and I have posts languishing in purgatory because I’m not happy with how Blogspot displays them.
  • I think categorizing my posts is enough to distinguish software stuff from other stuff. I think my fear of topic-dilution is unfounded. As it stands now, I barely post enough to fill a single blog anyway, so I’m not really at risk of over-inundating this space.
  • I feel it would be aesthetically pleasing to have a third argument, but I don’t. It’s my site, I don’t need to explain myself.

I’ll probably repost a couple things I liked from robdickerson.blogspot.com here, then let the Blogspot site wither away.