My Experiences With git-svn

December 30th, 2008

For the past two days, I’ve been giving git-svn a go at work. I was wanting a way to easily create task branches without touching the central subversion repository, and someone I worked with tipped me off to git’s standard svn interaction (thanks Eric). Plus I was just curious to learn more about git.

Here are my experiences and impressions so far.

My Setup

At work, I’m using IntelliJ Idea 8.0.1 on a Windows XP machine. The code I’m working on right now is a module contained within a larger framework, so I’m actually dealing with two subversion repositories; one for the framework, and another for the module. The larger project points to the module via an svn:externals definition.

Installing git With Cygwin

The first thing I did was get an install of git going. Since I’m using XP, I did this via Cygwin, which is the official route to git on Windows.

To actually be able to use git-svn, I had to install two Cygwin packages: git and subversion-perl.

I ran into some issues with the subversion perl bindings, but reinstalling subversion-perl fixed the problem.

I also needed to download Error.pm from CPAN and place it in <cygwin_root>/lib/perl5.

The final (minor) issue I hit with the installation was git-svn not being in a $PATH directory by default. (If you don’t want to mess with this, you can just say git svn (no hyphen) instead, and it’s all the same — as far as I can tell.)

All in all, the installation wasn’t a walk in the park, but it wasn’t too bad either.

Git-ting The Project

As I mentioned, the code I’m working on is an inner module for a larger piece of code. I toyed with the idea of trying to git the whole thing, but making the svn:externals definition work out seemed like it would be complicated at best. So, for now, I removed the externals definition on my local copy and checked out the inner module directly. (I’d be interested to hear about other ways to handle this.)

To do the checkout, I first prepared a directory:

git-svn init -s http://url.of.the/subversion/repository

(The -s means “stdlayout” — the svn repository has a normal trunk/branches/tags layout. The docs have more info on the different options).

Then to actually grab the repository:

git-svn fetch

Git stores the entire project history along with each repository (see the git tutorial if you want to know more about git), so the fetch can take a long time. This step took about 6 hours to get the ~12k revisions in my project.

If you don’t care about having the whole history, you can tell git to use any revision number as a starting point:

git-svn fetch -rREVISION

Using It

At this point, I could start using normal git commands to manipulate my local repository. To try it out, I created a local branch to do some work in:

rdickerson@rdickerson $> git branch -a     
* master
 
rdickerson@rdickerson$> git checkout -b aa_cache_fix
Switched to a new branch "aa_cache_fix"
 
rdickerson@rdickerson $> git branch -a
* aa_cache_fix
  master

(Again, see the git tutorial if you’re confused).

I made the changes I needed and committed back into the task branch:

rdickerson@rdickerson $> git commit -a

At this point I needed to merge back into the master, but I wanted to consolidate all the incremental changes I had made in the task branch into one big commit I could pass back to subversion. To bundle all of my branch commits together, I used the --squash merge option and made the large single commit to the master branch:

rdickerson@rdickerson $> git checkout master
rdickerson@rdickerson $> git merge aa_cache_fix --squash
rdickerson@rdickerson $> git commit -a -m "Fixed AA Cache"

Then I made sure my project was up-to-date and pushed the changes back out to subversion (the lines = redaction):

rdickerson@rdickerson $> git-svn rebase
Current branch master is up to date.
 
rdickerson@rdickerson $> git-svn dcommit
Committing to --------------- ...
Authentication realm: ------------------
Password for '---------': 
        M       ------- File #1
        M       ------- File #2
Committed r12457
        M       ------- File #1
        M       ------- File #2
r12457 = 09121633e5ca898cc96cd4f2264c987c7f29fc0f (git-svn)
No changes between current HEAD and refs/remotes/git-svn
Resetting to the latest refs/remotes/git-svn

And my commit shows up in both the git and subversion logs. Hooray!

Idea Integration

There are a couple Idea git plugins available, but none of them seem 100% perfect. I went with git4idea, which seems to be the standard.

Idea lets you specify version control systems on a per-directory level, so using git for my inner module and svn for everything else was no problem:

vcs2

The plugin has a nice, simple interface for switching branches:

git_checkout_idea1

Similar things exist for merging, stashing, pushing, pulling, etc.

I did run into some issues after I created a new file, added it to git, then renamed it — everything looked great through the console, but Idea’s file statuses were messed up, even after using the “Synchronize” and “Refresh” commands. I had to reopen the project to get things right again.

Another minor thing is that the commit dialog has no indication of what branch you’re in, so it would be pretty easy to accidentally commit to the wrong branch.

Impressions

So far, using git locally has been great. I can:

  • Create, switch, and merge branches quickly and easily
  • Perform basic git operations from Idea
  • Make commits and view project history even if I’m working offline or the subversion server is down. (Which admittedly happens rarely).

Keep in mind it’s only been two days, so who knows how I’ll feel after using it for a while. So far, though, I’ve learned a lot about git and set myself up with a swank local VCS in the process, so I’m going to say it was worth the effort.

RD.B Repost: The Limits of Proving Programs Correct

December 30th, 2008

This post originally appeared on http://robdickerson.blogspot.com. I’m no longer using that site, but I liked this post, so I’m reposting it here.

My university required senior computer science students to complete a capstone project before they graduated. The idea was to have every person perform a research project or internship that brought together all that he or she had learned over the last four years.

My capstone project was of the independent research variety. I’m very interested in formal logic and proof systems, and wanted to examine parallels between generating a formal proof and writing a computer program.

What I ended up doing was creating a system that started out with a formal program specification written in Z-notation, and simultaneously generated a working Spark-Ada program and a proof of that program’s conformity to the Z spec.

(Since each line of code transforms the program from one state to another, you can figure pre- and post- conditions of individual lines of code. So, the problem becomes a search for a list of statements whose pre- and post- conditions chain together to take the Z-specified preconditions to the Z-specified post conditions. The sequence of statements make up the working code, and the chain of line-by-line pre- and post- conditions are the correctness proof.)

The programs it could generate were very simple (they could only use a ridiculously small subset of Ada). It could write methods that did things like swapping the values of two variables. But it worked for the most part. And it was a lot of fun to work on, and I learned a lot.

The biggest thing I learned was the limited utility of proving code correct.

Something a fellow named Benjamin Russell said on the Haskell mailing list today got me thinking about software proofs (and inspired this post). He said that, “[b]asically, if you can prove that the program is correct, then you don’t need to test it.”

I disagree.

What I learned doing my senior capstone is that when you prove a program correct, what you’re actually proving is that the program conforms to some specification. Maybe it’s a specification formally defined in a language like Z, or maybe it’s a non-formal specification that resides in your head.

But really, a specification is prone to error just like program code. As your specification gets more detailed, it starts to become another programming language. With my capstone, all I really did is write a Z-to-Spark compiler. I’d shifted the opportunity for human error up a level — now, instead of making mistakes in Ada code, people were free to make mistakes in their Z spec.

It’s like saying you can write provably correct assembly code that matches a C specification. Maybe C or Z is an easier language to reason in, but you still have to take a lot of time to examine your specification, to make sure everything in the system meshes together — really, the formal specification becomes your new coding language.

On the flip side of the coin, as your specification becomes less formal, that opportunity for error shifts back toward the coding process. Less constraints in the specification mean more freedom in writing the code, which means more opportunity for mistakes in your actual program.

I guess what I’m saying is that in this approach:

1) come up with an over-arching specification for everything
2) write code that matches the specification

being able to prove code correct (prove it meets some spec) only gets you step two. Step one is just has hard (or harder). Creating a really formal specification means you have a new programming language to deal with, and a less-formal spec means you can have inconsistencies or missing information that bleed into your code base.

There is some essential complexity in developing software that cannot be eliminated by correctness proofs — you’re just moving opportunity for error from the “coding” side of the balance to the “specification” side.

All of that being said, if you’re doing something that has to be right, like air traffic control or data encryption, and you’re willing to spend a lot of time thinking about your formal specification, then I think provably correct software is the right goal. But correctness proofs are just one part of the process. Even if everything is proved correct, you still have to really test your software. If not to test the code, then to test the spec.

I think this is one of the reasons that test-driven development is such a practical idea for “normal” software development. It recognizes that your specifications are just as incomplete and subject-to-change as your code. With TDD, your formal specification is your suite of tests, and proving your code meets the spec is easy — just run the tests.

As far as program specifications go, unit tests are a nice balance between a formal approach (a machine checks conformity) and an informal approach (the tests are always changing and never complete).

For most programming, I think this balance is the best we have.

The Experiment Ends

December 30th, 2008

So I’ve been posting to robdickerson.blogspot.com for a while, to get the feel for how it would work out. I think I’m going to give up on Blogspot though.

  • I have little control over how the blogspot page is set up. I want more robust handling for code listings and console interaction display, and I have posts languishing in purgatory because I’m not happy with how Blogspot displays them.
  • I think categorizing my posts is enough to distinguish software stuff from other stuff. I think my fear of topic-dilution is unfounded. As it stands now, I barely post enough to fill a single blog anyway, so I’m not really at risk of over-inundating this space.
  • I feel it would be aesthetically pleasing to have a third argument, but I don’t. It’s my site, I don’t need to explain myself.

I’ll probably repost a couple things I liked from robdickerson.blogspot.com here, then let the Blogspot site wither away.

Programming Rob

September 8th, 2008

When it comes to writing posts on this site, most usually fall into the “software” category or the “everything else” category of my life.

So, depending on your point of view, this is either

  1. A personal blog with a bunch of boring technical stuff interspersed, or
  2. A software blog with a bunch of irrelevant personal stuff interspersed

In light of that observation, I’ve decided to experiment with another site where I can just write about software, leaving this site open for the “everything else” category.

The site is: http://robdickerson.blogspot.com

(Oh, and I also wanted to experiment with a setup where I don’t have to maintain a Wordpress install. :P )

We’ll see how it goes.

Turing… Again

June 23rd, 2008

It seems like I keep coming back to this earlier post. Maybe I have an obsessive personality? I’ve been trying for days to figure out. So far, I don’t think I do. I have some rigorous studies planned in the coming weeks; I’ll let you know my findings.

In the meantime, I came across this interesting paper called “The Origins of the Turing Thesis Myth,” which yet again knocks down the whole “Turing Machines equal Computers” thing.

When I went on my little rant, I talked about how I think Turing Machines are not the same as modern computers because for any given computer, there is a Turing Machine that can out-compute it; that is, the TM can perform computations the computer cannot.

This paper goes conversely, examining things computers can do that Turing Machines cannot. Mainly, incorporate user interaction. Previously unbeknownst to me, Turing himself actually defined the notion of a “choice machine”; so far, my understanding of this construct is that it exists to capture the non-algorithmic nature of interaction from sources “outside” of the TM (like users). I’m still getting to beknow the idea, though, so my understanding of it is likely to change in the near future.

After reading this paper, I had to change my thinking on TMs being “more powerful” than modern computers. Although the ideas presented are mostly new to me, so far the way a computer responds to external input does seems like an effective example of something a TM cannot model that a computer can do. (Of course, the added disjunct only further validates my TM != Computer thinking, so my larger worldview is still intact. Whew.)

Anyway, an interesting paper…

Programmer Behavior Catalog: Extraneous Brainstorming

February 25th, 2008

One of the disadvantages — pretty much the disadvantage — of working in a cube farm is that you have to listen to every conversation that any one of about seven or eight people participates in during the work day. Every conversation they have, all day long, you have to listen to.

It pretty much sucks big time for programmer productivity. (Fun fact: did you know that a programmer’s job revolves around spending long stretches of time concentrating?). On the up side, though, listening to coworkers talk all the time can reveal some interesting things about the people you work with.

This morning I was enjoying my mandatory eavesdropping privileges, and caught part of a technical conversation between two of my fellow programmers. Coder A had come to Coder B to ask if there was a way to schedule a task with a certain piece of software. By cleverly analyzing vocal cues (Coder B said, “ummm…”), I was able to determine that Coder B probably did not know the answer.

In a situation like this, some people in Coder B’s position would say, “I don’t know,” and let Coder A get back to finding a solution to his problem. If you think the story ends this way, you probably haven’t spent a lot of time with programmers.

Programmers like to help people. Not for altruistic reasons, but for prove-their-intellectual-prowess reasons. If you come to a programmer for help, they won’t send you away empty-handed. They’ll help you. They’ll help the bejezus out of you. The rest of the day, you’ll be walking around in a daze because of how hard you got helped, and when you finally snap out of it, you’ll tell all your friends how wonderfully helpful Coder B is, and how smart and handsome he must be to have been so helpful, and everyone will love Coder B. At least, that’s the idea.

In practice, though, sometimes you don’t know something, and you can’t help someone who asks about the thing you don’t know. So how do you escape this situation while still maintaining the illusion that you are the most helpful (and intelligent and handsome) person in your company?

You do what Coder B did. Pick a related problem, and help solve that one. Did you pick a related problem that Coder A has already solved? No problem! Offer alternate solutions and explain the extremely subtle pros and cons of each one. Basically, if you’ve managed to talk about any technical thing for at least five minutes… mission accomplished.

In this case, Coder B said something along the lines of “well, if you wanted to schedule the task, it’d have to run standalone, so you’d have to modify the code in X way so it didn’t depend on thing Y.” Coder A flat out said, “I’ve already taken care of that,” but Coder B launched into a description of the different ways you could modify the code; how you could make it a command object which would take more code re-writing, or how you could write a macro to call it, which would be easier but would be kind of ugly, or blah blah blah.

Basically, Coder B started brainstorming for ways to solve an already-solved problem.

Programmers are good at finding alternate solutions and weighing the pros and cons of each one — it’s an essential skill for a programmer to have. But the dark side of that skill is situations like this. How many times have you heard a programmer ramble about the relative merits of different solutions, when a perfectly acceptable solution has already been chosen?

So how do you get a programmer to shut the hell up and move on? The problem is that once the brain train gets rolling, it’s not stopping until it runs out of brain, uh, coal. If you ever find yourself watching precious minutes dissolve as a programmer brainstorms extraneously at you, my advice is to just turn around and run away. The more tenacious coders will follow you, running down the hallway shouting about run time efficiency versus code complexity. To ensure your getaway succeeds, kick the programmer in the shin before you flee.

It’s Official

February 13th, 2008

I’m a nerd god.


NerdTests.com says I'm a Nerd God.  What are you?  Click here!

Learning Haskell with ProjectEuler

December 18th, 2007

Over the past few months, I’ve been spending some time writing code in Haskell. I could devote quite a bit of text to how much I enjoy programming in this language. Suffice it to say, every good thing you’ve ever heard about Haskell is true.

You know, probably.

What this post is really about, though, is Haskell applied to a website on which I’ve been spending some time lately: http://projecteuler.net/. The site offers a collection of mathematical problems that are meant to be solved by writing some code.

If you’re just starting with Haskell (but are an otherwise experienced programmer), this site is a fantastic source of practice. Haskell lends itself particularly well to mathy things like this, and the problems build in complexity as you go along, forcing your code to become increasingly sophisticated.

So, the first problem is pretty much a one-liner, but by the time you get through even the first ten, you’ve started using folds, cool recursions, and interesting function compositions. Many problems force you to do a lot of thinking about complexity and performance, which can be a little tricky in Functional Land if you’ve just arrived from Imperative-ville.

And if you put in some effort to maximize the elegance of your solution code, there is a lot you are likely to learn. You may even find that some functional techniques you used to think impossibly clever start feeling more like second nature. And hey, isn’t that the process you went through to learn imperative languages anyway?

If you are already familiar with Haskell, well… you can still get a lot of enjoyment by thinking smug thoughts about how your three line solution would have looked in most other languages…

Illogical Arguments

November 13th, 2007

I came across this article which echoes some of the same frustrations I’ve mentioned before.

I’m glad I’m not the only one bothered by this…

Java Generics: Too Much Work For Too Little Type Safety?

March 22nd, 2007

As of Java 5, the Java language includes generics support.

This (long overdue?) feature lets you “generify” classes and methods to work on varieties of object types, while still maintaining some level of type safety.

Use of generics can be seen all over the place in the Java Collections library. Want a list of Strings? Declare:

List<String>

Want to map an integer id to a name? Say:

Map<Integer, String>

Sounds good? Let’s do a little experimenting with the way the system works.

Let’s say you want to write a class that represents a directed graph. (’Graph” here in the graph theory sense). So, you’ll want a class that contains a collection of vertex items along with a collection of edge items; each edge connects two vertices.

The Graph class will be responsible for creating the graph structure, but users of the class can make graphs out of anything they want. For example, you could make an interstate diagram with vertices of City objects and edges of Highway objects. Or, someone may make an Automaton class that contains a Graph with vertices made out of States, and edges made out of Transitions.

The natural thing to do is make Graph generic, and take in two type parameters; one for the ‘vertex’ type, and one for the ‘edge’ type. Then, your class may be used thusly:

Graph<City,Highway> interstateMap

or

new Graph<State,Transition>()

Great. So, let’s say you’ve started using this Graph class, and you have a bunch of Graphs with VType and EType for the vertex type and edge type, respectively. Now, you want to write a method that does something to Graph<VType, EType>s. So, you create a method with a signature something like:

public void doSomething(Graph<VType, EType> graph)

But wait. You also have classes that extend VType and EType, and you want your method to work on Graphs of those, too. Calling doSomething with a Graph<VTypeExtention, ETypeExtention> isn’t allowed unless you change your method signature to:

public void doSomething(Graph<? extends VType, ? extends EType> graph)

This is starting to get ugly. But, at least we’re sure to be type-safe, right?

But watch what’s legal to do inside of this method:

public void doSomething(Graph<? extends VType, ? extends EType> graph){
        Graph graph2 = graph;
        graph2.addVertex(new Foo());
}

So for all that overhead to make sure the Graph had the right type parameters, it’s that easy to add a vertex with the wrong type.

And what’s really bad is that not only will this code make it past the compiler, but you won’t even get a runtime exception when you call addVertex. The exception will come when you try to access the Foo item and call VType methods on it. It could be much later and at an unrelated place in the execution of the program.

You can’t even do the dynamic check manually, since type parameters aren’t allowed to participate in instanceof statements.

This is all caused by the erasure system Java uses to deal with Generics. The compiler makes static checks against the type parameters, then erases the type parameter information, then generates the byte code. At runtime,

Graph<? extends VType, ? extends EType> graph

turns into

Graph graph

and any dynamic type checking is out the window.

As far as guaranteed type safety goes, you might as well have made Graph with collections of plain old Objects.

I’m sure the erasure system is in place for many good reasons (backward compatibility among them). But, at the end of the day, issues like this make me feel like Java generics generate a pile of busywork for the programmer, but fail to truly pay off.