I’ve always been a proponent of version control from my very first interaction with CVS. And although the concept of using CVS on a project nowadays makes me shudder, I can’t deny that I have respect for its role in introducing me to version control.
When I was working for the University of Georgia, I was introduced to Subversion. Subversion was, and still is, a powerful version control tool. However, recently I’ve come to prefer Git due to the increased flexibility that it provides. There are some things that are possible with Git that aren’t doable with Subversion, and in my humble opinion these features can make managing multiple copies of an application much, much easier.

In today’s article I’m going to go over a few introductory Git terms, and then I’m going to give some tips on cool features in Git that made my life easier. Both of these features are things that Subversion just couldn’t do. This article is not an attempt at a fair compare and contrast between Subversion and Git, so if that’s what you’re looking for, then look elsewhere.

Now, let’s get started.

Git Terminology: Bare Repositories, Remotes, Push and Pull, etc.

In oder for what I’m going to explain to make any kind of sense, I’ve got to explain a few Git concepts. If you’re already fluent in Git, feel free to skip ahead. I’m going to explain these terms from the point of view of someone who already understands Subversion. If you don’t already understand Subversion, then you’ll probably be a little lost – but don’t be afraid to use the Google machine and get caught up with the rest of the class.

The first thing that I need to explain is the concept of push and pull. If you’ve worked in Subversion for awhile, you understand that when you run a commit Subversion takes your changes and uploads them to the Subversion repository on your organization’s version control server. Git works a little bit differently. Unlike in Subversion, your working copy is a bona-fide Git repository. So, when you commit your changes – you’re not doing anything on the network. You’re just updating your local Git repository.

Of course, version control doesn’t do much good if you can’t share your changes with your team members. So, Git introduces a few new commands. The push command takes your repository and publishes your changes to a Git repository elsewhere. This can be accessed using straight SSH or you may have some sort of Git server (i.e. gitolite or gitosis) running that your client talks to. Much like Subversion, you can’t publish any changes unless you already have the latest copy of the code in your local working directory. (If you try, it will fail.) So, that’s where pull comes in. It allows you to retrieve changes published to the shared repository and merge them with your current repository.

In Git, these shared repositories are called remotes. However, not all remotes have to be shared among several people. A remote is simply some Git repository that lives on another machine that has a shared history with your Git repository. By default, the “official” repository that your code comes from is named “origin”, but you can name remotes whatever you like. Additionally, you can interact with remotes using any protocol that lets you interact with the file system (SSH, FTP, SFTP, WebDAV, etc) making this feature really flexible.

As I mentioned, remotes are Git repositories. However, most remotes are a special kind of Git repository known as a bare repository. Your working directory on your computer would be a non-bare repository. The distinction is simple: non-bare repositories are places where code needs to be edited or run, while bare repositories are strictly for storage – and thus don’t have decompressed, complete copies of your source files readily available. Bare repositories only consist of the Git repository database. The files that make up your application are not extracted in editable form.

Finally, Subversion has the concept of a trunk and branches. In Git, there are only branches and the master branch is the equivalent of the trunk in SVN.

Now that we’ve got the basics down, let’s get rolling with the fun stuff!

Code Management: Use the Stash

One of the unfortunate things about PHP development is that you have to store database and application startup settings somewhere. And if you have half-decent system admin, the login credentials to the database on your development machine will not be the same as the login credentials on the production server. Usually, the database login credentials and database names will vary from development machine to development machine as well. If any of these machine-specific changes make it into the central repository, then someone’s code is going to be broken or conflict when they update or pull.

At Boxkite, we solved this problem by making database.dist.php a versioned file for database settings. Then we required developers to create their own database.php in the same directory to get the app working. We then used the svn:ignore setting to make sure that the user-specific database file never made it up to the central repository.

This strategy works fine. You can even take it a step further in Git by not requiring the separate file. You could store the production settings in database.php, then when users clone the repository they could change it to support their local settings and run:

git update-index --assume-unchanged path/to/config/database.php

– and voila! The database settings won’t be committed.

Recently, I’ve been staying away from this convention for a few reasons.

  1. If someone updates database.php with real, useful changes (additional config parameters, etc, etc) you’ve got a bit of a mess on your hands.
  2. Something doesn’t feel right about telling Git to lie about the status of the file. Doesn’t seem like a clean solution. Maybe I’m just neurotic.
  3. It also seems like doing this something that could cause a new intern or someone else headaches down the road because it’s not immediately obvious what has happened when you are looking at the output of Git’s status command.

If we were in Subversion, we wouldn’t have much of a choice. However, Git introduces a new concept called the stash. To do a merge on a file, say foo.php, that file can’t have uncommitted changes sitting out in your working directory. Git will complain and ask you to either stash or commit the file.

Now, if you’re currently working on foo.php and you’re not entirely sure you want to share those changes yet, a commit is a bad idea. After a commit, your changes are a part of the history of the repository. Instead, I recommend you utilize the stash, which is a part of your local repository that’s not shared with any remotes.

The stash a stack, so you can push things on and off in a first-in-first-out order. Optionally, you can also explicitly recall a particular entry and apply it. It’s dead simple to use. Running git stash will stash all uncommitted files in your working directory – reverting them to how they appeared at the last commit. This allows you to pull, merge, and do whatever you like. Then, when you’re ready to have that work-in-progress back, you simply run git stash apply and Git will redo those changes in the files in your working directory.

Deployments: Push to Non-Bare Repositories

If everything in Git is a repository, then it makes sense that if you have a web application running on a server somewhere then there’s a good chance that the actual web directory of the project will, itself, be a Git repository. (Much like it would be a Subversion working directory if you were using svn.) This makes a lot of sense, because it means you can push changes to the central Git repository, and then pull those changes down to the production server like you would with Subversion.

However, Git allows you to do something much more powerful than this. Git allows you to push directly to the copy of the application on the live server. This may seem like a subtle distinction, but there are some benefits.

  1. If you work for a company like mine that has an edge firewall around our network (where the central Git server is) and some of your clients are hosted offsite, then you don’t have many other options.
  2. Configure your client’s server to connect to your company’s VPN (which probably wouldn’t make the security guys too happy)
  3. Manually FTP files. (bleh)

  4. You don’t have to log into the server and CD to the directory to do the update.

  5. May need to enter your password unless you have SSH keys set up, but still faster than manually navigating to the directory in the terminal.

  6. Mentally, this means you don’t have to deal with a change of context. You’re still located in your working directory when you’re finished.

  7. You can script actions that happen automatically when you push to the directory using the post-receive hook.

  8. Most of the time, you’ll probably just want to script the git reset command so your changes get unpacked to the working directory. But Ruby on Rails developers can use Git to run touch tmp/restart.txt and reload the application server and make the new code live.

Setting this process up is fairly simple. You’ve got to create the Git repository on the remote machine, then add the remote to your local repository. In brief, it looks something like this.

On the remote machine:

  1. Create the directory you want the project to live in on this machine. CD into the directory.
  2. Run git init
  3. Run git config receive.denyCurrentBranch ignore, then this repository will be set up to receive content.
  4. CD to .git/hooks and create a new file named post-receive. Then, in that file add the following lines to tell git to update the working copies of the files every time an update is pushed:

GIT_WORKDIR=.. git reset --hard

Then, on your machine:

  1. CD to your local Git repository for the project.
  2. If you’re using SSH then you’ll need to add a remote using a command like this (I’ll assume the remote is named “production”):
    git remote add production ssh://user@host/path/to/remote/repo
  3. Then, all you need to do is push your master branch to the production repository:
    git push production master

Your code should now be on your server ready to run! It’s worth mentioning that you’re not restricted to using Git commands in the post-receive hook. You can execute whatever you need to get the job done and automate getting your code live!

Further Reading

Conclusion

This has admittedly been a very brief overview of two particular aspects of Git that I’ve really enjoyed. At under 2000 words, I don’t think I really did the power of these features justice, but hopefully I did whet your appetite to learn more about Git if you’re not already a user. If you’re new to Git, hopefully this has reaffirmed your decision to make the jump from where ever you’re coming from.

As always, leave me some comment love with your thoughts.