<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=299788&amp;fmt=gif">

How to Migrate from CVS to Git: A Step by Step Guide

Software Development, Software Solutions, Git

Migrate from CVS to Git

Sometimes during a long running client project, you get to make a big exciting change that benefits you as a developer. Most of my career has been dedicated to helping clients evolve their projects by introducing new frameworks, migrating platforms, etc., all while continually developing new code to add long-desired functionality for their end users. However, the day had finally arrived when I was tasked with moving a code base from a CVS repository to Git... and I was anxious to get started.


Since I hadn't made this conversion before, some research was necessary. I already had a few key objectives in mind:

  • Minimal downtime: The migration itself should be done in a short amount of time and with few resources in order to minimize disruption of ongoing development
  • Keep history of file changes: One of the key benefits of a repository is being able to view the history of changes to code resources. We'd also enjoy the benefit of Git by no longer losing history of files due to moving and renaming resources for future changes.


There were a few choices to be made in how to accomplish this. I had a PC workstation to use and in order to get access to all the tooling necessary to make the conversion, a Linux environment would be needed.

The quickest and easiest way to do this was to use a virtual machine with Vagrant. This would allow me to do the conversion and then quickly discard the virtual environment without any traces left behind. 

After setting up my quick and easy Ubuntu Linux environment, I had to now choose the exact conversion software. After researching a few solutions, I chose to use git-cvsimport as it provided a direct path from CVS to Git without any intermediate conversions. Additionally, I found these two resources which would give me the necessary commands:

Thought Process: How do I Structure the Git repo(s)?

Now that I had my environment set up, I needed to take a step back and figure out exactly how my project would be set up in a Git repository. I needed to convert a portal project which was structured as a couple dozen individual WAR projects, each containing one or more portlets, shared code Java projects that are packaged as JAR files within the portlets and a couple of other supporting Java packages for various functions outside the portal.  

In my case, this question was answered by the way code changes are made and deployed to the portal.
Since code is released all together and any code enhancements to the portal may span a couple of different portlets, it makes sense to package this all under one repository.

Put another way, there would be a 'root' parent folder with each project existing as a subdirectory underneath it.  I searched for an easy (yet proper) way of handling the merging of repositories.  I looked at various solutions like Git submodules, but they didn't seem like the correct approach for what we were trying to accomplish. Then I found an article showing a simple merge of repositories into one repository and it looked like the most simple and straightforward way: Merging Two Git Repositories Into One Repository Without Losing File History.

I took into consideration that if any of the shared code projects were used outside of the portal project, then it would make sense to structure the repositories differently. Also, if each portlet was treated as an autonomous unit with no dependencies and individually deployed, then it would make sense to keep each project in its own individual repository as well.


Plan of Conversion

Now that I had done all necessary reading, researching and talking with colleagues, I knew what the final structure would look like and how to implement it.



1. Install Vagrant and VirtualBox

I simply followed the directions in section 'STEP-BY-STEP GUIDE OF INSTALLING SVN2GIT' from Michael March's HOWTO article linked above using the slightly modified Vagrant file listed below. The changes included removing Subversion related packages and adding git-cvs.

# -*- mode: ruby -*-
# vi: set ft=ruby :
$script = <<SCRIPT
echo I am provisioning...
date > /etc/vagrant_provisioned_at
sudo apt-get -q -y install git-core git-cvs ruby rubygems
Vagrant.configure("2") do |config|
# set the base box type
## Usage :: For a redhat/centos VM w/hostname jira-lrn-com.lrn.com:
### OPERATINGSYSTEM='redhat' vagrant up jira-lrn-com.lrn.com
## Usage :: For a precise64 VM w/hostname jira-isostech-com.isostech.com:
### vagrant up
os_name = 'precise64'
config.vm.box = 'precise64'
config.vm.box_url = 'http://files.vagrantup.com/precise64.box'
# set the hostname
if ("#{ARGV[1]}"=="")
config.vm.define :"#{ARGV[1]}" do |mmconfig|
>#mmconfig.vm.box = "apache"
config.vm.synced_folder Dir.pwd + "/migrate/", "/migrate"
config.vm.network :forwarded_port, guest: 80, host: 8080
config.vm.network :forwarded_port, guest: 443, host: 8443
config.vm.provider :virtualbox do |vb|
vb.customize ["modifyvm", :id, "--memory", "1096"]
# Set the timezone to the host timezone
require 'time'
timezone = 'Etc/GMT' + ((Time.zone_offset(Time.now.zone)/60)/60).to_s
config.vm.provision :shell, :inline => "if [ $(grep -c UTC /etc/timezone) -gt 0 ]; then echo "#{timezone}" | sudo tee /etc/timezone && dpkg-reconfigure --frontend noninteractive tzdata; fi"
config.vm.provision "shell", inline: $script
NOTE: If you're running Vagrant on a Windows box and you don't want to mess around with configuring the SSH client or keys, just use PuTTY or a similar SSH client and log in with following credentials:
  • user: vagrant
  • password: vagrant

2. Make sure all the code is checked into the HEAD stream in CVS

Since restructuring would occur as well as changes to how branching would be handled, we’d want a everything on the main trunk in order to re-branch later in Git. I made sure all code was checked into the CVS repository and that the production release was merged in the HEAD stream. As you can imagine, the best time to do this conversion is when you're completely done with a development release cycle.


3. Create tarball of each CVS project along with CVSROOT folder

Since having remote access to CVS is not good enough, we needed a copy of the CVS folders (full or subset) required for conversion. In this example, we tar up a subset of the remote CVS server's applicable directories.

# on the CVS server, go to the CVS directory and create a tar
# in your home directory making sure to copy all project folders and CVSROOT
cd /home/cvsroot
tar -zcvf ~/cvsRepo.tar.gz CVSROOT/ JavaProject1/ PortletWAR1/ PortletWAR2/


4. Convert each project folder from CVS to individual Git repositories

Copy the newly created tar files to the easily accessible 'migrate' folder within your Vagrant folder which will be accessible through its Linux instance. To keep it easy, expand the tar file into the same location as on the CVS server.

# on the Vagrant instance, create the location and expand the tar file
mkdir /home/cvsroot
>tar -zxvf /migrate/cvsRepo.tar.gz -C /home/cvsroot


5. Create author conversion file if names known

Q: Why create an author conversion file?  A: So that your Git repository will have an accurate representation of the authors of each change in the repository.
If you have all the names of individuals who have committed changes to the CVS repository, then this step is easy. Just create a new author conversion file (author-conv-file.txt) with the following format: (For consistency, the user IDs should match the user store that your final Git repository is using so that all new development going forward will have matching names. If your CVS repository uses a different user store resulting in differing user IDs, you'll need to find out what the Git user IDs are. If you're able to create this file, skip to section 'Conversion process with cvs-import'.)

# format: userID=User Name <emailaddress>
djohnson=David Johnson <djohnson@isostech.com>
msmith=Michael Smith <msmith@oreillyauto.com>
Be sure to remove the converted folders before continuing with the next section.


6. Pre-conversion process with cvs-import

If you are unsure what the author names are, then you will need to do the cvsimport process twice. This first conversion is so the author names can be determined; the second is for the actual conversion.

# From your migrate location, execute the cvsimport on
# each project file with arguments:
# -a Import all commits
# -k kill keywords (recommended)
# -d root of CVS archive passing in location as argument
# -A author conversion file passing in the location/name as argument
# -C target passing in name of git directory to create (matching project name)
# CVS project name
cd /migrate
git cvsimport -a -k -d /home/cvsroot -C JavaProject1 JavaProject1
git cvsimport -a -k -d /home/cvsroot -C PortletWAR1 PortletWAR1
git cvsimport -a -k -d /home/cvsroot -C PortletWAR2 PortletWAR2
# next, you can follow this git shortlog procedure for each project to
# get a list of author names
cd JavaProject1
git shortlog |egrep ^\w
After executing the git shortlog procedure for each converted project, you will see a list of user IDs for creating your author conversion file as shown in the above section 'Create author conversion file if names known'.

mjohnson (57):
msmith (287):


7. Conversion process with cvs-import

With your author conversion file, you can now execute the final conversion.

# From your migrate location, execute the cvsimport on
# each project file with arguments:
# -a Import all commits
# -k kill keywords (recommended)
# -d root of CVS archive passing in location as argument
# -A author conversion file passing in the location/name as argument
# -C target passing in name of git directory to create (matching project name)
# CVS project name
cd /migrate
git cvsimport -a -k -d /home/cvsroot -A author-conv-file.txt -C JavaProject1 JavaProject1
git cvsimport -a -k -d /home/cvsroot -A author-conv-file.txt -C PortletWAR1 PortletWAR1
git cvsimport -a -k -d /home/cvsroot -A author-conv-file.txt -C PortletWAR2 PortletWAR2


8. Restructure each project to subfolders

Since we determined all projects would be living under a new parent folder, move all contents of projects into a subfolder of the same name. This process is repeated for each project folder git repository. This can be scripted if dealing with a large number of projects.

# create the project subfolder using the same name as the project folder
cd /migrate/JavaProject1
mkdir JavaProject1
# move everything into the project subfolder except
# the project subfolder (those are grave accents)
mv `ls -A | grep -v JavaProject1` JavaProject1
# finally, commit the changes to the
git commit -m "Move JavaProject1 files into JavaProject1 subfolder”


9. Create a new repository directory to hold all the projects

Next, create the final git repository that the project git repositories will merge into.

# name the folder what you want the final repository name to be
mkdir /migrate/PortletProject
cd /migrate/PortletProject # initialize git directory and add initial file to # enable merge (this file will be deleted later) git init echo "my initial commit" > initial-commit.txt git add . git commit -m "Initial Commit"


10. Merge each individual project into the new repository directory

Next, each project git repository will be merged into the final repository by repeating the following steps.

# from your final repository directory, add each project as
# a remote - repeat process for all
cd /migrate/PortalProject
git remote add -f JavaProject1 /migrate/JavaProject1/
git merge JavaProject1/master
You can now discard the initial commit.

# erase your initial commit file
git rm initial-commit.txt
git commit -m "Removed initial file"


11. Move finalized repository to git repository server

Finally, move your repository into the Git server everyone shares. This example assumes you've already set up SSH keys in order to remotely connect.

cd /migrate/PortalProject
git remote add origin git@repo.isostech.com:PortalProject.git
git push origin master


12. Cleanup

Now that the conversion is completed.  We can easily stop and clean up this Vagrant instance.

# on your host machine, within the vagrant project directory
vagrant destroy
# clean up any files in 'migrate' folder

There were a couple of issues I ended up dealing with after the conversion was completed. Some of these was just getting familiar with Git and some of them involved the IDE tooling for Eclipse. The restructuring of the project has an effect on how you view history with Git. By default, Git will not display any history before the point of a resource move. So in order to view old history, I had to specify '--follow' for the Git log command. Also, when viewing history in Eclipse using EGit, I was not able to do comparisons or completely view an old revision of a file before the resource move. This is a known problem and will hopefully be fixed soon. In the meantime, I can use other tools to make file comparisons, so it's not a deal breaker.

And that's all folks!  I hope this blog shows not only a quick solution for coming up with an environment in which to make this CVS to Git conversion, but also some of the thought processes involved with structuring your converted projects into Git repositories. Using tools like Vagrant gave me the opportunity to repeatedly try out different scenarios with an easy way to clean up and restart the whole process with just a few keystrokes.  Give it a try!

Special thanks to Scott Smith, Michael March and Kevin Brandon for discussing with me all things Git.

Managing JIRA at Scale White Paper

TAGS: Software Development, Software Solutions, Git

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Subscribe to Our Newsletter

Recent Blog Posts