Post

Why Version Control Exists: Before Git (1/3)

Why Version Control Exists: Before Git (1/3)

Subject: Re: build failing on server

Hi all,

Does anyone still have parser.c from last Thursday’s build?

I’m looking for the version prior to the logging changes. The build was stable at that point, and I suspect a regression after that.

If someone has it, please send the file.

Thanks,
a tired developer, 1992


That might have felt like a strange mail, but it was routine to the developers who lived through it

This was exactly the kind of friction that forced better tools into existence.

To understand why, we need to go back to a time when computers were still strange boxes that took their time “thinking”

Mail and Pendrive Era

pre-vcs-mailing

Picture yourself, a young enthusiastic nerd in the 90s, sitting in front of your IBM PC clone running Minix. You and you friends are working on a fun project Freax

You know a guy who really knows his way around MASM so you drop a mail asking for help. He is onboard, you send him the code zip over mail and so the collaboration begins. Life goes on, your project is being talked about on IRC Channels and before anyone knows, tens of people are sending over their patches for this revolutionary new software

  • You send latest.zip to Dennis
  • Dennis reverts with an updated parser.c
  • You manually pick and choose changes to keep, compile and run
  • Guido is joining as a new contributor so Tim sends him project.zip
  • You notify Tim about the updated parser, send him the latest_really.zip
  • Here comes Bjarne wholly persuaded by Richard’s take on Libre Licensing and now we suddenly have I_Love_Libre.zip

The issue is glaringly obvious.

THERE IS NO ORGANIZATION IN THE CODEBASE

Everyone has their own version of the truth. This is where early version control systems enter the picture.

Early Version Control Systems

Centralized VCS

  • There is one central repository that everyone depends on (Source of Truth)
  • You need to be connected to the server in order to make changes
  • Each developer gets a working copy, not the full repository
  • History is on central server

CVS

There is chatter about version control and there exists this software named CVS. You start using CVS with the help of your friend Ari

Now you have one server hosting the Project. Workflow looks something like this:

  1. Copy project from Server
    1
    
    cvs checkout project
    

    Everyone has a local copy on your computer BUT meaningful actions still depend on the server You start modifying main.c on your local FS.

    There is no lock on files. Someone changed main.c while you were working on it and committed before you? Well, tough luck

  2. Update before committing
    1
    
    cvs update
    

    This the stage where you pull changes from the server (yea change first, pull later)

    Remember your main.c changes? They are currently in conflict with latest at server. You have to manually pick and choose changes to keep

  3. Commit
    1
    
    cvs commit
    

    After resolving conflicts, you commit your changes. Each file is committed independently - if something fails midway, you can end up with a partially applied change.

SVN

Someone on the IRC suggested Subversion so you decides to give it a try!

The workflow stayed similar

  1. Copy project from Server
    1
    
    svn checkout project
    
  2. Lock file if you want
    1
    
    svn lock main.c -m "Editing civilization"
    

    You can edit the file locally. Others can still modify it locally, but they cannot commit changes while the lock is held.

  3. Commit
    1
    
    svn commit main.c -m "Update civilization"
    

    Now you can commit changes atomically. Some improvement finally!

  4. Unlock the File
    1
    
    svn unlock main.c
    

    The file is now available for someone else

Issues with Centralized VCS

  1. The server is a Single Point of Failure
  2. History is stored on the server making your working copy incomplete
  3. Branching while Technically Possible, is expensive

Distributed VCS

  • No central server (No source of Truth)
  • Every local copy is a Full Repository including history and branches
  • Branches are Lightweight and Cheap making it part of workflow

BitKeeper

Freax has become widespread. Hundreds of contributors and Thousands of lines of code

CVS and SVN don’t scale to this level so you decided to go Distributed

Bitkeeper is a famous DVCS but it is proprietary. Alas! the project needs it so you compromise and start using it

The catch is the license. Contributors are explicitly forbidden from reverse-engineering BitKeeper, and violating this condition means losing access entirely.

Latest workflow is something like this

  1. Clone the Repository
    1
    2
    
    bk clone bk://freax.bkbits.net/freax-2.5 freax
    cd freax
    

    You have a full repo copy with all history and branches

  2. Make Changes Locally

    Edit the files you want to change. No need to manually Lock and Unlock files. No connection to the server

  3. Commit Changes
    1
    
    bk commit
    

    Commits were local and Atomic

  4. Pull Changes
    1
    
    bk pull
    

    Pull changes from others. This might create conflict

  5. Resolve Conflicts

    BitKeeper knew which changes conflicted so conflict resolution was easier

    Merges had explicit Record making it traceable

  6. Push Changes
    1
    
    bk push
    

    Your changes are now available to others. Contributors choose if they want to pull changes or not

A new beginning

Freax now depends on a proprietary tool, governed by a license that could be revoked at any time.

For a project built on open collaboration, that dependency could not last…

This post is licensed under CC BY 4.0 by the author.