Label Cloud

Monday, August 31, 2009

Source Control Management 201 - Repository Design for efficient code management using any source control

Source control management is an essential part of the development. It should be just as critical part of the developer’s toolbox as a good text editor. There is no project that is small enough not to deserve one. There is no such thing as bad source control management system. While there are probably hundreds SCM systems out there, some are more user developer friendly then others. Some of the more common ones you hear about today are Subversion, CVS, Git, Visual SourceSafe, Rational ClearCase.

A goal of a source control system is to answer the following questions:

  • What code is currently running in production?
  • What is the latest code a developer should be looking at?
  • What code was used to compile version X.XX.XXX?
  • How can a developer make changes without affecting other developers?
  • What changed between version X.XX.XXX and X.XX.XXY?
  • How can a developer incorporate changes between X.XX.XXX and X.XX.XXY to make X.XX.XXZ?
  • How can a developer rollback changes from X.XX.XXY to get back to X.XX.XXX?

All source control versions that I’ve worked with can answer all of the above questions. It does take some organizational skills on the developer to allow it to do so. SCM system stores code in a repository. It is up to the developer and SCM team to organize the layout of the repository. Some of the more common strategies are Trunk Focused

clip_image002

All development is done on the trunk. When the code is stable enough to be ready for QA/Production Testing it is brunched into a Beta/Release Candidate branch. The trunk version is incremented. The RC branch should have only minor bug fix related changes applied to it. All major changes are done in the trunk. As bugs are fixed in the RC branch, the changes are migrated into the trunk. When a version is released, the RC branch is moved into a Release branch. CHANGES ARE NEVER MADE IN THE RC BRANCH. IT IS READ ONLY! If a bug fix must be made to the already released version, a branch is created, and a change is done on the branch. The fix is then merged into the trunk.

This repository allows to easily answering developer’s question

  • What is the latest code I should develop from: Trunk
  • What code is currently running in production: Latest Read Only branch
  • How to make changes to version X.XX.XXX: Make a new branch from the version X.XX.XXX. After finishing your changes, merge them into the trunk.

Another strategy is to organize the repository around production version.

clip_image002[12]

Main trunk has the currently released production version. The trunk IS READ ONLY! For development, a branch is created, and development is done on the branch. After the version is released to production, the trunk is replaced with the copy of the released branch, and is again made read only. To develop the next version, another branch is created. If a fix has to be made to the production release, a branch for the fix is created. Once fix is released, a new trunk is created from the fix branch. The changes are merged into the branch under current development.

This layout answers the same questions slightly differently:

  • What code is currently running in production: Trunk
  • What is the latest code I should develop from: Latest branch
  • How to make changes to version X.XX.XXX: Make a new branch from the version X.XX.XXX. After bug fix is released into production, the branch is copied and becomes the new trunk. Changes are merged into the latest development branch.

There are other repository layouts available, and / or your team might make design changes around above structures.


Share/Save/Bookmark

Sunday, August 09, 2009

Multithreading will only take you so far

Working with Oracle Coherence, I do a lot of thinking about distributed architecture, parallel processing, and multithreading. Making use of all this technology is a great way to solve many problems. It can often seem, that as long as you can split your problem into small enough pieces, you’ll be able to process data instantaneously.

When thinking about distributed system, we often forget that making a system distributed, we still have a limited number resources to distribute the workload. The application is deployed on a specific number of machines, each with a specific number of CPU Cores. Each CPU core can process one instruction at a time (not completely true, but for simplicity’s sake). For example: In a distributed system where each workload takes 1 second to process, and we have a total of 10 workloads that need to be processed, will take at least 3 seconds to process all of them

image

Adding more threads will not work, only adding more CPU cores will. That is critically important when you consider that for many complex operations, a result of individual workload is not enough to provide a meaningful result to the end user. Results from all requested workloads have to be aggregated to create a final result.  This makes it relatively simple to figure out how much a calculation will take:

Total Time = Number of Work Loads / Number of Cores * Time Take by Work Load

Another very import point to remember, is that during performance testing, distributed systems behave differently. Requests from multiple clients will interfere a lot more with each other then they do in a straight processing system. In a prior example, a request that takes 3 seconds when ran from 1 client, can take 6 seconds with 2 clients, if the last work item for the first client, is started after all work items of the second client.

image

That will probably not happen, yet you have take the possibility into account. That is just the nature of the beast. 


Share/Save/Bookmark