Source Repository Rules

Revision control, also known as version control and source control (AKA part of software configuraiton management), is the management of changes to documents, source codes. As release engineer, we have to work with a source repository everyday. So, what should NOT be under source control?

Today I read one post from That blog is for JDK build. In that post, it just simply lists a few commandments for dealing with software repositories and the build results of those repositories. The following is its commandments:

  1. There shalt not be binary files in the repository.
    Binary files (executables, native libraries, zip files, jar files, etc.) are NOT source and should not be managed in a source repository.
  2. Keep thy path names simple.
    Directory names and filenames in the repositories should never contain blanks or non-printing characters. Certain characters such as ‘$’ should also be avoided.
  3. There shall be one newline convention.
    The contents of all source files should follow the standard unix conventions on newlines (no \^M’s).
  4. Generated source files shall not be added as managed files.
    Source files generated during the build process should not be managed files in a repository.
  5. All output from the build shall be kept separated from the source.
    All files generated during the build should land in a well defined output only directory such as build/ or dist/. The src/ directory should never get written to during the build process.

Basically I am just 100% agreed with them but I just want to add some examples which are materials I retrieved from Internet (Well, SCM is a big topic, above commandments are too simple and are just not enough. I know ppl just wants to source control everything as storage is cheaper nowadays. 🙂 )

A general rule of source control: All of the source files generated by people manually should be under control. Besides source codes, it should also include build description files, configure files, etc. Vice versa, anything that is generated by build system automatically, like target files, executable files, binary, bytecode, code/documents generated from XML should not be source control. However, 3rd party libraries that you don’t have the source or don’t build should be under control.

You should only source control those files that:

  • ( need revision history OR are created outside of your build but are part of the build, install, or media ) AND
  • can’t be generated by the build process you control AND
  • are common to all users that build the product (no user config)

The list includes things like:

  • source files
  • make, project, and solution files — build files & project configuration files
  • other build tool configuration files (not user related)
  • 3rd party libraries
  • design documentation
  • description files like WSDL, XSL

For example, in world of Java,

Anything that’s generated from the items you check into source control.

Things should be checked in:

  1. Source files (.java, and other languages)
  2. 3rd party JARs
  3. Configuration XML or .properties
  4. HTML, CSS, JSPs for web apps
  5. SQL scripts
  6. Design (UML) and documentation (Word or HTML)
  7. Unit test classes and any test data

Things should not checked in:

  1. Compiled .class files
  2. Generated JAR or WAR files except those 3rd party JARs
  3. javadocs
  4. JUnit report HTML and results

Actually to avoid checkin those unnecessary files into your repository, you can set those rules in your desktop client of version control system. Take github as example, you can define ignored files (.gitignore) as below. (It is also a very good example about which types of files you should ignore for different languages respectively).

## Eclipse


# External tool builders

# Locally stored “Eclipse launch configurations”

# CDT-specific

# PDT-specific

## Visual Studio

## Ignore Visual Studio temporary files, build results, and
## files generated by popular Visual Studio add-ons.

# User-specific files

# Build results


# MSTest test Results


# Visual C++ cache files

# Visual Studio profiler

# Guidance Automation Toolkit

# ReSharper is a .NET coding add-in

# TeamCity is a build add-in

# DotCover is a Code Coverage Tool

# NCrunch

# Installshield output folder

# DocProject is a documentation generator add-in

# Click-Once directory

# Publish Web Output

# NuGet Packages Directory
## TODO: If you have NuGet Package Restore enabled, uncomment the next line

# Windows Azure Build Output

# Windows Store app package directory

# Others

# RIA/Silverlight projects

# Backup & report files from converting an old project file to a newer
# Visual Studio version. Backup files are not needed, because we have git 😉

# SQL Server files

## Windows detritus

# Windows image file caches

# Folder config file

# Recycle Bin used on file shares

# Mac crap

## Python


# Packages

# Installer logs

# Unit test / coverage reports


#Mr Developer

How Compiler build Software

Yesterday I refrehsed myself about various source file types and how to get them built by tools respectively. In this post I will summary my study note.

In this post, I will use C as sample language as C is higher level than assembly language, it is closer with OS than Java hence it is a good one to be the example.

Compiler for C,

There are various C compilers and the most famous compiler should be GNU Compiler Collection. The GNU Compiler Collection (GCC) is a compiler system produced by the GNU Project. GCC is a key component of the GNU toolchain (in other word, GCC toolchain method to compile codes). The GNU toolchain is a blanket term for a collection of programming tools produced by the GNU Project. In GCC, it consists by below components:

1. C preprocessor – The C preprocessor implements the macro languages used to transform C, C++ and other programs before they are compiled.

2. C compiler – The C compiler compiles source codes into assembly language.

3. assembler – The assembler compiles assembly language into target file (binary code).

4. linker – The linker links target files into a single executable program.

More details about GNU GCC,

In the following examples, I will show you how to compile C codes via GCC. My demo platform is,

luhuang@luhuang-VirtualBox:~/workspace/Hello$ uname -a
Linux luhuang-VirtualBox 3.0.0-32-generic-pae #51-Ubuntu SMP Thu Mar 21 16:09:48 UTC 2013 i686 i686 i386 GNU/Linux

Source code:

Let’s see our material firstly:


#include "hello.h"

int main(int argc, char *argv[]){
	if (MAX(1,2) == 2){
	return 0;


#include <stdio.h>
#include "hello.h"

void hello(const char *string)
	printf("Greeting %s\n", string);


extern void hello(const char *string);

#define MAX(a,b) ((a) > (b) ? (a) : (b))

Let’s see what above three files will do:

1. In main.c, its first line tells C compiler to include ‘hello.h’.

2. In hello.h, it defines a Marco ‘MAX’ and in main.c it invokes the MAX macro.

3. In hello.c, it includes two header file. The first header comes from C’s built-in stdio.h and it provides the standard printf function.

4. In hello.h, it defines the function prototype of hello(*) and marco MAX(a,b).

Ok, let’s compile them!

1. let’s go to source dir,

luhuang@luhuang-VirtualBox:~/workspace/Hello$ ls
hello.c hello.h main.c

You can see it has three source codes I described above only.
2. Compile it. option -c means compiling source codes into target file. Actually in the back-end, it invokes C preprocessor, C compiler and assembler in sequence to compile source codes into target file. In C language, a basic unit of compiling is a C source code file (.c) and its header file ends with .h. Similarly, its name of target file will be end with .o. For a header file, it won’t generate any .o file.

luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -c hello.c
luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -c main.c
luhuang@luhuang-VirtualBox:~/workspace/Hello$ ls
hello.c  hello.h  hello.o  main.c  main.o

3. If you want to invoke preprocessor explicitly. You can use option -E. With option -E, C preprocessor will just process source codes’ #include directives and Marco. -E will tell GCC processes only #include directives and marco. It won’t do any compile work. In the following example you can see, it replaces

	if (MAX(1,2) == 2){


if (((1) > (2) ? (1) : (2)) == 2){

Let’s see how it preprocesses main.c,
luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -E main.c

# 1 "main.c"
# 1 ""
# 1 ""
# 1 "main.c"
# 1 "hello.h" 1
extern void hello(const char *string);
# 2 "main.c" 2

int main(int argc, char *argv[]){
if (((1) > (2) ? (1) : (2)) == 2){
return 0;

4. Let’s see how GCC compile source codes into assembly code with option -S. As I said above, GCC works in the way of toolchain. That is to say, if you invoke -S, it will do preprocessor -E firslty. Let’s see below example,

luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -S hello.c
luhuang@luhuang-VirtualBox:~/workspace/Hello$ cat hello.s
	.file	"hello.c"
	.section	.rodata
	.string	"Greeting %s\n"
	.globl	hello
	.type	hello, @function
	pushl	%ebp
	.cfi_def_cfa_offset 8
	.cfi_offset 5, -8
	movl	%esp, %ebp
	.cfi_def_cfa_register 5
	subl	$24, %esp
	movl	$.LC0, %eax
	movl	8(%ebp), %edx
	movl	%edx, 4(%esp)
	movl	%eax, (%esp)
	call	printf
	.cfi_restore 5
	.cfi_def_cfa 4, 4
	.size	hello, .-hello
	.ident	"GCC: (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1"
	.section	.note.GNU-stack,"",@progbits

5. After Step 4, we get source code’s assembly code. Let’s move further to generate its target file. To generate target code, we can use option -c:

luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -c hello.c
luhuang@luhuang-VirtualBox:~/workspace/Hello$ file hello.o
hello.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped

Here I use Linux’s file command to see hello.o’s file type.  You can see, gcc -c generates its target code in the format of 32-bit, Least Significant Byte, Intel x86. (Yes, the target code could not run cross multi-platform 😦 )

The other way to check target file is, to check what methods it invokes. Here we use the nm command to get those information. nm command is very useful when we need to find out ‘undefined symbol’ build error. In the following example, you can see hello.o invokes hello() and printf() methods. It matches with the source code.

luhuang@luhuang-VirtualBox:~/workspace/Hello$ nm hello.o
00000000 T hello
         U printf

Unix also provides another command objdump to help us retrieve detailed information about a target file. In the following information, it will use -x option to get hello.o’s abstract information:

luhuang@luhuang-VirtualBox:~/workspace/Hello$ objdump -x hello.o

hello.o:     file format elf32-i386
architecture: i386, flags 0x00000011:
start address 0x00000000

Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         0000001c  00000000  00000000  00000034  2**2
  1 .data         00000000  00000000  00000000  00000050  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00000000  00000000  00000000  00000050  2**2
  3 .rodata       0000000d  00000000  00000000  00000050  2**0
  4 .comment      0000002b  00000000  00000000  0000005d  2**0
                  CONTENTS, READONLY
  5 .note.GNU-stack 00000000  00000000  00000000  00000088  2**0
                  CONTENTS, READONLY
  6 .eh_frame     00000038  00000000  00000000  00000088  2**2
00000000 l    df *ABS*	00000000 hello.c
00000000 l    d  .text	00000000 .text
00000000 l    d  .data	00000000 .data
00000000 l    d  .bss	00000000 .bss
00000000 l    d  .rodata	00000000 .rodata
00000000 l    d  .note.GNU-stack	00000000 .note.GNU-stack
00000000 l    d  .eh_frame	00000000 .eh_frame
00000000 l    d  .comment	00000000 .comment
00000000 g     F .text	0000001c hello
00000000         *UND*	00000000 printf

6. Ok. Now, we know how GCC compiler preprocessor, compile, and assembly source codes into target files. Let’s generate executable from these target files. Here we use option -o (the toolchain here is: -E, -S, -c, -o):

luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -o hello hello.o main.o
luhuang@luhuang-VirtualBox:~/workspace/Hello$ file hello
hello: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, not stripped

You might note its file information ‘dynamically linked (uses shared libs)’, it means it is using dynamic linked method.

7. Run it.

luhuang@luhuang-VirtualBox:~/workspace/Hello$ ./hello
Greeting Hello!

ldd – print shared library dependencies

luhuang@luhuang-VirtualBox:~/workspace/Hello$ ldd hello =>  (0xb7796000) => /lib/i386-linux-gnu/ (0xb7603000)
	/lib/ (0xb7797000) is standard C library which provides functions like printf.

8. In Linux, it supports Static linked library and Dynamic linked library. In the following let’s see how a static/dynamic linked library works:

How Static linked library work:

1. gcc -c hello.c will compile hello.c to hello.o

2. ar -rs will archive hello.o as static library

3. ar -t will list what .o files have been archived

4. gcc -c main.c will compile main.c to main.o

5. gcc -o hello main.o will fail with complain ‘undefined reference to `hello”.

6. gcc -o hello main.o myhello.a compiles it with myhello.a. It works.

7. show dynamic dependencies. it doesn’t list myhello.a as it has been compiled into the executable itself.

luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -c hello.c
luhuang@luhuang-VirtualBox:~/workspace/Hello$ ar -rs myhello.a hello.o
ar: creating myhello.a
luhuang@luhuang-VirtualBox:~/workspace/Hello$ ar -t myhello.a
luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -c main.c
luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -o hello main.o
main.o: In function `main':
main.c:(.text+0x11): undefined reference to `hello'
collect2: ld returned 1 exit status
luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -o hello main.o myhello.a
luhuang@luhuang-VirtualBox:~/workspace/Hello$ ldd hello =>  (0xb7786000) => /lib/i386-linux-gnu/ (0xb75f3000)
	/lib/ (0xb7787000)

How Dynamic linked library work:

1. use PIC (position-independent code) directive to compile hello.c. That will enable program be loaded into memorry in a dynamic manner.

2. use -shared directive to archive hello.o to

3. show information about It is a shared object.

4. gcc -c main.c to generate main.o

5. generate executable. -L specify the directory of shared object. Here . means current directory.

6. ldd to show dynamic dependencies. You can see it complains ‘ => not found’

7. Although we can use -L to tell program where to locate program libraries, we still have to tell OS where to load them. In Linux, we can use LD_LIBRARY_PATH to specify the location of program libraries.

luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -c -fPIC hello.c
luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -shared -o hello.o
luhuang@luhuang-VirtualBox:~/workspace/Hello$ file ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, not stripped
luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -c main.c
luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -o hello main.c -L .
luhuang@luhuang-VirtualBox:~/workspace/Hello$ ldd hello =>  (0xb778c000) => not found => /lib/i386-linux-gnu/ (0xb75f9000)
	/lib/ (0xb778d000)
luhuang@luhuang-VirtualBox:~/workspace/Hello$ ./hello
./hello: error while loading shared libraries: cannot open shared object file: No such file or directory
luhuang@luhuang-VirtualBox:~/workspace/Hello$ export LD_LIBRARY_PATH=.
luhuang@luhuang-VirtualBox:~/workspace/Hello$ ldd hello =>  (0xb7784000) => ./ (0xb777f000) => /lib/i386-linux-gnu/ (0xb75ee000)
	/lib/ (0xb7785000)
luhuang@luhuang-VirtualBox:~/workspace/Hello$ ./hello
Greeting Hello!


Let me summarize how compiler build software. Basically, a compiler needs to do below similar steps to convert source codes into executable:

1. Pre-processor — check errors in language syntax level.

2. Compile it into target file — compile files into binary target files.

3. Link them — link or load them in memory and run.

You can also refer to book Software Build Systems: Principles and Experience for more details and further study.

Configuration Management

Configuration Management – In this post, I will just conclude some principles in CM.


Configuration management (CM) is a process for establishing and maintaining consistency of a product’s performance, functional and physical attributes with its requirements, design and operational information throughout its life. ” 

Below is the checklist you can use to check whether you have adopted the CM in your team or not and you should “YES” to all the question:

1. Can you reproduce any of your environments – Build, Test, Staging, and Deploy, etc. including OS versions, patchings, networking configuration, technologies stacks, and other software applications and setting deployed upon your environmnets?

2. Can you easily deploy your new changes into your environments in an incremental manner?

3. All your changes in configuration could be tracked?

4. All of the CM rules/policies can be conducted strictly?

5. Is it that anyone from development team, testing team, PMs, etc. can easily visit these CM information?

6. Will your CM strategies delay the software delivery?

Version Control

One of the essential elements of CM is Reproducible. So deploying a Version Control System in your organization is a MUST. There are lots of Version Conrol Tools. Like RCS (Revision Control SYSTEM) shipped with Linux, CVS (Concurrent Versions Control), SVN (Subversion), and Perforce, etc. In this article I will not compare their advantages or disadvantages. But my recommend is, if your team wants to use commercial product, then Perforce will be your choice. If your team wants to use opensource products, I strongly recommend SVN than CVS.

Version control all you have

Here the term ‘Version Control’ is not the same as ‘Source Control’ as Version Control is not merely for Source Control only. In my opinion, all of the artifacts related to your developments should be under version control.  Development codes, testing codes, database scripts, build/deploy scripts, documentations, libraries files, and all of the configuration files, even compilers and tools should be under version control. In this way, it will help new team memmbers join your team easily.

Commit code frequently into mainline

When adopt Version control, there are two principles. Firstly, you should commit codes very frequently so you can find out the build errors supposed you have check-in builds and then you can roll back to the last working version. Secondly, your team members can see what you changed and they can work based on your latest changes.

In some of the development teams, they will create branch to do their developments however that will cause below problems,

1. It is breaking the Continuous Integration principles. As in CI, it requires that we should integrate ASAP and working with branch will delay the integration.

2. It is difficult to merge those branches together if they go too far.

3. It is difficult to do code refactoring.

So, we should commit code frequently and commit them into mainline.

Have meaning comments when commits code

All of your commit should give them a making sense comments. Like, “This commit is for bug#123456, and it would introduce method getId();”. It is humanity to give your commit a meaning comment.

Dependencies Management

In a SDLC, the most common dependencies are 3rd-parties libraries and the dependencies introduced by those 3rd-parties.

Libraries management

Those libraries usually exist with binary files, like a junit.jar, unless you are using interpreting languages. Actually there is argument that whether we should have those libraries under version control. My suggestion is that we should also have version control for those libraries. Yes it is true that you can download them from internet if you want to deploy a new environment and those libraries basically will not get changed. But let’s consider scenario: Your project is using a 3rd-party library ABC-3.2.1.jar from however due to an update in  ABC-3.2.1.jar (Version 2), then your configuration will get updated automatically and it will break your build or introduce potential problems. Maven provides a powerful in dependencies management and it will help you save a copy locally as local repository if you configure it works like that.

Modules management

Modules mean your internal projects. Supposed you are using Maven as the Build Management tool, then all of your internal projects can be deployed into the local repository as a module.

Software Configuration Management

As a critical component, configuration information work together with product code and its data to build the applications. We can control the build, deployment, and live of a software product via configuration information. A delivery team should consider seriously how to configure those information and how to manage them during the full of SDLC.

Configuration VS. Flexibility

Everyone wants to have the highest flexibility but sometimes we just can not have all of the good things in the same time. We need to sacrifice part of the flexibility for a better product quality. The risk of changing in configuration management is lower than changing codes. Supposed that development has its highest flexibility then all of the configuraiton information will be stored in the source codes and any changes will cause a change in source code. That is too much.

Configuration Categories

We can configure system during its processes of build, deployment, testing, and publish. Below is the categories list of configuration information:

1. Configuration information for building software into binary.

2. Configuration information for packaging into a war, ear, etc.

3. Configuration information for deploying software.

4. Configuration information for starting up/running the software.

Principles of Configuration management

We should treat configuration information the same as source code, manage them properly and test them. Below are some points we need to consider,

1. When and what configuration setting we can introduce during a SDLC? For example, for the setting of junit.jar, we should introduce them only in testing phase; for JDBC pool setting, we should have different setting during different phases (like System Testing and Production).

2. All of the configuration information should be stored in the same repository as source code but the detailed information should be stored in a different place. User passwords, web console passwords should not be sourced control for safety.

3. The instantiation of a configuration file should be automated.

4. There should be a naming convention for all of the configuration settings.

5. DRY (Don’t Repeat Yourself). Just ensure that every configuration item works for a special purpose and there are no duplicated item with the same purpose.

Environment Management

We can utilize Virutalization Technologies to do environment management. In my current company, we are using Oracle VM to templatize any environments and it can provide a duplicated environment just in a click. All of the changes in the environment can be tracked.


Configuration is the basic requirement to do Continuous Integration, delivery management. We should version control of everything required to build the software. Of course, we need to balance between Flexible and Configuration (I will have a post later to introduce Maven’s Convention over Configuration.).

Continuous Integration – Session

“Continuous Integration is a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily – leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. Many teams find that this approach leads to significantly reduced integration problems and allows a team to develop cohesive software more rapidly.” – Martin Fowler.

In some companies which don’t have a Release Team they don’t adopt CI even thought they claim that they are going with Agile Development. In my opinion, it is a fake Agile if no CI in their SDLC.

Nowadays it is very common that you have to get ready to deploy your artifact in anytime to satisfy the business requirement and the era that without a professional release engineer that you still can produce good quality product in a timely manner has gone! More and more organizations have their Build/Rlease Team, in other term, DevOps.

I ever had a sharing session with new hires. The session is about what is CI, and how we conduct CI, and the function of Build & Release Team. I hope that through that session ppl especially those new hires could understand well our SDLC.

I shared my slides below with necessary modification and pages removed to satisfy the company policy.

Below is a brief of my session,


Maintain a Single Source Repository
Checkin & Automate the Build
Everyone can see what’s happening
Make it Easy for Anyone to Get the Latest Execute
Make Build Self-Testing
Automate Deployment

The value of CI:

Reduced risks.
Reduce repetitive manual processes
Generate deployable software at any time and at any place
Enable better project visibility
Establish greater confidence in the software product from the development team

CI is not used to find out compile error, although it can.

Compile is the most basic thing CI do, compile error is not acceptable
The target of CI is helping find out integration /deployment problems as earlier as possible.
Ideally, a successful build in CI should:
 1. Compile succeeded
 2. Passed all unit test
 3. Unit test coverage reach the acceptable rate
 4. Passed all functional test and regression test
 5. Passed performance test
 5. Passed user acceptable test if necessary
Any successful build of CI could generate a deliverable package, so CI could & should give confidence to team members that our product can be deployed to production at any time.

CI is one of core practices of Agile, effective CI need the whole team follow other practices, on the other hand, CI could work with other practices to make the whole project better.

Test Driven Development
Automation Testing
Coding standard adherence
Small releases
Collective ownership


Commit code frequently
Don’t commit broken code
Fix broken builds immediately
Write automated developer tests
All tests and inspections must pass
Run private builds
Avoid getting broken code


Source code management –>
Source control system (like CVS, SCCS, Subversion) setup and maintenance
Setup and monitor daily continuous/production builds
Co-ordinate with the development team in case of build failures
Update build tools as per changes in the product output/strategies
Create branches and setup separate build system after a milestone releases
Create build storage archives for back tracking builds

Cross team co-ordination –>
Gather build specifications from the technical team
Document build instructions for the development team
Participate in release/milestone planning and scheduling with the product team
Announce and communicate vetted builds to QA and the documentation team
Work with the localization team to produce multi-language bits
Work with the sustaining team during product patches/upgrades
Coordinate with other dependent groups for their product milestone bits, e.g. aplication server, jdk, etc.

Build output –>
Setup kit server for build storage
Setup a cache server for storing other product bits or third party bits
Upload bits for internal and external customers
Create CD/DVD iso images and physical media
Code Quality Control–>
Setup code standard?
Monitor code quality trend

Software Engineering –>
Agile & CI governed
Automated the more the better
Workflow optimized

You can download the slide from Continuous Integration-Session