How Make build software

This weekend I went through the GNU Make manual http://www.gnu.org/software/make/manual/make.html and basically I have already understood the syntax of Make, Make’s implicit/explicit rules, makefile automatic/defined variables, data structure and functions, and also understood Make’s execute phases (parse makefile to construct directed acyclic graph and execute rules).

As I have already understood the basic of Make, I couldn’t wait to look into OpenJDK’s make system! 🙂 I found a directories diagram about the source codes of OpenJDK ( They called it OpenJDK Mercurial Forest). https://blogs.oracle.com/kto/entry/openjdk_mercurial_forest, It was posted in Oct. 2007 and it was a bit outdated however overall it still reflects the current OpenJDK 8.

Before I describe my understanding of how OpenJDK builds, I want to introduce how Make builds source codes under multi-directories with multi directories levels. For a simply make example (all of the source codes under the same directory), refer to my post Hello World in MAKE. It is not easy if we have to construct a build system with Make to build software with multi-directories. Basically in Make, we have two methods.

Method 1 – Recursive Make:

Take below structure as example, it has one Makefile under the root directory and one Makefile under each sub-directory. This hierarchy of modules can be nested arbitrarily deep. Its method is, makefile will use $(make) to invoke its child makefiles recursively.

The top-level Makefile ofen looks a lot like a shell script:

MODULES = ant bee
all:
for dir in $(MODULES); do \
(cd $$dir; ${MAKE} all); \
done

The ant/Makefile looks like this:

all: main.o
main.o: main.c ../bee/parse.h
$(CC) -I../bee -c main.c

Its directed acyclic graph generated by make in memory should be like below,

So here the logic here is very frank, to build prog, it will recursively build main.o and parse.o. For more details, you might refer to http://aegis.sourceforge.net/auug97.pdf (Recursive Make Considered Harmful), in this thesis, it lists the harmful of recursive make. My understanding is, as in above example it has 2 sub-directories then everything here is fine, however considering we have hundreds of directories and their dependencies are very complicated – in other word, we have to tweak their build sequences, that will be painful. And another pitfall, supposed here main.o needs to invoke a lib generated from another makefile, it will be in risk that main.o will be not built properly as lib might be outdated as here lib is not in the makefile of main.o.

Method 2 – Inclusive Make:

Let’s use below example to explain what is Inclusive Make:

You might have found that, under each subdirectory, it has a .mk file. Inclusive means, in the main entrance of GNU make, it will include the .mk files. In this method, it can be ensure that it always has only one GNU make process (in above Recursive method, it might has more than one process). And the best is, it can manage the dependencies relationship together – that means we will not miss any dependencies and less risk in building outdated or improper software.

The advantages:

1. It has only one GNU make process running and its start up will be faster while in the recursive method, it might invoke hundreds of processes.

2. It still has one makefile to describe the rules for all of the files under each directory. Maintaining only one makefile is nightmare.

3. It maintains all of the dependencies relationship together and reduce the risk of generating improper build artfacts.

4. As said in point 3, its dependencies relationship is maintained by make together hence we don’t need to maintain the $(MAKE) sequences.

Well, let’s talk about its disadvantages 😦

1. It’s more difficult and complicated to compose makefiles as in make, a ‘include’ directive means including the text literal and we have to take care of variables declare, value assign, goals define more carefully!

How OpenJDK maintains makefiles

We have looked into two methods how make builds source codes in different directories with many directories levels. Let’s see into OpenJDK and experience how it maintains its build system structure.

In my post Let’s build openjdk, OpenJDK can be built just in one word: make. Below is its targets http://hg.openjdk.java.net/jdk8/jdk8/raw-file/tip/README-builds.html:

Make Target Description
empty build everything but no images
all build everything including images
all-conf build all configurations
images create complete j2sdk and j2re images
install install the generated images locally, typically in /usr/local
clean remove all files generated by make, but not those generated by configure
dist-clean remove all files generated by both and configure (basically killing the configuration)
help give some help on using make, including some interesting make targets

Let’s see how it works.

1. under common/makefiles/, it has one file Makefile with only one line

include ../../NewMakefile.gmk

2. In NewMakefile.gmk, it has below include snippets:

# ... and then we can include our helper functions
include $(root_dir)/common/makefiles/MakeHelpers.gmk
...
    ifeq ($(words $(SPEC)),1)
        # We are building a single configuration. This is the normal case. Execute the Main.gmk file.
        include $(root_dir)/common/makefiles/Main.gmk
    else
...
include $(root_dir)/common/makefiles/Jprt.gmk
...
help:
	$(info )
	$(info OpenJDK Makefile help)
...
	$(info )

.PHONY: help

3. Let’s see into $(root_dir)/common/makefiles/Main.gmk

In Main.gmk, it will inlcude MakeBase.gmk and others. in the right panel, it lists all of its targets and matches with the table I listed above.

 

Conclusion:

OpenJDK is using Inclusive make to build its source codes. It has one .gmk file under each source directory.

However to support build components individually, OpenJDK also provides Makefile for each component. OpenJDK consists of below components.

Repository Contains
. (root) common configure and makefile logic
hotspot source code and make files for building the OpenJDK Hotspot Virtual Machine
langtools source code for the OpenJDK javac and language tools
jdk source code and make files for building the OpenJDK runtime libraries and misc files
jaxp source code for the OpenJDK JAXP functionality
jaxws source code for the OpenJDK JAX-WS functionality
corba source code for the OpenJDK Corba functionality
nashorn source code for the OpenJDK JavaScript implementation

For example, for corba, under corba/ directory, it has a Makefile,

...
#
# Makefile for building the corba workspace.
#

BUILDDIR=.
include $(BUILDDIR)/common/Defs.gmk
include $(BUILDDIR)/common/CancelImplicits.gmk

#----- commands

CHMOD = chmod
CP = cp
ECHO = echo # FIXME
...
# Default target
default: all

#----- classes.jar

CLASSES_JAR = $(LIB_DIR)/classes.jar
$(CLASSES_JAR):
	$(MKDIR) -p $(@D)
	$(BOOT_JAR_CMD) -cf $@ -C $(CLASSES_DIR) .

#----- src.zip

SRC_ZIP_FILES = $(shell $(FIND) $(SRC_CLASSES_DIR) \( -name \*-template \) -prune -o -type f -print )
...
jprt_build_product jprt_build_debug jprt_build_fastdebug: all
	( $(CD) $(OUTPUTDIR) && \
	  $(ZIP) -q -r $(JPRT_ARCHIVE_BUNDLE) build dist )

#-------------------------------------------------------------------
...
#
# Phonies to avoid accidents.
#
.PHONY: all build clean clobber debug jprt_build_product jprt_build_debug jprt_build_fastdebug

Basically, I am clear about how Make works! Awesome!!! 😉

 

Hello World in MAKE

I had no experience in building software using MAKE. However as I mentioned in my post Let’s build openjdk, I am very interested in how javenet builds their OpenJDK (It is a very good mean to learn how to compose elegant build scripts by studying from open community). But, I know, before I can enjoy OpenJDK’s build scripts, I need to head into MAKE firstly.

Source codes I will use in this Hello World tutorial:

add.c

#include
#include "calc.h"

void add(const char *string)
{
	printf("I am adder %s\n", string);
}

sub.c

#include
#include "calc.h"

void sub(const char *string)
{
	printf("I am subber %s\n", string);
}

calc.c

#include "calc.h"

int main(int argc, char *argv[])
{
	add("1");
	sub("1");
	return 0;
}

calc.h

extern void add(const char *string);
extern void sub(const char *string);

Compile above source codes manually:

luhuang@luhuang:~/workspace/calculator$ pwd
/home/luhuang/workspace/calculator
luhuang@luhuang:~/workspace/calculator$ ls
add.c  calc.c  calc.h  makefile  sub.c
luhuang@luhuang:~/workspace/calculator$ gcc -g -c add.c
luhuang@luhuang:~/workspace/calculator$ gcc -g -c sub.c
luhuang@luhuang:~/workspace/calculator$ gcc -g -c calc.c
luhuang@luhuang:~/workspace/calculator$ gcc -g -o calculator calc.o add.o sub.o
luhuang@luhuang:~/workspace/calculator$ ./calculator
I am adder 1
I am subber 1
luhuang@luhuang:~/workspace/calculator$ ls
add.c  add.o  calc.c  calc.h  calc.o  calculator  makefile  sub.c  sub.o
luhuang@luhuang:~/workspace/calculator$

You can refer to my post How Compiler build Software for more details.

Let’s start using MAKE!

http://www.gnu.org/software/make/manual/make.html

Preparing and Running Make,

You can simply run the command ‘make’ from shell:

luhuang@luhuang:~$ make
make: *** No targets specified and no makefile found.  Stop.
luhuang@luhuang:~$

make needs a file called the makefile that describes the relationships among files in your program and provides commands for updating each file.

In above command, it fails with error ‘no makefile’ file found as under that directory it doesn’t have a default file called ‘makefile’. So firstly let’s compose a makefile.

# I am a comment
# Hello world in MAKE
calculator: calc.o add.o sub.o
	gcc -g -o calculator calc.o add.o sub.o

add.o: add.c calc.h
	gcc -g -c add.c

sub.o: sub.c calc.h
	gcc -g -c sub.c

calc.o: calc.c calc.h
	gcc -g -c calc.c

In above makefile, it defines the relationships among files calc.c, sub.c, add.c, calc.h. A line started with # means it is a comment and will be ignored.

-g means enable debug. -c means create target file and -o means linking the .o files as a program. you need to put a tab character at the beginning of every recipe line! This is an obscurity that catches the unwary. If you prefer to prefix your recipes with a character other than tab, you can set the .RECIPEPREFIX variable to an alternate character. The first two lines are a rule of make.

A simple makefile consists of “rules” with the following shape:

     target ... : prerequisites ...
             recipe
             ...
             ...

Run it!

luhuang@luhuang:~/workspace/calculator$ make
gcc -g -c calc.c
gcc -g -c add.c
gcc -g -c sub.c
gcc -g -o calculator calc.o add.o sub.o
luhuang@luhuang:~/workspace/calculator$
luhuang@luhuang:~/workspace/calculator$ make
make: `calculator' is up to date.
luhuang@luhuang:~/workspace/calculator$ rm *.o
luhuang@luhuang:~/workspace/calculator$ rm calculator
luhuang@luhuang:~/workspace/calculator$ make -f makefile
gcc -g -c calc.c
gcc -g -c add.c
gcc -g -c sub.c
gcc -g -o calculator calc.o add.o sub.o
luhuang@luhuang:~/workspace/calculator$

You could see that the output sequence of the first make command is similar as our manual run. the makefile can be specified by -f option. Also make is smart that if no changes in source code, it will not recompile the program with output ‘make: `calculator’ is up to date.’. If I need to rerun make, I need to run ‘rm *.o’ and ‘rm calculator’ firstly. Let’s improve it by introducing a new target ‘clean’:

clean:
	rm *.o
	rm calculator

and see,

luhuang@luhuang:~/workspace/calculator$ make -f makefile
gcc -g -c calc.c
gcc -g -c add.c
gcc -g -c sub.c
gcc -g -o calculator calc.o add.o sub.o
luhuang@luhuang:~/workspace/calculator$ ls
add.c  add.o  calc.c  calc.h  calc.o  calculator  makefile  makefile~  sub.c  sub.o
luhuang@luhuang:~/workspace/calculator$ make -f makefile clean
rm *.o
rm calculator
luhuang@luhuang:~/workspace/calculator$ ls
add.c  calc.c  calc.h  makefile  makefile~  sub.c
luhuang@luhuang:~/workspace/calculator$

Let’s continue to improve it by removing the repeated here,

# I am a comment
# Hello world in MAKE
SRCS = add.c sub.c calc.c
OBJS = $(SRCS:.c=.o)
HED  = calc.h
PROG = calculator
CC = gcc
CFLAGS = -g

$(PROG): $(OBJS)
	$(CC) $(CFLAGS) -o $@ $^

$(OBJS): $(HED)

clean:
	rm *.o
	rm $(PROG)

In above improved script, it uses key=value to define variables and use $(key) to use it. The variable CFLAGS exists so you can specify flags for C compilation by implicit rules. Here it has another implicit rule: the target file of fileA.c will be named as fileA.o, in above script,  OBJS = $(SRCS:.c=.o) is using this name convention. The value of $@ is the value of left part of rule and $^ is the right part. So in this case, $@ = ‘calculator’ and $^ = ‘$(SRCS:.c=.o)’.

Ok, let me summarize what I learnt so far.

1. What a rule looks like:

     target ... : prerequisites ...
             recipe
             ...
             ...

2. A simple Makefile and how to issue make command with -f

3. How to define variables and use them.

4. Define additional targets

In the coming week, I will continue to refresh myself in make. Like other languages, make will have conditional control, functions, and include directives. After I equip myself with enough make knowledge, I will digest the make scripts of OpenJDK.

!!! Stay tuned !!! 😉

 

Let’s build openjdk

As a builder, I am interested in how javanet builds their OpenJDK. In post https://blogs.oracle.com/kto/entry/jdk_build_musings, it provides some insights into the world of JDK build and test. I summarize those insights  which are generic and provides my comments and list below. They are some interested but very basic elements a build system should have,

  • Continuous Build/Integration & Automated Test
    Every component, every integration area, every junction point or merge point should be constantly built and smoke tested. Regarding to continuous build/integration, I strongly recommend book Continuous Integration: Improving Software Quality and Reducing Risk. Basically a build system should be well designed to support continuous build. There are lots of applications to support continuous build/integration, like Hudson, Jenkins, Cruise, etc. These applications have rich extensions to support your personal and special requirements.  For smoke testing, actually CI requires that in your build flows, ASA the complication of source code completes, a set of steps of testings should be executed as well.
  • Build and Test Machines/Multiple Platforms
    The hardware/machine resources for a build and test system is cheap, and a bargain if it keeps all developers shielded from bad changes, finding issues as early in the development process as possible. But it is also true that hardware/machine resources do not manage themselves, so there is also an expense to managing the systems, some of it can be automated but not everything. Virtual machines can provide benefits here, but they also introduce complications. Here, in my current working company, we make use of Oracle VM machines to provide virtual environments to support build/release activities. By adopting VMs, we can shorten our time in preparing environments and configuring environments. What is more, it can help make sure our configuration is consistent. In OpenJDK, they provide a script called /configure to help verify whether your environment is ready to build JDK or not. I will introduce it later.
  • Partial Builds/Build Flavors
    With the JDK we have a history of doing what we call partial builds. The hotspot team rarely builds the entire jdk, but instead just builds hotspot (because that is the only thing they changed) and then places their hotspot in a vetted jdk image that was built by the Release Engineering team at the last build promotion. Dito for the jdk teams that don’t work on hotspot, they rarely build hotspot. This was and still is considered a developer optimization, but is really only possible because of the way the JVM interfaces to the rest of the jdk, it rarely changes. To some degree, successful partial builds can indicate that the changes have not created an interface issue and can be considered somewhat ‘compatible’.
    These partial builds create issues when there are changes in both hotspot and the rest of the jdk, where both changes need to be integrated at the same time, or more likely, in a particular order, e.g. hotspot integrates a new extern interface, later the jdk team integrates a change that uses or requires that interface, ideally after the hotspot changes have been integrated into a promoted build so everyone’s partial builds have a chance of working.
    The partial builds came about mostly because of build time, but also because of the time and space needed to hold all the sources of parts of the product you never really needed. I also think there is a comfort effect by a developer not having to even see the sources to everything he or she doesn’t care about. I’m not convinced that the space and time of getting the sources is that significant anymore, although I’m sure I would get arguments on that. The build speed could also become less of an issue as the new build infrastructure speeds up building and makes incremental builds work properly. But stay tuned on this subject, partial builds are not going away, but it’s clear that life would be less complicated without them.
  • Mercurial
    Probably applies to Git or any distributed Source Code Management system too.
    OpenJDK just use Mercurial as its SCM tool. But here I am open that we can just choose on our demand. I have experience in Performance, VSS, SVN, and RCS! (Yes, The RCS from Linux! 🙂 )
  • Nested Repositories
    Not many projects have cut up the sources like the OpenJDK. There were multiple reasons for it, but it often creates issues for tools that either don’t understand the concept of nested repositories, or just cannot handle them.
  • Managing Build and Test Dependencies
    Some build and test dependencies are just packages or products installed on a system, I’ve often called those “system dependencies”. But many are just tarballs or zip bundles that needs to be placed somewhere and referred to. In my opinion, this is a mess, we need better organization here. Yeah yeah, I know someone will suggest Maven or Ivy, but it may not be that easy. Maven is an awesome tool to manage dependencies. You will definitely fall in love with him ASA you give him a hug!
  • Resolved Bugs and Changesets
    Having a quick connection between a resolved bug and the actual changes that fixed it is so extremely helpful that you cannot be without this. The connection needs to be both ways too. It may be possible to do this completely in the DSCM (Mercurial hooks), but in any case it is really critical to have that easy path between changes and bug reports. And if the build and test system has any kind of archival capability, also to that job data.
  • Distributed Builds
    Work in a distributed way is not so easy. I ever built software in a distributed way by using Cruise — it provides distributed support by running build in different agents in different machines. In testing, for example, for unit testing and smoke testing, we can try define two independent flows for unit testing and smoke testing respectively and triggered ASA the source code compilation succeeds.
  • Killing Builds and Tests
    At some point, you need to be able to kill off a build or test, probably many builds and many tests on many different systems. This can be easy on some systems, and hard with others. Using Virtual Machines or ghosting of disk images provides a chance of just system shutdowns and restarts with a pristine state, but that’s not simple logic to get right for all systems. I think here the concern is, how could we have a better control of our build flows. To have a better contorl the flows, we can add pauses, split the flows into some more independent flows and define their dependencies so one build can trigger another one only when all of the prerequiests are satisified.

To support and enhance the build of OpenJDK, OpenJDK team launched a project ‘https://blogs.oracle.com/kto/entry/build_infrastructure_project‘ to enhance in below points,

  • Different build flavors, same build flow
  • Ability to use ‘make -j N‘ on large multi-CPU machines is critical, as is being able to quickly and reliably get incremental builds done, this means:
    • target dependencies must be complete and accurate
    • nested makes should be avoided
    • ant scripts should be avoided for multiple reasons (it is a form of nested make), but we need to allow for IDE builds at the same time
    • rules that generate targets will need to avoid timestamp changes when the result has not changed
    • Java package compilations need to be made parallel and we also need to consider some kind of javac server setup (something that had been talked about a long time ago)
  • Continued use of different compilers: gcc/g++ (various versions), Sun Studio (various versions), and Windows Visual Studio (various versions)
  • Allow for clean cross compilation, this means making sure we just build it and not run it as part of the build
  • Nested repositories need to work well, so we need a way to share common make logic between repositories
  • The build dependencies should be managed as part of the makefiles

(I am going to study how to build software with MAKE. The build scritps of OpenJDK will be good materials. Hooray! 🙂 ). More details about OpenJDK build infrastructure group, http://openjdk.java.net/groups/build/

Let’s build OpenJDK

(Here I am building it based on http://hg.openjdk.java.net/jdk8/jdk8/raw-file/tip/README-builds.html – OpenJDK 8) and https://blogs.oracle.com/kto/entry/jdk8_new_build_infrastructure

Getting the source,

As I said above, OpenJDK uses http://mercurial.selenic.com/ to do its source control. So install Mercurial firstly if you have not yet done.

My Environment is

Linux luhuang-VirtualBox 3.0.0-32-generic-pae #51-Ubuntu SMP Thu Mar 21 16:09:48 UTC 2013 i686 i686 i386 GNU/Linux

Run below commands as ‘root’ user or sudo,

Install and update your aptitude, purge openjdk-6* if installed, and install necessary packages,

1. apt-get install aptitude

root@luhuang-VirtualBox:/home/luhuang# apt-get install aptitude
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
  libboost-iostreams1.46.1 libclass-accessor-perl libcwidget3 libept1
  libio-string-perl libparse-debianchangelog-perl libsub-name-perl
Suggested packages:
  aptitude-doc-en aptitude-doc tasksel debtags libcwidget-dev
  libhtml-parser-perl libhtml-template-perl libxml-simple-perl
The following NEW packages will be installed:
  aptitude libboost-iostreams1.46.1 libclass-accessor-perl libcwidget3 libept1
  libio-string-perl libparse-debianchangelog-perl libsub-name-perl
0 upgraded, 8 newly installed, 0 to remove and 449 not upgraded.
Need to get 2,985 kB of archives.
After this operation, 9,236 kB of additional disk space will be used.
Do you want to continue [Y/n]? Y
...

2. aptitude update

3. apt-get purge openjdk-6*

Why we have to purge openjdk-6* before continue?

“Install a bootstrap JDK. All OpenJDK builds require access to a previously released JDK called the bootstrap JDK or boot JDK. The general rule is that the bootstrap JDK must be an instance of the previous major release of the JDK. In addition, there may be a requirement to use a release at or beyond a particular update level”

4. aptitude install mercurial openjdk-7-jdk rpm ssh expect tcsh csh ksh gawk g++ ccache build-essential lesstif2-dev

root@luhuang-VirtualBox:/home/luhuang# aptitude install mercurial openjdk-7-jdk rpm ssh expect tcsh csh ksh gawk g++ ccache build-essential lesstif2-dev
The following NEW packages will be installed:
  ca-certificates-java{a} ccache csh expect gawk icedtea-7-jre-jamvm{a}
  java-common{a} ksh lesstif2{a} lesstif2-dev libbonobo2-0{a}
  libbonobo2-common{a} libexpat1-dev{a} libfontconfig1-dev{a}
  libfreetype6-dev{a} libgnome2-0{a} libice-dev{a} libnss3-1d{a}
  libpthread-stubs0{a} libpthread-stubs0-dev{a} librpm2{a} librpmbuild2{a}
  librpmio2{a} librpmsign0{a} libsigsegv2{a} libsm-dev{a} libx11-dev{a}
  libxau-dev{a} libxcb1-dev{a} libxdmcp-dev{a} libxext-dev{a} libxft-dev{a}
  libxp-dev{a} libxrender-dev{a} libxt-dev{a} mercurial mercurial-common{a}
  openjdk-7-jdk openjdk-7-jre{a} openjdk-7-jre-headless{a}
  openjdk-7-jre-lib{a} openssh-server{a} rpm rpm-common{a} rpm2cpio{a} ssh
  ssh-import-id{a} tcl8.5{a} tcsh ttf-dejavu-extra{a} tzdata-java{a}
  x11proto-core-dev{a} x11proto-input-dev{a} x11proto-kb-dev{a}
  x11proto-print-dev{a} x11proto-render-dev{a} x11proto-xext-dev{a}
  xorg-sgml-doctools{a} xtrans-dev{a} zlib1g-dev{a}
The following packages will be upgraded:
  libexpat1 libfreetype6 libnss3 tzdata
4 packages upgraded, 60 newly installed, 0 to remove and 445 not upgraded.
Need to get 80.5 MB/81.4 MB of archives. After unpacking 148 MB will be used.
Do you want to continue? [Y/n/?]
...

(It seems the dependencies have changed since their last update in the pages, I still have to install below packages):

5. apt-get install libX11-dev libxext-dev libxrender-dev libxtst-dev

6. apt-get install libcups2-dev

7. apt-get install libasound2-dev

Run below commands as your working user to get the jdk8/build sources. Here for convenient, I am also using root user:

8. hg clone http://hg.openjdk.java.net/jdk8/build jdk8-build

9. cd jdk8-build

10. sh ./get_source.sh

Example:

root@luhuang-VirtualBox:/media/sf_shared# hg clone http://hg.openjdk.java.net/jdk8/build jdk8-build
requesting all changes
adding changesets
adding manifests
adding file changes
added 774 changesets with 1018 changes to 118 files
updating to branch default
101 files updated, 0 files merged, 0 files removed, 0 files unresolved
...
root@luhuang-VirtualBox:/media/sf_shared/jdk8-build# sh ./get_source.sh
# Repositories:  corba jaxp jaxws langtools jdk hotspot nashorn

                corba:   /usr/bin/python -u /usr/bin/hg clone http://hg.openjdk.java.net/jdk8/build/corba corba
                 jaxp:   /usr/bin/python -u /usr/bin/hg clone http://hg.openjdk.java.net/jdk8/build/jaxp jaxp
Waiting 5 secs before spawning next background command.
                 jaxp:   requesting all changes
                corba:   requesting all changes
                corba:   adding changesets
                 jaxp:   adding changesets
                jaxws:   /usr/bin/python -u /usr/bin/hg clone http://hg.openjdk.java.net/jdk8/build/jaxws jaxws
            langtools:   /usr/bin/python -u /usr/bin/hg clone http://hg.openjdk.java.net/jdk8/build/langtools langtools
...

Then do your build:

11. chmod a+x common/bin/*

12. cd common/makefiles

13. bash ../autoconf/configure

Configure will try to figure out what system you are running on and where all necessary build components are. If you have all prerequisites for building installed, it should find everything. If it fails to detect any component automatically, it will exit and inform you about the problem. I think the philosophy of Configure is very awesome. In my daily builds, I have some scripts and docs to do sanity of a build server but I don’t design a tool elegant like this to automate everything and detect missing components and give smart suggestions like this!
Example (a failed configure check with suggestion):

configure: error: Could not find all X11 headers (shape.h Xrender.h XTest.h). You might be able to fix this by running 'sudo apt-get install libX11-dev libxext-dev libxrender-dev libxtst-dev'.
configure exiting with result code 1
configure: error: Could not find cups! You might be able to fix this by running 'sudo apt-get install libcups2-dev'.
configure exiting with result code 1

Example (a successful configure check):

...
A new configuration has been successfully created in
/media/sf_shared/jdk8-build/build/linux-x86-normal-server-release
using default settings.

Configuration summary:
* Debug level:    release
* JDK variant:    normal
* JVM variants:   server
* OpenJDK target: OS: linux, CPU architecture: x86, address length: 32

Tools summary:
* Boot JDK:       java version "1.7.0_21" OpenJDK Runtime Environment (IcedTea 2.3.9) (7u21-2.3.9-0ubuntu0.11.10.1) OpenJDK Client VM (build 23.7-b01, mixed mode, sharing)  (at /usr/lib/jvm/java-7-openjdk)
* C Compiler:     gcc-4.6 (Ubuntu/Linaro 4.6.1-9ubuntu3) version 4.6.1 (at /usr/bin/gcc-4.6)
* C++ Compiler:   g++-4.6 (Ubuntu/Linaro 4.6.1-9ubuntu3) version 4.6.1 (at /usr/bin/g++-4.6)

Build performance summary:
* Cores to use:   1
* Memory limit:   4031 MB
* ccache status:  installed and in use
...

Wow, so elegant! A good philosophy to do sanity testing for a build server!

Building JDK 8 requires use of a version of JDK 7 that is at Update 7 or newer. JDK 8 developers should not use JDK 8 as the boot JDK, to ensure that JDK 8 dependencies are not introduced into the parts of the system that are built with JDK 7

Note that some Linux systems have a habit of pre-populating your environment variables for you, for example JAVA_HOME might get pre-defined for you to refer to the JDK installed on your Linux system. You will need to unset JAVA_HOME. It’s a good idea to run env and verify the environment variables you are getting from the default system settings make sense for building the OpenJDK.

14. make
Ready? Fasten your seatbelt. Go!

...
Compiling /media/sf_shared/jdk8-build/hotspot/src/share/vm/utilities/yieldingWorkgroup.cpp
Compiling /media/sf_shared/jdk8-build/hotspot/src/share/vm/runtime/vm_version.cpp
Linking vm...
ln: creating symbolic link `libjvm.so.1': Protocol error
Making signal interposition lib...
Making SA debugger back-end...
**NOTICE** Dtrace support disabled: /usr/include/sys/sdt.h not found
All done.
INFO: ENABLE_FULL_DEBUG_SYMBOLS=1
INFO: ALT_OBJCOPY=/usr/bin/objcopy
INFO: /usr/bin/objcopy cmd found so will create .debuginfo files.
INFO: STRIP_POLICY=min_strip
INFO: ZIP_DEBUGINFO_FILES=1
warning: [options] bootstrap class path not set in conjunction with -source 1.6
1 warning
Generating linux_i486_docs/jvmti.html
INFO: ENABLE_FULL_DEBUG_SYMBOLS=1
INFO: ALT_OBJCOPY=/usr/bin/objcopy
INFO: /usr/bin/objcopy cmd found so will create .debuginfo files.
INFO: STRIP_POLICY=min_strip
INFO: ZIP_DEBUGINFO_FILES=1
## Finished hotspot (build time 00:07:09)

## Starting corba
Compiling 6 files for BUILD_LOGUTIL
Creating corba/btjars/logutil.jar
Compiling 141 files for BUILD_IDLJ
...
## Finished jdk (build time 00:11:38)

----- Build times -------
Start 2013-08-28 18:54:01
End   2013-08-28 19:40:17
00:00:28 corba
00:31:07 hotspot
00:00:32 jaxp
00:01:57 jaxws
00:11:38 jdk
00:00:32 langtools
00:46:16 TOTAL
-------------------------
Finished building OpenJDK for target 'default'

Look, it has my signature!

luhuang@luhuang:~/build/jdk8-build/build/linux-x86-normal-server-release/jdk/bin$ date
Wed Aug 28 20:12:00 CST 2013
luhuang@luhuang:~/build/jdk8-build/build/linux-x86-normal-server-release/jdk/bin$ ./java -version
openjdk version "1.8.0-internal"
OpenJDK Runtime Environment (build 1.8.0-internal-luhuang_2013_08_28_18_53-b00)
OpenJDK Server VM (build 25.0-b47, mixed mode)
luhuang@luhuang:~/build/jdk8-build/build/linux-x86-normal-server-release/jdk/bin$

So, we have done with the build of OpenJDK8. So easy! Thanks to the elegant build infrastructure of OpenJDK that we can build OpenJDK in just several commands!

 

How Compiler build Software

Yesterday I refrehsed myself about various source file types and how to get them built by tools respectively. In this post I will summary my study note.

In this post, I will use C as sample language as C is higher level than assembly language, it is closer with OS than Java hence it is a good one to be the example.

Compiler for C,

There are various C compilers and the most famous compiler should be GNU Compiler Collection. The GNU Compiler Collection (GCC) is a compiler system produced by the GNU Project. GCC is a key component of the GNU toolchain (in other word, GCC toolchain method to compile codes). The GNU toolchain is a blanket term for a collection of programming tools produced by the GNU Project. In GCC, it consists by below components:

1. C preprocessor – The C preprocessor implements the macro languages used to transform C, C++ and other programs before they are compiled.

2. C compiler – The C compiler compiles source codes into assembly language.

3. assembler – The assembler compiles assembly language into target file (binary code).

4. linker – The linker links target files into a single executable program.

More details about GNU GCC, http://en.wikipedia.org/wiki/GNU_Compiler_Collection

In the following examples, I will show you how to compile C codes via GCC. My demo platform is,

luhuang@luhuang-VirtualBox:~/workspace/Hello$ uname -a
Linux luhuang-VirtualBox 3.0.0-32-generic-pae #51-Ubuntu SMP Thu Mar 21 16:09:48 UTC 2013 i686 i686 i386 GNU/Linux

Source code:

Let’s see our material firstly:

main.c

#include "hello.h"

int main(int argc, char *argv[]){
	if (MAX(1,2) == 2){
		hello("Hello!");
	}
	return 0;
}

hello.c

#include <stdio.h>
#include "hello.h"

void hello(const char *string)
{
	printf("Greeting %s\n", string);
}

hello.h

extern void hello(const char *string);

#define MAX(a,b) ((a) > (b) ? (a) : (b))

Let’s see what above three files will do:

1. In main.c, its first line tells C compiler to include ‘hello.h’.

2. In hello.h, it defines a Marco ‘MAX’ and in main.c it invokes the MAX macro.

3. In hello.c, it includes two header file. The first header comes from C’s built-in stdio.h and it provides the standard printf function.

4. In hello.h, it defines the function prototype of hello(*) and marco MAX(a,b).

Ok, let’s compile them!

1. let’s go to source dir,

luhuang@luhuang-VirtualBox:~/workspace/Hello$ ls
hello.c hello.h main.c

You can see it has three source codes I described above only.
2. Compile it. option -c means compiling source codes into target file. Actually in the back-end, it invokes C preprocessor, C compiler and assembler in sequence to compile source codes into target file. In C language, a basic unit of compiling is a C source code file (.c) and its header file ends with .h. Similarly, its name of target file will be end with .o. For a header file, it won’t generate any .o file.

luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -c hello.c
luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -c main.c
luhuang@luhuang-VirtualBox:~/workspace/Hello$ ls
hello.c  hello.h  hello.o  main.c  main.o

3. If you want to invoke preprocessor explicitly. You can use option -E. With option -E, C preprocessor will just process source codes’ #include directives and Marco. -E will tell GCC processes only #include directives and marco. It won’t do any compile work. In the following example you can see, it replaces

	if (MAX(1,2) == 2){

with

if (((1) > (2) ? (1) : (2)) == 2){

Let’s see how it preprocesses main.c,
luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -E main.c

# 1 "main.c"
# 1 ""
# 1 ""
# 1 "main.c"
# 1 "hello.h" 1
extern void hello(const char *string);
# 2 "main.c" 2

int main(int argc, char *argv[]){
if (((1) > (2) ? (1) : (2)) == 2){
hello("Hello!");
}
return 0;
}

4. Let’s see how GCC compile source codes into assembly code with option -S. As I said above, GCC works in the way of toolchain. That is to say, if you invoke -S, it will do preprocessor -E firslty. Let’s see below example,

luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -S hello.c
luhuang@luhuang-VirtualBox:~/workspace/Hello$ cat hello.s
	.file	"hello.c"
	.section	.rodata
.LC0:
	.string	"Greeting %s\n"
	.text
	.globl	hello
	.type	hello, @function
hello:
.LFB0:
	.cfi_startproc
	pushl	%ebp
	.cfi_def_cfa_offset 8
	.cfi_offset 5, -8
	movl	%esp, %ebp
	.cfi_def_cfa_register 5
	subl	$24, %esp
	movl	$.LC0, %eax
	movl	8(%ebp), %edx
	movl	%edx, 4(%esp)
	movl	%eax, (%esp)
	call	printf
	leave
	.cfi_restore 5
	.cfi_def_cfa 4, 4
	ret
	.cfi_endproc
.LFE0:
	.size	hello, .-hello
	.ident	"GCC: (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1"
	.section	.note.GNU-stack,"",@progbits
luhuang@luhuang-VirtualBox:~/workspace/Hello$

5. After Step 4, we get source code’s assembly code. Let’s move further to generate its target file. To generate target code, we can use option -c:

luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -c hello.c
luhuang@luhuang-VirtualBox:~/workspace/Hello$ file hello.o
hello.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
luhuang@luhuang-VirtualBox:~/workspace/Hello$

Here I use Linux’s file command to see hello.o’s file type.  You can see, gcc -c generates its target code in the format of 32-bit, Least Significant Byte, Intel x86. (Yes, the target code could not run cross multi-platform 😦 )

The other way to check target file is, to check what methods it invokes. Here we use the nm command to get those information. nm command is very useful when we need to find out ‘undefined symbol’ build error. In the following example, you can see hello.o invokes hello() and printf() methods. It matches with the source code.

luhuang@luhuang-VirtualBox:~/workspace/Hello$ nm hello.o
00000000 T hello
         U printf
luhuang@luhuang-VirtualBox:~/workspace/Hello$

Unix also provides another command objdump to help us retrieve detailed information about a target file. In the following information, it will use -x option to get hello.o’s abstract information:

luhuang@luhuang-VirtualBox:~/workspace/Hello$ objdump -x hello.o

hello.o:     file format elf32-i386
hello.o
architecture: i386, flags 0x00000011:
HAS_RELOC, HAS_SYMS
start address 0x00000000

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         0000001c  00000000  00000000  00000034  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  1 .data         00000000  00000000  00000000  00000050  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00000000  00000000  00000000  00000050  2**2
                  ALLOC
  3 .rodata       0000000d  00000000  00000000  00000050  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .comment      0000002b  00000000  00000000  0000005d  2**0
                  CONTENTS, READONLY
  5 .note.GNU-stack 00000000  00000000  00000000  00000088  2**0
                  CONTENTS, READONLY
  6 .eh_frame     00000038  00000000  00000000  00000088  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
SYMBOL TABLE:
00000000 l    df *ABS*	00000000 hello.c
00000000 l    d  .text	00000000 .text
00000000 l    d  .data	00000000 .data
00000000 l    d  .bss	00000000 .bss
00000000 l    d  .rodata	00000000 .rodata
00000000 l    d  .note.GNU-stack	00000000 .note.GNU-stack
00000000 l    d  .eh_frame	00000000 .eh_frame
00000000 l    d  .comment	00000000 .comment
00000000 g     F .text	0000001c hello
00000000         *UND*	00000000 printf

6. Ok. Now, we know how GCC compiler preprocessor, compile, and assembly source codes into target files. Let’s generate executable from these target files. Here we use option -o (the toolchain here is: -E, -S, -c, -o):

luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -o hello hello.o main.o
luhuang@luhuang-VirtualBox:~/workspace/Hello$ file hello
hello: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, not stripped

You might note its file information ‘dynamically linked (uses shared libs)’, it means it is using dynamic linked method.

7. Run it.

luhuang@luhuang-VirtualBox:~/workspace/Hello$ ./hello
Greeting Hello!

ldd – print shared library dependencies

luhuang@luhuang-VirtualBox:~/workspace/Hello$ ldd hello
	linux-gate.so.1 =>  (0xb7796000)
	libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xb7603000)
	/lib/ld-linux.so.2 (0xb7797000)

libc.so.6 is standard C library which provides functions like printf.

8. In Linux, it supports Static linked library and Dynamic linked library. In the following let’s see how a static/dynamic linked library works:

How Static linked library work:

1. gcc -c hello.c will compile hello.c to hello.o

2. ar -rs will archive hello.o as static library

3. ar -t will list what .o files have been archived

4. gcc -c main.c will compile main.c to main.o

5. gcc -o hello main.o will fail with complain ‘undefined reference to `hello”.

6. gcc -o hello main.o myhello.a compiles it with myhello.a. It works.

7. show dynamic dependencies. it doesn’t list myhello.a as it has been compiled into the executable itself.

luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -c hello.c
luhuang@luhuang-VirtualBox:~/workspace/Hello$ ar -rs myhello.a hello.o
ar: creating myhello.a
luhuang@luhuang-VirtualBox:~/workspace/Hello$ ar -t myhello.a
hello.o
luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -c main.c
luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -o hello main.o
main.o: In function `main':
main.c:(.text+0x11): undefined reference to `hello'
collect2: ld returned 1 exit status
luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -o hello main.o myhello.a
luhuang@luhuang-VirtualBox:~/workspace/Hello$ ldd hello
	linux-gate.so.1 =>  (0xb7786000)
	libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xb75f3000)
	/lib/ld-linux.so.2 (0xb7787000)
luhuang@luhuang-VirtualBox:~/workspace/Hello$

How Dynamic linked library work:

1. use PIC (position-independent code) directive to compile hello.c. That will enable program be loaded into memorry in a dynamic manner.

2. use -shared directive to archive hello.o to myhellolib.so

3. show information about myhellolib.so. It is a shared object.

4. gcc -c main.c to generate main.o

5. generate executable. -L specify the directory of shared object. Here . means current directory.

6. ldd to show dynamic dependencies. You can see it complains ‘myhellolib.so => not found’

7. Although we can use -L to tell program where to locate program libraries, we still have to tell OS where to load them. In Linux, we can use LD_LIBRARY_PATH to specify the location of program libraries.

luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -c -fPIC hello.c
luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -shared -o myhellolib.so hello.o
luhuang@luhuang-VirtualBox:~/workspace/Hello$ file myhellolib.so
myhellolib.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, not stripped
luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -c main.c
luhuang@luhuang-VirtualBox:~/workspace/Hello$ gcc -o hello main.c -L . myhellolib.so
luhuang@luhuang-VirtualBox:~/workspace/Hello$ ldd hello
	linux-gate.so.1 =>  (0xb778c000)
	myhellolib.so => not found
	libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xb75f9000)
	/lib/ld-linux.so.2 (0xb778d000)
luhuang@luhuang-VirtualBox:~/workspace/Hello$ ./hello
./hello: error while loading shared libraries: myhellolib.so: cannot open shared object file: No such file or directory
luhuang@luhuang-VirtualBox:~/workspace/Hello$
luhuang@luhuang-VirtualBox:~/workspace/Hello$ export LD_LIBRARY_PATH=.
luhuang@luhuang-VirtualBox:~/workspace/Hello$ ldd hello
	linux-gate.so.1 =>  (0xb7784000)
	myhellolib.so => ./myhellolib.so (0xb777f000)
	libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xb75ee000)
	/lib/ld-linux.so.2 (0xb7785000)
luhuang@luhuang-VirtualBox:~/workspace/Hello$ ./hello
Greeting Hello!
luhuang@luhuang-VirtualBox:~/workspace/Hello$

Summary

Let me summarize how compiler build software. Basically, a compiler needs to do below similar steps to convert source codes into executable:

1. Pre-processor — check errors in language syntax level.

2. Compile it into target file — compile files into binary target files.

3. Link them — link or load them in memory and run.

You can also refer to book Software Build Systems: Principles and Experience for more details and further study.