Distributed extensions
The distributed package is in the process of being merged into the core product.
Introduction
This "distributed" contrib package for CruiseControl allows a master build machine to distribute build requests to other physical machines on which the builds are performed and to return the results to the master.
In order to extend CruiseControl without requiring that our distributed extensions be merged in with the core CruiseControl code, we decided to add our code as a new contrib package. This complicates configuration a bit, but carefully following the following steps should have you distributing builds in no time. You should, however, already be familiar with CruiseControl if you expect to succeed with this more complex arrangement.
Overview
The distributed extensions make use of Jini for the service lookup and RMI features it provides. In addition to the usual CruiseControl startup the user will have to start up a Jini service registrar and HTTP class server. Also, each build agent machine will need to have code installed locally and will need to start up and register their availability with the registrar. Once a federation of one or more agents is registered with a running registrar, CruiseControl has the ability to distribute builds through a new DistributedMasterBuilder that wraps an existing Builder in the CC configuration file. Examples are given below. Doing distributed builds is seamless in CruiseControl and the user has the option of only distributing builds for projects they choose to distribute.
Compatibility with Prior Releases
If you will be distributing builds in an environment which includes Build Agents from CruiseControl version 2.6 or earlier, please see Upgrade Notes.
How-To
-
Building the code
- Build CruiseControl in the usual way. (See getting started -> source distribution to build from source.)
- In the contrib/distributed directory, run
ant
. The default target will build the distributed extensions.
You need theANT_HOME
environment variable set, and a junit.jar available to to ant. Junit ant tasks don't work unless junit.jar is on ant's "boot" classpath. You can either copy a junit.jar file into yourANT_HOME/lib
directory, or define theANT_ARGS
environment variable with a "-lib" directive pointing to a junit.jar. For example:export ANT_HOME=~/devtools/apache-ant-1.6.5 export ANT_ARGS="-lib ~/devtools/cruisecontrol/main/lib/junit-3.8.2.jar"
You might need to set theJAVA_HOME
environment variable if the JNLP API (javaws.jar) can not be located otherwise.
A new directory will be created called dist that contains a number of subdirectories (agent, builder, core, lookup, and util). Also, a file will be created called cc-agent.zip. The zip file contents are identical to the agent subdirectory. The zip file can easily be transferred to any machine you wish to serve as a build agent while the agent subdirectory can be used for testing by running a build agent locally. (Also see Java Web Start deployment of build agents.) After building, the distributed extensions tree will look similar to the example below. Directory comments prefixed with '*' will contain copies of some shared configuration files.contrib/ conf/... (Shared configuration files - some of which are copied into dist sub dirs) dist/ (New directory created by builing distributed extensions) agent/... (*Build Agent) builder/... (DistributedMasterBuilder class, used by master build loop). core/... lookup/... (*Lookup Service - aka: Registrar) util/... (*General utilites, including Agent Utility)
- Build CruiseControl in the usual way. (See getting started -> source distribution to build from source.)
-
Basic Configuration
If you plan to rebuild the distributed extensions, note that any configuration files under the contrib/distributed/dist directory are liable to be cleaned and replaced. The originals reside in contrib/distributed/conf and you may find it preferable to change them there before you build the distributed extensions. In most cases, you should not have to edit any of these configuration files.- (Optional) In the contrib/distributed/conf directory there is a file entitled agent.properties. Though the default typically works, one property may need to be set in this file:
cruise.build.dir
should be set to the directory the build agent should consider its build directory. It will be treated as a temporary directory, though some caching may occur. - (Optional) In the contrib/distributed/conf directory you'll find the cruise.properties file. The default value of
cruise.run.dir
typically works, but can be set to the root directory for the master copy of the code and build results. That is, if you follow the canonical CC configuration, this should be the parent directory of your checkout, logs, and output directories. The logs and output directories will be automatically populated by the results sent back from the build agents. - Pre-populate your checkout directory with the projects you want to do distributed builds on, just as you would in a non-distributed CruiseControl scenario. Note that each agent must have all projects pre-populated unless you have configured specific builds to go to specific agents (more below). This is a limitation of the current architecture that would be nice to fix, possibly via distributed versions of Bootstrapper and/or Project plugins.
- Register the Distributed Plugin - You must "register" the Distributed plugin in your config.xml as shown below. (If you forget to do this, while starting CC, you will see an error about no plugin registered for "distributed").
<plugin name="distributed" classname="net.sourceforge.cruisecontrol.builders.DistributedMasterBuilder"/>
- Now change your CruiseControl configuration (config.xml) to do distributed builds for a project (see <distributed> and examples below).
top
<distributed>
<cruisecontrol> <project> <schedule> <distributed>
Execute the nested Builder on a remote Build Agent and optionally return build artifacts after the build completes.
The standard CruiseControl properties passed to builders are available from within the nested Builder
Attributes
Attribute Required Description entries No A semicolon-delimited list (key1=value1;key2=value2) used to find a matching agent on which to perform this build. agentlogdir No Build artifacts directory on remote Agent. All content of this directory is returned to the Master, and deleted after the build completes. masterlogdir No Build artifacts directory on Master into which Agent artifacts will be moved. Typically included in log merge agentoutputdir No Another artifacts directory on the remote Agent. All content of this directory is returned to the Master, and deleted after the build completes. masteroutputdir No Another artifacts directory on Master into which Agent artifacts will be moved. Typically included in log merge showProgress No (defaults to true) If true or omitted, the distributed builder will provide progress messages, as will the nested builder if it supports this feature (assuming the nested builder's own showProgress setting is not false). If false, no progress messages will be shown by the distributed builder or nested builder - regardless of the nested builder's showProgress setting. If any parent showProgress is false, then no progress will be shown, regardless of the distributed or nested builder settings. Child Elements
Element Cardinality Description <builder> 1 The nested <builder> to be executed on the remote Build Agent. See <composite> to execute multiple Builders. <remoteResult> 0 .. * Specifies additional artifacts directory. All content of this directory is returned to the Master, and deleted from the Agent after the build completes. The element has two required attributes: "agentDir" and "masterDir". The "masterDir" is typically included in log merge
Example:<remoteResult agentDir="target/site" masterDir="target/site"/>
Examples
- Given an existing configuration snippet:
<schedule interval="30"> <ant antscript="ant.bat" antworkingdir="C:/cruise-control/checkout/BasicJavaProject" > </ant> </schedule>
wrap the ant builder configuration with a<distributed>
tag like this:<schedule interval="30"> <distributed> <ant antscript="ant.bat" antworkingdir="C:/cruise-control-agent/checkout/BasicJavaProject" > </ant> </distributed> </schedule>
Note: antscript and antworkingdir attributes now refer to locations on your agent. All agents must conform to these same settings.
The Project Name value determines where Build Agent work directories are created. These defaults can be overridden by setting 'agentlogdir' and 'agentoutputdir' attribs.
- You may have noticed the user-defined.properties file in the conf directory for the agent. These properties are, as you might expect, user-defined. Any unique properties you would like to indicate characteristics of THIS SPECIFIC agent should be added here in canonical property form (i.e. "
key=value
", without the quotes). In the CC configuration file an attribute can be added to the <distributed> tag containing semicolon-delimited entries used to match this agent with the projects that should be built on it. For instance, changing the example above to:<schedule interval="30"> <distributed entries="agent.name=number2"> <ant antscript="ant.bat" ...
will ensure an agent withagent.name=number2
in its user-defined.properties file will be the recipient of this project's build requests. If multiple agents match a given set of entries, it is indeterminate which will receive the build request. For an agent to be considered a match, the agent must have at least all the entries defined for <distributed entries="...">. A matching agent may have more entries than those defined for <distributed entries="...">.
Even if no entries are listed in the user-defined.properties file, four entries are programatically registered with every agent. These areos.name
,java.vm.version
(which may show the hotspot version in java 1.6.0_04+), andjava.version
, containing the Java system properties of the same names, andhostname
containing the hostname on which the agent is running. A more useful example than the previous one might be:<distributed entries="os.name=Windows XP">
or<distributed entries="os.name=Linux">
By configuring one project twice, with two differentos.name
properties, you could ensure that your project builds correctly on two different operating systems with only one instance of CruiseControl running. This requires two <project> configurations in your config.xml. Here's a more complex example:<distributed entries="os.name=Windows XP;dotnet.version=1.1;fixpack=SP2">
- Using the <composite> tag in your config.xml file allows multiple builders to run for a single <project>. The <composite> tag is a "composite builder" which defines a list of builders that will be executed sequentially and treated as a single build. The config below causes a set of ant builds to be performed sequentially on the same Build Agent:
<project ...> <schedule ...> <distributed entries="..."> <composite> <ant (build 1)...> <ant (build 2)...>
The exampe below will cause a set of builds to be performed sequentially on the different agents (each with a different OS). Both the Windows and Linux builds must complete successfully before the entire Composite Build is considered successful.<project ...> <schedule ...> <composite> <distributed entries="os.name=Windows XP"> <ant (build 1)...> <distributed entries="os.name=Linux"> <ant (build 1)...>
- By default, the canonical locations for log and output files are used on both the remote agents and the master. These can be overridden using the following attributes on the <distributed> tag:
<distributed agentlogdir="agent/log" masterlogdir="master/log" agentoutputdir="agent/output" masteroutputdir="master/output"> ... </distributed>
After a remote build, any files on the agent machine in dir "agent/log" will be copied back to the master machine into dir "master/log". The "logs" and "output" dirs will be deleted on the Agent after the build finishes.
NOTE: You may have problems when running a BuildAgent on the same machine as the main CC server due to the removal of the log/output dirs by the BuildAgent (if the main CC server needs the deleted directories). In such cases, you should override the cannonical artifact dirs using these tags.
- Given an existing configuration snippet:
- (Optional) In the contrib/distributed/conf directory there is a file entitled agent.properties. Though the default typically works, one property may need to be set in this file:
-
Doing distributed builds
Linux Note: Many Linux distros include the hostname in /etc/hosts for the "127.0.0.1" address on the same line as "localhost.localdomain" and "localhost". This interferes with the operation of Jini (an Agent finds the Lookup Service, but the MasterBuilder or Agent Utility can not find the Agent). You may need to edit the /etc/hosts file as shown below to list the actual hostname and ip address:# This is NOT jini friendly. #127.0.0.1 localhost.localdomain localhost ubuntudan 127.0.0.1 localhost.localdomain localhost # actual host ip address and host name 10.6.18.51 ubuntudan
- Start the Lookup Service by navigating to the contrib/distributed/dist/lookup directory and running
ant
. The default target should start the registrar and class server. - Start the agent by navigating to the contrib/distributed/dist/agent directory and running
ant
. The default target should start the build agent and register it with the Lookup Service. Note: while there is no reason you couldn't have an agent running in your build master, additional agents will require you to copy the cc-agent.zip to each machine, unzipping and configuring for each of them. Another option is to use the webstart BuildAgent features - see Java Web Start deployment of build agents for details. - Test that Jini is running and your agent(s) is/are registered using the
JiniLookUpUtility
. In contrib/distributed/dist/util runant test-jini
. After 5 seconds you should see a list of services that have been registered with Jini. Since the Jini Lookup Service itself is a Jini service you should havecom.sun.jini.reggie.ConstrainableRegistrarProxy
listed even if you have no agents running. If you do have agents running, however, you should see aProxy
service listed for each of them, withBuildAgentService
listed as the type. You can also test the availability of services (Lookup and BuildAgents) by using the Agent Utility - You can manually run a build using the
InteractiveBuildUtility
. This allows you to test your configuration without starting CruiseControl. In contrib/distributed/dist/util runant manual-build
. If the distributed tag in your configuration file does not contain any entries, you'll be prompted to enter them. These are optional, however, and pressing ENTER at the prompt will pick up whatever agent is available. Note that you can pass in the path to your CruiseControl configuration file as an argument to theInteractiveBuildUtility
and save a step when running it. (Note: This ant target is not working [reading input from the command prompt isn't working in ant - any fixes?], but the class should work outside of ant.) - Start CruiseControl using the startup scripts (cruisecontrol.sh or cruisecontrol.bat) in: contrib/distributed. Any builds that are queued for a distributed builder should be sent to your running agent. Typically, CruiseControl is run from the /contrib/distributed directory (not main/bin), but this is not required. If CruiseControl can't find required jars, config files, etc, you may need to set the
CCDIR
environment variable to yourCruiseControl/main
directory before launching the contrib/distributed/cruisecontrol.bat/.sh file.
- Start the Lookup Service by navigating to the contrib/distributed/dist/lookup directory and running
-
Advanced configuration
- If you plan to rebuild the distributed extensions, note that any configuration files under the contrib/distributed/dist directory are liable to be cleaned and replaced. The originals reside in contrib/distributed/conf and you may find it preferable to change them there before you build the distributed extensions. Since user-defined.properties and agent.properties are copied into the cc-agent.zip you'll need to unzip and make your changes locally on the agent.
- Jini as used in these distributed extensions has several configuration options. Beware of the start-jini.config, however--it is not likely you will need to make changes to it.
- As delivered, Jini uses an insecure security policy. Should you choose to change this, create your own policy file(s) and change cruise.properties and agent.properties to reference your own versions. Note that the one copy of insecure.policy in contrib/distributed/conf is copied to the agent, lookup, and util subdirectories during the build.
- Jini, being a Sun product, uses Java's native logging, not Log4j or Commons-Logging. Jini logging configuration is via the jini.logging file. As with insecure.policy, one copy of jini.logging is duplicated for the agent, lookup, and util. Either independently change these copies or change the original once. Note: The jini logging settings do not work when runing a Build Agent via Webstart.
- If your local network does not have DNS services setup properly (ie: LAN hostnames are not resolved correctly), see the note: BAD DNS HACK in start-jini.config and transient-reggie.config. It is far better to fix your LAN DNS issues, check out other things (like the localhost issue), and only use the mentioned hard-coded DNS hack as a last resort. If you find no agents (including local ones) are being discovered, it is far more likely you have a mismatch between your Agent and config.xml entries settings.
- As delivered, Jini uses an insecure security policy. Should you choose to change this, create your own policy file(s) and change cruise.properties and agent.properties to reference your own versions. Note that the one copy of insecure.policy in contrib/distributed/conf is copied to the agent, lookup, and util subdirectories during the build.
- To keep track of problems on remote Build Agents, you may want to alter the main CruiseControl log4j.properties file main/log4j.properties to use an "Email" logger to notify you of errors via email. For example:
log4j.rootCategory=INFO,A1,FILE,Mail ... # Mail is set to be a SMTPAppender log4j.appender.Mail=org.apache.log4j.net.SMTPAppender log4j.appender.Mail.BufferSize=100 log4j.appender.Mail.From=ccbuild@yourdomain.com log4j.appender.Mail.SMTPHost=yoursmtp.mailhost.com log4j.appender.Mail.Subject=CC has had an error!!! log4j.appender.Mail.To=youremail@yourdomain.com log4j.appender.Mail.layout=org.apache.log4j.PatternLayout log4j.appender.Mail.layout.ConversionPattern=%d{dd.MM.yyyy HH:mm:ss} %-5p [%x] [%c{3}] %m%n
- CruiseControl manages its own thread count for simultaneous builds. While this makes sense when the build master is the only machine performing builds (normal CruiseControl use), it's nearly useless to do distributed builds without being able to do them simultaneously. As such, you will want to configure CruiseControl to run using approximately as many threads as you'll have running agents. For complicated reasons this may not be the best solution, but it should be adequate until a more sophisticated thread-count mechanism can be added to CruiseControl. In your CC configuration file, add a <threads> tag under the <cruisecontrol> tag at the top:
<system> <configuration> <threads count="5" /> </configuration> </system>
where 5 would be replaced with your expected number of build agents. - Java Web Start deployment of build agents: The command
contrib/distributed/ant war-agent
will use the file contrib/distributed/build-sign.properties to sign agent jars and bundle them into a deployable.war
file (dist/cc-agent.war). Be sure you update build-sign.properties appropriately to use your signing information/certificate. - Agent Utility: Running
contrib/distributed/dist/util/ant agent-util
(from inside the contrib/distributed/dist/util dir) will launch a Build Agent monitoring utility. The Agent Utility can also be used to kill (and if the agent was launched via webstart - restart) Build Agents. As of version 2.8, CruiseControl will automatically load a JMX Build Agent Utility into the JMX Control Panel if CCDist classes are available. See the -agentutil command line argument to disable the JMX Build Agent Utility if needed. - Build Agent UI: Build Agents default to showing a simple User Interface. The Build Agent will detect if it is running in a headless environment and automatically bypass the UI. This UI can be manually bypassed by adding:
-Djava.awt.headless=true
or-skipUI
to the Build Agent during startup (either via command line or as a webstart jnlp parameter). - Build Agent Unicast Lookup URL(s): To make BuildAgents find a Lookup Service via unicast, create the property:
registry.url
in the agent.properties file and set it's value to the url of the Lookup Service. If you need multiple unicast URL's, use a comma separated list of Unicast Lookup Locaters (URL's) as the property value (see example below). This can be useful in environments where multicast is not working or practical, or if multicasts are disabled, but should be used only after checking out other things (like the localhost issue).registry.url=jini://ubuntudan,jini://10.6.18.51
- Build Agent Entry Overrides: Build Agents support the assignment of 'EntryOverrides' that can be set at runtime. This allows you to add new 'entries' to certain agents while they are running. NOTE: If your are running multiple Agents on the same machine, they will share their EntryOverride settings.
Use Case: You have a Project that must only be built on machines with specific audio hardware. You can add a new "entries" value to the <distributed> tag of this Project in your config.xml, like:<distributed entries="sound=hardwaremixable"> ... </distributed>
Deploy and launch all your agents, without modifying entries in user-defined.properties. You can now add a new 'Entry Override' (ie:sound=hardwaremixable
) to only those agents running on the correct hardware. Do this via the Build Agent UI or the Build Agent Utility. This new Agent entry will persist across Agent restarts.
NOTE: Be aware there is a bug in the Preferences API implementation in JRE 6.0 on non-Windows OS's that prevents these settings from persisting. See Sun Bug ID: 6568540 "(prefs) Preferences not saved in Webstart app, even after synch()" - you might want to vote for it.
To workaround this bug, the saxon jars are no longer used in the agent.jnlp file. If this workaround causes problems for you, you can uncomment these jars in the agent.jnlp file (and the "ps.jarnames-xml-libs" patternset in CCDist build.xml).
- If you plan to rebuild the distributed extensions, note that any configuration files under the contrib/distributed/dist directory are liable to be cleaned and replaced. The originals reside in contrib/distributed/conf and you may find it preferable to change them there before you build the distributed extensions. Since user-defined.properties and agent.properties are copied into the cc-agent.zip you'll need to unzip and make your changes locally on the agent.
Todo for this implementation
- A default
cruise.build.dir
could be used on the agent, removing the requirement for any user configuration. The agent.properties file could havecruise.build.dir
commented out so users would see they had the option to configure their own build location. - Should we package the master like we do the agent? We shouldn't expect to run from a dist directory. It'd be nice if it were configurable to start up CruiseControl with or without Jini, or perhaps even to bring Jini up or down automatically given the presence of distributed tags in the configuration.
- More secure default Jini policy files.
- The agent busy state logic is kludgy. Jini contains a transaction framework (mahalo) and a mailbox service (mercury), either of which might be a way of managing busy state. Or the attempted RMI method could be utilized. A solution should be chosen and pursued to completion.
- The code to start/stop the Jini Lookup Service during CCDist unit tests is pretty ugly. Any suggestions to improve it are welcome. (Maybe Jeff's JiniStarter...)
- Add the following optional attributes to the <distributed> tag to support failing a build if an Agent can not be found in a timely fashion:
- AgentSearchRetryDelay - Corresponds directly to the message you see in the logs about "Couldn't find available agent. Waiting 30 seconds before retry.". There's a @todo comment on the field (DEFAULT_CACHE_MISS_WAIT). See usages of DistributedMasterBuilder.DEFAULT_CACHE_MISS_WAIT for more info in the source.
- AgentSearchRetryLimit - Defines how many times to perform the AgentSearchRetry loop (described in item 1). When the number of times through that retry loop exceeds the limit, a build failure would be thrown.
- AgentFindWaitDuration - The amount of time (seconds) to wait for a direct query of the Jini lookup cache to return a matching (and "non-busy") agent. The "find" returns immediately if an available agent is cached, but there can be cases where the current default delay (5 seconds) is not enough. See usages of MulticastDiscovery.DEFAULT_FIND_WAIT_DUR_MILLIS for more info
- More unit tests!
Limitations of this approach
- CruiseControl doesn't allow for a varying thread count. It would be useful to allow the build thread count to vary according to the number of active agents. The CC administrator shouldn't have to change the thread count when agents come and go. On the other hand, varying thread count directly with agent-count is unsophisticated as some of the active agents may not match the entries for a given build and thus will be idle. Perhaps there should be a change in build queuing where as long as an agent is able to take a build request the thread is spawned, otherwise the request is queued.
- Does the attribute antworkingdir for AntBuilder have to correspond to the agent.properties configuration? If so that prevents agents from differing from each other. That is, each agent should be able to have an independent configuration. antworkingdir requires knowledge of the build agent that the master shouldn't know and that might vary from agent to agent. If the CCConfig API is changed, the agent could resolve env variables at remote build time (instead of using the env var values of the master).
Credits
This code was initially donated to the CruiseControl community by SolutionsIQ, Bellevue, WA.
The folks at SolutionsIQ responsible for this code include Jeff Ramsdale, Rand Huso, Pinak Mengle, and Mehruf Meherali
Maintained by Dan Rollo