Page updated: 21/03/2007
About the website

                        Valid XHTML 1.0!

 
gLite 3.0

lcg-RB - Update to version 3.0.7-0

Date 05.02.07
Priority Normal

Description


This update fixes various bugs. For the full list of bugs, please see list below.
 
WARNING: this patch makes the RB forget all unfinished jobs.

This means that the RB should be drained sufficiently before the patch is applied. One can prevent new job submissions by letting the node's firewall refuse remote (!) connections to port 9002.

condor-lcgrb replaces condor on lcg-RB nodes to make the Condor-G components more robust, particularly w.r.t. job proxies.

Between LCG-2_7_0 and gLite 3.0.0 the condor rpm on the lcg-RB was upgraded only because the gLite-WMS needed the newer version in the repository.

There were no significant improvements expected. Instead, there were worries that the newer version might not have undergone a comparable amount of stress testing.

Now we have encountered various problems that were not seen
with the old version:
  1. A job may be given the wrong proxy, which either makes the job immediately abort after successful submission, or lets the job live on when it should have aborted.

    The latter has been an issue for SAM, while the former has
    been plaguing the Atlas FCR jobs and has been reported at
    least in GGUS tickets 12226 and 15694.
  2. GGUS ticket 12940 discusses how a gLite 3 RB can become
    unusable for some user if a proxy cannot be renewed for
    one of the user's jobs.

    An LCG-2_7_0 RB can be made to fail the same way if the
    condor rpm is upgraded to that of gLite 3.
  3. There have been more run-away gahp_server processes,
    dissociated from the controlling condor_gridmanager process,
    than observed with the old version, which had a separate
    condor_gridmanager instance per gahp_server.

    Since the lcg-RB is not supposed to be given a lot of development or testing effort any more, it would seem best to return to the previous version of condor.

To avoid clashes with the gLite WMS, instead of downgrading condor it seems better to introduce a new rpm that happens to contain an older version of condor, and to remove the dependency on condor from the lcg-RB meta rpm.

The condor-lcgrb rpm contains the condor-6.6.6-lcg3 functionality, but relocated under /opt/condor-20.0.7 so that YAIM will consider it the highest version of condor available.

The pre-install script will stop the Condor-G processes, but will not restart them, since the admin will first have to reconfigure condor on the lcg-RB, e.g. as follows:

/opt/glite/yaim/scripts/run_function \
the-site-info.def config_condor

/etc/init.d/edg-wl-jc start

A full reconfiguration will also work, of course:

/opt/glite/yaim/scripts/configure_node \
the-site-info.def lcg-RB

Fixed bugs

Number Description
 #13888 VOMS Admin: Internal database inconsistency detected: Got more roles than expected for user "<my DN>"
 #15566 VOMS Admin does not enforce the correct group semantics
 #16245 [VOMS Admin] Removing a VO should call check_parameters()
 #16472 VOMS Admin voms.request.webui.enabled config parameter does not work
 #17476 voms-admin fails in creating users correctly on oracle
 #18140 [ voms-admin ] create-group option doesn't work properly in the command line

Updated rpms

Name Version Full RPM name Description
condor-lcgrb 1.0.0-3 condor-lcgrb-1.0.0-3.i386.rpm condor 6.6.6 with LCG patches for LCG-RB
glite-security-voms-admin-client 1.2.15-1 glite-security-voms-admin-client-1.2.15-1.noarch.rpm gLite VOMS Administration clients
glite-security-voms-admin-interface 1.0.5-1 glite-security-voms-admin-interface-1.0.5-1.noarch.rpm gLite VOMS Administration service (interface)
lcg-RB 3.0.7-0 lcg-RB-3.0.7-0.noarch.rpm lcg RB node

The RPMs can be updated using apt via

Service reconfiguration after update

Service must be reconfigured. See above for details.

Service restart after update

Service must be restarted.

How to apply the fix

  1. Update the RPMs (see above)
  2. Update configuration (see above)
  3. Restart the service if necessary (see above)