LCG2 Release Notes



Document identifier:
Date: December 13, 2004
Author: CERN GRID Deployment Group (<support-lcg-deployment@cern.ch>)
Abstract: These notes explain what is in the new release

Contents

Changes from 2_2_0 to 2_3_0

This release contains the usual RH7.3 as well as the full SL release. The released software is identical for both operating systems. Full interoperability between both systems has been verified. In addition to the SL release, a new install method based on a scripted manual install has be created to help with the installation as LCFGng is not supported on SL. The new install method has 3 step. Install the OS, apt-get the rpms, configure node type. The documentation has also been rearranged. We have split the old LCGReleaseNotes into five documents. LCG2 Release Notes, LCG2 LCFG Install, LCG2 LCFG Upgrade, LCG2 New Sites and LCG2 Site Testing.


Major points:
=============

- The following nodes have been certified for SLC3:
  BDII, CE (pbs + condor), MON, Proxy, RB, SE, WN

- A site can now mix RH7.3 nodes with SLC3 nodes, e.g. the typical
  configuration could be all service nodes running RH7.3 with worker
  nodes running SLC3, or vice versa, service nodes on SLC3 with worker nodes
  on RH7.3. The latter configuration may be of some interest given the
  facts the service nodes are exposed to the external connections and
  consequently the security issues may be much bigger than on worker
  nodes; the kernel security patches are coming faster for SLC3 while
  some sites may have a big problem obtaing those patches for the RH7.3.

- The inter-site operability has been verified, sites running different
  O/S's will be able to communicate.

- Installation on SLC3 using LCFGng is not supported.

- The APEL accounting packages from GOC is included as part of the release.
  It requires installing R-GMA, which has been already in the previous
  release. The packages parse the log files of the gatekeeper, system messages
  and the PBS events (so far) to extract job information and publish it
  using R-GMA.

- LFC: new high performance LCG File Catalog
    - Based on lessons learned in DC's in last few months 
    - Fixes performance and scalability problems seen in EDG Catalogs
  	Cursors for large queries
  	Timeouts and retries from the client
    - Provides more features than the EDG Catalogs
  	User exposed transaction API
  	Hierarchical namespace and namespace operations
  	Integrated GSI Authentication + Authorization
  	Access Control Lists (Unix Permissions and POSIX ACLs)
  	Checksums
    - Based on existing code base
  	Supports Oracle and MySQL database backends
    - Integration with GFAL and lcg_util complete
    - POOL Integration will be provided (November 2004)

  Both EDG (old) and LFC (new) file catalogs are included and all clients
  (GFAL, lcg-util) tools support both catalogs, selectable by every user job
  via an environmental variable, which (as released) defaults to the old one.
  We have started migration with experiments of the old catalog to the new one.
  For details about the design, implementation and the performance of the
  new file catalog see Jean-Philippe Baud presentation at the CHEP2004.
  Find more information also in GFAL/lcg-util README.

- GFAL improvements
    - thread-safe version (requested by Atlas)
    - provide 2 new methods for asynchronous pre-staging: srm_get and 
      srm_getstatus (requested by LHCb)
    - integration with the new LFC file catalog
    - increase timeout from 5 to 15 seconds when talking to BDII
    - build on SLC3 as well as RH73

- lcg_util improvements
    - do not return actual_guid if copy or registration in LRC failed
    - integration with the new LFC file catalog
    - fix typos in error messages

- New GIP (Generic Information Provider) should prove to be more flexible and
      and stable than previous methods. Now installed on CE, SE, Proxy, RB.

- Number of bigger and smaller bug fixes, as always.

For the full list see below.

Summary of changes with respect to the previous LCG2 Aug/2004 release:
======================================================================

Note: 3-digit numbers are Savannah *patch* numbers
      4-digit numbers are Savannah *bug*   numbers


- VDT:
  -------
  Upgrade to 1.2.0 (from 1.1.14-4.lcg1)
  
  - Globus functionally equivalent (2.4.3 with exactly the same patches)
  - fixes annoying (but harmless) complaints from
    /opt/globus/sbin/globus-initialization.sh
  - MyProxy v0.6.1 (from v0.5.9; minor bugfixes) 


- CondorG:
  --------
  Upgrade from condor-6.6.0-2.edg6 to condor-6.6.0-2.edg9 (change in
  gahp_server only) to help with the following:

  4500 - gahp_server memory usage
           Very important reduction of memory usage by "gahp_server"
           processes on RB (fixes by David Smith, will be sent to VDT)
  4578 - prevent condor_gridmanager restarting many jobs if
	   the gridmonitor returns an error.

  
- Information System:
  ------------------
  Updated BDII to version 3.1.11.
  197 - BDII is now also intalled on CE for sites to replacing the SITE
         GIIS with a BDII
  196 - Publishing ResourceBroker and MyProxy service
  203 - Timeout protection when downloading the configuration
  207 - firewall fix for localhost when restarting
  214 - fix for long connections 
  216 - reduce "broker not able to plan" problem
  4558 - timeout while dlapsearch is active
  4596 - BDII DN substitution error

- Information Providers:
  ----------------------
  GIP upgraded to version 1.0.14.
  232 - Upgrade lcg generic information provider to 1.0.7
           Fix bug 4481.

- Workload Management System
  ---------------------------
  Upgraded from lcg2.1.32 -> lcg2.1.54 with the following changes:

  5109 - WMS daemon memory leaks
  4924 - logd race to setup confirmation socket
  4909 - PR sometimes doesn't renew a proxy
  4894 - NS can become unresponse during dialoge with client
  4892 - NS can (partially) crash with 'Unable to receive'
  4891 - PR daemon can exit
  4836 - locallogger 'error getting event's jobid'
  3807 - LogMonitor must not crash on bad Condor-G log files
  3883 - Cancel request from jobcontroller never appeared in condor-g
  3884 - jobs not canceled, 'unknown to the system'
  3916 - job interactive doesn't work (not supported)
  4009 - Daemon startup during system boot
  4070 - FuzzyRanking (stochasticRankSelector)
  4126 - NS can crash with 'Pipe Closed'
  4127 - Stderr overwrites stdout when redirected to the same file
  4144 - Malformed requirements expression causes WM/NS to exit
  4261 - edg-wl-wm daemon fails under sl3
  4285 - UI-with-voms-certificate bug
  4286 - Thread behaviour & the WM
  4299 - interlogd can become deadlocked
  4318 - Matchmaking policy for resubmitted jobs
  4350 - job wrapper must set BDII to be used
  4365 - WL libraries/daemons must retry BDII queries

  Remember that when upgrading one should ensure that the WMS services are
  restarted, to allow the new version to become active.

- LCG Job Managers:
  -----------------
  N/A.

- Data Management:
  ----------------
  New edg-replica-manager v2.3.5.

  edg-reptor-client :
   5152 - overview: some memory leaks in edg-reptor-client (LDAP)
   5229 - overview: edg-reptor-client non-portable code
   2889 - "edg-rm pi" should filter out unwanted VOs.
           Old behaviour is still accessible by simply not specifying a VO.
   3466 - Wrong error message with a guid with no SURLs
   3586 - file zero size left during data transfer.  On an
           exception during copying, we delete the destination filehandle.
   4346 - edg-rm should obey LCG_GFAL_INFOSYS
   4370 - edg-rm cr/rep should not retry if catalog errors occur
   4390 - make edg-rm search harder for a correct \$JAVACMD.
           We now check on the path too.
   4391 - edg-rm should abort if credentials have expired/are missing. 
   4405 - edg-rm fails with SEs with similar names

- GFAL:
  -----
  Upgraded to GFAL-client 1.4.0.

  For changes see the 'GFAL' paragraph in the 'Major points' section above.
   4577 - protect against NULL pointer to error message (reported by Jens)
   4579 - implement support for multiple files in SRM get/put requests:
          turlsfromsurls() (requested by LHCb)
   
- lcg-util:
  ---------
  New lcg-util 1.1.0.

  For changes see the 'lcg-util' paragraph in the 'Major points' section above.
   4131 - lcg_rf does not correctly get filesize for classic SE 
   4132 - lcg-del: do not remove entry from catalog and return an error
          if physical file removal failed. Print better error messages
	  in case of expired certificate.
   4133 - lcg-gt: for a classical SE, check protocol validity and use
          the requested protocol if ok
   4134 - return an error if provided/generated guid exists already in
          the LRC database
   4574 - add method lcg-sd to set the file status to "Done" (especially
          needed for dCache) 

- R-GMA:
  -----
  N/A.

- APEL accounting:
  ----------------
  Included GOC accounting packages which parse log files of gatekeeper,
  system message and PBS event to extract job information and publishes
  it using R-GMA.

- Others:
  ------
  CA updated to verion 0.24, which added:
    - Slovenian SiGNET CA
    - SEE-GRID CA
    - Estonian Grid CA
    - and updated LIP CA
  205 - IUCC and BEGrid root certificates and updated Russia CA
  218 - Add new package lcg-info-api-ldap-1.1-0 which include tools
        and API to gather information related to Grid services
  225 - Include SIXT VO
        Following the idea of Fokke Dijkstra and Ron Trompert, introduce
        a mechanism for sites to add their new VOs easily
  234 - Update lcg env component to version 1.0.3 to fix bug 4483
  239 - Fix a problem introduced by patch 234
  267 - Add cron job for lcg-expiregridmapdir.pl on RB and SE to clean the
          hardlink between the user's certificate and the local username 
  4483 - LCG_GFAL_INFOSYS cannot be superseeded with Environment in JDL

  Cleanup of unused configuration settings for the registration to TOP GIIS
  from SITE GIIS.

- Monitoring (Grid ICE):
  ---------------------
  N/A.

About this document ...

This document was generated using the LaTeX2HTML translator Version 99.1 release (March 30, 1999)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -split 0 -html_version 4.0 -no_navigation -address GRID deployment LCG2-Release-Notes.drv_html

The translation was initiated by Laurence Field on 2004-12-13


GRID deployment