RELEASE NOTE , Apr/06, 2005 ---------------------------------- | | | April/2005 LCG2-2.4.0 release | | | ---------------------------------- The April/2005 LCG2-2.4.0 release has been certified and tagged as: LCG-2_4_0 for the public release on Apr/06, 2005. It supports the following Linux Operating Systems: - RH7.3 - SLC3 running in 32bit mode - SLC3 running in 64bit mode (not yet available) The release is now installed on the C&T testbed and has undergone the certification testing. Major points: ============= - We no longer distribute JAVA ---------------------------- As has been mentioned on several occasions the practice to include the JAVA RPMs in our repositories was at best questionable from a legal standpoint. Of course, the middleware still requires JAVA to be installed. The installation guides contain instructions on how to download the software from SUN and install it. This is certainly not a step forward to more comfort, but we see no way to avoid it. - The release procedure has changed --------------------------------- LCG 2-4-0 is the first of the new fixed date releases. For those who didn't follow the development and discussions in this area, here it is in a nutshell: .. Every three months (1st April, 1st July, 1st October ....) a major release will be made. This release can contain new services, new clients and might require the installation of new RPMs and additional configuration changes. ---------------------------------------------------- | It is expected that these major releases are put | | into operation by the sites within 3 weeks after | | the release came out. | ---------------------------------------------------- .. In between there will be a release every month, if needed. The sites are not required to install these releases, however, we expect them to provide space for dteam on the shared file system used for user software installation to make the client libs available to the VOs. .. The central services can get fully backward compatible upgrades at any time. This will affect the CICs only. .. In addition at any time a security upgrade might be needed. We expect the sites to follow those in a timely manner. - Shared space for user level software installation -------------------------------------------------- We have observed that most VOs can't make efficient use of sites that do not provide some shared space for the installation of user level software. It would be really helpful if you could consider this for your site. Please contact us if you have any questions concerning this. On several sites that already provide this to the VOs, this service is not provided for the dteam VO. This now becomes more important, since we would like to use the mechanism described in the workload management section to keep the client libs recent. - Test Pilots ------------ To make the upgrades smoother we now test deploy first with a few sites that volunteered. (Thanks for the help) Summary of changes with respect to the previous LCG2 Oct/2004 release: ====================================================================== Note: 3-digit numbers are Savannah *patch* numbers 4-digit numbers are Savannah *bug* numbers - VDT: ------- version 1.2.0 - CondorG: -------- No change. - YAIM: ----- Careful: the following two changes are *NOT* backward-compatible: 1. invocation ---------- The syntax has been streamlined a bit: install_node site-info.def meta-package [meta-package ...] configure_node site-info.def node-type [node-type ...] This has been done to ease configuration of machines with multiple node types and to solve problems with multihoming and aliasing. To properly configure a machine with multiple node types, the syntax above must be used with all node-types or meta-packages on the same command line. 2. users ----- The USERS_CONF file now has an extended format which includes information on VO membership and sgm status. No assumptions are now made about local naming conventions. Other points ------------ * The java version required is 1.4.2_07. This should be obtained from Sun before starting any installations. * The functions used for any particular node type are defined in scripts/node-info.def. You can easily edit this to remove or add new functions * Yaim now calls apt-get without interactivity * Configuration of queues is now more flexible, without one queue per VO being imposed * Configuration of user accounts can be turned off by removing the appropriate functions from node-info.def. Cron jobs can be turned off by setting CRON_DIR in site-info.def Many bug fixes: --------------- 5862 - config_javadotconf does not look properly for Java 6020 - edguser:edguser have fixed UID/GID 6164 - YAIM requires interactivity when running APT 6167 - USERS_CONF required to exist 6168 - YAIM should allow for selecting/de-selecting configuration functions 6171 - RFE: create directories for SW_DIR when not existing 6319 - YAIM automatically assumes one queue per VO (with the same name) 6363 - GLOBUS_TCP_PORT_RANGE not configurable 6456 - config_mkgridmap doesn't care about SGM logins 6550 - Wrong permissions on VO.list files of tags 6592 - Batch system hardcoded in /etc/globus.conf file. 6594 - config_users create users also for VO not supported 6616 - site-info.def example still has various problems 6623 - YAIM has /opt/{edg,globus,lcg} hardcoded everywhere 6746 - install_node does not check $FUNCTIONS_DIR/local/* 6800 - /opt/edg/var/edg-rgma/rgma-tools-defaults not configured 6831 - Happens when a VO and it's group don't have the same name => config_gip creates a wrong lcg-info-generic.conf file 6920 - feature request: yaim and multiple edg-mkgridmap auth entries 6937 - config_gip and install date 6941 - Apel configuration with YAIM 6964 - The config_rfio script does not actually start the rfiod daemon 7005 - apel.conf files should be read only by root. 7029 - YAIM configures Torque nodes with np=3 7138 - config_gip creates the VO_Tags dirs with bad owners and rights... 7362 - yaim should allow mysql password to be stored in .my.cnf, not env var 7363 - no quoting of MYSQL_PASSWORD 7420 - setup of attributes GlueSLArchitectureType and GlueSEName "hard coded" 7424 - YAIM cron jobs for Apel 7489 - config_gip has "pbs" hardcoded for one of the dynamic_scripts 7514 - clean up dynamic plug-in configuration 7528 - config_gip breaks subject DNs in GRID_TRUSTED_BROKERS at whitespace 7677 - wrong names for torque resmom port in /etc/services - Information System (BDII): ------------------ Updated BDII to version 3.2.5. The BDII has been re-engineered. The database swap method has been changed to improve performance under a high load. Bug fixes: 7330 - slapd errors parsing lcg-bdii-write-slapd.conf - Information Providers ---------------------- lcg-info-generic => 1.0.16 5821 - Write wrapper script puts multiple commands on the same line 6555 - File name typo causes error when removing temporary file 6556 - Dn comparisions can cause info not to be updated 6557 - Script makes the assumption that the same script isn't called separately with different arguments lcg-info-dynamic-condor (1.0.0 => 1.0.1) 6720 - Displayed 0 CPU free when the system was idle while using condor lcg-info-dynamic-lsf => 1.0.5 6347 - Wrong name of the variable 6662 - Illegal division by zero 6668 - Free CPUs value not correct 6774 - Using lsf v 6.0, and using long hostnames 7030 - No an array cleaning causeses wrong numbers in CPU information 7443 - Wrong value for MaxCPUTime and MaxWallClockTime 7492 - Provider fails when a queue is not associated to a specific host-group 7536 - LSF version not detected correctly with LSF 6.1 lcg-info-dynamic-pbs => 1.0.5 6207 - Default time limits not taken into account 6211 - MaxTotalJobs not processed 6603 - Bad FreeCPU information 6216 - Free CPU count incorrect 6246 - GlueCEPolicyPriority is static 6913 - Stoppping and disabling queues in pbs not reflected in GlueCEStateStatus 6924 - State not reset when looping over the queues Monitoring (new) ---------------- New components have been included for job level monitoring. This will allow users and monitoring tools to follow the status of jobs, including their CPU consumption and other indicators. The following two components provide the basis for this. It should be pointed out that the functionality is independent of the batch system and doesn't add additional load to the services. lcg-mon-job-status (New 1.0.3) This package contains a daemon that takes certain insert statements related to job status from the lbserver and publishes them via R-GMA. The daemon runs on the RB and the R-GMA table it uses is JobStatusRaw. lcg-mon-wn This package contains a script that is called by the job wapper. The script then publishes every 5mins, some values about the running jobs. This script is run on the WN and the R-GMA table it uses is JobMonitor. - Workload Management System -------------------------- Updated to version lcg2.1.62. Beside various bug fixes and performance improvements, the system now includes an interface to file catalogues that conform to the DLI. A mechanism has been put in place that enables the user to choose between multiple version of the middleware offered by a site. The selection is done by a simple JDL statement. Details on how this works for users can be found in the LCG User Guide. This feature will allow users to use the latest versions of the middleware even on sites that can't follow the monthly updates. 7684 - Inefficient queries for 'subcluster' information 7685 - Start lcg-mon-wn along with user payload 7686 - Check for success of middleware version selection 7687 - Export EDG_WL_RGMA_FILE, EDG_WL_RGMA_SOCK in init.d/edg-wl-lbserver 7688 - FileListLock race between threaded client/ other clients The changes in this version of the WMS should not have a direct impact on users or system adminstrators: Although the stability and performance of the WMS should be somewhat improved. There is also added support for new components, such as the periodic logging of status infromation of running jobs and the logging of events from the LB server to RGMA. Remember that when upgrading one should ensure that the WMS services are restarted, to allow the new version to become active. Known problems: --------------- 1. There are known weaknesses in the startup scripts for the WMS services. After rebooting or restarting services care should be taken to ensure that the services are indeed running. 2. There is a very small chance that canceling a job multiple times can crash the workload manager. In this case the workload manager will be automatically restarted after a few minutes. 3. The LCFGng configuration object, uicmnconfig, which is used by LCFGng to configure the UI, will incorrectly write the configuration of LBAddresses in the case where multiple network servers (ie. nslines) are specified. See bug https://savannah.cern.ch/bugs/?func=detailitem&item_id=7582 4. The proxy updates, usualy run from cron, of the locallogger and the LB server have a small chance of interfering if run very close together in time. For example, cron only needs to run '/sbin/service edg-wl-locallogger proxy' and not also '/sbin/service edg-wl-lbserver proxy'. The current confiruation tools may still be adding both of these entries. - LCG Job Managers: ----------------- No change. - Data Management: ---------------- See lcg_utils LCG File Catalog LFC : ---------------------- The LCG File Catalogue, already tested by the people active in ARDA and by DESY, is supported by this release. It gives access to a more performant secure catalogue. The client libs are part of the WNs and UIs. This catalogue will interoperate with the RB via the DLI interface. Migration scripts from the RLS to the LFC are available on request. The LFC released is secure, and depends on CSEC, a security plugin for client-server socket. This is only a temporary dependency, and will go away in the next release because the code will be fully integrated into the LFC itself. The current LFC is also dependend on the Castor client library. This dependency will also be removed in the next release because the Castor code will be fully integrated into the LFC. The LFC being secure means in particular that the machine hosting the LFC server should have a proper host certificate. And to use the LFC client, a user needs to have a valid proxy. Note : the INSTALL notes included in the RPMs are not up to date and cannot be updated at this time due to time constraints. Please refer to the "LFC Administrator's Guide" instead, that can be found at: https://edms.cern.ch/file/579088/1/LFC-Administrator-Guide.pdf There is also a link to this guide from the GOC Wiki Page (Data Management section) : http://goc.grid.sinica.edu.tw/gocwiki/AdministrationFaq GFAL and lcg_util : ------------------- 6445 - make error messages more user friendly, give more details about the problem Both GFAL and lcg_util are fully compliant with the secure LFC as catalog backend. Note that when using the secure LFC, the user needs a valid proxy. The lcg_util version is 1.2.9 (fixing the problem with lcg-del ) edg-rm is now a wrapper using the lcg_util tools EIS tools: ---------- These tools are targeted at the endusers and ease the access to the information system. The only current documentation is the online help, but they will be described in the upcoming new version of the LCG-2 User Guide. lcg-infosites Helps especially in finding SEs and their matching CEs. It is a replacement for the very inefficient "edg-rm printInfo" command. lcg-info For user friendly ldap browsing - VOMS ---- The version included is 1.3.7 This release includes support for VOMS on the CE and the classic SE. On other services a VOMS proxy is accepted as a standard grid proxy. On all nodes the gridmapfiles can be populated with information provided by the VOMS servers and LDAP-VO servers. This eases the interoperation with other grid infrastructures. - DPM (Disk Pool Manager) ----------------------- One of the interesting capabilities of the lightweight Disk Pool Manager is that it can convert an existing, populated Classic SE into an scalable SRM-SE without moving the data. The server is in the final stage of testing and will be released in the very near future. We mention the DPM already here to help smaller sites in planning the migration from Classic SEs to SRM enabled storage. - dCache: ------- dCache is an SRM enabled disk pool manager provided by DESY and FNAL. For an overview of the system please refer to: http://www.dcache.org The first version of an installation and operations manual can be found at: http://grid-deployment.web.cern.ch/grid-deployment/documentation/public/gis/dCache-SiteAdmin-Guide/pdf/dCache4SiteAdmins.pdf Currently there is no simple migration path from an existing Classic SE to a dCache SE, but a site can choose dCache for a new SE. dCache is a highly flexible, tunable system. SA1 can only support a limited number of standard configurations, due to resource limitations. For more complex setups, please contact the dCache team. - R-GMA: ----- R-GMA version: 4.0.7 The new version of R-GMA is based on the version of R-GMA in gLite, but provides backwards compatible APIs for the client calls. This version is expected to be more stable and have improved performance. One service that will make use of the new R-GMA is edg-rgma-gin, which in this release has been activated and will export to R-GMA all the information that is produced by the Generic Information Provider and the Grid Ice sensors. - APEL accounting: ---------------- Upgrade to version 3.4.42. The R-GMA based accounting tool now supports PBS and LSF. - Monitoring (Grid ICE): --------------------- Grid Ice: gridice-sensor-1.5.1-pl2_sl3 The latest version provided by the INFN team has been included. - WN on IA64: ----------- - Tank & Spark ------------ The RPMs for this tool for user level software installation are included in this release, the configuration and documentation can be found here: http://grid-deployment.web.cern.ch/grid-deployment/cgi-bin/index.cgi?var=eis/docs This can be an alternative for sites that have problems providing a shared file system for the user software. Installation notes: ------------------ RH7.3: it is installed by LCFGng, in addtion these few manual steps are required on the CE: a) /etc/rc.d/init.d/mysql start b) mysqladmin password c) mysqladmin -h password d) mysql -u root -p