** LCG-2_7_0 RELEASE NOTES ** This is the official announcement of the release of LCG-2_7_0. Please read these notes carefully before installing or upgrading. ** CONTENTS ** 1 INTRODUCTION 2 INFORMATION CONCERNING PARTICULAR COMPONENTS 3 INFORMATION CONCERNING PARTICULAR NODES 4 CHANGELOG Note in particular the information given on upgrading R-GMA. We would like to thank the IT, SEE and UKI ROCs for their help in testing this release. ** INTRODUCTION ** This release of LCG-2_7_0 should be understood in the context of this year's timetable. As a ROC manager or site admin you may have heard about the planned merger of the LCG-2.x middleware stack and the gLite middleware fusing into gLite-3.0. While we can't give you an accurate release date for this important change we think that we should share with you what drives the process that will lead to gLite-3.0. The main event on the production system this year will be Service Challenge 4. For this the start of the production phase is June 1st. At this time gLite-3.0 has to be up and running at all sites that participate. To allow for some local adaptation, verification, and the like, the release has to be given to the sites at least one month before the start. This makes it very likely that the next release will be there before April 30. This brings up the question whether you should upgrade now or wait for gLite-3.0. Since several of the components come with many significant improvements and fixes it is advisable to upgrade now. The only component for which you might want to wait for the first upgrade to LCG-2_7_0 is the DPM. As you can see in the release notes the VOMS enabled DPM is ready to enter certification and will be released as soon as possible. Full details on the release can be found at http://lcg.web.cern.ch/LCG/Sites/releases.html ** INFORMATION CONCERNING PARTICULAR COMPONENTS ** * R-GMA * The R-GMA that comes with the LCG-2_7_0 release does not have client server compatibility with the version of R-GMA that was installed with LCG-2_6_0. This means that at the site, the client version must match the version of the server. To ensure continuity of service there are two options. If the site is small and the upgrade of the site will be done in one go, just update the site MON box first, followed by all other nodes. If the site is large and the upgrade can not be done in one go, it is recommended that a new MON box be installed with the new version. As the other nodes are upgraded, the R-GMA clients on those nodes can be reconfigured to point from the old to the new MON box. Soon after this release the R-GMA Registry will stop accepting non-authenticated connections. This means that sites that have not moved to authenticated connectors will no longer be able to use R-GMA. Please note also that SFT Server is currently incompatible with an LCG-2_7_0 MON. If you want to build your own SFT server, use an LCG-2_6_0 MON box. * REPOSITORY * The main software repository is http://grid-deployment.web.cern.ch/grid-deployment/gis/apt/LCG-2_7_0/sl3/en/i386 There is a mirror, updated daily, at http://linuxsoft.cern.ch/LCG/apt/LCG-2_7_0/sl3/en/i386 Security updates have been separated so sites can distinguish between these and others. Yum 2.4 support has been added. * YAIM * Please see the upgrade guide for information on which yaim variables have been added and removed. http://grid-deployment.web.cern.ch/grid-deployment/documentation/LCG2-Manual-Upgrade/ Perhaps the most important is that SE_HOST is no longer used. For a classic SE use CLASSIC_HOST, and then populate the variable SE_LIST with a list of the hostnames of all your SEs if appropriate (classic, DPM and d-Cache). Note that the number of pool accounts per VO has increased significantly! Also note there now is a "groups.conf" besides the standard "users.conf", documented here: http://goc.grid.sinica.edu.tw/gocwiki/Writing_a_good_groups%2econf_file_for_yaim The Manual Installation Guide is available here http://grid-deployment.web.cern.ch/grid-deployment/documentation/LCG2-Manual-Install/ More information on what yaim does can be found the the Generic Configuration Reference http://grid-deployment.web.cern.ch/grid-deployment/gis/lcg-GCR/ VO Configuration This release coincides with the launch of a tool to help sites find the information they need to configure VO support. https://lcg-sft.cern.ch/yaimtool/yaimtool.py This should enable site administrators to generate the necessary yaim parameters for the VOs they wish to support. Information for as many VOs as possible has been added. VO Managers can populate the database further. This service is being finalised but is useable now - feedback is appreciated. * VOMS * voms-client 1.6.10 is packaged in this release, which is also included in glite R1.4.1. Because of some thread-safety issues and memory leaks in the 1.6.10 libraries, the old 1.5.4 client is still provided and used by default (the DPM gridftpd needs the new version). The contents of the {$EDG_LOCATION,$GLITE_LOCATION}/etc/vomses directories are configured by yaim now. They are used when users type the command: voms-proxy-init -voms [VO nickname]:/group/role=[VOMS-Role] voms 1.6.10 and voms-admin 1.2.10 are included in the VOMS server LCG2 Release. There is no yaim configuration for VOMS servers. * LCG_UTILS * lcg_utils/gfal now default to using LFC as the catalogue type. To use the RLS a user must define this environment variable: export LCG_CATALOG_TYPE=edg # bash or setenv LCG_CATALOG_TYPE edg # tcsh * LCG-INFOSITES * A complete new version of lcg-infosites. New features including requests to all local services have been included. Some very slow queries have been removed this time. * LCG-MON-STDOUT * Many bug fixes. * BDII * Important changes provided by version 3.5.3: - LANG=C setting moved to the correct place (was ineffective before); - no longer touches static files on startup (bug 8800); - configuration files no longer world-readable (bug 10916); - measures to prevent recursive inclusion; - info providers are supplied with host proxy when possible. * XEN * Information on using SLC3 and Xen can be found at http://project-xen.web.cern.ch/project-xen/xen/howto.html We will release some images of LCG_2_7_0/SLC3 to be used with Xen. ** INFORMATION CONCERNING PARTICULAR NODES ** * RB * The RB publishes job status monitoring information to R-GMA. Please turn this off if local regulations dictate. High availability solutions for RB (and MyProxy) nodes which separate state and processing are now available. Please see the wiki for more: http://goc.grid.sinica.edu.tw/gocwiki/AdministrationFaq Yaim now configures the RB to use the DLI by default for matchmaking. You can use yaim's RB_RLS variable list define VOs for which the RLS should still be used. A new cron job on RB to remove stale sandboxes. * CE * The gridftp monitor has been removed from the CE. Please remove this rpm (lcg-mon-gridftp) if necessary. It has not been 'obsoleted' in the meta-rpm to avoid problems with combined node types. Note that yaim will remove this rpm for you if necessary. A new cron job on CE to remove stale files left under grid account home directories. The lcg-expiregridmapdir cron job now removes the least recently used links until the VO's pool account usage is again below a given threshold, by default 80 percent; it no longer checks if an account has jobs in the batch system, as Condor-G (e.g. the RB) will regularly touch the relevant accounts on CEs that have unfinished jobs. * SE * Please ensure that your SE info providers are advertising the correct GlueSEArchitecture (http://infnforge.cnaf.infn.it/glueinfomodel/index.php/Spec/V12). You can now set this with SE_ARCH in yaim. * WN * A new cron job on WN to remove stale files left under grid account home directories. * DPM/LFC * LFC : * virtual uids/gids * VOMS support * possibility to be read-only * possibility to specify the number of threads at startup time * new methods : lfc_getlinks, lfc_getreplica, lfc_readdirxr * automatic reconnection if DB connection broken Important note: By default, YAIM will run the Data Location Interface for all LFCs (locals and centrals). The DLI gives a insecure read-only access to the LFC data. Please check with your VO whether this is acceptable. To turn the DLI off, here is the recipe : * /sbin/service lfc-dli stop * /sbin/chkconfig lfc-dli off * in /etc/sysconfig/lfc-dli, set RUN_DLI="no" DPM : * multi-domain support * DPNS only : possibility to specify the number of threads at startup time * SRMv2 : fix of srmLs * automatic reconnection if DB connection broken For those upgrading DPM or LFC manually, please note the instructions in the following pages. Yaim will perform the steps described for you; https://uimon.cern.ch/twiki/bin/view/LCG/LfcVirtualIdsAndVOMS https://uimon.cern.ch/twiki/bin/view/LCG/MultiDomainDpm A VOMS enabled DPM narrowly missed the cutoff for entry into LCG-2_7_0. Please be aware that it will be released as an update as soon as it has been certified. * CLASSIC SE * This is a good opportunity to upgrade your Classic SE to a DPM; https://uimon.cern.ch/twiki/bin/view/LCG/ClassicSeToDpm * BATCH SYSTEM * The CIC On Duty has reported about a non vanishing number of sites failing the regular SFTs. The reason for these specific failures seems to be a wrong configuration of the site's batch system concerning the dteam VO: - dteam jobs should have a higher priority (queue based) than those owned by other VOs. - dteam should get access to at least one CPU, immediately when a CPU is free. Short deadline jobs - there is useful information on setting up torque/maui to handle short deadline jobs; http://egee-na4.ct.infn.it/wiki/index.php/Torque%20Configuration MPI The version of torque/maui provided with the release will not work for MPI. This will be fixed in future releases. In the meantime, please see the following page on how to support MPI; http://goc.grid.sinica.edu.tw/gocwiki/MPI_Support_with_Torque * VOBOX * lcg-user-configuration-vobox-1.0.0-1.noarch.rpm This is a complete new tool included in the vobox and it allows to publish the own experiment services from the VO-BOx inside the BDII. * d-CACHE * Yaim does not yet support d-Cache with a postgresql based pnfs. To accommodate sites who have already upgraded to this version of pnfs, we now have two types of d-Cache SE. lcg-SE_dcache This has no dependency on pnfs at all, so upgrades of either type (postgresql or gdbm) should work at the rpm level. lcg-SE_dcache_gdbm This has a dependency on pnfs (ie the gdbm version) and is necessary for a new install. Please note however that pnfs_postgresql is the preferred implementation and migration is non trivial. A new install with pnfs_postgresql will be supported in future. For more information on pnfs_postgresql please see the d-Cache book; http://www.dcache.org/manuals/index.shtml ** CHANGELOG ** CE Estimated Response Time Info Providers (v 1.4.1) This information provider is new in LCG 2.7.0 and is contained in two RPMs, lcg-info-dynamic-scheduler-generic and lcg-info-dynamic-scheduler-pbs. Sites using torque/pbs as an LRMS and Maui as a scheduler are fully supported by this configuration; those using other schedulers and/or LRMS systems will need to provide the appropriate back-end plugins. For sites meeting the following criteria, the system should work out of the box with no modifications whatsoever: LRMS == torque scheduler == maui vo names == unix group names of that vo's pool accounts Documentation on what to do if this is not the case can be found in the file lcg-info-dynamic-scheduler.txt in the doc directory /opt/lcg/share/doc/lcg-info-dynamic-scheduler There is also documentation in this directory indicating the requirements on the backend commands you will need to provide in the case that you are using a different scheduler or LRMS. Tim Bell at CERN can help for people using LSF. Note for sites not configuring from scratch: check after upgrading that in /etc/maui.cfg the following appears: ADMIN3 edginfo rgma LCG_UTILS/GFAL Lcg_util (v1.3.5) bug #14107: lcg_utils gives bad errnos VO not specified and not provided in env bug #11700: lcg-lr returns wrong error message on no such guid in LFC bug #12226: lcg-cr vs. relative path in SRM SARoot Reworked lcg_globus_ifce.c to reset SIGINT sighandler after copy and few misc. issues Fixed some errno reporting in lcg_globus_ifce.c GFAL (v1.7.7) bug #10348: lcg-cr -v should report LFC endpoint bug #10591: timeout too short for the ldap query to the BDII? bug #11515: default SE functionality doesn't work for VOs like vo.lal.in2p3.fr bug #12222 : lcg-* tools should use GlueSAPath if available bug #12535: lcg-cr gives incomplete error message when LCG_CATALOG_TYPE=edg bug #13998: Better error messages required for MDS problems bug #14121: Default Catalog should be LFC, not EDG bug #14712: lcg-lr/lcg-la SEGV when num entries == internal buffer size FTS https://uimon.cern.ch/twiki/bin/view/LCG/FtsChangesFrom13To14 WMS Changes between lcg2_1_68 and lcg2_1_73: o change in the format of the logging of job status changes, for RGMA export o by default use multiple threads during matchmaking to improve performance o added the possibility to look for a script in a site specific location for execution in the job wrapper before the user's job o added output sandbox size limiting: default limit 100 million bytes o changes to improve the performance of gang matching o more connection logging and thread identity logging in the network server and workload manager logs o fix to improve the cleanup of user processes on the workernode when jobs finish or when the proxy lifetime is exceeded o various changes to improve the performance of some internal operations YAIM bugs fixed #11045 config_proxy_server: myproxy started twice #8413 SE_TYPE value not competely used in config_gip #8585 lcg-SEDCache installation missing steps/checks #8721 YAIM sets lcg-info-generic.conf on D-Cache to a classic SE #9012 Cannot configure a dpm as second SE #9954 config_gip publishes wrong value for GlueSEAccessProtocolPort for SRM SEs #9992 MYPROXY_TCP_PORT_RANGE is not defined #10257 Inclusion of VOs in the default YAIM site-info.def #10334 SE_dpm_disk should start DPM-enabled GridFTP #10338 yaim-2.6.0-8: run_function doesn't work for dcache and dpm SEs #10514 Request for Feature: monitoring-support of new Grid-Elements and processes #10576 Tomcat requires maxPostSize set to 0 in server.xml #10715 YAIM should refuse bad node type combinations #10801 Yaim missconfigures shift.conf if already available #10844 SITE_NAME env variable needed #10853 GlueCEInfoContactString misconfigured #10871 DPM : please add the possibility to specify several disk servers #10919 configure_node SE_dpm_disk does not create pool accounts or configure gridftp #10943 GlueService mismatch between LCG and gLite e.g. MyProxy #10956 yaim should take new R-GMA JobStatusRaw table into account #11072 APEL Configuration file -GKLogProcessor SubmitHost #11332 yaim should randomise apel publishing times #11450 config_fmon_client: does not default INSTALL_ROOT to /opt #11474 GlueServiceOwner #11557 config_torque_server adds acls to queue each time it is run #11599 lcg-expiregridmapdir cron job should run every hour #12393 yaim doesn't configure TAR_UI properly when running as root and !central_certs #11764 config_mkgridmap does not generate SGM entry if only VOMS is used #12174 RB dynamic plugin not working. #12198 GlueServiceVersion not published well #12200 Service Site relation is not published correctly for some services #12201 Use GlueServiceEndpoint instead of GlueServiceURI #12209 vo_user_prefix in config_mkgridmap && user.conf format #12597 yaim configure VOBOX's gatekeeper w/ ${JOB_MANAGER} #12630 Duplicated blocks added to gsi-ssh config files upon reconfiguration #12651 BDII modify URL #12781 some VOs need more than 50 pool accounts #13574 configure_node: use of tr without quotes, causes unexpected shell expansion #13799 Alternative list of VOs for LFC server configuration #13849 user configuration improvements in yaim #13852 no sgm accounts for VOMS-only VOs #13917 WN must not have GLOBUS_TCP_PORT_RANGE #13966 GlueCEInfoDataDir not set correctly? #14004 ERT dynamic plugin config to be added #14074 Glue schema information not fully specified for SEs #14081 SAPath is incorrect for DPM + dCache - should only be path on disk #14200 environment variables missed in the site nodes #14203 RBs should use the DLI #14243 YAIM : all LFCs should run the DLI #14268 Error with SE_DPM_mysql configuration; config_DPM_mgr #14433 mistype in yaim site-info.def file R-GMA The new R-GMA version has fixed a number of bugs related to reliability and scalability. There is also improved tuple validation against the schema and a new RGMAUserException for badly formated tuples.