gLite > gLite 3.1 > glite-WMS > Update to glite-WMS 3.1.12-0  
 
 

 

 

gLite 3.1

glite-WMS - Update to version 3.1.12-0


Date 25.02.2009
Priority Normal

Description


Release 08_98 of the WMS for gLite3.1/SL4. Changes with respect to the current production version (patch #1726):
Those set of patches affects CEs, Proxy, BDII, LB, UI, VOBOX,WMS and Worker nodes. This release note can be apply to all thoses nodes.

  • Enabled submission to CREAM CE. A newly introduced component in the WMS internal architecture, called ICE, implements the job submission service to CREAM. Its functionality can be compared to what the three components JC. LM and CondorG do for the submission to LCG CE

  • Added recovery procedure for the WM. This feature is enabled by the option EnableRecovery = true in the WorkloadManager section of the configuration file https://twiki.cnaf.infn.it/cgi-bin/twiki/view/EgeeJra1It/WMSConfFile). It basically works in this way: upon restart, old requests are reconsidered and the LB is queried to know exactly where to resume their processing in such a way that no operation is performed twice or more.
Important: 1) if the recovery is not enabled, simply starting and stopping the glite-wms-workload_manager process (and of course restarting after whatever kind of interruption) might cause duplicating requests. 2) the recovery only works with "JobDir" (see below)

  • "JobDir" is a mailbox-based persistent communication mechanism, for the moment adopted between the WM proxy and the WM. In the present release it is enabled by default. A tool is available for converting from the former mechanism based on filelist (conversion in the opposite way is also supported). At the moment this not done automatically. Of course, another option to handle this transition will consist in putting the WMS in drain and wait for the filelist to be empty.

  • LDAP queries to fetch information in the Information Supermarket from the BDII can now be pre-filtered. This can be very helpful whenever a WMS instance is dedicated to only one VO. Typically, using a production BDII, the ISM reaches a size of 6-7000 entries, with the consequence that the match-making for a job can take a time of the order of ten seconds. Using the filter on the VO name, as for the aforementioned use-case, significantly reduces the MM time. The filtering expression has to be set by assigning the relevant parameter in the WorkloadManager section of the configuration file, as shown in the following example:
    • IsmIiLDAPCEFilterExt="(|(GlueCEAccessControlBaseRule=VO:cms)(GlueCEAccessControlBaseRule=VOMS:/cms/*))"
      
      

  • LDAP queries to the BDII can now be done asynchronously (attribute IsmIiLDAPSearchAsync = true in the WM section). This mode is typically faster than the usual synchronous one.

  • Purchasing from CEMon has been temporarily disabled

  • Purchasing from R-GMA has been removed

  • Added support for MPI jobs according to the latest specifications from the MPI working group. The value "MPICH" for the JDL attribute JobType? becomes deprecated from now on, just set it to "Normal" and follow the new guideline instead

  • Support for interactive jobs has been dismissed. However, the functionality is not compromised because it can be achieved using a tool called i2glogin (formerly known as glogin). This different approach is actually more flexible, the user being totally in charge, and it follows the trend set by the new handling for MPI jobs).

  • Known issues:
    • Performance problems in the newly introduced ICE component when it has to deal with several CEs
    • Very often, especially under high loads, the virtual memory occupation for the glite-wms-workload_manager process may reach very high values, such as one Gigabyte and more. This is not about a memory leak, but simply the effect of a well-known problem with the allocator which comes with the glibc (the so called ptmalloc2). See tcmalloc for a more detailed explanation. This problem can be avoided using run-time redirection to whatever lock-free, optimized alternative allocator, to avoid excessive swap activity. It is highly suggested doing so wherever RAM is less than or equal to 4Gb. Here is our recipe which makes use of the TCmalloc, such an alternate allocator distributed by Google under BSD license:
    • install the two rpms, google-perftools-devel-???.rpm and google-perftools-???.rpm (just pick up the latest version, older versions should work anyway, just in case),
    • enable the malloc redirection for the WM by editing the glite-wms-wm script. It is just a matter of removing the comment in the following line:
#use_google_perf_tools=1


Please also have a look at the list of known issues.

This update fixes various bugs. For the full list of bugs, please see list below.

Fixed bugs

Number Description
 #13494 ARC job submitter.
 #16308 A subscription update is not working (doesn't send the new correct expiration time)
 #21909 /etc/cron.d/glite-wms-check-daemons.cron needs to redirect stderr to /dev/null
 #23443 Documentation out of date
 #24690 Multiple retrieval of job output - Unable to perform job purge
 #26885 Job wrongly kept in ICE cache with status UNKNOWN
 #27215 WM to set the maximum output sandbox size
 #27797 Mixed int and string in Parameters attribute generates wrong jdl
 #27899 VO override does not work with JdlDefaultAttributes
 #28235 Previously used CEs are not considered at all in the resubmission
 #28249 The ICE's command line dumpICECache opens the ICE's database in ReadWrite mode that is wrong (should be readonly)
 #28498 org.glite.wms-utils.classad contains a non-portable module
 #28637 Delegation IDs not found when CREAM persistent storage is cleared
 #28642 User environment breaks WMS wrapper
 #28657 Unexpected exception thrown by ICE
 #29182 The purger under some particular circumstance segfaults
 #29538 ICE doesn't catch an exception raised by a voms function
 #30308 created .mpi file in MPICH job wrapper causes jobs to fail
 #30518 glite-wms-wm crashes during resubmission
 #30816 A collection with pending jobs can be processed multiple times
 #30896 WMS must limit number of files per sandbox
 #30900 MinPerusalTimeInterval default is too low
 #31006 more signals from the Batch System (especially LSF) to be trapped by the jobwrapper
 #31026 Jobwrapper: redirection to /dev/null for both stdout and stderr should be carefully avoided when applicable
 #31278 WMS should prevent non-SDJ jobs from being scheduled on SDJ CEs
 #32078 Problem with GangMatching statement involving GlueSEStatus
 #32345 WMProxy forward request to WM when dirmanager sigseg
 #32366 glite-WMS does not yet support worker node monitoring
 #32528 The BDII information purchasing sometimes timeouts
 #32962 FQAN comparator does not work properly
 #32980 Maradona file should be removed at resubmission
 #33026 "no compatible resources" problem on SL4 WMS
 #33140 boost::timer overflows too quickly
 #33378 The WM startup script should create the jobdir input directory if required
 #34508 Any collection submitted while the WMS is down is not recovered upon WM startup
 #34510 When a collection is aborted the "Abort" event should be logged for the sub-nodes as well
 #35156 glite-wms-purgeStorage.sh hardcodes proxy file name
 #35250 DAG: glite_wms_wmproxy_dirmanager does not extract links from tar.gz
 #35544 org.glite.wms-utils.jobid fails build because of gcc-4 strictness
 #35878 org.glite.wms.common extra qualification relates to gcc-4 strictness
 #36145 Jobdir support to be enabled in the glite-wms-planner
 #36341 Possible bug in ICE when exiting for suicidal patch. Db can return a empty string into JobCacheIterator::refresh()
 #36466 gethostname is called repeatedly and often and could cause troubles
 #36496 WMProxy Server: any-user does not work
 #36536 The glite wms purge storage library should rely on LBProxy while logging CLEAR events.
 #36551 any exception raised while reading from the input wil cause the WM to exit
 #36558 WMProxy Server: should log user id on syslog
 #36870 glite-wms-brokerinfo-access files RPM build, spec file using deprecated Copyright
 #36876 A method of creamJob can return an empty string for most long lived user proxy. This can cause a fail in a LB's method
 #36902 Cron job to renew host-proxy
 #36907 Incomplete error message reported by ICE when lease creation fails
 #36913 utility to convert filelist to jobdir
 #36962 ICE fails to build with the new WMS Purger interface
 #37659 ICE uses a ENDLINE line terminator for log4cpp's calls that is not portable to new version log4cpp 1.0. Must be removed.
 #37674 Pointer returned by edg_wll_GetSequenceCode() is not checked for non-nullness
 #37756 ICE should not resubmit jobs which have been killed by CREAM due to expiring proxy
 #37862 Wrong default value for the GLITE_LOCATION variable in glite-wms-ice script
 #37916 There's an unused and useless method in a ICE's class
 #38275 [ YAIM ] LB_HOST should be requited in the WMS configuration
 #38359 some issues in the limit for the output sandbox in the WMS jobwrapper
 #38366 Recovery doesn't work with a list-match request:
 #38509 The WM's recovery procedure hangs if no relevant events are found for a given request
 #38739 WMProxy Server: doesn't allow exec if there's only user DN in gacl file
 #38816 Suicidal patch bug
 #38828 A suicidal patch related issue
 #38975 [ yaim-wms ] clean glite-wms.pre variables
 #38976 [ yaim-wms ] replace 'echo' with 'yaimlog' command in config_glite_wms
 #38978 [ yaim-wms ] config_glite_wms should use GLITE_HOME_DIR instead of GLITE_USER_HOME
 #38997 [ yaim-wms ] GLITE_USER_HOME gives problems
 #39214 WMProxy does not check CRLs
 #39215 The purger needs some refinements
 #39217 JDL API C++: Parametric jobs are not well formed
 #39298 [yaim wms] Request to be able to set ExpiryPeriod and MatchRetryPeriod values via YAIM
 #39308 [ YAIM ] Various WMS configuration issues
 #39488 httpd-wmproxy-errors.log grows indefinitely
 #39501 Wrong message logged by ICE when job proxy files disappear
 #39641 User proxy mixup for job submissions too close in time
 #39657 Bad port specified in the WMProxy logs (443 instead of 7443)
 #39694 [ yaim-wms ] YAIM version for yaim wms
 #39903 Fermilab proxy cannot submit to WMS SL4, they are ok with SL3
 #40335 [ yaim-wms ] gLite script does not run well
 #40389 [WMS] Strange error on a 3.1 WMS
 #40967 Problems in script glite_wms_wmproxy_load_monitor
 #41418 Wrong value for the attribute purge_jobs in the WMS conf file
 #42587 Error processing DAG dependencies while generating the ISB for final node
 #42590 The WM terminates unexpectly handing a cancel request.
 #44140 [ YAIM ] something hard-coded in config_glite_wms
 #44761 WM: segmentation fault during recovery
 #44762 WM: segmentation fault while processing remnants of aborted collections
 #44763 When a collection is aborted the "Abort" event should be logged for waiting, submitted or done-failed sub-nodes
 #45391 submit requests for pending collections are deleted by the recovery on wm exit
 #46209 [ yaim-wms] GlueServiceType: org.glite.wms.WMProxy, supported VOs not published

Updated rpms

Name Version Full RPM name Description
glite-WMS 3.1.12-0 glite-WMS-3.1.12-0.i386.rpm gLite metapackage (glite-WMS)
glite-ce-cream-client-api-c 1.9.4-0.slc4 glite-ce-cream-client-api-c-1.9.4-0.slc4.i386.rpm org.glite.ce.cream-client-api-c (1.9.4)
glite-ce-monitor-client-api-c 1.9.2-0.slc4 glite-ce-monitor-client-api-c-1.9.2-0.slc4.i386.rpm org.glite.ce.monitor-client-api-c (1.9.2)
glite-info-provider-service 1.0.3-0 glite-info-provider-service-1.0.3-0.noarch.rpm glite-info-provider-service
glite-jdl-api-cpp 3.1.16-2.slc4 glite-jdl-api-cpp-3.1.16-2.slc4.i386.rpm org.glite.jdl.api-cpp v. 3.1.16-2
glite-wms-brokerinfo 3.1.11-3.slc4 glite-wms-brokerinfo-3.1.11-3.slc4.i386.rpm org.glite.wms.brokerinfo
glite-wms-broker 3.1.5-6.slc4 glite-wms-broker-3.1.5-6.slc4.i386.rpm Increased age after properties cleaning
glite-wms-classad_plugin 3.1.9-3.slc4 glite-wms-classad_plugin-3.1.9-3.slc4.i386.rpm Increased age after properties cleaning
glite-wms-common 3.1.22-1.slc4 glite-wms-common-3.1.22-1.slc4.i386.rpm org.glite.wms.common
glite-wms-configuration 3.1.11-3.slc4 glite-wms-configuration-3.1.11-3.slc4.i386.rpm Increased age after properties cleaning
glite-wms-helper 3.1.30-1.slc4 glite-wms-helper-3.1.30-1.slc4.i386.rpm org.glite.wms.helper provides a framework to add modules to the workload manager to transform a user-defined job description (jdl) to a format ready for submission to the CE. org.glite.wms.helper also provides some fundamental modules related to matchmaking.
glite-wms-ice 3.1.31-1.slc4 glite-wms-ice-3.1.31-1.slc4.i386.rpm Bug fixes
glite-wms-ism 3.1.19-2.slc4 glite-wms-ism-3.1.19-2.slc4.i386.rpm org.glite.wms.ism
glite-wms-jobsubmission 3.1.11-1.slc4 glite-wms-jobsubmission-3.1.11-1.slc4.i386.rpm Increased age after properties cleaning
glite-wms-manager 3.1.48-1.slc4 glite-wms-manager-3.1.48-1.slc4.i386.rpm org.glite.wms.manager
glite-wms-matchmaking 3.1.8-3.slc4 glite-wms-matchmaking-3.1.8-3.slc4.i386.rpm Increased age after properties cleaning
glite-wms-purger 3.1.13-1.slc4 glite-wms-purger-3.1.13-1.slc4.i386.rpm Increased age after properties cleaning
glite-wms-utils-classad 3.1.7-1.slc4 glite-wms-utils-classad-3.1.7-1.slc4.i386.rpm org.glite.wms-utils.classad v. 3.1.7
glite-wms-utils-exception 3.1.3-2.slc4 glite-wms-utils-exception-3.1.3-2.slc4.i386.rpm org.glite.wms-utils.exception v. 3.1.3
glite-wms-utils-jobid 3.1.5-1.slc4 glite-wms-utils-jobid-3.1.5-1.slc4.i386.rpm org.glite.wms-utils.jobid v. 3.1.5
glite-wms-wmproxy 3.1.41-1.slc4 glite-wms-wmproxy-3.1.41-1.slc4.i386.rpm org.glite.wms.wmproxy v. 3.1.41-1
glite-yaim-wms 4.0.5-2 glite-yaim-wms-4.0.5-2.noarch.rpm glite-yaim-wms_R_4_0_5_2

The RPMs can be updated using yum via

Service reconfiguration after update

Service must be reconfigured.

Service restart after update

Service must be restarted.

How to apply the fix

  1. Update the RPMs (see above)
  2. Update configuration (see above)
  3. Restart the service if necessary (see above)