gLite 3.0.0 Release Notes
What is in gLite-3.0.0?
Overview:
- LCG-2.7.0 and updates
- gLite WMS/LB
- gLite CE
- gLIte/LCG WN
- gLite/LCG UI
- FTS
- FTA
Who can upgrade?
Upgrades are supported from LCG-2.7.0. If you upgrade from
gLite-1.5.0 upgrades might work for the components that are in common, however
this has not been tested and it might be advisable to reinstall instead.
Why two workload management systems?
Since many applications haven't migrated all their
production systems to the gLite WLM we have to keep the LCG RBs and CEs
operating.
gLite CE or LCG CE?
All larger sites should deploy both CEs to ensure that the
majority of resources is available from both worlds.
If you are running two CEs please take care to avoid collisions of pool account
mapping. This is typically achieved either by allocating separate pool account
ranges to each CE or by allowing them to share a gridmapdir.
Since the gLite WLMS can utilize LCG-CEs, smaller sites should stay with the LCG
CE to allow access to their resources and data that is stored at their site.
The project will make an announcement when this preference will change.
gLiteWMS/LB or LCG-RB?
Since the work load manager end user APIs differ for these
services you have to get an agreement with you local user community which node
type they prefer.
Large sites should try to add at least on gLite WMS/LB node to the set of RBs
that they are operating.
User Documentation
The team around Andrea Sciaba is currently working on an
updated version of the LCG user guide. The document is currently a live document
and users should come back frequently to get updated versions. You can find the
document here:
https://edms.cern.ch/document/722398/1
As soon the document has stabilized we will publish in addition a version in
html.
The above document can be used as an introduction and manual for the system, for
those who need very detailed information on individual gLite components you can
find links here:
../../../../../../glite-web/egee/documentation/
For the FTS additional material is available here:
../../../../../../documentation/DataManagement/R3.0/
More in depth material for the gLite WLM is available from:
http://egee-jra1-wm.mi.infn.it/egee-jra1-wm/
Links to additional LCG components can be found in the draft of the LCG User
Guide.
Pointers to documentation on the components of this release are being compiled
here
http://www.grid.kfki.hu/afs/gdebrecz/web/LCG/the-LCG-directory.html
where you can find additional useful information.
Installing the release
Please use the 'Generic Installation and Configuration
Guide' that you can find here
for installation details.
For upgrade instructions refer to the 'Introduction to the
Manual Upgrade Procedure' that you can find
here.
There is a lot of useful node-specific information there
which you should take note of. Note that yaim has been renamed to glite-yaim and
relocated to /opt/glite. Many metapackages have been similarly renamed. Full
details are in the above mentioned guides.
The configuration guides for native configuration via the
gLite configuration management scripts will be updated during the next few days
and can be found in 'gLite v3.0 Advanced Installation and Configuration Guide'
and 'Generic gLite v3.0 VO Configuration Guide'
here. This is relevant for
those who run the gLite workload management services and VOMS and intend to
avoid the YAIM wrappers.
Upgrading
The upgrade from lcg-2.7 to gLite-3.0 is described and has
been tested. The upgrade from gLite-1.5 to 3.0 has not been tested and can't be
seen as an upgrade because several components that have been in 1.5 are no
longer part of the release bundle. Sites and projects that depend on these will
be supported individually and should contact the deployment team via
deploy-grid-support@cern.ch.
Notes and known issues
- WN and UI
- The tarball distribution for the WNs and UIs have
been released on May 10th.
- gLite WMS + LB
- gLite CE
- The gLite CE is configured to support only VOMS
proxies. Accounting is currently not supported on the gLite CE.
- Users should be aware that the software tags have
to be published on the gLite-CE and the lcg-CE of a site.
- If you are running two CEs (typically LCG and
gLite versions) please take care to ensure no collisions of pool account
mapping. This is typically achieved either by allocating separate pool
account ranges to each CE or by allowing them to share a gridmapdir.
- dCache
- Note that dcache may show errors if you have more
than 56 CAs. If this is the case, currently the only fix is to identify
CAs you do not need to support and remove them. See bug
16538 for reference.
- Make sure you shutdown dCache before removing
them.
- Site BDII
- If you decide to run the site BDII on your glite
CE some additional steps are needed:
Add the site.ldif file manually, or better create it by running
config_gip before you start to configure the glite CE.
Add the following line to the node-info.def:
BDII_site_FUNCTIONS="config_edgusers config_bdii"
Other issues to remain aware of
Between LCG-2_7_0 and gLite 3.0 MySQL has been upgraded
from 4.0 to 4.1. There has been a change in the password encryption, please keep
this in mind. This is handled by YAIM, but it might be good to know anyway.
Pointers to documentation on the components of this release are being compiled
here
http://lcg.web.cern.ch/LCG/Sites/the-LCG-directory.html
New Features
Before we start, this is not meant to be a complete list
of all new features, but to highlight the most important changes.
- lcg-RB
- Condor is upgraded to 6.7.10 there is a new
condor-lcg package which provides LCG modifications to the gahp_server
and grid_monitor.
- Configuration of these is handled by YAIM.
- UI
- The gLite 3.0 UI is a 'combined' UI,
incorporating LCG and gLite components.
- WN
- The gLite WN has combined gLite and LCG
components
- DPM
- SRM V2.1.1 interface implemented. Available
without srm-copy nor functional global space reservation.
- Changes in DPM from LCG2.7.0: Virutal IDs, VOMS,
DPM replicate and drain utilities for admin Upgrade from LCG-2_7_0 is
supported.
- LFC
- VOMS enabled.
- LFC supports dirs and files permissions based on
VOMS roles, groups, subgroups and user DN. No VOMS metadata associated
to user DN is available. Custodial flag requested by CMS available as
ftype (file type). No replica attributes since they are available from
SE via getFileMetadata (SRM v1) or srmLs (SRM v2). DLI interface
available in LFC v1.4.5. Possibility to set this flag is not
implemented, not available on SE either.
- Improved read performance for experiment access
patterns. It is possible to use DNS switch to balance servers load. Not
possible to serve 100Hz requests per server if security is on. The
performance measured is 150Hz requests served on a normal 2GHz PC. SSL
sessions are not currently implemented. Bulk operations not available,
but there is support for sessions (no failure prone) and transactions
(failure prone).
- GFAL
- Posix file access via LFN available.
- Best replica selection based on site location
only.
- dCache support
- The yaim script for configuring dCache has
received many updates from GridPP. It offers extended functionality but
is backward compatible.
- FTS
- Improved retry logic, option for VO specific
setting and plugin to control retry policy.
- Improved monitoring.
- The FTS web-service can authorise using VOMS
roles; DN; groups; subgroups.
- The FTS does not currently use the VOMS
credentials when contacting the SRMs (it uses the plain credential
retrieved from plain MyProxy).
- Reshuffling, cancellations possibles. VO managers
can change order of jobs.
- Plugin retry policy is available. Catalog plugin
mechanism is available and is being validated (currently by Atlas) to
check that it meets their requirements. The deployment model for these
VO-
specific plugins is not clear yet. A VO can provide a pluging for
catalogue operation. Currently based on Python script. The script is
currently in a file and there is no mechanism to distribute it.
- FTS endpoint discovery using the BDII to match
source and destination.
- srmcpy is now supported.
- Infoprovider to publish channel topology.
- Improved error handling. Full error messages are
available to users and experiment frameworks using a polling pattern.
- Notification (i.e. messaging) to users or to
experiment frameworks is not available.
- gLite WLM
- Compared to the LCG functionality.
- Support for DAG (compound jobs)
- Bulk jobs with shared input sandboxes
- Parametric Jobs
- VOMS proxy renewal
- Sandbox located on webs servers or SEs
- Shallow resubmission improving the reliability
(only for single jobs)
- Web service interface ( WM Proxy) reducing the
latency of the submission process.
- Support for MPI on non shared file system
clusters.
- Information caching from potentially multiple
sources ( Information supermarket)
- Real time access to selected job output
- APIs and command line tools for accessing logging
and booking, enabling user monitoring and monitoring of non RB jobs.
- R-GMA
Additional useful documentation
Additional useful documentation can be found
http://lcg.web.cern.ch/LCG/Sites/the-LCG-directory.html
Several HowTos including one on configuration for MPI can be found at the
gocwiki page:
http://goc.grid.sinica.edu.tw/gocwiki/MPI_Support_with_Torque
The main page
http://goc.grid.sinica.edu.tw/gocwiki/
includes links to other regional wiki's with useful additional material.