===============================================================================
=========================== LCG1 Installation notes ===========================
===============================================================================
=========== C 2003 by Emanuele Leonardi - Emanuele.Leonardi@cern.ch ===========
===============================================================================

Reference tag: LCG1-1_1_0

These notes will assist you in installing the latest LCG1 tag. Read through it
carefully before starting the installation of a new site.

If at your site you have already installed release LCG1-1_0_0 or LCG1-1_0_1,
then you can opt for an update of the nodes rather than a complete
re-installation. Appendix E summarizes the procedure to follow in this case.

Introduction and overall setup
==============================

In this text we will assume that you are already familiar with the LCFGng
server installation and management.

A detailed guide can be found at

http://grid-deployment.web.cern.ch/grid-deployment/gis/lcfgng-server73.pdf

Note that by following the procedure described in that document, you will
install on your LCFGng server the latest available version of each object.
In some cases this may be incompatible with the object version used on the
LCFGng client nodes. To make sure that the correct version of each object is
installed on your server, you should use the lcfgng_server_update.pl script,
available from CVS. See chapter "Preparing the installation of current tag"
below for instructions on how to obtain and run this script.

Files needed for the current LCG1 release are available from a CVS server at
CERN. This CVS server contains the list of rpms to install and the LCFGng
configuration files for each node type. The CVS area, called "lcg1" and
topologically equivalent to the edg-release area in the EDG CVS repository,
can be reached from

http://lcgapp.cern.ch/cgi-bin/viewcvs/viewcvs.cgi/?cvsroot=lcgdeploy

Note1: at the same location there is another directory called "lcg-release":
this area is used for the integration and certification software, NOT for
production. Just ignore it!

Note2: documentation about access to this CVS repository can be found in

http://grid-deployment.web.cern.ch/grid-deployment/documentation/cvs-guide

In the same CVS location we created an area for each of the sites participating
to LCG1, e.g. BNL, BUDAPEST, CERN, etc. These directories (should) contain the
configuration files used to install and configure the nodes at the
corresponding site. Site managers are required to keep these directories
up-to-date by committing all changes they do to their configuration files back
to CVS so that we will be able to keep track of the status of each site at
any given moment. If a site reaches a consistent working configuration, site
managers can (should) create a tag which will allow them to easily recover
configuration information if needed. The tag name should follow the convention
described in

http://grid-deployment.web.cern.ch/grid-deployment/cgi-bin/index.cgi?var=lcg1Status

Note: if you have not done it yet, please get in touch with Louis Poncet
<louis.poncet@cern.ch> or Markus Schulz <markus.schulz@cern.ch> to activate
your write-enabled account on the CVS server.

Given the increasing number of sites joining the LCG production system, support
for installation at new sites is now organized in a hierarchical way: new
secondary (Tier2) sites should direct questions about installation problems to
their reference primary (Tier1) site. Primary sites will then escalate problems
when needed.

All site managers have in any case to join and monitor the LCG-Rollout list
where all issues related to the LCG deployment, including announcements of
updates and security patches, are discussed. You can join this list by going to

http://cclrclsv.RL.AC.UK/archives/lcg-rollout.html

and clicking on the "Join or leave the list" link.

Preparing the installation of current tag
=========================================

The current LCG1 tag is ---> LCG1-1_1_0 <---

In the following instructions/examples, when you see the <CURRENT_TAG> string,
you should replace it with the name of the tag defined above.

To install it, check it out on your LCFG server with

> cvs checkout -r <CURRENT_TAG> -d <TAG_DIRECTORY> lcg1

Note: the "-d <TAG_DIRECTORY>" will create a directory named <TAG_DIRECTORY>
and copy there all the files. If you do not specify the -d parameter, the
directory will be a subdirectory of the current directory named lcg1.

The default way to install the tag is to copy the content of the rpmlist
subdirectory to the /opt/local/linux/7.3/rpmcfg directory on the LCFG server.
This directory is NFS-mounted by all client nodes and is visible as
/export/local/linux/7.3/rpmcfg

Now go to the directory where you keep your local configuration files. If you
want to create a new one, you can check out from CVS any of the previous tags
with:

> cvs checkout -r <YOUR_TAG> -d <LOCAL_DIR> <YOUR_SITE>

If you want the latest (HEAD) version of your config files, just omit the
"-r <YOUR_TAG>" parameter.

Go to <LOCAL_DIR>, copy there the template files from <TAG_DIRECTORY>/source,
cfgdir-cfg.h.template, local-cfg.h.template, and site-cfg.h.template, rename
them cfgdir-cfg.h, local-cfg.h, and site-cfg.h, and edit their content
according to the instructions in the files.

NOTE: if you already have localized versions of these files, just compare
them with the new templates to verify that no new parameter needs to be set.

To download all the rpms needed to install this version you can use the
updaterep command. In <TAG_DIRECTORY>/updaterep you can find 2 configuration
files for this script: updaterep.conf and updaterep_full.conf. The first will
tell updaterep to only download the rpms which are actually needed to install
the current tag, while updaterep_full.conf will do a full mirror of the LCG rpm
repository. Copy updaterep.conf to /etc/updaterep.conf and run the updaterep
command. By default all rpms will be copied to the /opt/local/linux/7.3/RPMS
area, which is visible from the client nodes as /export/local/linux/7.3/RPMS.
You can change the repository area by editing /etc/updaterep.conf and modifying
the REPOSITORY_BASE variable.

IMPORTANT NOTICE: as the list and structure of Certification Authorities (CA)
accepted by the LCG project can change independently from the middle-ware
releases, the rpm list related to the CAs certificates and URLs has been
decoupled from the standard LCG1 release procedure. This means that the version
of the security-rpm.h file contained in the rpmlist directory associated to the
current tag could be incomplete or obsolete. Please go to URL

http://grid-deployment.web.cern.ch/grid-deployment/cgi-bin/index.cgi?var=lcg1Status

Click on the "LCG1 CAs" link at the bottom of the page and follow the
instructions there to update all CA-related settings. Changes and updates of
these settings will be announced on the LCG-Rollout mailing list.

To make sure that all the needed object rpms are installed on your LCFG server,
you should use the lcfgng_server_update.pl script, also located in
<TAG_DIRECTORY>/updaterep. This script will report which rpms are missing or
have the wrong version and will create the /tmp/lcfgng_server_update_script.sh
script which you can then use to fix the server configuration. Run it in the
following way:

> lcfgng_server_update.pl <TAG_DIRECTORY>/rpmlist/lcfgng-common-rpm.h
> /tmp/lcfgng_server_update_script.sh
> lcfgng_server_update.pl <TAG_DIRECTORY>/rpmlist/lcfgng-server-rpm.h
> /tmp/lcfgng_server_update_script.sh

WARNING: please always give a look to /tmp/lcfgng_server_update_script.sh
and verify that all rpm update commands look reasonable before running it.

In the source directory you should give a look to the redhat73-cfg.h file
and see if the location of the rpm lists (updaterpms.rpmcfgdir) and of the rpm
repository (updaterpms.rpmdir) are correct for your site (the defaults are
consistent with the instructions in this document). If needed, you can redefine
these paths from the local-cfg.h file.

Also in local-cfg.h you can (must!) replace the default root password with the
one you want to use for your site:

+auth.rootpwd <CRYPTED_PWD> <--- replace with your own crypted password

To obtain <CRYPTED_PWD> you can use the following command:

> perl -e 'print crypt("MyPassword","y3")."\n"'

where "MyPassword" should be replaced with your own clear-text root password
and "y3" is a string with two randomly chosen characters (change "y3" to your
liking).

To finalize the adaptation of the current tag to your site you should edit your
site-cfg.h file. You can use the site-cfg.h.template file in the source
directory as a starting point. If you already have a site-cfg.h file that you
used to install any of the LCG1-1_0_0 and LCG1-1_0_1 tags, you can find a
detailed description of the modifications to this file needed for the new tag
in Appendix E below.

WARNING: the template file site-cfg.h.template assumes you want to run the
PBS batch system without sharing the /home directory between the CE and all the
WNs. This is the highly recommended setup. If for some reason you want to
run PBS in traditional mode, i.e. with the CE exporting /home with NFS and
all the WNs mounting it, you should edit your site-cfg.h file and comment out
the following two lines:

#define NO_HOME_SHARE
...
#define CE_JM_TYPE            lcgpbs

In addition to this, your WN configuration file should include this line:

#include CFGDIR/UsersNoHome-cfg.h"

just after including Users-cfg.h (please note that BOTH Users-cfg.h AND
UsersNoHome-cfg.h must be included).

WARNING: in the current default configuration the "file" protocol access to the
SE is enabled. This means that SE and WN nodes must share a disk area called
/flatfiles/SE00. This is where the various VO will store their files, each VO
using a different subdirectory named after the VO itself (e.g.
/flatfiles/SE00/atlas) plus an extra directory, named "data" (i.e.
/flatfiles/SE00/data). If you are using an external file server to hold this
area, mounting it both on the SE and on the WNs, then you should create the
<VO> and "data" subdirectories yourself and set the ownership as root:<VO>
(or root:root for "data") and the access as 775, i.e.

> ls -ld /flatfiles/SE00
drwxrwxr-x    3 root     alice        4096 Sep  2 16:13 alice
drwxrwxr-x    3 root     atlas        4096 Sep  2 16:13 atlas
drwxrwxr-x    3 root     cms          4096 Sep  2 16:13 cms
drwxrwxr-x    3 root     root         4096 Sep  2 16:13 data
drwxrwxr-x    3 root     dteam        4096 Sep  2 16:13 dteam
drwxrwxr-x    3 root     lhcb         4096 Sep  2 16:13 lhcb

If on the other hand you want to keep the /flatfiles/SE00 area on the local
disk of the SE, then you can tell LCFG to create the whole structure by
including the following line in your SE node configuration file:

#include CFGDIR/flatfiles-dirs-SECLASSIC-cfg.h"

Note: the line should be inserted close to where you include
StorageElement-cfg.h but the exact position is not important.

Node installation and configuration
===================================

In your site-specific directory you should already have the do_mkxprof.sh
script. If you do not find it, you can check out the one in the CERN CVS area
with

> cvs checkout CERN/do_mkxprof.sh

This script just calls the mkxprof command specifying that it should look for
configuration files starting from the current directory (option "-S."). Feel
free to use your preferred call to the mkxprof command but note that running
mkxprof as a daemon is NOT recommended and can easily lead to massive
catastrophes if not used with extreme care: do it at your own risk.

To create the LCFG configuration for one or more nodes you can do

> ./do_mkxprof.sh node1 [node2 node3, ...]

If you get an error status for one or more of the configurations, you can get
a detailed report on the nature of the error by looking into URL

http://<Your_LCFGng_Server>/status/

and clicking on the name of the node with a faulty configuration (a small red
bug should be shown beside the node name).

Once all node configurations are correctly published, you can proceed and
install your nodes following any one of the installation procedures described
in the "LCFGng Server Installation Guide" mentioned above.

When the initial installation completes (expect two automatic reboots in the
process), each node type requires a few manual steps, detailed below, to be
completely configured. After completing these steps, some of the nodes need
a final reboot which will bring them up with all the needed services active.
The need for this final reboot is explicitly stated among the node
configuration steps below.

Common steps
------------

-- On the ResourceBroker, MyProxy, ComputingElement, and StorageElement nodes
   you should install the host certificate/key files in /etc/grid-security with
   names hostcert.pem and hostkey.pem. Also make sure that hostkey.pem is only
   readable by root with

   > chmod 400 /etc/grid-security/hostkey.pem

-- All Globus services grant access to LCG users according to the list of
   certificates contained in the /etc/grid-security/grid-mapfile file.
   The list of VOs included in grid-mapfile is defined in
   /opt/edg/etc/edg-mkgridmap.conf. By default all VOs accepted in LCG
   are included in this list. You can prevent VOs from accessing your site by
   commenting out the corresponding line in edg-mkgridmap.conf.

   E.g. by commenting out line

group ldap://grid-vo.nikhef.nl/ou=lcg1,o=alice,dc=eu-datagrid,dc=org .alice

   on your CE you will prevent users in the Alice VO to submit jobs to your
   site.

   After installing a ResourceBroker, ComputingElement, or StorageElement node
   and modifying (if needed) the local edg-mkgridmap.conf file you may force a
   first creation of the grid-mapfile by running

> /opt/edg/sbin/edg-mkgridmap --output --safe

   Every 6 hours a cron job will repeat this procedure and update grid-mapfile.

UserInterface
-------------

No additional configuration steps are currently needed on a UserInterface node.

ResourceBroker
--------------

-- Configure the MySQL database. See detailed recipe in Appendix C at the end
of this document

-- Reboot the node

ComputingElement
----------------

-- Configure the PBS server. See detailed recipe in Appendix B at the end of
   this document.

-- Create the first version of the /etc/ssh/ssh_known_hosts file by running

   > /opt/edg/sbin/edg-pbs-knownhosts

   A cron job will update this file every 6 hours.

-- If your CE is NOT sharing the /home directory with your WNs (this is the
   default configuration) then you have to configure sshd to allow WNs to copy
   job output back to the CE using scp. This requires the following two steps:

1) modify the sshd configuration. Edit the /etc/ssh/sshd_config file
and add these lines at the end:

HostbasedAuthentication yes
IgnoreUserKnownHosts yes
IgnoreRhosts yes

and then restart the server with 

> /etc/rc.d/init.d/sshd restart

2) configure the script enabling WNs to copy output back to the CE.

 - in /opt/edg/etc, copy edg-pbs-shostsequiv.conf.template to
   edg-pbs-shostsequiv.conf then edit this file and change parameters to your
   needs. Most sites will only have to set NODES to an empty string.

 - create the first version of the /etc/ssh/shosts.equiv file by running 

   > /opt/edg/sbin/edg-pbs-shostsequiv

   A cron job will update this file every 6 hours.

Note: every time you will add or remove WNs, do not forget to run

> /opt/edg/sbin/edg-pbs-shostsequiv    <--- only if you do not share /home
> /opt/edg/sbin/edg-pbs-knownhosts

on the CE or the new WNs will not work correctly till the next time cron runs
them for you.

-- The CE is supposed to export information about the hardware configuration
   (i.e. CPU power, memory, disk space) of the WNs. The procedure to collect
   these informations and publish them is described in Appendix D of this
   document.

-- Reboot the node

-- If your CE exports the /home area to all WNs, then after rebooting it make
   sure that all WNs can still see this area. If this is not the case, execute
   this command on all WNs:

   > /etc/obj/nfsmount restart

StorageElement
--------------

-- Make sure that all subdirectories in /flatfiles/SE00 were correctly created
   (see WARNING notice at the end of the "Preparing the installation of the
   current tag" section).

-- Reboot the node.

-- If your SE exports the /flatfiles/SE00 area to all WNs, then after rebooting
   the node make sure that all WNs can still see this area. If this is not the
   case, execute this command on all WNs:

   > /etc/obj/nfsmount restart

WorkerNode
----------

-- If your WNs are NOT sharing the /home directory with your CE (this is the
   default configuration) then you have to configure ssh to enable them to copy
   job output back to the CE using scp. To this end you have to modify the ssh
   client configuration file /etc/ssh/ssh_config adding these lines at the end:

Host *
     HostbasedAuthentication yes

Note: the "Host *" line might already exist. In this case, just add the second
line after it.

-- Create the first version of the /etc/ssh/ssh_known_hosts file by running

   > /opt/edg/sbin/edg-pbs-knownhosts

   A cron job will update this file every 6 hours.

BDII Node
---------

To avoid having a single top MDS which could be easily overloaded, we have
split the information system in several (two for the moment) independent
information regions, each served by one or more regional MDSes.

For this schema to work, each and every BDII on the GRID must know the name of
all region MDSes and merge information coming from them into a single database.

All software needed to handle this data collection is now included in an rpm
installed on the BDII node. To configure it you have to:

- go to the /opt/edg/etc directory
- copy bdii-cron.conf.template to bdii-cron.conf

The template file included in the current tag already contains the correct
settings for Regional MDSes active as of end October 2003. These are:

MDS_HOST_LIST="
    adc0026.cern.ch:2135/lcg00108.grid.sinica.edu.tw:2135
    lcgcs01.gridpp.rl.ac.uk:2135
"

Should new Regional MDS nodes appear, they will be announced on the lcg-rollout
mailing list. In this case you will have to edit bdii-cron.conf and add them
to the correct group. You can find a description of the syntax for the
MDS_HOST_LIST variable in Appendix A in this document.

If in doubt, send e-mail to the lcg-rollout mailing list asking for the
correct setting of MDS_HOST_LIST for your site.

Regional MDS Node
-----------------

No additional configuration steps are currently needed on a Regional MDS node.

Note: If your site is hosting a Regional MDS node, once in a while you will be
notified of new sites joining your region. In this case on the LCFGng server
you should modify the node configuration file for your Regional MDS and add a
line like:

EXTRA(globuscfg.allowedRegs_topmds) <SITEGIIS_HOST>:2135

where <SITEGIIS_HOST> is the hostname of the node hosting the GIIS for the new
site. Then you must update your Regional MDS node with mkxprof.

MyProxy Node
------------

-- Reboot the node after installing the host certificates (see "Common Steps"
   above).

Testing
-------

IMPORTANT NOTICE 1: the new UI/RB services are not compatible anymore with the
old UI/RB ones. When updating, make sure to update both nodes before testing
them or all tests will fail.

IMPORTANT NOTICE 2: if /home is NOT shared between CE and WNs (this is the
default configuration) due to the way the new jobmanager works, a
globus-job-run command will take at least 2 minutes. Even in the configuration
with shared /home the execution time of globus-job-run will be slightly longer
than before. Keep this in mind when testing your system.

To perform the standard tests (edg-job-submit & co.) you need to have your
certificate registered in one VO and to sign the LCG usage guidelines.

Detailed information on how to do these two steps can be found in :

http://lcg-registrar.cern.ch/

If you are working in one of the four LHC experiments, then ask for
registration in the corresponding VO, otherwise you can choose the "LCG
Deployment Team" (aka DTeam) VO.

A test suite which will help you in making sure your site is correctly
configured is now available. This software provides basic functionality tests
and various utilities to run automated sequences of tests and to present
results in a common HTML format.

Extensive on-line documentation about this test suite can be found in

http://grid-deployment.web.cern.ch/grid-deployment/tstg/docs/LCG-Certification-help

Experiment software
-------------------

Waiting for the final agreement on a flexible and experiment controlled system
to install and certify VO specific application software, we included a few rpms
for the Alice and CMS collaborations to allow an initial use of the system.

We will probably add soon a few rpms for Atlas and, possibly, for LHCb, and
this will require a new minor release of LCG1.

After that, with LCG version 2 we will switch to the new system and no
VO specific software will be included in the LCG distribution anymore.

Appendix A
==========

Syntax for the MDS_HOST_LIST variable
-------------------------------------

The LCG information system is currently partitioned into several independent
regions, each served by one or more top level MDS servers.

To collect a full set of information, the BDII should contact one MDS server
per region and download from it all the information relative to that region.

The correct setting for the MDS_HOST_LIST will then consist of a list of MDS
servers organized into several groups, each group corresponding to a different
region.

The BDII will scan through this list, looping over the different groups. For
each group it will try and get the information from the first MDS server in
the group. If the server does not answer, the BDII will try the second, etc.
As soon as one of the servers in the group answers and provides information
about the region, the BDII will switch to the next group and repeat the same
operation.

If none of the servers in one group answers, no information for the
corresponding region will be retrieved and the whole region will disappear
from the information available to the Resource Broker.

The syntax for MDS_HOST_LIST is composed of a list of servers organized into
several groups. Groups of servers are separated by one or more "white space"
characters (i.e. space, tab, LF), while servers belonging to the same group are
separated by a "/" (slash) character. A server name can be followed by a ":"
(colon) character and the slapd port number. If the slapd port number is the
default one (2135) it can be omitted from the server specification.

Example:

MDS_HOST_LIST="
    server1/server2:2136/server3
    server4/server5:2135	server6
"

In this example the grid is organized in three regions. The first region
is served by three servers: server1, server2, and server3. On server2 slapd
is listening to the (non-standard) port 2136. The second region is served by
two servers: server4 and server5. In this case the specification of the
port for server5 is not really needed. The third region is served by a
single server, server6, where slapd is listening on the standard port 2135.

N.B.: the name of the servers must by in a format usable for the BDII to
contact it. Both numeric (e.g. 141.108.5.5) and text (e.g. adc0026.cern.ch)
formats are correct. If the MDS server is local to the BDII, then the network
name can be omitted, but in general this is not recommended as it might induce
confusion if someone wants to use the same configuration from a different site.

N.B.: as explained before, the order in which the servers appear in a group
is relevant as this is the order in which the BDII will query them, stopping
after the first successful contact. This means that by listing first the
MDS servers which are closest to your site from the network efficiency point
of view may improve the update time for your BDII. Also, having different
sites using different query orderings results in a statistical balancing of
the load on the MDS servers of the same region, thus improving the general
scalability of the information system. If you are in doubt about the best
way to sort the MDS servers for any of the regions, please contact the LCG
Roll-Out group at <LCG-ROLLOUT@cclrclsv.RL.AC.UK>.

At the time of writing (end October 2003) two regions are active: the LCG1 East
region and the LCG1 West region. The East region is served by two MDS servers,
one located at CERN and one in Taipei. The West region is served by one MDS
server located at RAL. The correct setting for the MDS_HOST_LIST variable is
then:

MDS_HOST_LIST="
	adc0026.cern.ch:2135/lcg00108.grid.sinica.edu.tw:2135
	lcgcs01.gridpp.rl.ac.uk:2135
"

or, for sites network-wise closer to Taipei,

MDS_HOST_LIST="
	lcg00108.grid.sinica.edu.tw:2135/adc0026.cern.ch:2135
	lcgcs01.gridpp.rl.ac.uk:2135
"

In all cases the ":2135" port specification can be omitted.

Appendix B
==========

How to configure the PBS server on a ComputingElement
-----------------------------------------------------

1) load the server configuration with this command (replace <CEhostname> with
the hostname of the CE you are installing):

@-----------------------------------------------------------------------------
/usr/bin/qmgr <<EOF

set server scheduling = True
set server acl_host_enable = False
set server managers = root@<CEhostname>
set server operators = root@<CEhostname>
set server default_queue = short
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server scheduler_iteration = 600
set server default_node = lcgpro
set server node_pack = False

create queue short
set queue short queue_type = Execution
set queue short resources_max.cput = 00:15:00
set queue short resources_max.walltime = 02:00:00
set queue short enabled = True
set queue short started = True

create queue long
set queue long queue_type = Execution
set queue long resources_max.cput = 12:00:00
set queue long resources_max.walltime = 24:00:00
set queue long enabled = True
set queue long started = True

create queue infinite
set queue infinite queue_type = Execution
set queue infinite resources_max.cput = 48:00:00
set queue infinite resources_max.walltime = 72:00:00
set queue infinite enabled = True
set queue infinite started = True
EOF
@-----------------------------------------------------------------------------

Note that queues short, long, and infinite are those defined in the site-cfg.h
file and the time limits are those in use at CERN. Feel free to
add/remove/modify them to your liking but do not forget to modify site-cfg.h
accordingly.

2) edit file /var/spool/pbs/server_priv/nodes to add the list of WorkerNodes
you plan to use. CERN settings are:

@-----------------------------------------------------------------------------
lxshare0223.cern.ch np=2 lcgpro
lxshare0224.cern.ch np=2 lcgpro
lxshare0225.cern.ch np=2 lcgpro
lxshare0226.cern.ch np=2 lcgpro
lxshare0227.cern.ch np=2 lcgpro
lxshare0228.cern.ch np=2 lcgpro
lxshare0249.cern.ch np=2 lcgpro
lxshare0250.cern.ch np=2 lcgpro
lxshare0372.cern.ch np=2 lcgpro
lxshare0373.cern.ch np=2 lcgpro
@-----------------------------------------------------------------------------

where np=2 gives the number of job slots (usually equal to #CPUs) available
on the node, and lcgpro is the group name as defined in the default_node
parameter in the server configuration.

3) Restart the PBS server

> /etc/rc.d/init.d/pbs_server restart

Appendix C
==========

How to configure the MySQL database on a ResourceBroker
-------------------------------------------------------

First make sure that the mysql server is up and running:

> /etc/rc.d/init.d/mysql start

If it was already running you will just get notified of the fact.

Now you can choose a DB management <password> you like (write it down
somewhere!) and then configure the server with the following commands:

> mysqladmin password <password>
> mysqladmin --password=<password> create lbserver20
> mysql --password=<password> lbserver20 < /opt/edg/etc/server.sql
> mysql --password=<password> \
        --exec "grant all on lbserver20.* to lbserver@localhost" lbserver20

Note that the database name "lbserver20" is hardwired in the LB server code
and cannot be changed so use it exactly as shown in the commands.

Appendix D
==========

Publishing WN information from the CE
-------------------------------------

When submitting a job, users of LCG are supposed to state in their jdl the
minimal hardware resources (memory, scratch disk space, CPU time) required
to run the job. These requirements are matched by the RB with the information
on the BDII to select a set of available CEs where the job can run.

For this schema to work, each CE must publish some information about the
hardware configuration of the WNs connected to it. This means that site
managers must collect information about WNs available at the site and insert
it in the information published by the local CE.

The procedure to do this is the following:

- choose a WN which is "representative" of your batch system (see below for a
  definition of "representative") and make sure that the chosen node is fully
  installed and configured. In particular, check if all expected NFS partitions
  are correctly mounted.

- on the chosen WN run the following script as root, saving the output to a
  file.

@-----------------------------------------------------------------------------
#!/bin/bash
echo -n 'hostname: '
host `hostname -f` | sed -e 's/ has address.*//'
echo "Dummy: `uname -a`"
echo "OS_release: `uname -r`"
echo "OS_version: `uname -v`"
cat /proc/cpuinfo /proc/meminfo /proc/mounts
df
@-----------------------------------------------------------------------------

- copy the obtained file to /opt/edg/var/info/edg-scl-desc.txt on your CE,
  replacing any pre-existing version.

- restart the GRIS on the CE with

  > /etc/rc.d/init.d/globus-mds restart

Definition of "representative WN": in general, WNs are added to a batch system
at different times and with heterogeneous hardware configurations. All these
WNs often end up being part of a single queue, so that when an LCG job is sent
to the batch system, there is no way to ask for a specific hardware
configuration (note: LSF and other batch systems offer ways to do this but the
current version of the Globus gatekeeper is not able to take advantage of this
possibility). This means that the site manager has to choose a single WN as
"representative" of the whole batch cluster. In general it is recommended that
this node is chosen among the "least powerful" ones, to avoid sending jobs with
heavy hardware requirements to under-spec nodes.

Appendix E
==========

Update procedure from release LCG1-1_0_0/1 to LCG1-1_1_0
--------------------------------------------------------

Before updating the nodes
-------------------------
- Modifications to your site-cfg.h file:

#define SITE_EDG_VERSION LCG1-1_1_0
#define CE_IP_RUNTIMEENV LCG-1 ALICE-3.09.06 CMKIN-1.1.0 CMKIN-VALID CMSIM-VALID

- On CE, SE, and RB make a backup copy of the /opt/edg/etc/edg-mkgridmap.conf
  file as it will be overwritten during the update. You can then copy it back
  after the update. Note: this step is needed only if your site is accepting
  only a subset of the current 5 LCG VOs (Alice, Atlas, CMS, LHCb, and DTeam).

- Node configuration files should not need any change.

Update Procedure
----------------
Run mkxprof (or do_mkxprof) for all your nodes. You may want to verify that all
nodes are actually updated by looking in the /var/obj/log/client file on the
nodes.

After updating the nodes
------------------------

On WorkerNode and UserInterface: nothing to do.

On Regional MDS, and ResourceBroker: reboot the node.

On StorageElement: make sure that all subdirectories in /flatfiles/SE00 were
correctly created (see WARNING notice at the end of the "Preparing the
installation of the current tag" section) and then reboot the node.

On ComputingElement: follow the procedure to publish WN information described
in Appendix D and then reboot the node.

On BDII: apply the procedure to create the /opt/edg/etc/bdii-cron.conf file as
described in the BDII configuration paragraph above.

On MyProxy: restart the myproxy-server process with:
> /etc/rc.d/init.d/myproxy stop
> /etc/rc.d/init.d/myproxy start
(or just reboot the node!)