=============================================================================== ========================== LCG-2 Installation notes =========================== =============================================================================== =========== C 2004 by Emanuele Leonardi - Emanuele.Leonardi@cern.ch =========== =============================================================================== Reference tag: LCG-2_0_0 These notes will assist you in installing the latest LCG-2 tag and upgrading from the previous tag. The document is not a typical release note. It covers in addition some general aspects related to LCG2. This document is intended for: 1) Sites that run LCG2 and need to upgrade to the current version 2) Sites that move from LCG1 to LCG2 3) Sites that join the LCG 4) Sites that operate LCG2 What is LCG? ============ This is best answered by material found on the projects web site http://lcg.web.cern.ch/LCG/ . From there you can find information about the nature of the project and its goals. At the end of the introduction you can find a section that collects most of the references. How to join LCG2? ================ If you want to join LCG and add resources to it you should contact the LCG deployment manager Ian Bird (Ian.Bird@cern.ch) to establish the contact with the project. If you only want to use LCG you can follow the steps described in the LCG User Overview (http://lcg.web.cern.ch/LCG/peb/grid_deployment/user_intro.htm). The registration and initial training using the LCG-2 Users Guide (https://edms.cern.ch/file/454439//LCG-2-Userguide.pdf) should take about a week. However only 8 hours is related to working with the system, while the majority is waiting for the registration process with the VOs and the CA. If you are interested in adding resources to the system you should first register as a user and subscribe to the LCG Rollout mailing list (http://www.listserv.rl.ac.uk/archives/lcg-rollout.html). In addition you need to contact the Grid Operation Center (GOC) (http://goc.grid-support.ac.uk/gridsite/gocmain/) and get access to the GOC-DB for registering your resources with them. This registration is the basis for your system being present in their monitoring. It is mandatory to register at least your service nodes in the GOC DB. It is not necessary to register all farm nodes. Please see Appendix H for a detailed description. LCG has introduced a hierarchical support model for sites. Regions have primary sites (P-sites) that supports the smaller centers in this region. If you do not know who is your primary site, please contact the LCG deployment manager Ian Bird. If you have identified your primary site you should fill the form that you find at the end of the guide in appendix G and send it to your primary site AND to the deployment team at CERN (support-lcg-deployment@cern.ch). The site security contacts and sysadmins will receive material from the LCG security team that describes the security policies of LCG. Discuss with the grid deployment team or with your primary site a suitable layout for your site. Various configurations are possible. Experience has shown that using at the beginning a standardized small setup and evolve from this to a larger more complex system is highly advisable. Typical layout for a minimal site is a user interface node (UI) which allows to submit jobs to the grid. This node will use the information system and resource broker either from the primary site, or the CERN site. A site that can provide resources will add a computing element (CE), that acts as a gateway to the computing resources and a storage element (SE), that acts as a gateway to the local storage. In addition a few worker nodes (WN) to provide the computing power can be added. Large sites with many users that submit a large number of jobs will add a resource broker (RB). The resource broker distributes the jobs to the sites that are available to run jobs and keeps track of the status of the jobs. The RB uses for the resource discovery an information index (BDII). It is good practice to setup a BDII on each site that operates a RB. A complete site will add a Proxy server node that allows the renewal of proxy certificates. In case you don't find a setup described in this installation guide that meets your needs you should contact your primary site for further help. After a site has been setup the site manager, or the support persons of the primary sites should run the initial tests that are described in the first part of the chapter on testing. If these tests have been run successful the site should contact the deployment team via e-mail. The mail should contain the sites GIIS name and the hostname of the GIIS. To allow further testing the site will be added to a LCG-BDII which is used for testing new sites. Then the primary site, or the site managers can run the additional tests described. When a site has passed these tests the site or the primary site will announce this to the deployment team which then after a final round of testing will add the site to the list of core sites. How to report problems ======================= The way problems are reported is currently changing. On the LCG user introduction page (http://lcg.web.cern.ch/LCG/peb/grid_deployment/user_intro.htm) you can find information on the current appropriate way to report problems. Before reporting a problem you should first try to consult your primary site. Many problems are currently reported to the rollout list. Internally we still use a Savannah based bug tracking tool that can be accessed via this link https://savannah.cern.ch/bugs/?group=lcgoperation. How to setup your site ====================== With this release you have the option to either install and configure your site using LCFGng, a fabric management tool that is supported by LCG, or to install the nodes following a manual step by step description which can be used as a basis to configure your local fabric management system. For very small sites the manual approach has the advantage that no learning of the tool is required and no extra node needs to be maintained. In addition no reinstallation of your nodes is required. However, the maintenance of the nodes will require more work and it is more likely to introduce hidden misconfigurations. For medium to larger sites without their own fabric management tools using LCFGng can be an advantage. It is up to a site to decide which method is preferred. The documentation for the manual installation can be found here: http://grid-deployment.web.cern.ch/grid-deployment/documentation/manual-installation/ We currently support all node types with the exception of the PROXY server node. This will follow soon. In case you decide to use the manual setup you should nevertheless have a look at parts of this document. For example the section about firewalls and testing are valid for both installation methods. Network access =============== The current software requires outgoing network access from all the nodes. And incoming on the RB, CE, and SE. Some sites have gained experience with running their sites through a NAT. We can provide contact information of sites with experience of this setup. To configure your firewall you should use the port table that we provide as a reference. Please have a look at the chapter on firewall configuration. General Note on Security ========================= While we provide in our repositories Kernel RPMs and use for the configuration certain versions it has to be pointed out that you have to make sure that you consider the kernel that you install as safe. If the provided default is not what you want please replace it. Sites Moving From LCG1 to LCG2 =============================== Since LCG2 is significantly different from both LCG1 and EDG, it is mandatory to study this guide even for administrators with considerable experience. In case you see the need to deviate from the described procedures please contact us. Due to the many substantial changes w.r.t LCG1, updating a site from any of the LCG1 releases to LCG-2 is not possible in a reliable way. A complete re-installation of the site is the only supported procedure. Another change is related to the CVS repository used. For CERN internal reasons we had to move to a different server and switch to a different authorization scheme. See http://grid-deployment.web.cern.ch/grid-deployment/documentation/cvs-guide/ for details about getting access to the CVS repository. For web based browsing the access to CVS is via http://lcgdeploy.cvs.cern.ch/cgi-bin/lcgdeploy.cgi/ As described later we changed for LCG2 the directory structure in CVS. There are now two relevant directories lcg2 and lcg2-sites. The first contains common elements while the later contains the site specific information. In addition to the installation via LCFGng an increasing number of node types is now supported to be installed manually: These are: Worker Nodes (WN), User Interfaces (UI), Computing Elements (CE), classical Storage Elements (SEs), and the BDII The Proxy Server is in preparation and almost finished. References: =========== Documentation: ------------------------- LCG Project Homepage: http://lcg.web.cern.ch/LCG/ Starting point for users of the LCG infrastructure: http://lcg.web.cern.ch/LCG/peb/grid_deployment/user_intro.htm LCG-2 User's Guide: https://edms.cern.ch/file/454439//LCG-2-Userguide.pdf LCFGng server installation guide: http://lcgdeploy.cvs.cern.ch/cgi-bin/lcgdeploy.cgi/lcg2/docs/LCFGng_server_install.txt LCG-2 Manual Installation Guide: http://grid-deployment.web.cern.ch/grid-deployment/documentation/manual-installation/ LCG GOC Mainpage: http://goc.grid-support.ac.uk/gridsite/gocmain/ CVS User's Guide: http://grid-deployment.web.cern.ch/grid-deployment/documentation/cvs-guide/ Registration: -------------------------- LCG rollout list: http://www.listserv.rl.ac.uk/archives/lcg-rollout.html - join the list Get the Certificate and register in VO: http://lcg-registrar.cern.ch/ - read LCG Usage Rules - choose your CA and contact them to get USER certificate (for some CAs online certificate request is possible) - load your certificate into web browser (read instructions) - choose your VO and register (LCG Registration Form) GOC Database: http://goc.grid-support.ac.uk/gridsite/db-auth-request/ - apply for access to the GOCDB CVS read-write access and site directory setup: mailto:louis.poncet@cern.ch - prepare and send a NAME for your site following the schema -[-
] (e.g. es-Barcelona-PIC, ch-CERN, it-INFN-CNAF) Site contact database: mailto:support-lcg-deployment@cern.ch - fill in the form in Appendix G and send it Report bugs and problems with installation: https://savannah.cern.ch/bugs/?group=lcgoperation Notes about lcg2-20040407 ------------------------------------- In the previous beta-release the new LCG-BDII node type has been introduced. And for some time the two information system structures have been operated in parallel. Since we expect many sites to move from LCG1 to LCG2 we will switch now permanently to the new layout which we describe later in some detail. The new LCG-BDII does not use any more on the Regional MDSes but collects information directly from the Site GIISes. The list of existing sites and their addresses are downloaded from a pre-defined web location. See notes in the BDII specific section in this document for installation and configuration. This layout will allow sites and VOs to configure their own super- or subset of the LCG2 resources. A new Replica Manager client has also been introduced in the previous version. This is the only client which is compatible with the current version of the RLS server, so file replication at your site will not work till you have updated to this release. Introduction and overall setup ============================== In this text we will assume that you are already familiar with the LCFGng server installation and management. Please refer to the LCFGng_server_install.txt file in the docs directory of the lcg2 release for an up-to-date guide on how to set-up an LCFGng server for use with LCG-2. Note for sites which are already running LCG1: due to the incompatible update of several configuration objects, a LCFG server cannot support both LCG1 and LCG-2 nodes. If you are planning to re-install your LCG1 nodes with LCG-2, then the correct way to proceed is: 1) kill the rdxprof process on all your nodes (or just switch your nodes off if you do not care about the extra down-time at your site); 2) update your LCFG server using the objects listed in the LCG-2 release; 3) prepare the new configuration files for your site as described in this document; 4) re-install all your nodes. If you plan to keep your LCG1 site up while installing a new LCG-2 site, then you will need a second LCFG server. This is a matter of choice. The LCG1 installation is of very limited use if you setup the LCG-2 site since several core components are not compatible anymore. Files needed for the current LCG-2 release are available from a CVS server at CERN. This CVS server contains the list of rpms to install and the LCFGng configuration files for each node type. The CVS area, called "lcg2", can be reached from http://lcgdeploy.cvs.cern.ch/cgi-bin/lcgdeploy.cgi/ Note1: at the same location there is another directory called "lcg-release": this area is used for the integration and certification software, NOT for production. Please ignore it! Note2: documentation about access to this CVS repository can be found in http://grid-deployment.web.cern.ch/grid-deployment/documentation/cvs-guide/ In the same CVS location we created an area, called lcg2-sites, where all sites participating to LCG-2 should store the configuration files used to install and configure their nodes. Each site manager will find there a directory for their site with a name in the format -- or -[-
] (e.g. es-Barcelona-PIC, ch-CERN, it-INFN-CNAF): this is where all site configuration files should be uploaded. Site managers that install a site Site managers are kindly asked to keep these directories up-to-date by committing all changes they do to their configuration files back to CVS so that we will be able to keep track of the status of each site at any given moment. Once a site reaches a consistent working configuration, site managers should create a CVS tag which will allow them to easily recover configuration information if needed. Tag names should follow the following convention: The tags of the LCG-2 modules are: LCG2-, e.g. LCG2-1_1_1 for software release 1.1.1 If you tag your local configuration files, the tag name must contain a reference to the lcg2 release in use at the time. The format to use is: LCG2-___