NC BioGrid Objectives and Requirements
Our planning and implementation has divided the grid effort into the following areas of concern as organizing principles:
In the language used below, the word "should" denotes an objective whereas "must" denotes a requirement that is essential to the functioning of the grid.
- The grid must span multiple independently managed domains. The participating organizations generally do not coordinate their choices of equipment, systems and protocols. They individually chose their own security regimes, and use a variety of authentication and authorization methods.
- The grid must include a diverse set of hardware and operating system platforms. This diversity arises from selections made to optimize different criteria, by independently managed domains, and using selection criteria and purchasing decisions which change over time. At the current time, operating systems within the scope of the project include Linux, Solaris, AIX, Windows and MacOS X. Hardware platforms include Intel IA32 and IA64, Sun SPARC, IBM POWER4, and PowerPC G4.
- All aspects of grid function should be capable of dynamic update. In other words, it must be possible to add, modify and remove grid resources without downtime.
- Nodes that participate in the grid must be configured to meet a set of security standards that are dictated by a superset of the security policies of each organization in order to insure the integrity of data and applications running on the grid.
- The grid must integrate multiple distributed, heterogeneous, independently managed data sources. While large, centralized data stores make sense in many situations, we see a continuing need to deal with data sources at many or all of our sites for the indefinite future.
- The grid must implement mechanisms to automatically "stage" input files and the applications to the hosts where the computation will take place, and to copy the resulting output to a specified central location. Note that by having the capability to stage applications, we remove the need for an administrator to maintain separate copies of the applications on all nodes in the grid.
- The grid should provide data caching and/or replication to minimize network traffic and insure that data close to where it is needed for computation.
- The grid should allow the user to find data based on characteristics - the physical location of the data should be transparent to the user.
- The grid should implement data encryption and integrity checks (e.g. message digesting) to insure that data is transported across the network in a secure fashion.
- The grid must provide the backup/restore mechanisms and policies necessary to prevent data loss and minimize downtime across the grid. This will be critical when we move into the production phase at a later date. For now, backup/restore is provided on a site-by-site basis.
- The grid must allow for independent management of compute resources - we require this for our main sites, but in an academic environment it is common to have further division into sub-sites which are independently managed.
- The grid must provide a "meta-scheduler" that can intelligently and transparently select compute resources capable of running a user's job. This entails understanding the current and predicted loads on grid resources, including the ability to interface with third party queuing systems (e.g. Platform LSF and Sun Grid Engine) that are front ending grid-enabled clusters.
- The grid should provide job checkpointing so that a failed job from the point at which it failed. This is crucial for large jobs that may take many hours or days to run to completion.
- Parallel processing should be available when parallelized code is available for the systems on the grid. The goal is to reduce the job's wall clock time when possible.
- The grid must insure that appropriate security mechanisms (e.g. encryption and integrity checks) are in place to protect the components of a job (the application, parameters and associated data) that are moved across the network and stored on a grid node as part of a computation.
- The grid must create a uniform name space so that resources can be addressed consistently across the grid. That is, the name by which a resource is referenced should be the same regardless of where it is accessed from on the grid. This is important in a multi-site, multi-domain environment.
- The grid must allow local user IDs, security credentials & grid identity mappings to be independently managed at the "site" level, while centrally managing global grid user IDs.
- The grid must provide fine grained access control mechanisms to restrict usage of grid resources based on individual and/or organizational identities.
- The grid must provide the programming interfaces necessary to build GUIs (preferably browser based) for use by biologists for all of their grid based work -- often referred to as a "portal" -- as well as to customize existing applications to enable them to access the grid.