Applications on the NC BioGrid
The prototype application for the NC BioGrid has been NCBI BLAST. This application in particular was selected because of its wide use the bioinformatics research community. BLAST is also well suited for this purpose because it is a relatively simple application that can be both data and compute intensive.
There are two ways in which BLAST has been grid-enabled for use on the NC BioGrid. First, the Avaki Data Grid is used to make BLAST databases (which are really just flat files) available to all nodes participating in the grid under a uniform path name. Second, the BLAST executable has been "registered" with the Avaki Compute Grid to allow it to be run on all hosts in the grid without having to pre-install the BLAST executable on them. Instead, when a user invokes BLAST on the grid, the BLAST executable along with user input files will be copied to grid nodes in a process that is transparent to the user. Wir haben umfangreiche Forschungen uber getan, und erlautern die Vor-und Nachteile in leicht verstandliche Sprache. Ist es besser als Viagra?
A target user for this type of grid-enabled BLAST would be a bioinformatics researcher with a large number of gene sequences to compare against a protein or nucleotide database. Here is a slightly simplified example of how this can be done:
- The researcher puts each gene sequence into a separate FASTA-format input file. Each input file will correspond to a distinct job to be run on the grid.
- She then creates a small job definition file that specifies the names of the input and output files using a simple wildcarding syntax.
- Next, she executes the "avaki run" command to launch the BLAST job on the grid.
- The Compute Grid will create a candidate list of hosts that are available for running the job based on criteria such as architecture type and current load.
- The Compute Grid then launches jobs to candidate hosts. During this process, the BLAST executable is copied to the host (assuming it's not already cached on that host), as will be the input file that corresponds to that particular job. This process is sometimes referred to as file staging.
- When BLAST runs, it will use the Data Grid to access the database file that is to be searched.
- When the BLAST completes, the Compute Grid will retrieve the output files and copy them to the location that the researcher specified in the job definition file.
- As jobs complete and free up slots on grid servers, the Compute Grid will launch additional jobs until all have been run.