Technical Documentation:Application Server
From VectorBase Development Wiki
This page is part of Technical Documentation
Contents |
Application Server Documentation
The application server accepts job and search submissions and returns the results, all by using SOAP messages. The application submission SOAP page is run on top of Apache & PHP. Search is run on top of Apache Axis and Tomcat, built on Lucene, a Java-based search engine.
Submitting Jobs
Jobs, such as BLAST requests, are submitted to the application server through a SOAP call as follows:
- The user submits a BLAST job to the webserver via the web interface.
- The job submission is encoded with all relevant data and sent to the application server.
- The application server reads the SOAP messsage and attempts to submit the job to a compute grid for processing.
- If there is an open node in (any of?) the outside grids (e.g. Cluster A, Biocomplexity Cluster, etc.), then the job is submitted to the first open node.
- If there are no open nodes outside, the job is submitted to the internal Xgrid cluster.
- The web server polls (either by automatic page refresh or by user choice) the application server for the status of the job.
- The application server gets the status from the appropriate grid (this may be done either synchronously or otherwise?) and returns it to the webserver as a SOAP message.
- Once the status indicates that the job is finished, the webserver requests the job output/results from the application server.
- The application server fetches the job results from the appropriate compute grid and returns it to the web server as a SOAP message.
- The webserver displays the result to the user and optionally stores it for later usage or processing.
Job Submission to HPCC Cluster
Our local application server is configured to serve as a Sun Grid Engine Submission Host. This is to facilitate running computation jobs on the HPCC cluster (which uses Grid Engine) in downtown South Bend. A number of the configuration steps are required for the Application Server to submit jobs to the HPCC cluster.
Searching
Incomplete
Add New Computation Services
There are a number of steps that must be performed in order to add a new computational service to Vectorbase. These include modifications to the front-end GUI allowing the user to set parameters, upload data, submit jobs, get results, etc. Likewise there are modifications to the application server so that it can accept those jobs, run the appropriate programs, return results, etc.
Front-end GUI
Much of the GUI work will depend upon the application being provided, what input it accepts, what output it provides, what parameters it accepts, etc. so what is described is primarily core functionality to integrate the tool into the Ensembl GUI. However, you should expect that designing and implementing the front-end GUI will be the most time-consuming portion of adding a computation service. The front-end GUI needs to provide this functionality:
- Job submission screen that allows the user to specify input like FASTA sequences and for setting parameters for the application. Also if a job is submitted, the user should be able to come back later (say in a different web browser session) and retrieve the status and results of the job with a specified job id number.
- Perform the actual job submission by sending a SOAP request to the application server.
- Job status screen that shows information about the job that was submitted, whether it is still running or has finished, and links to results or details about errors that have occurred.
- Job results screen that presents the results in a way that is useful to the user. It might be in an appropriate format so the results can be inputted into another program, or presented in an html format with links to Vectorbase organisms, gene browser, etc.
The front-end GUI is written with a combination of php, html, and javascript. PHP provide the ability to connect to the Vectorbase databases to provide specialized GUI elements, for example a list of organisms. Javascript is useful for doing error checking and controlling what GUI elements are accessible, setting or resetting default parameters, etc.
The first task is to create an initial php file for the application. This php file goes into /Volumes/Web2/vectorbase/sections/Tools. You may want to copy an existing file from one of the other tools as a start. This file acts a coordinator by showing the appropriate screen (submission, status, results) based upon the current state of the user's interaction. All of the specific files are put in the tool_includes subdirectory; you should create a subdirectory under tool_includes for your application.
Next you need to incorporate your tool program into the Vectorbase GUI, this involves connecting to the vectorbase_ui database and manually inserting records into some tables. Tools has (currently) two menu entries, one which is id=25, is the tool list on the main vectorbase page as well as the Tools page; the second which is id=7, is the submenu of the Tools menu. You can make these different if you want, or leave your tool off of the submenu (especially if this submenu starts getting crowded). Here are the SQL commands I used to add ClustalW:
- insert into menu_items (display_name, parent, order_by, link, help_name, page_section) values ('ClustalW', 25, 3, '/Tools/ClustalW', 'clustalw', 'ClustalW');
- insert into menu_items (display_name, parent, order_by, link, help_name, page_section) values ('ClustalW', 7, 3, '/Tools/ClustalW', 'clustalw', 'ClustalW');
The display_name is what is displayed on the web page, order_by indicates in which order the tools are displayed on the web page, help_name is a unique idea for help info (see below), link is the URL of your new php file without the extension, and I don't know what page_section but I gave it a value from looking at the BLAST entry.
Once you do the inserts, you should be able to reload the Vectorbase web page and see your link. So while that has added the link, it has not added help text for mouse-overs, nor description on the Tools page. To add this information, you need to add an entry into the web_help table. Here is the command I used to add ClustalW:
- insert into web_help (help_name, short_text, long_text) values ('clustalw', 'Multiple sequence alignment.', 'Multiple sequence alignment with the ClustalW program.');
The help_name field should match the help_name field in menu_items above. Now you can work on your php file to provide an interface specific for your application.
As you write your application specific screens, take advantage of "mouseover" capability which will provide detailed help description for that particular GUI element; this allows for a concise input screen with more detailed description regulated to the help text. GUI elements should be provided with unique tags then entries in the web_help table can be inserted for each of those tags.
Job Submission Screen
The current convention is to have a file called input.php which handles job submission; you should try to keep the interface uniform with the job submission screens for the other tools. The screen has two overal sections: one for submitting a new job and another for getting the status of an existing job. Getting the status of an existing job is standard; it accepts a job id number and calls the Job Status screen. For a new job, you should provide a minimal input scenario where most all parameters are given default parameters and a detailed input scenario where the user can tweak all the possible parameters. The input.php program has three primary sections. One section is Javascript which handles switching between the minimal input scenario and the detailed input scenario; Javascript is also a good way to enable/disable GUI elements based upon other settings. For example, specifying a specific value for one parameter may allow some additional parameters to be set. Another section is the HTML input elements for submitting a new job, and the last section provides the GUI for getting the status of an existing job.
Perform Job Submission
The Jobs.php program at /Volumes/Web2/vectorbase/includes provides the general interface for communicating with the web service on the application server. The basic interface handles submitting a job, getting job status, and getting job results. This file needs to be modified and a subclass of the JobSubmission class created which handles submission of your job type to the application server; this primarily involves wrapping up the parameters for the job into a SOAP request. The parameters as well as the job type need to correspond with web service changes (described in application server section below).
For submitting a new job, the current convention is to have a file called submit.php which creates an instance of your handler class, sets its parameter from the GUI input elements, then perform the SOAP request. This program also puts job information into the Vectorbase database so that the user can retrieve the status and results of their job. After the job is submitted, the user is redirected to the Job Status Screen.
Job Status Screen
The current convention is to have a file called status.php which provides job status information. There are two basic states for this screen; either the job is still running or the job has finished and results are available. Besides showing the status of the current job, this program should also provide information about the input provided; this information was saved into the Vectorbase database as part of the job submission. If the job is still running then the program essentially waits some period of time then redirects to the status screen. If the job has finished, then summary result information should be shown with links to view the actual results. If the application returns multiple results, some additional parsing may be performed to provide separate links. For example, BLAST may have multiple hits so those results can be provided in combination and/or as individual results. If results can be provided in different formats, this screen should provide those options to the user.
Job Results
Finally, the job results need to be provided to the user in a useful way. If the results can be provided in multiple formats, separate php programs can be written to perform any processing required. There should always be a way for the user to obtain just the raw results in text format (if applicable for the application) so that they can perform their own processing unencumbered by additional formatting. Likewise if the results are large, provide a mechanism for the results to be sent directly to a local file on the user's computer.
Application Server
The basic steps are:
- Install the application.
- Modify JobService.wsdl to include the new job type.
- Write new php file to handle the job request.
- Modify soapjobs.php to include the new job type.
- Setup temporary directories.
- Test it!
Install Application
The application should be installed under /Volumes/App1/bin. Create a separate directory tree for the application and try to maintain separate version directories; if the application needs to be installed in order to run, make it install within its directory tree. Likewise if the application needs additional libraries or tools that are not part of standard Mac OSX then install them within that application's directory tree. Avoid installing into Mac OSX common directories (like /usr/local) if possible, as this can create unexpected dependencies among tools and cause maintenance problems in the future if newer versions are installed.
For example, the hottest new bioinformatic program, biowedgie, which uses GNU autoconf, may be installed like this.
- cd /Volumes/App1/bin
- mkdir biowedgie
- cd biowedgie
- mkdir src
- cd src
- tar zxvf biowedgie-1.0.tar.gz
- cd biowedgie-1.0
- ./configure --prefix=/Volumes/App1/bin/biowedgie
- make install
Once the application is installed, verify that you can run the application properly. Try giving it valid input, determine its command-line parameters, where are input files read from, where are output files written to, etc. The SOAP job will be running the program non-interactively, so you will need to determine which parameters you will be required to set in order to control input/output versus optional parameters that a user may or may not set.
If the application has database files, for example BLAST creates specialized database files for its reference sequences, then those files should be put under /Volumes/App1/db. Make a separate directory for your application.
ND Cluster Integration
If the application is to be run on the ND cluster then it needs to be installed on the cluster machines. Log into a cluster head node, like opteron.hpcc.nd.edu, with the vbase account. The application should be installed under /dscratch/vbase. The same general guidelines for creating a separate directory tree, installing additional libraries, etc. for the application server also apply to the ND cluster. Beware that root access is not available on the ND cluster, so nothing can be installed in common directories like /usr/local. Application database files should be put under /dscratch/vbase/db.
Modify JobService.wsdl
The JobService.wsdl file defines a general interface for the submitting, checking status, and obtaining results for computational jobs. This file resides at /var/www/vectorbase/applications/definitions on the application server node. In order to add a new computation service, this file needs to be modified to include two things. First is a complexType for your computation service which describes all of the parameters that will be passed to the job. The second is an entry for the job type linking your type of job with the complexType defined in the first step. Both of these modifications are made in the <types> section. For the ClustalW computation service, I defined this complexType for the parameters:
<complexType name="clustalw"> <element name="alignment" minOccurs="1" maxOccurs="1" type="xs:string"/> <element name="sequence" minOccurs="1" maxOccurs="unbounded" type="xs:string"/> <element name="ktuple" minOccurs="1" maxOccurs="1" type="xs:string"/> <element name="wlength" minOccurs="1" maxOccurs="1" type="xs:string"/> <element name="score" minOccurs="1" maxOccurs="1" type="xs:string"/> <element name="tdiag" minOccurs="1" maxOccurs="1" type="xs:string"/> <element name="gpenalty" minOccurs="1" maxOccurs="1" type="xs:string"/> <element name="matrix" minOccurs="1" maxOccurs="1" type="xs:string"/> <element name="opengap" minOccurs="1" maxOccurs="1" type="xs:string"/> <element name="endgap" minOccurs="1" maxOccurs="1" type="xs:string"/> <element name="extgap" minOccurs="1" maxOccurs="1" type="xs:string"/> <element name="sepgap" minOccurs="1" maxOccurs="1" type="xs:string"/> <element name="format" minOccurs="1" maxOccurs="1" type="xs:string"/> <element name="order" minOccurs="1" maxOccurs="1" type="xs:string"/> </complexType>
and for linking this type of job, I added the ClustalW line in the wsdl below. Note that the type attribute references the complexType defined above. The name attribute will be used when the job is submitted (to specify what type of job it is) and by soapjobs.php (to determine what type of job it is).
<complexType name="job">
<element name="submitter" minOccurs="1" maxOccurs="1" type="xs:string"/>
<choice>
<element name="BLAST" minOccurs="1" maxOccurs="1" type="jsnstypes:blast"/>
<element name="ClustalW" minOccurs="1" maxOccurs="1" type="jsnstypes:clustalw"/>
</choice>
</complexType>
Note that this file is a *LIVE* file, as soon as you save changes to the file they become active, so be careful only to save once you have completed your changes and double-checked that your wsdl is correct.
Write New PHP File
As more and more computational services are provided, they should be split up into separate php files on the application server for easier maintenance. Currently however they are all together in the soapjobs.php.
Modify soapjobs.php
This file is responsible implementing the JobService web service; it is located at /var/www/vectorbase/applications on the application server. To add your own computational service, you need to write a subclass of the Jobs class which handles your computational job, and you need to modify the JobService class so that it triages your job to that handler class. The handler class generally implements just a single method, submit(), that extracts the job parameters, creates any temporary files, creates a script for the running the job, and submits the script to the computational cluster. The JobService class which triages the incoming jobs looks at the job type as defined in JobService.wsdl and creates an instance of the handler class to process the job request as shown in the following code:
public function SubmitJob($submission) {
if ($submission["BLAST"]) {
$blast = new BlastJob($submission);
return $blast->submit();
} elseif($submission["ClustalW"]) {
$new_job = new ClustalWJob($submission);
return $new_job->submit();
} elseif($submission["seq"]) {
$seq = new seqJob($submission);
return $seq->submit();
} else {
return array("message" => "Don't understand job type");
}
}
You may want to copy the handler class from another job type as a starting base, those you will need to modify it to call your application with appropriate parameters.
Currently JobService expects the results to be written to stdout which is going to be passed back to the front-end GUI once the job is complete. If you want to perform any parsing of the output, you might have the output go to a temporary file then run a parsing script on the output file. It is probably better to parse the job output as part of running the job versus doing all of the parsing as part of the front-end GUI.
Setup temporary directories
Your application might need to create temporary files, for example a script is generally generated which runs the application and the script is submitted to the computation cluster. Create a directory under /Volumes/App1/job_tmpspace/job_input for your application. Likewise create a directory under /Volumes/App1/job_tmpspace/job_output for where the running job will place its output. Make sure to set the permissions on these directories so the web service program can read and write.
Testing
Try out all the different front-end GUI screens, make sure you can set all of the various parameters for your application and that they flow through to command line parameters for the job, verify that the results displayed are complete. Because some files are shared by multiple tools, it is also a good idea to check that you haven't accidently broken the other tools.
Here are some hints for debugging problems:
- You are getting RPC/XML errors when the front-end GUI sends a SOAP message. Maybe you have a syntax error in soapjobs.php that is preventing the server from processing the SOAP request. Load the page directly [[1]]. You should get the message, "SOAP-ENV:ServerBad Request. Can't find HTTP_RAW_POST_DATA", otherwise it should say if there is some problem with the php file.
- Maybe there is a problem with the wsdl file. Try loading it directly with a web browser at [[2]]. This will not validate the syntax but at least insure the file is accessible.
Notes for Specific Computational Services
BLAST
Currently BLAST is not running on the ND cluster, only the local Xgrid.
ClustalW
Currently ClustalW is running only on the ND cluster.
