The U of A UITS (formerly CCIT) group maintains High Performance Computing systems for use in research applications. A shared memory supercomputer (marin) and a Linux cluster (ICE) are available for running jobs requiring a large amount of memory, parallel processing, and certain visualization and scientific applications. ICE (Integrated Computing Environment) is a large cluster of Silicon Graphics Altix machines (8 cpus per node); for more details see: http://www.sgi.com/products/servers/altix/ice/. The UA News article http://uanews.org/node/20578 discusses the HPC systems, including their world ranking in terms of power and "green-ness". Other HPC systems are described here: http://www.hpc.arizona.edu
** NOTE: All BIO5 researchers are encouraged to take advantage of the availability of high priority CPU hours resulting from dedicated ICE cluster compute nodes funded by a TRIF grant (see below for details).
Marin is the front end to a large shared memory system, whereas ICE (Integrated Computing Environment) is a cluster of networked computing nodes, each having their own separate memory space. Clusters are easier to scale up, and cost less than large shared memory computers. The new HPC systems are cooled with chilled water instead of air conditioning, making them "greener" in comparison to earlier hardware. Programming on clusters can be somewhat more complicated than on shared memory systems, because without shared memory, information must be communicated between processors by passing messages. Both marin and the ICE cluster use PBS (Portable Batch System) for scheduling jobs. On both systems, every PI receives a monthly allocation of 1000 CPU hours in the default job queue. The processors on ICE are faster than those on marin since the ICE system is newer. For example, an 8 processor NCBI blastn job that took 29 hours on marin finishes in 17 hours on ICE.
To get an account on the UA High Performance Computing systems (HPC includes both ICE and marin), you need a UANetID and a faculty sponsor. The accounts page is: http://anise.u.arizona.edu/HPC2/02_hpc/02_hpc_starting/01_hpc_accounts.shtml. If you are a faculty member or PI and wish to sponsor an account, the link is: https://account.arizona.edu/faculty.html
We recommend using the ssh command from an XTerminal window in a BioDesk session or from a MacOSX Terminal; the command looks like: ssh -X email@example.com.
(The -X argument allows you to open X Window GUI based applications such as the nedit editor, gsAssembler, and consed.) For help with BioDesk, see the BioDesk help page. You may also connect to ICE using a Window-based ssh client, such as SSH Secure Shell or PuTTy; however, in order to run X Window applications you will also need to install the XMing server on your Windows desktop. To download SSH or PuTTy, go to https://sitelicense.arizona.edu/nocost.php. If you are already logged into marin, you may simply type ''. Unlike the command prompts on marin, your command prompt on ICE will not display 'ICE', but instead will be [service0], or [service1], etc.
All of the user data partitions are shared (/home, /scr*, /genome), meaning that in these directories the same files are visible on marin and on the ICE system. You may notice that your home directory has been prefixed with /homeA or /homeB and as a result you'll need to alter scripts that contain full paths for files. Other directories, such as /usr/local/bin and /usr/local/blast/bin are separate because these contain executable files that are not portable between the two systems.
Applications presently available include NCBI BLAST, mpiBLAST, blat, EMBOSS, PAML, Interpro, HMMER, OligoArray2, as well as the 454/Newbler, celera, MIRA, and Velvet short-read assemblers. Long-running jobs such as IM, mrbayes, and genetree can be compiled and run with the mtcp checkpointing library to allow job restart after the maximum CPU time is used for an individual job submission. Many applications, such as blast, blat, HMMER, and EMBOSS, require that a module is loaded prior to their use (see the modules question below). Several Life Sciences applications are located in /usr/local/bin, and some are in /genome/ICEbin. To access the programs in /genome/ICEbin without having to specify a full path, add /genome/ICEbin to your PATH environment variable (see the next Question). To check whether an application is available on ICE, use the 'which' command to find the full path of the program, and the file command to determine whether it is suitable for running on ICE, i.e. ELF 64-bit LSB executable, AMD x86-64. (By contrast, marin executables are IA-64, not x86-64.)
Here is an example of an application that can run on ICE:
firstname.lastname@example.org find out if a particular application can be installed on the ICE cluster.
Note: when editing your .cshrc file you need to be very careful! Mistakes such as splitting long lines (or using an editor that automatically wraps lines) can cause your account to behave badly. It is a good idea to save a working copy of .chsrc just in case anything goes awry. If you are using the nedit editor and your PATH is long, be sure that nedit does not automatically wrap the line: Select Preferences ->Default Settings -> Wrap -> None, then Preferences -> Save Defaults. To modify your PATH, in your home directory, edit the file .cshrc so that it includes the following 6 lines (copy and paste):
Modules allow you to choose which software packages you use. Modules can specify particular compilers, or indicate that you wish to use bioPerl, bioPython, EMBOSS, mpiBLAST, HMMER, or other applications. The 'module avail' command will show a list of all available modules. To use a module, you must first load it by typing the 'module load' command. For example, to run mpiBLAST you would type 'module load mpiblast' (or include the module load command in your job submit script.) See /genome/ICEbiop.csh for an example. To see which modules you have already loaded, use the command 'module list'. There is also a 'module unload' command that you may need to use before loading a module for a different version of an application.
With your supercomputer account you get a 5Gb allocation, and your files are accessible from both marin and ICE. You can check how much of your quota has been used by typing the 'quota -v' command when logged into marin or ICE. If you need more space, the xdisk utility can be used to set up temporary allocations of up to 200Gb. Smaller xdisk allocations can be retained for longer periods of time. You must select one of the space/time combinations listed by the 'xdisk -c query' command (do not use numbers that are not listed for number of days or number of Mb). The number of days of your xdisk allocation may be extended ONCE only.
Each PI is allocated 1000 CPU hours per month in the default job queue, to be distributed among all group members. A lower priority queue (windfall) allows an unlimited number of job submissions and use of CPU hours not being used by higher priority jobs. In certain circumstances, hours in the High Priority Queue may be allocated. For more information about using the High Priority job queue, see the question below or contact Susan Miller (
Use the 'va' command when logged into ICE. Remaining CPU hours in the default and high_priority queues will be displayed.
There is no charge for using up to 1000 CPU hours per month in the Default Job Queue. See also the question below regarding the High Priority Job Queue. If you wish to purchase dedicated compute nodes, see http://anise.u.arizona.edu/HPC2/02_hpc/03_hpc_systems/03_hpc_cluster.shtml.
As with the marin system, you need to write a job submit script for the Portable Batch System queueing software. The script must contain directives to PBS that specify the number of nodes and processor to use, the amount of CPU time and wall time requested for the job. You must also specify your PI's group in the group_list directive; to find the name of your group, run the 'va' command when logged into ICE. To avoid using more of your monthly allocation of resources than you intend to, the PBS directives must be written carefully, keeping the specific job requirements in mind. Examples can be found at: http://anise.u.arizona.edu/HPC2/02_hpc/02_hpc_starting/03_05_hpc_user_guide_for_SGI_ICE_cluster.shtml and in the file /genome/ICEblast.csh If you request a large number of cpus, your allocation is charged for all of these whether they are actually used by your job or not. PBS documentation is available at: http://anise.u.arizona.edu/HPC2/02_hpc/02_hpc_starting/03_hpc_guides.shtml When you have finished editing the PBS script, use the qsub command to submit it, e.g.
qsub mySubmitFile.csh. You can monitor the status of your job by running the command:
On the ICE system, NCBI BLAST jobs may be run with up to 8 processors. To use more than 8 processors, see the next question and answer. To run BLAST with 8 processors, copy the file /genome/ICEblast.csh to your directory and modify it by adding your Email address, group name, and full paths to your input and output files. Then use the qsub command to submit your job file as described in the answer to the previous question.
On the ICE system, NCBI BLAST jobs may be run with up to 8 processors. To use more than 8 processors, mpi-BLAST is required. However, with mpi-BLAST there is message passing overhead and larger jobs may not scale very well. Additionally, mpi-BLAST requires a special segmented formatting of the BLAST database by the mpiformatdb tool. For more information, see: http://www.mpiblast.org/
The BLAST databases in the /genome directory are accessible both from marin and from the ICE cluster. To see a list of nucleotide BLAST databases, use the command 'ls /genome/*.nsq /genome/*/*.nsq' (This lists databases in /genome and immediate subdirectories of /genome). To see protein BLAST databases, use .psq in place of .nsq in the ls command.
There are approximately 180000 CPU hours per monthavailable in the High Priority Job Queue to be shared among BIO5 and Life Sciences researchers. The availability of these High Priority Queue hours is a result of the funding of a number of compute nodes by a TRIF grant written by BIO5 researchers. If you would like to have an allocation of High Priority hours, create a group as described in the next question, then contact Susan Miller (
email@example.com). Currently there is no automatic renewal of High Priority hours and allocations are done on a case-by-case basis. For information on funding your own dedicated compute nodes within ICE, see http://anise.u.arizona.edu/HPC2/001_rc_baseplus.shtml.
Go to http://marin.hpc.arizona.edu/sponsor and create a group named b5abcxyz, where abc are the PI's initials, and xyz is an abbreviation for the project you're using ICE for. Then add your group members' UANetIDs to that group so that they can submit jobs to the high_priority queue. After your "b5abcxyz" group has been created, contact Susan Miller (
firstname.lastname@example.org) for an allocation of hours in the high priority job queue. For such submissions, use the following PBS directives:
Copy /genome/ICEbin/IM_mtcp_submit.pl and /genome/ICEbin/IM_mtcp_restart.pl to your directory and modify them to use your Email, Group, directory, and desired command strings.
The stable version is bioperl 1.4, released in 2004. Version 1.5.2 was released in 2006 and is considered a "developer's release", which is not guaranteed to be stable. For most purposes either version should work. If you are having problems with one version, try the other by unloading the loaded bioperl module and loading the module for the other. For specific details about BioPerl differences, see http://bioperl.org/wiki/FAQ#What_is_the_difference_between_1.5.2_and_220.127.116.11F_What_do_you_mean_developer_release.3F
The latest version of PAML is 4b, and these programs are installed in /usr/local/bin. If you plan to run PAML utilities using BioPerl's Bio::Tools::Run::Phylo::PAML modules, you need to use PAML version 3.13d, which is installed in /genome/ICEbin. BioPerl cannot parse output from later PAML versions.
Try the 'hottip' command on the ICE system, or refer to the High Performance Computing User Guide. You may also subscribe to the HPC-Discuss Listserv and you can email questions to HPC-CONSULT@listserve.arizona.edu.
Please email us at
email@example.com report any answers that are not clear or any errors that may be present in this document.