|
|
|
AutoFACT: Automatic Functional Annotation/Classification Tool |
| Print |
|
The who, what, when, where and why of AutoFACT. AutoFACT is a perl script that reads a FASTA sequence file and corresponding BLAST output files and performs automatic functional annotation.
1. What is AutoFACT? AutoFACT is a tool for Automatic Functional Annotation and Classification of sequences, written by Liisa Koski et. al., Université de Montréal, Canada Key features of AutoFACT are: - Analyzes nucleotide and protein sequence data
- Determines the most informative functional description by combining multiple blast reports from several user selected databases.
- Assigns putative metabolic pathways, functional classes, enzyme classes, GeneOntology terms and locus names.
- Generates output in HTML, text and GFF formats for the users convenience.
- Performs transitive EST annotation (trEST), see below for details (NEW to version 3.1).
2. What do I need to know to use this FAQ? Familiarity with basic Unix commands and conventions is assumed, as it the ability to use the sftp utility to transfer files to and from the supercomputer and to run parallel BLAST jobs on the High Performance Computing system. For more help with these topics, see the relevant FAQs. 3. How do I run AutoFACT? AutoFACT runs on a single processor, so it is most efficient to run the desired BLAST jobs on parallel processors, placing the outputs into directories named as required by AutoFACT. When these BLAST outputs are in place, AutoFACT can by run by submitting a job to a single processor. Begin by creating a directory (e.g. autoF) that will contain the input sequence file and the following subdirectories that AutoFACT requires: LSU/ SSU/ cog/ est_others/ html/ kegg/ nr/ pfam/ smart/ uniref90/ In each of the above subdirectories, AutoFACT will expect a BLAST output file, named by appending .out to the name of the input file. For example, if the input file is named myseqs.fasta, each output file must be named myseqs.fasta.out. To accomplish this, set the -o flag in the BLAST command to include the relevant AutoFACT subdirectory and the .out file, e.g. -o LSU/myseqs.fasta.out 4. Can I see an example of the directory setup for AutoFACT? [aura0][~/autoF]> ls -l -rw------- 1 sjmiller nrsc 583 Oct 13 09:43 autofact.csh drwxrwxr-x 2 sjmiller nrsc 8192 Sep 22 14:35 LSU/ drwxrwxr-x 2 sjmiller nrsc 8192 Sep 22 14:35 SSU/ drwxrwxr-x 2 sjmiller nrsc 8192 Sep 22 14:33 cog/ drwxrwxr-x 2 sjmiller nrsc 8192 Sep 22 14:35 est_others/ drwxrwxr-x 2 sjmiller nrsc 8192 Sep 22 14:43 html/ drwxrwxr-x 2 sjmiller nrsc 8192 Sep 22 14:33 kegg/ drwxrwxr-x 2 sjmiller nrsc 8192 Oct 31 15:16 nr/ drwxrwxr-x 2 sjmiller nrsc 8192 Sep 22 14:33 pfam/ drwxrwxr-x 2 sjmiller nrsc 8192 Sep 22 14:35 smart/ -rw-rw-r-- 1 sjmiller nrsc 352 Sep 22 18:30 testseq.fasta drwxrwxr-x 2 sjmiller nrsc 8192 Oct 13 09:45 uniref90/ [aura0][~/autoF]> ls -l LSU/ -rw-rw-r-- 1 sjmiller nrsc 9376 Sep 23 10:11 testseq.fasta.out [aura0][~/autoF]> ls -l SSU/ -rw-rw-r-- 1 sjmiller nrsc 2674 Sep 23 10:11 testseq.fasta.out You can see that each subdirectory contains an output file with the same name. However, the contents of these files are different since they result from a BLAST to a different database. 5. How do I submit my AutoFACT job? Create a submit file containing the following lines, editing your email address, group, the AutoFACT directory path, and the input sequence file as needed (edit bold portions). The 'time /genome/bin/AutoFACT/scripts/AutoFACT.pl ...' command needs to all be on one long line. #!/bin/csh #PBS -N AutoFACT #PBS -m bea #PBS -M my_email@email.arizona.edu #PBS -W group=mygroup #PBS -q default ### Set the number of cpus that will be used. #PBS -l select=1:ncpus=1 ### Important!!! Include this line for your 1p job. ### Without it, the whole cpu node containing 4 cpus will be allocated. #PBS -l place=pack:shared #PBS -l cput=120:0:0 #PBS -l walltime=120:0:0 setenv BLASTMAT /usr/local/BLAST/data setenv PATH2AUTOFACT /genome/bin/AutoFACT cd /home4/u16/myhome/autoF time /genome/bin/AutoFACT/scripts/AutoFACT.pl -a -g -f myseqs.fasta 6. What is the output produced by AutoFACT ? AutoFACT creates an html file in the directory from which it is run. This file can be opened in a browser and contains the functional annotations discovered by AutoFACT, including links to relevant URLs. AutoFACT also writes results to a Tab-delimited .gff file that can be viewed with Excel or a test editor. |
|