Home arrow Site Navigation arrow FAQs and HowTo's arrow High Throughput Computing arrow AutoFACT: Automatic Functional Annotation/Classification Tool
AutoFACT: Automatic Functional Annotation/Classification Tool | Print |
The who, what, when, where and why of AutoFACT. AutoFACT is a perl script that reads a FASTA sequence file and corresponding BLAST output files and performs automatic functional annotation.

1. What is AutoFACT?

AutoFACT is a tool for Automatic Functional Annotation and Classification of sequences, written by Liisa Koski et. al., Université de Montréal, Canada

Key features of AutoFACT are:

  • Analyzes nucleotide and protein sequence data
  • Determines the most informative functional description by combining multiple blast reports from several user selected databases.
  • Assigns putative metabolic pathways, functional classes, enzyme classes, GeneOntology terms and locus names.
  • Generates output in HTML, text and GFF formats for the users convenience.
  • Performs transitive EST annotation (trEST), see below for details (NEW to version 3.1).
     For more details, see BMC Bioinformatics. 2005 Jun 16;6(1):151

 

2. What do I need to know to use this FAQ?

Familiarity with basic Unix commands and conventions is assumed, as it the ability to use the sftp utility to transfer files to and from the supercomputer and to run parallel BLAST jobs on the High Performance Computing system.  For more help with these topics, see the relevant FAQs.

 

3. How do I run AutoFACT?

AutoFACT runs on a single processor, so it is most efficient to run the desired BLAST jobs on parallel processors, placing the outputs into directories named as required by AutoFACT.  When these BLAST outputs are in place, AutoFACT can by run by submitting a job to a single processor.  Begin by creating a directory (e.g. autoF) that will contain the input sequence file and the following subdirectories that AutoFACT requires:

 

LSU/  SSU/  cog/  est_others/  html/  kegg/  nr/  pfam/  smart/  uniref90/

In each of the above subdirectories, AutoFACT will expect a BLAST output file, named by appending .out to the name of the input file.  For example, if the input file is named myseqs.fasta, each output file must be named myseqs.fasta.out.   To accomplish this, set the -o flag in the BLAST command to include the relevant AutoFACT subdirectory and the .out file, e.g.  -o LSU/myseqs.fasta.out

 

4. Can I see an example of the directory setup for AutoFACT?

[aura0][~/autoF]> ls -l


-rw-------   1 sjmiller nrsc 583   Oct 13 09:43 autofact.csh

drwxrwxr-x 2 sjmiller nrsc 8192 Sep 22 14:35 LSU/
drwxrwxr-x 2 sjmiller nrsc 8192 Sep 22 14:35 SSU/
drwxrwxr-x 2 sjmiller nrsc 8192 Sep 22 14:33 cog/
drwxrwxr-x 2 sjmiller nrsc 8192 Sep 22 14:35 est_others/
drwxrwxr-x 2 sjmiller nrsc 8192 Sep 22 14:43 html/
drwxrwxr-x 2 sjmiller nrsc 8192 Sep 22 14:33 kegg/
drwxrwxr-x 2 sjmiller nrsc 8192 Oct 31 15:16 nr/
drwxrwxr-x 2 sjmiller nrsc 8192 Sep 22 14:33 pfam/
drwxrwxr-x 2 sjmiller nrsc 8192 Sep 22 14:35 smart/
-rw-rw-r--  1 sjmiller nrsc 352   Sep 22 18:30 testseq.fasta
drwxrwxr-x 2 sjmiller nrsc 8192 Oct 13 09:45 uniref90/


[aura0][~/autoF]> ls -l  LSU/

-rw-rw-r-- 1 sjmiller nrsc 9376 Sep 23 10:11 testseq.fasta.out

[aura0][~/autoF]> ls -l  SSU/

-rw-rw-r-- 1 sjmiller nrsc 2674 Sep 23 10:11 testseq.fasta.out

 

You can see that each subdirectory contains an output file with the same name.  However, the contents of these files are different since they result from a BLAST to a different database.

5. How do I submit my AutoFACT job?

Create a submit file containing the following lines, editing your email address, group, the AutoFACT directory path, and the input sequence file as needed (edit bold portions).  The 'time /genome/bin/AutoFACT/scripts/AutoFACT.pl ...' command needs to all be on one long line.

#!/bin/csh
#PBS -N AutoFACT

#PBS -m bea
#PBS -M my_email@email.arizona.edu

#PBS -W group=mygroup

#PBS -q default

### Set the number of cpus that will be used.
#PBS -l select=1:ncpus=1

### Important!!! Include this line for your 1p job.
### Without it, the whole cpu node containing 4 cpus will be allocated.
#PBS -l place=pack:shared
#PBS -l cput=120:0:0
#PBS -l walltime=120:0:0

setenv BLASTMAT /usr/local/BLAST/data
setenv PATH2AUTOFACT /genome/bin/AutoFACT
cd /home4/u16/myhome/autoF
time /genome/bin/AutoFACT/scripts/AutoFACT.pl -a -g -f myseqs.fasta

 

6. What is the output produced by AutoFACT ?

AutoFACT creates an html file in the directory from which it is run.  This file can be opened in a browser and contains the functional annotations discovered by AutoFACT, including links to relevant URLs.  AutoFACT also writes results to a Tab-delimited .gff file that can be viewed with Excel or a test editor.