Become familiar with basic UNIX commands
for listing files, moving around the directory structure and viewing
output files. There are numerous free Tutorials available on the web –
go to Google and search for ‘Unix Tutorials’ to find some of these.
Using UNIX:
There are many sources of UNIX tutorials.
A few of these are:
http://www.ee.surrey.ac.uk/Teaching/Unix
http://www.math.utah.edu/lab/unix/unixtutorial.html http://www.unixtools.com/tutorials.html
There are some short Quicktime movies accessible from the UA
Computer-Based Training Site (type UNIX in the ‘Find a Course’ box):
http://uacbt.arizona.edu/default.htm (for U of A, you do not need to purchase these! You only need your UA NetID/password).
Trimming file names:
There is a script named trimlabel that will trim the first letter and
two digits from a filename. It will rename all files in the current
directory.
Setting up to run phred/polyphred/phrap/consed:
A set of directories must be created, and the script mkpolydirs will do
this for you. Then move your chromatogram files into the chromat_dir.
Using phred to improve base-calling:
phred -id chromat_dir -trim_alt "" -trim_scf -cd scf_dir
The scf_dir will contain SCF (Staden compressed format) chromatograms
that also contain the phred quality info and base calls. These files
are about half the size of the original chromatograms. To save disk
space you can gzip the original chromatograms or move them to another
system. The SCF files can be imported into Sequencher. To use them with
polyphred, phrap, and consed, run the scf2chromat script - this renames
the original chromat directory (if it is still present) and renames the
scf_dir as chromat_dir as required by phrap, etc.
Running phred/polyphred/phrap/consed:
Move down into the edit_dir (using the command cd edit_dir) and type
polyphredPhrap. Lots of output will scroll by and when it's finished
run consed. In consed open the most recent assembly, then double click
on a contig. From the contig view, Navigate to Tags and select
polyPhredRank1. This will let you view each polymorphic site. Repeat
for Rank2, 3, etc. NOTE: if your sequences are purely homozygous (e.g.
cloned DNA or mitochondrial sequences) then refcomp is more appropriate
than polyphred.
consed Output:
This
appears to be quite limited, in that you can only output consensus
sequences or an ace format assembly, which doesn't nicely show aligned
sequences. It might be possible to use BioPerl to write a script that
would produce a nicer output.
polyphred output
In the edit_dir you will see a file with its name ending in
‘.polyphred.out’ and that file contains detailed information on
predicted polymorphisms. The BEGIN_POLY tag marks a region that lists
the position, flanking sequence, and rank of each polymorphism. (Higher
ranks are more likely to be polymorphisms). The BEGIN_GENOTYPE region
lists position in the consensus, position in the read, name of the
read, two most significant bases and rank. To change the sensitivity of
polyphred, read the complete documentation for available options.
|