Summary
-------
We provide a library that enables us to select a number of reference genes to
which codon usage should be optimized. Furthermore, we allow for input of a
variable amount of fitness factors: translation speed of codons, tRNA
abundance, etc. Given these contributing fitness factors the result is
displayed as the strength of the respective fitness factors that lead to the
best resemblance between simulated and reference codon usage. In a next step,
the strengths can be tuned and a codon usage can be generated that can
afterwards be used to adapt a gene sequence with the help of classic codon
optimization tools as OPTIMIZER.
Example
-------
In an example workflow you might want to select a fasta file that
contains the genes you want use. You can either select them from a file
or a url. In both cases a histogram of codon usage and amino acid usage
is generated.
You can then (optionally) load a list of highly expressed genes, we
support the format from the HEG database. Visualizing the codon usage
bias for e.g. checking if the CUB as you expect can be done by plotting
various methods of dimensionality reduction.
If you do not want to use all the genes you can enter a
number n. The first n genes will only be analysed.
You now have to select a fitness matrix which gives the probability of
one amino acid to be represented by another one.
Additionally, you can select a number of fitnessfunctions that assign
to each codon a fitness. These functions will be normalized!
If you want to perform a test run you
have to enter the parameters: alpha,beta,selection,t_i for every
testfunction. alpha and beta are parameters for the <todo> model of
codon substitution and are related to transition/transversion bias.
Input is either comma or whitespace/tab separated (or a combination of
those).
You can compare the absolute codon usage and relative (normalized for
each amino acid) codon usage by plot comparison. For optimizing the
distance you can try optimizing the first gene and again regard the
comparison to see if the algorithm works at all.
In a last step you can optimize all genes you have read in. Returned
are the optimal parameters, a goodness of fit and the RSCU that you can
use for optimizing with the help of, e.g., OPTIMIZER.
Authors and License
-------------------
GPLv3
Jan-Hendrik Trösemeier,
Susanne Lipp,
Christel Kamp
Contact: name.lastname at pei.de