【5.4.2】codonw

一、简介

2005的一个工具,支持Linux本地化

http://codonw.sourceforge.net/index.html

Calculates the codon usage indices

  • CAI: Codon adaptation index
  • Fop: Frequency of optimal codons
  • Nc: Effective number of codons
  • CBI: Codon Bias Index

Calculates amino acid indices

  • GRAVY score
  • Aromaticity

Calculates correspondence analysis of

  • Codon Usage
  • RSCU (Relative synonymous codon usage)
  • Amino acid usage

Correspondence analysis

  • can include/exclude codons/amino acids
  • can generate detailed reports of trends
  • attempts to identify optimal codons automatically
  • can allow additional data sets to be added records any number of trends

Calculates gene parameters

  • Gene length
  • GC, GC3s and codon position specific G+C
  • Dinucleotide composition (in all three codon frames)
  • Amino acid usage
  • Relative amino acid usage
  • Codon usage
  • Relative Synonymous codon usage

二、安装

cd /data/user/sam/project/codon_optimization/lib
https://sourceforge.net/projects/codonw/files/codonw/SourceCode-1.4.4%28zip%29/CodonWSourceCode_1_4_4.zip

unzip CodonWSourceCode_1_4_4.zip
cd codonw

三、用法

寻求帮助:

codonw -help
codonw
生成序列的codon usage
codonw input.dat -nomenu input.out input.blk

默认情况下,codonw会将每个基因的密码子使用情况报告给文件input.blk。 由于此数据集没有问题,因此不应有警告消息。 但是,基于EMBL版本50对该数据集的先前版本进行分析,其中SCCHRIII具有230个带注释的ORF,会生成这些典型的警告消息。

Warning: Sequence 178 "SCCHRIII.PE178______" does not begin with a recognised start codon
Warning: Sequence 178 "SCCHRIII.PE178______" is not terminated by a stop codon
Warning: Sequence 202 "SCCHRIII.PE202______" does not begin with a recognised start codon
Warning: Sequence 202 "SCCHRIII.PE202______" has 1 internal stop codon(s)
Warning: Sequence 202 "SCCHRIII.PE202______" is not terminated by a stop codon

input.dat 是输入序列文件,地址:http://codonw.sourceforge.net/input.dat

input.blk结果示例:

Phe UUU   27 1.38 Ser UCU    4 0.71 Tyr UAU    6 1.00 Cys UGU    4 1.14 
		UUC   12 0.62     UCC    8 1.41     UAC    6 1.00     UGC    3 0.86 
Leu UUA   12 1.24     UCA    7 1.24 TER UAA    0 0.00 TER UGA    0 0.00 
		UUG    9 0.93     UCG    4 0.71     UAG    1 3.00 Trp UGG    5 1.00 

		CUU   19 1.97 Pro CCU    5 1.43 His CAU    3 1.00 Arg CGU    3 1.50 
		CUC    7 0.72     CCC    0 0.00     CAC    3 1.00     CGC    0 0.00 
		CUA    9 0.93     CCA    7 2.00 Gln CAA   11 1.69     CGA    1 0.50 
		CUG    2 0.21     CCG    2 0.57     CAG    2 0.31     CGG    0 0.00 

Ile AUU   19 1.33 Thr ACU    5 0.71 Asn AAU   12 1.20 Ser AGU    8 1.41 
		AUC   10 0.70     ACC    8 1.14     AAC    8 0.80     AGC    3 0.53 
		AUA   14 0.98     ACA   10 1.43 Lys AAA   13 1.08 Arg AGA    5 2.50 
Met AUG   14 1.00     ACG    5 0.71     AAG   11 0.92     AGG    3 1.50 

Val GUU   16 2.00 Ala GCU   10 1.33 Asp GAU    8 1.07 Gly GGU   21 2.00 
		GUC    5 0.62     GCC    9 1.20     GAC    7 0.93     GGC    3 0.29 
		GUA    7 0.88     GCA    9 1.20 Glu GAA    7 1.40     GGA   10 0.95 
		GUG    4 0.50     GCG    2 0.27     GAG    3 0.60     GGG    8 0.76 

459 codons in YCG9_Probable___ (used Universal Genetic code)

input.out结果示例:

title                    	T3s	C3s	A3s	G3s	CAI	CBI	Fop	Nc	GC3s	GC	L_sym	L_aa	Gravy	Aromo	
YCG9_Probable__________13	0.4337	0.2347	0.3588	0.1852	0.123	0.075	0.446	54.09	0.335	0.394	439	458	0.610699	0.122271	
YCG8________573_residues_	0.2876	0.3595	0.4222	0.1875	0.100	0.020	0.394	52.46	0.439	0.446	180	190	-0.211579	0.084211	
ALPHA2________633_residue	0.3636	0.2273	0.4939	0.2177	0.109	-0.034	0.397	58.73	0.328	0.351	204	210	-0.667143	0.052381
密码子使用指数 Codon usage indices

codonw input.dat -all_indices -c_type 2 -f_type 4 -nomenu

  • -c_type 2 : 选择 CAI 参考基因组, 2代表Saccharomyces cerevisiae
  • -f_type 4: 选择 Fop/CBI 参考基因组,4代表Saccharomyces cerevisiae
  • all_indices :所有的密码子使用指数都计算,包括:T3s, C3s, A3s, G3s, CAI, CBI, Fop, Nc, GC3s, GC, L_sym, L_aa, Gravy and Aromaticity

多条序列平均的密码子频次

codonw input.dat -nomenu -cutot input.out input.coa 

结果

Phe UUU 1483 1.14 Ser UCU 1094 1.47 Tyr UAU 1000 1.12 Cys UGU  434 1.18 
			UUC 1117 0.86     UCC  773 1.04     UAC  789 0.88     UGC  303 0.82 
	Leu UUA 1349 1.55     UCA  882 1.19 TER UAA   47 1.27 TER UGA   36 0.97 
		UUG 1549 1.78     UCG  487 0.66     UAG   28 0.76 Trp UGG  665 1.00 

	CUU  698 0.80 Pro CCU  747 1.27 His CAU  677 1.15 Arg CGU  328 0.86 
	CUC  364 0.42     CCC  415 0.71     CAC  499 0.85     CGC  171 0.45 
	CUA  671 0.77     CCA  911 1.55 Gln CAA 1388 1.35     CGA  151 0.39 
	CUG  604 0.69     CCG  281 0.48     CAG  668 0.65     CGG  103 0.27 

Ile AUU 1612 1.35 Thr ACU 1052 1.38 Asn AAU 1778 1.17 Ser AGU  717 0.97 
		AUC 1018 0.85     ACC  660 0.87     AAC 1262 0.83     AGC  500 0.67 
		AUA  943 0.79     ACA  883 1.16 Lys AAA 2118 1.13 Arg AGA 1038 2.71 
Met AUG 1156 1.00     ACG  444 0.58     AAG 1645 0.87     AGG  504 1.32 

Val GUU 1184 1.49 Ala GCU 1055 1.40 Asp GAU 1905 1.25 Gly GGU 1284 1.87 
		GUC  674 0.85     GCC  765 1.01     GAC 1145 0.75     GGC  552 0.80 
		GUA  622 0.78     GCA  836 1.11 Glu GAA 2371 1.41     GGA  557 0.81 
		GUG  690 0.87     GCG  368 0.49     GAG  995 0.59     GGG  355 0.52 

53400 codons in Average of genes (used Universal Genetic code)

Correspondence Analysis (COA)

codonw input.dat -coa_cu -nomenu -silent

这会生成密码子使用情况的COA。 摘要文件为“ summary.coa”,其中包含COA生成的大多数数据。 会生成一系列的 coa文件

用上一步基因序列生成的密码子COA表

codonw input.dat -fop_file fop.coa   -nomenu

codonw input.dat -cai_file cai.coa -cbi_file cbi.coa  -nomenu   result2.out result2.blk

cricetulus griseus 在NCBI上搜索相关的genome序列( https://www.ncbi.nlm.nih.gov/nuccore/NC_007936.1?report=fasta ),有comple,coding,gene三种。

cd /data/user/sam/project/codon_optimization/lib/codonW/genome/CHO_coding

codonw CHO_coding.fasta -coa_cu -nomenu -silent

四、报错

报错1

[sam@g02 CHO_complete]$ codonw CHO_complete.fasta -coa_cu -nomenu -silent

 Welcome to CodonW  for Help type h


Warning: Sequence   1 "NC_007936.1_Cricetul" does not begin with a recognised start codon

Warning: Sequence   1 "NC_007936.1_Cricetul" has 385 internal stop codon(s)

Warning: Sequence   1 "NC_007936.1_Cricetul" is not terminated by a stop codon


                Number of sequences: 1
        WARNING  1 sequences had internal stop codons   WARNING
Generating correspondence analysis
Problems with the number genes used for fop adjusting to 1 gene

Sequence 1 "NC_007936.1_Cricetul" contains no amino acids with 2 synonymous codons
        --Nc was not calculated
WARNING An attempt to calculate CAI relative adaptivnesss FAILED
 no Phe amino acids found in the high bias dataset

参考资料

药企,独角兽,苏州。团队长期招人,感兴趣的都可以发邮件聊聊:tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn