【2.2.1】统计蛋白序列的各种属性 Emboss pepstats
一、简介
- 网址:http://bar.utoronto.ca/cgi-bin/emboss/pepstats
- 说明文档: http://bar.utoronto.ca/cgi-bin/emboss/help/pepstats
- 下载地址: ftp://emboss.open-bio.org/pub/EMBOSS/ (下载EMBOSS)
可以计算的属性
- Molecular weight
- Number of residues
- Average residue weight
- Charge
- Isoelectric point
- For each type of amino acid: number, molar percent, DayhoffStat
- For each physico-chemical class of amino acid: number, molar percent
- Probability of protein expression in E. coli inclusion bodies
- Molar extinction coefficient (A280)
- Extinction coefficient at 1 mg/ml (A280)*
二、用法
参数:
[sam@g02 view]$ pepstats --help
Calculate statistics of protein properties
Version: EMBOSS:6.6.0.0
Standard (Mandatory) qualifiers:
[-sequence] seqall Protein sequence(s) filename and optional
format, or reference (input USA)
[-outfile] outfile [*.pepstats] Pepstats program output file
Additional (Optional) qualifiers: (none)
Advanced (Unprompted) qualifiers:
-aadata datafile [Eamino.dat] Amino acid properties
-mwdata datafile [Emolwt.dat] Molecular weight data for amino
acids
-pkdata datafile [Epk.dat] Values of pKa for amino acids
-[no]termini boolean [Y] Include charge at N and C terminus
-mono boolean [N] Use monoisotopic weights
General qualifiers:
-help boolean Report command line options and exit. More
information on associated and general
qualifiers can be found with -help -verbos
结果示例:
PEPSTATS of LACI_ECOLI from 1 to 360
Molecular weight = 38590.16 Residues = 360
Average Residue Weight = 107.195 Charge = 1.5
Isoelectric Point = 6.3901
A280 Molar Extinction Coefficients = 22920 (reduced) 23045 (cystine bridges)
A280 Extinction Coefficients 1mg/ml = 0.594 (reduced) 0.597 (cystine bridges)
Improbability of expression in inclusion bodies = 0.660
Residue Number Mole% DayhoffStat
A = Ala 44 12.222 1.421
B = Asx 0 0.000 0.000
C = Cys 3 0.833 0.287
D = Asp 17 4.722 0.859
E = Glu 15 4.167 0.694
F = Phe 4 1.111 0.309
G = Gly 22 6.111 0.728
H = His 7 1.944 0.972
I = Ile 18 5.000 1.111
J = --- 0 0.000 0.000
K = Lys 11 3.056 0.463
L = Leu 41 11.389 1.539
M = Met 10 2.778 1.634
N = Asn 12 3.333 0.775
O = --- 0 0.000 0.000
P = Pro 14 3.889 0.748
Q = Gln 28 7.778 1.994
R = Arg 19 5.278 1.077
S = Ser 32 8.889 1.270
T = Thr 19 5.278 0.865
U = --- 0 0.000 0.000
V = Val 34 9.444 1.431
W = Trp 2 0.556 0.427
X = Xaa 0 0.000 0.000
Y = Tyr 8 2.222 0.654
Z = Glx 0 0.000 0.000
Property Residues Number Mole%
Tiny (A+C+G+S+T) 120 33.333
Small (A+B+C+D+G+N+P+S+T+V) 197 54.722
Aliphatic (A+I+L+V) 137 38.056
Aromatic (F+H+W+Y) 21 5.833
Non-polar (A+C+F+G+I+L+M+P+V+W+Y) 200 55.556
Polar (D+E+H+K+N+Q+R+S+T+Z) 160 44.444
Charged (B+D+E+H+K+R+Z) 69 19.167
Basic (H+K+R) 37 10.278
Acidic (B+D+E+Z) 32 8.889
命令行:
pepstats $(sequence).txt 1-pepstats.txt
三、名词详解
3.1 extinction coefficient
摩尔吸光系数(Molar Absorption Coefficient),也称摩尔消光系数(Molar Extinction Coefficient),是指物质对某波长的光的吸收能力的量度,以符号“ε”表示。
3.2 Dayhoff
DayhoffStat是氨基酸的摩尔百分比除以Dayhoff统计量。从EMBOSS数据文件Edayhoff.freq中读取Dayhoff统计信息,它是将每1000个氨基酸的相对出现率标准化为100。
3.3 inclusion bodies
包含体(inclusion bodies)中表达的可能性有时称为溶解度量度的一种。但是,如果重组蛋白在大肠杆菌中表达,则可以表达为可溶于细胞质或不溶于包涵体。如果哈里森模型(Harrison model)预测给定的蛋白质可能在包涵体中表达,这并不意味着不可能使其溶于细胞质。一个例子:具有C-末端His-Tag的热生热球菌细胞分裂蛋白FtsA在包涵体中表达的哈里森概率为58%。但是,大肠杆菌胞质溶胶中有大量可溶性蛋白(F. van den Ent和J. Lowe,EMBO J. 19,5300-5307,2000)。蛋白质是否在包涵体中表达不仅取决于序列,还取决于许多其他因素,例如大肠杆菌菌株,温育温度,表达载体的类型,启动子和培养基的强度。
3.4 其他的网页工具
https://web.expasy.org/cgi-bin/protparam/protparam
参考资料
这里是一个广告位,,感兴趣的都可以发邮件聊聊:tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn