|
|
|
|
|
近年发表的论文
|
2007年 |
|
1. |
Construction of
mathematical model for high-level expression of foreign genes in
pPIC9 vector and its verification |
|
|
Bingli Wu, Lei Cha,
Zepeng Du, Xiaomin Ying, Hua Li, Liyan Xu, Xiaofei Zheng, Enmin Li,
Wuju Li |
|
|
Biochemical and
Biophysical Research Communications,2007, 354:498–504 |
|
|
Abstract:
In this report, we introduced a mathematical model for high-level
expression of foreign genes in pPIC9 vector. At first, we collected
40 heterologous genes expressed in pPIC9 vector, and these 40 genes
were classified into high-level expression group (expression level
>100mg/L, 12 genes) and low-level expression group (expression level
<100mg/L, 28 genes). Then, the Naive Bayes method was used to
construct the model with RNA secondary structure profile of 3'-end
of foreign genes as features. The classification accuracy from
leave-one-out cross-validation was 100%. Finally, another five genes
collected from literatures were used to test the ability of the
model. The results indicated that there were four genes correctly
predicted. In addition, the model was also verified by expressing
human neutrophil gelatinase-associated lipocalin (NGAL) gene with
expression level more than 100mg/L. Therefore, we propose that the
model can be used to predict the expression level of heterologous
genes before experiments and optimize the experiment designs to
obtain the high-level expression. Furthermore, we have developed a
web server for evaluation and design for high-level expression of
foreign genes, which is accessible at
http://ppic9.med.stu.edu.cn/ppic9 |
|
|
Full Text
Download:
 |
|
2. |
Predicting siRNA
efficiency |
|
|
W. Li and L. Cha |
|
|
Cell. Mol. Life Sci.,
2007, 64:1785 – 1792 |
|
|
Abstract:Since
the identification of RNA-mediated interference (RNAi) in 1998, RNAi
has become an effective tool to inhibit gene expression. The
inhibition mechanism is triggered by introducing a short
interference double-stranded RNA (siRNA,19~27 bp) into the
cytoplasm, where the guide strand of siRNA (usually antisense
strand) binds to its target messenger RNA and the expression of the
target gene is blocked. RNAi has been widely applied in gene
functional analysis, and as a potential therapeutic strategy in
viral diseases, drug target discovery, and cancer therapy. Among the
factors which may compromise inhibition efficiency, how to design
siRNAs with high efficiency and high specificity to its target gene
is critical. Although many algorithms have been developed for this
purpose, it is still difficult to design such siRNAs. In this
review, we will briefly discuss prediction methods for siRNA
efficiency and the problems of present approaches. |
|
|
Full Text
Download:
 |
|
3. |
拟南芥基因组中新的microRNA预测及分析 |
|
|
金伟波,孔栋,应晓敏,郭爱光,李伍举 |
|
|
生物物理学报,23(2007)389-396 |
|
|
摘要:MicroRNA(miRNA)是一类存在于动植物体内,长度为21~25nt的内源性小RNA,对生物体的转录后基因调控起着关键作用,但一些低丰度的miRNA和组织特异性miRNA往往很难发现.为了系统识别拟南芥基因组中新的非同源miRNA,首先基于已报道的拟南芥miRNA的特征,从全基因组范围中筛选出453条可能的miRNA前体:其次,为了进一步对上述miRNA前体进行筛选,利用人的miRNA前体数据构建了支持向量机模型GenomicSVM,该模型对人测试集的敏感性和特异性分别为86.3﹪和98.1﹪(30个人miRNA前体和1
000个阴性miRNA前体),对拟南芥测试集的正确率为93.6﹪(78个miRNA前体);最后,利用GenomicSVM预测上述453条miRNA前体序列,得到了37条候选的新的拟南芥miRNA前体,为进一步的miRNA实验发现研究提供了指导. |
|
|
Full Text
Download:
 |
|
|
|
2006年 |
|
1. |
基于k-tuple组合酵母ncRNA与mRNA的比较研究 |
|
|
李华、应晓敏、查磊、李伍举 |
|
|
生物物理学报,2006,22:110-116 |
|
|
摘要:ncRNA和mRNA一样,都是重要的功能分子。以k-tuple(k字)含量为特征,对酵母ncRNA成熟序列和mRNA的编码区、上游序列与下游序列进行了分类与比较研究,结果显示:基于ncRNA成熟序列与mRNA编码区的3-tuple的含量,ncRNA和mRNA的交叉有效性分类精度(leave-one
out cross—validation,LOOCV)平均值达到93.93%;基于上游序列4-tuple和5-tuple的含量,分类精度分别为92.49%和92.76%;基于下游序列4-tuple和5-tuple的含量,分类精度分别为91.58%和90.60%;利用上游序列和下游序列的4-tuple与5-tuple的含量,其平均分类精度分别为94.68%和94,83%;通过t检验,得到了在ncRNA和mRNA上、下游序列中具有显著统计学差异的k-tuple。上述结果表明,基于ncRNA成熟序列与mRNA编码区的3-tuple含量和基于ncRNA与mRNA上、下游序列的4或5-tuple含量可以有效地区分ncRNA与mRNA。此研究结果不仅有助于准确识别ncRNA与mRNA,还有助于发现ncRNA特异的转录因子结合位点。
|
|
|
Full Text
Download:
 |
|
2. |
BioSun2.0:一个综合性的辅助分子生物学实验设计软件
|
|
|
查磊, 应晓敏, 曹源, 李华, 李伍举 |
|
|
军事医学科学院院刊,2006,30:461-464 |
|
|
摘要:我们曾于2004年推出了计算机辅助分子生物学实验设计的软件系统BioSun
1.0,该系统提供了较为全面的数据处理与分析功能.为了更好地服务于生物医学工作者,我们对该软件系统进行了升级,推出了2.0版本,新增的功能主要有:基于Blast的多种形式的序列比对、基于ClustalW的多序列比对与进化树构建、蛋白质三维结构展示、基于RNAfold的RNA二级结构预测和序列格式转换等.通过与商业化综合性的生物信息学软件系统DNASIS
MAX 2.05、DNAStar 5.0、Vector NTI 9.1和BioEdit 7.0
的比较发现,BioSun2.0具有操作简便、功能众多和性价比高等特点,能够满足生物医学实验室的常规需求 |
|
|
Full Text
Download:
 |
|
3. |
Mprobe
2.0:Computer-Aided Probe Design for Oligonucleotide Microarray |
|
|
Wuju Li, Xiaomin Ying |
|
|
Applied
bioinformatics, 2006, 5:181-186 |
|
|
Abstract:
DNA chips have proven to be effective tools in detecting
gene expression levels. Compared with DNA chips using complementary
DNA as probes, oligonucleotide microarrays using oligonucleotides as
probes have attracted great attention because of their well known
advantages. The design of gene-specific probes for each target is
essential to the development of oligonucleotide microarrays. We have
previously reported the development of a probe design software
termed Mprobe 1.0. Here, we present a new version of this software,
termed Mprobe 2.0. Several new features are included in Mprobe 2.0.
Firstly, a paradox-based sequence database management system has
been developed and integrated into the software, which consequently
allows interoperability with sequences in GenBank, EMBL, and FASTA
formats. Secondly, in contrast to setting a fixed threshold for the
secondary structure of probes in Mprobe 1.0 and other related
software, Mprobe 2.0 employs a different method. After parameters
such as GC type, probe melting temperature and GC contents have been
evaluated, candidate probes are sorted by the free energy from high
to low value, followed by specificity analysis. Thirdly, Mprobe 2.0
provides users with substantial parameter options in the visual
mode. Mprobe 2.0 possesses an easier interface for users to manage
sequences annotated in different formats and design the optimal
probes for oligonucleotide microarrays and other applications.
AVAILABILITY: The program is free for non-commercial users and can
be downloaded from the web page |
|
|
Full Text
Download:
 |
|
2005年 |
|
1. |
How many genes are needed for early detection of breast cancer,
based on gene expression patterns in peripheral blood cells? |
|
|
Wuju Li |
|
|
Breast Cancer
Research, 2005, vol. 7 (5): E5. |
|
|
|
Abstract:
In
their recent report [1], Sharma
and coworkers explore the early detection of breast
cancer. They analyzed a gene expression data set
(1368 genes in 62 normal and 40 tumour samples,
including sample duplication in different batches)
using the nearest shrunken centroid method. They
identified a panel of 37 genes that permitted early
detection, with the classification accuracy being
about 82%. This is a typical problem with sample
classification based on gene expression profiling.
The objective is to achieve high prediction accuracy
with as few genes as possible, and so feature
selection plays an important role; examination of a
large number of genes will increase the
dimensionality, computational complexity, and
clinical cost. According to our previous study of
data sets from patients with colon cancer, leukaemia
and breast cancer [2], we
estimated that five or six genes – rather than 37
-would be sufficient for the early detection of
beast cancer [1]. So how many
genes are indeed needed? In order to address this
question, we evaluated the data presented by Sharma
and coworkers using the Tclass system [2].
In the
Tclass system, Fisher's linear discriminant analysis
and a step-wise optimization procedure for feature
selection are used to analyze a batch adjusted data
set [1] in two ways. The first is
to take the prediction accuracy from the training
set as the object function. The second way is to
take the classification accuracy from the
leave-one-out cross-validation as the object
function. For the former, the selected optimal
feature sets are evaluated by randomly dividing all
tissue samples into a training set (e.g. 50%, 67%,
or 85% of samples) and a test set 200 times. The
relationship between the prediction accuracy and the
number of genes is illustrated in Fig.
1, which shows that the greatest prediction
accuracy was achieved using six genes (Fig.
1a); other peaks in accuracy occurred when 10,
13, or 15 genes were used (Fig.
1b). Furthermore, two genes – the 481th
(BC009696) and the 801th (BC000514) – permitted
classification accuracy as high as 86%, which is
greater than the 82% achieved by Sharma and
coworkers [1] with the selected 37
genes. |
|
|
|
Full Text
Download:
 |
|
2. |
An approach to studying lung cancer-related proteins in human blood
|
|
|
Ting Xiao, Wantao
Ying, Lei Li, Zhi Hu, Ying Ma, Liyan Jiao,
Jinfang Ma, Yun Cai, Dongmei Lin, Suping Guo,
Naijun Han, Xuebing Di, Min Li, Dechao Zhang, Kai Su,
Jinsong Yuan, Hongwei Zheng, Meixia Gao, Jie He,
Susheng Shi, Wuju Li, Ningzhi Xu, Husheng Zhang,
Yan Liu, Kaitai Zhang, yanning Gao, Xiaohong Qian,
and Shujun Cheng |
|
|
Molecular
& Cellular Proteomics,
2005, published online. |
|
|
Abstract:
Early-stage lung cancer detection is the first step towards
successful clinical therapy and increased patient survival.
Clinicians monitor cancer progression by profiling tumor cell
proteins in the blood plasma of afflicted patients. Blood plasma,
however, is a difficult cancer protein assessment media, because it
is rich in albumins and heterogeneous protein species. We report
herein a method to detect the proteins released into the circulatory
system by tumor cells. Initially, we analyzed the protein components
in the conditional medium (CM) of lung cancer primary cell or organ
cultures, and in the adjacent normal bronchus using 1-D PAGE and
nano-ESI-MS/MS. We identified 299 proteins involved in key cellular
process such as cell growth, organogenesis and signal transduction.
We selected 13 interesting proteins from this list, and analyzed
them in 628 blood plasma samples using ELISA. We detected 11 of
these 13 proteins in the plasma of lung cancer patients and
non-patient controls. Our results showed that plasma MMP1 levels
were elevated significantly in late-stage lung cancer patients, and
that the plasma levels of 14-3-3 sigma, beta and eta in the lung
cancer patients were significantly lower than those in the control
subjects. To our knowledge, this is the first time that fascin,
ezrin, CD98, annexin A4, 14-3-3 sigma, 14-3-3 beta and 14-3-3 eta
proteins have been detected in human plasma by ELISA. The
preliminary results showed that a combination of CD98, fascin, PIGR/SC
and 14-3-3 eta had a higher sensitivity and specificity than any
single marker. In conclusion, we report a method to detect proteins
released into blood by lung cancer. This pilot approach may lead to
the identification of novel protein markers in blood and provide a
new method of identifying tumor biomarker profiles for guiding both
early detection and therapy of human cancer. |
|
|
Full Text
Download:
 |
|
|
|
|
| |
Genome Class Prediction Based on Amino Acid
Composition (AAC) from Proteomes |
|
Wuju Li, Tao Liu,
Xiaomin Ying, and Ming Fan |
|
Molecular
& Cellular Proteomics,
2004, vol.3 (10): S79. |
|
Abstract:
With genomic sequences from three domains of life become
increasingly available, the relationships between the AAC and the
genome classes (organisms' phenotype) have been widely studied in
the following two aspects. The first aspect is to concentrate on the
difference of AAC of proteins from particular type or whole
proteomes in different genome classes. The second aspect is to study
the issue of genome class prediction based on the AAC. The purpose
of the above two aspects is to explain why certain organisms can
live in extreme conditions of temperature, salinity, or pressure.
Here we want to emphasize whether there is a possibility to predict
the genome classes as accurately as possible using small subsets of
amino acids. In order to investigate the issues systematically, the
Fisher linear discriminate analysis (FLDA) was applied to the
following four data sets DOMAIN, LIFE, HTHAB, and ARCHAEA. The
DOMAIN is about the three domains of life (16 archaea, 75 bacteria,
and 6 eukaryotic genomes). The LIFE is about the three lifestyles
(13 HTH, 4 TH, and 79 MES). The HTHAB includes 10 HTH in archaea and
3 HTH in bacteria. The ARCHAEA is about the three lifestyles in
archaea (10 HTH, 3 TH, and 3 MES). By using the feature selection method of all
possible combinations of features (amino acids), we found that the
cross-validation accuracies for above four data sets could reach 94.8%, 97.9%,
100.0%, and 100.0% by only using the compositions of four (A, I, K, and Q),
five (I, K, P, V, and Y), two (E and Q), and two (M and Q) amino acids
respectively. The average cross-validation accuracy reaches 98.2%.
Therefore, AAC from the proteomes provides an alternative way to determine the
genome classes such as the lifestyle or the domains of life. According to
what we know, the correspondence
analysis, principal component analysis (PCA),
and hierarchical cluster analysis have been applied to study the
distinction of different genome classes using the AAC, but the classification
methods have not been used. Therefore, our work represents a first attempt
on this effort in this field. |
|
PDF
Abstract
Download:
 |
| |
|
| |
RDfolder:
a web server for prediction of RNA secondary structure |
|
Xiaomin Ying, Hong Luo, Jingchu Luo
and Wuju Li |
|
Nucleic Acids Research,
2004, vol.32: W150-W153. |
|
Abstract: Prediction
of RNA secondary structure is important in the functional analysis
of RNA molecules. The RDfolder web server described in this paper
provides two methods for prediction of RNA secondary structure: random
stacking of helical regions and helical regions distribution. The
random stacking method predicts secondary structure by Monte Carlo
simulations. The method of helical regions distribution predicts secondary
structure based on the helices that appear most frequently in the
set of structures, which are generated by the random stacking method.
The RDfolder web server can be accessed at
http://rna.cbi.pku.edu.cn. |
|
Full Text
Download:
 |
|
|
| |
BioSun:
计算机辅助分子生物学实验设计的软件系统 |
|
李伍举, 应晓敏 |
|
军事医学科学院院刊,2004
vol. 28(5): 401-404 |
|
摘要:论述了我们自行研究与开发的分子生物学实验辅助设计的生物信息学软件系统BioSun,运行于Windows环境。其主要功能有:可视化的序列编辑、可接收多种序列格式(EMBL,
GenBank和FastA)的数据库管理系统、多种方式的序列比较、多种方式的抗原表位预测、基于多种算法的RNA二级结构预测、酶切位点分析及酶切图谱制作、PCR实验辅助设计、辅助寡核苷酸微阵列的探针设计、辅助cDNA微阵列的引物设计和原核系统外源基因高效表达设计等。BioSun系统使用图形用户界面方式,可实现对图形与文本文件的灵活管理,具有操作灵活、功能多样等特点,可用于分子生物学实验辅助设计,对加快实验进程和提高实验的成功率具有较大意义。 |
|
|
| |
|
Samcluster:
An integrated scheme for automatic discovery of sample classes using
gene expression profile |
|
Wuju Li, Ming Fan and Momiao Xiong |
|
Bioinformatics, 2003, vol.19:
811-817 |
| |
Motivation:
Feature (gene) selection can dramatically improve the accuracy
of gene expression profile based sample class prediction. Many
statistical methods for feature (gene) selection such as stepwise
optimization and Monte Carlo simulation have been developed for
tissue sample classification. In contrast to class prediction,
few statistical and computational methods for feature selection
have been applied to clustering algorithms for pattern discovery.
Results: An integrated scheme and corresponding
program SamCluster for automatic discovery of sample classes based
on gene expression profile is presented in this report. The scheme
incorporates the feature selection algorithms based on the calculation
of CV (coefficient of variation) and t-test into hierarchical
clustering and proceeds as follows. At first, the genes with their
CV greater than the pre-specified threshold are selected for cluster
analysis, which results in two putative sample classes. Then,
significantly differentially expressed genes in the two putative
sample classes with p-values 0.01, 0.05, or 0.1 from t-test are
selected for further cluster analysis. The above processes were
iterated until the two stable sample classes were found. Finally,
the consensus sample classes are constructed from the putative
classes that are derived from the different CV thresholds, and
the best putative sample classes that have the minimum distance
between the consensus classes and the putative classes are identified.
To evaluate the performance of the feature selection for cluster
analysis, the proposed scheme was applied to four expression datasets
COLON, LEUKEMIA72, LEUKEMIA38, and OVARIAN. The results show that
there are only 5, 1, 0, and 0 samples that have been misclassified,
respectively. We conclude that the proposed scheme, SamCluster,
is an efficient method for discovery of sample classes using gene
expression profile.
Availability: The related program SamCluster
is available upon request or from the web page
http://www.sph.uth.tmc.edu:8052/hgc/Downloads.asp or
http://www.biosun.com.cn/softwares/samcluater.html
|
|
Full Text
Download:
 |
|
|
|
SARS病毒抗原表位预测 |
|
李伍举.
刘涛. |
|
解放军医学杂志 2003 vol.28(6):S9-S10 |
|
摘要:[目的] 采用集Hopp&Woods亲水性、Janin表面可及性、Karplus-Schulz主链柔软性和电荷分布为一体的综合性抗原表位预测方法和蛋白质二级结构预测对SARS病毒的两个膜蛋白S和M进行抗原表位预测,以便为SARS病毒的疫苗设计提供依据。[结果]通过运用Goldkey等软件分析了SARS病毒的两个膜蛋白S和M的抗原表位,分别获得了14个和7个可能的抗原表位。 |
| |
备注:Goldkey的相关功能已集成至我们最新推出的软件BioSun中。 |
| |
Full Text
Download:
 |
| |
|
|
传染性非典型肺炎可能病原——新冠状病毒的系统发生学分析 |
|
刘涛. 李伍举.
范明. |
|
解放军医学杂志 2003 vol.28(6):S1-S5 |
|
摘要:2003年3月以来,一种新冠状病毒(SARS-CoV)被初步确定为2002年底爆发的致死性传染病——严重急性呼吸综合症(Severe
Acute Respiratory Syndrome,即SARS)的病原。该病毒具有其他已知冠状病毒典型的基因组结构。对该病毒进行系统发生学分析对进一步的实验研究具有指导意义。我们首先通过构建SARS-CoV在全基因组水平上的系统发生树来明确其演化位置,然后分别从核酸和蛋白两个水平分析了SARS-CoV的5个主要同源蛋白——复制酶、S蛋白、E蛋白、M蛋白和N蛋白的系统发生树。结果表明,SARS-CoV与目前已知的冠状病毒同源,但具有与其它冠状病毒明显不同的特点——各同源基因的演化历史彼此不同,其中结构蛋白基因的演化历史与基因组的演化历史不同;SARS-CoV与IBV和TGV尤其是IBV的亲缘关系较近,尤其是在E蛋白和M蛋白两水平上的特殊近缘关系在进一步的实验研究中值得注意和参考。 |
| |
Full Text
Download:
 |
| |
|
|
人NMDA受体主亚基M3-M4环基因片段的高效表达、纯化与鉴定 |
|
张玉梅.
孙长凯.
范明.
李伍举.
刘淑红.
赵杰.
韩大跃.
王嘉玺. |
|
中国生物化学与分子生物学报 2003 vol.19(5):588-593 |
|
摘要:用基因工程方法获得人N甲基D天冬氨酸(N methyl D aspartate, NMDA)受体主亚基M3
M4环靶片段,以此为免疫原,用于进一步免疫原性及相关应用研究.自人脑胶质瘤组织中提取总RNA ,采用RT PCR扩增出人NMDA受体主亚基M3
M4环的基因片段,并按照计算机辅助原核表达载体pBV220中外源基因高效表达的数学模型预测方法,将其进行优化改构.将目的基因克隆到pBV2
2 0中,转化大肠杆菌DH5α,升温诱导表达,从蛋白质水平检测重组体在大肠杆菌中的表达情况,通过制备性SDS PAGE进行纯化,从相对分子质量、免疫反应性、肽质谱指纹分析等方面进行鉴定.结果表明,成功构建了人NMDA受体主亚基M3
M4环的原核表达载体(命名为pBV NR1L3) ,通过基因优化,实现了高效表达.凝胶扫描分析表达量约占菌体总蛋白29% ,重组肽纯度达95%以上。 |
|
|
|
|
Tclass:
Tumor Classification System Based on Gene Expression Profile |
|
Li Wuju and Xiong Momiao |
|
Bioinformatics 2002, vol.18:
325-326 |
|
Summary: A method
that incorporates feature selection into Fisher’s linear discriminant
analysis for gene expression based tumor classification and a corresponding
program Tclass were developed. The proposed method was applied to
a public gene expression data set for colon cancer that consists of
22 normal and 40 tumor colon tissue samples to evaluate its performance
for classification. Preliminary results demonstrated that using only
a subset of genes ranging from 3 to 10 can achieve high classification
accuracy.
Availability: The program is written in Matlab and
is being rewritten in the Java language. The source code is available
upon request. |
|
Full Text
Download:
 |
|
|
|
MProbe:
computer aided probe design for oligonucleotide microarrays |
|
Wuju Li, Jian Huang, Ming Fan, Shengqi
Wang |
|
Applied Bioinformatics 2002:1(3):163-166. |
|
Abstract: The present
work describes a complete probe design software system for oligonucleotide
microarrays based on Kane’s research on probe sensitivity and specificity
(Kane’s rule). Combining Kane’s rule and traditional criteria for
probe design we constructed MProbe, the software system for oligonucleotide
microarrays using Java. The general criteria for probe design are:
(1) probes may have different lengths that range from 20 to 100 bases;
(2) they should have a similar melting temperature (Tm) or GC content;
(3) they should not contain stable secondary structures; and (4) they
abide by Kane’s rule. |
|
|
|
基因表达谱的生物信息学 |
|
李伍举 |
|
军事医学科学院院刊,2002 vol.26(1):73-76. |
| |
摘要:DNA微阵列技术是继DNA重组技术、PCR扩增技术之后的又一重大生物技术。基于微阵列实验,可以同时观察在某一生命现象中成千上万个基因的动态表达水平。与过去的研究模式即单个基因的表达研究相比,分子生物学工作者的观念将由此发生巨大改变,使得人们能够在基因组水平上以系统的、全局的观念去研究生命现象及其本质。目前,微阵列技术已应用到肿瘤分型、肿瘤分类、基因功能研究、基因之间调控网络构建、药物靶位识别等许多方面,但是,从本质上讲,通过微阵列实验所直接获得的是一个基因表达谱(即基因表达矩阵,行表示基因,列表示实验样本),微阵列的实际应用就是通过对基因表达矩阵的生物信息学处理来实现的,因此,在由微阵列技术为基础的分子生物学研究中,生物信息学是其中极其重要的一环,本文就与基因表达谱相关的生物信息学方法作一综述。 |
|
|
|
人N-甲基-D-门冬氨酸受体主亚基受体激活相关多肽的理化特性与抗原性分析 |
|
孙长凯.
赵杰.
李伍举.
冯健男.
刘淑红.等 |
|
中华医学杂志 2002 vol.82(1):50-53 |
|
摘要 目的:分析人N2甲基2D2门冬氨酸受体(NMDAR)主亚基NR1a上两个受体激活相关多肽P1、P2的抗原性及其理化特性。方法:用GOLDKEY软件从蛋白质数据库中调出人NR1a分子的氨基酸序列,分别在其第一、第三跨膜域前后逆向、顺向截取151和144个氨基酸长度的多肽片段P1与P2,选取Hopp&Woods与Kyte亲水性、Janin表面可及性、Karplus2Schulz主链柔韧性及Welling抗原性等参数予以多参数分析,采用Prosite程序与Chou2Fasman方法比较其氨基酸位点与二级结构特征,以此为基础综合判定P1与P2片段的抗原位点并与已有的实验结果相比较。结果
:P1、P2多肽片段上可能分别有6和7个8~15aa长序列具有良好的抗原性。P1相关序列主要分布于其氨基端,与配体结合关键氨基酸残基相距较远。P2上的相关序列分布较均匀,包含有受体激活重要相关位点或与配体结合关键氨基酸残基距离较近。P2片段的总体抗原性、亲水性与可及性均强于P1,尤以其近膜的15个残基为著。P1、P2多肽片段均含有一定数量的β2转角,但P1片段含有较多的半胱氨酸残基和无规卷曲,而P2片段则含有较多的芳香族残基并以α螺旋结构为主。结论:人NMDAR主亚基NR1a上的两个受体激活相关多肽P1、P2均具有一定数量的抗原位点,与P1相比较,P2可能更易成为NMDAR免疫干预的分子靶点。 |
|
|
|
|
Feature (gene) selection in gene expression-based tumor classification |
|
Xiong M, Li W, Zhao J, Jin L, Boerwinkle E. |
|
Mol Genet Metab.
2001 vol.73(3):239-47. |
|
Abstract:
There is increasing interest in changing the emphasis of tumor
classification from morphologic to molecular. Gene expression
profiles may offer more information than morphology and provide an
alternative to morphology-based tumor classification systems. Gene
selection involves a search for gene subsets that are able to
discriminate tumor tissue from normal tissue, and may have either
clear biological interpretation or some implication in the molecular
mechanism of the tumorigenesis. Gene selection is a fundamental issue in gene expression-based tumor
classification. In the formation of a discriminant rule, the number of genes is large relative to the number of tissue samples. Too many genes can harm the
performance of the tumor classification system and increase the cost as well. In this report, we discuss criteria and illustrate techniques for reducing the
number of genes and selecting an optimal (or near optimal) subset of genes from an initial set of genes for tumor classification. The practical advantages of
gene selection over other methods of reducing the dimensionality (e.g., principal components), include its simplicity, future cost savings, and higher
likelihood of being adopted in a clinical setting. We analyze the expression profiles of 2000 genes in 22 normal and 40 colon tumor tissues, 5776 sequences
in 14 human mammary epithelial cells and 13 breast tumors, and 6817 genes in 47 acute lymphoblastic
leukemia and 25 acute myeloid leukemia samples. Through these three
examples, we show that using 2 or 3 genes can achieve more than 90%
accuracy of classification. This result implies that after initial
investigation of tumor classification using microarrays, a small number of selected genes may be used as biomarkers for tumor classification, or may have
some relevance in tumor development and serve as a potential drug target. In this report we also show that stepwise Fisher's linear discriminant function is
a practicable method for gene expression-based tumor classification. |
|
|
|
|
Computational
methods for gene expression-based tumor classification |
|
Xiong M, Jin L, Li W, Boerwinkle E. |
|
Biotechniques.
2000 vol.29(6):1264-8,1270. |
|
Abstract:
Gene expression profiles may offer more or additional information than classic morphologic- and histologic-based
tumor classification systems. Because the number of tissue samples examined is usually much smaller than the number of genes examined, efficient data
reduction and analysis methods are critical. In this report, we propose a principal component and discriminant analysis method of tumor classification using
gene expression profile data. Expression of 2000 genes in 40 tumor and 22 normal colon tissue samples is used to examine the feasibility of gene
expression-based tumor classification systems. Using this method, the percentage of correctly classified normal and tumor tissue was 87.0%. The combined
approach using principal components and discriminant analysis provided superior sensitivity and specificity compared to an approach using simple differences
in the expression levels of individual genes. |
|
|
| |
|
GeneDn:
for high-level expression design of heterologous genes in a prokaryotic
system |
|
Li Wu Ju, Lei Hong Xing, Pei Wu Hong
and Wu Jia Jin |
|
Bioinformatics 1998, vol.14:
884-885. |
|
RESULTS: Based on
the mathematical model of high-level expression of heterologous genes
in prokaryotic vector pBV220, we developed a program GeneDn for high-level
expression design of natural and synthetic genes. AVAILIBILITY: The
program is written in Turbo Pascal 7.0. The source code and related
material are available upon request. |
|
Full Text
Download:
 |
|
|
|
Prediction
of RNA secondary structure based on helical regions distribution |
|
Li Wuju and Wu Jiajin |
|
Bioinformatics 1998, vol.14:
700-706. |
|
MOTIVATION: RNAs
play an important role in many biological processes and knowing their
structure is important in understanding their function. Due to difficulties
in the experimental determination of RNA secondary structure, the
methods of theoretical prediction for known sequences are often used.
Although many different algorithms for such predictions have been
developed, this problem has not yet been solved. It is thus necessary
to develop new methods for predicting RNA secondary structure. The
most-used at present is Zuker's algorithm which can be used to determine
the minimum free energy secondary structure. However many RNA secondary
structures verified by experiments are not consistent with the minimum
free energy secondary structures. In order to solve this problem,
a method used to search a group of secondary structures whose free
energy is close to the global minimum free energy was developed by
Zuker in 1989. When considering a group of secondary structures, if
there is no experimental data, we cannot tell which one is better
than the others. This case also occurs in combinatorial and heuristic
methods. These two kinds of methods have several weaknesses. Here
we show how the central limit theorem can be used to solve these problems.
RESULTS: An algorithm for predicting RNA secondary structure based
on helical regions distribution is presented, which can be used to
find the most probable secondary structure for a given RNA sequence.
It consists of three steps. First, list all possible helical regions.
Second, according to central limit theorem, estimate the occurrence
probability of every helical region based on the Monte Carlo simulation.
Third, add the helical region with the biggest probability to the
current structure and eliminate the helical regions incompatible with
the current structure. The above processes can be repeated until no
more helical regions can be added. Take the current structure as the
final RNA secondary structure. In order to demonstrate the confidence
of the program, a test on three RNA sequences: tRNAPhe, Pre-tRNATyr,
and Tetrahymena ribosomal RNA intervening sequence, is performed.
AVAILABILITY: The program is written in Turbo Pascal 7.0. The source
code is available upon request. |
|
Full Text
Download:
 |
|
|
|
|
pBV220载体中外源基因表达水平定量分析 |
|
|
李伍举,吴加金 |
|
|
病毒学报,1997,vol.13: 126-133. |
|
|
摘要: 运用基于螺旋区随机堆积的RNA二级结构预测与密码子偏性计算等序列分析技术,分析了pBV220载体中携带的人白细胞介素2、人白细胞介素4等22个外源基因的表达水平。结果表明:5'端-30~39区域和3'端30~-39区域的二级结构自由能与表达水平具有显著的统计学意义;其次是3'端9bp的局部密码子偏性,SD序列与起始密码子ATG之间碱基数在8±3范围内与表达水平无显著关系。另外,运用判别分析方法构建了判别函数,判别符合率高达95.5%。 |
|
|