1.什么是SNP和SSLP?
SNP:即单核苷酸多态性,是由于基因组中等位位点上单个核苷酸改变而导致的核酸序列多态性(Polymorphism)。
SSLP:简单序列长度多态性,是一系列不同长度的重复序列,包括卫星DNA,小卫星,微卫星(STR)。
2.知识整理:
一.基因组介绍
1,Gene: A DNA segment containing biological information and hence coding for an RNA and/or polypeptide molecule.
Genome: The entire genetic complement of a living organism.
n Prokaryocyte
n Eukaryocyte: nuclear genome + organelle(chloroplast, mitochondrion) genome
2,Transcriptome: Coding RNA; the product of genome expression
3,Proteome: The proteome comprises all the proteins present in a cell at a particular time.
The proteome means all the proteins being made by the transcriptome
4,基因组学的发展和研究现状
二 基因组作图
绘制遗传图谱的实验基础是什么?即连锁分析。
1,基因组做图的目的:利用鸟枪法测定含有重复序列的DNA大分子方面存在困难:①利用鸟枪法需要将DNA打成片段,进行测序后再进行拼接;这对于较大的基因组尤其是人的基因组来说是困难的,因为随着片段数的增加,所需要分析的数据越来越复杂;②鸟枪法存在的第二个问题是当分析基因组的重复区域时会发生错误,导致部分重复区域被遗遗漏或是将同一染色体或是不同染色体的两个片段错误的连接在一起。总而言之,在测序时需要首先建立一个图谱,通过标明基因和其他显著特征的位置,为测序提供引导。
2,基因组做图的类型:遗传图谱和物理图谱
3,遗传图谱的含义:应用遗传学技术构建的能在基因组上显示基因和其他序列特征位置的图谱。遗传学技术包括杂交育种技术实验。连锁分析是遗传做图的基础。
4,物理图谱的含义:
5,遗传图谱与物理图谱的比较:
遗传作图(Genetic mapping)也称连锁图谱(linkage map)
作图方法:“连锁分析(linkage analysis)”包括杂交实验(cross-breeding experiments),家系(pedigrees)分析等。根据遗传实验计算标记间的相对距离。
标记:性状、基因或DNA分子标记。
图距单位:厘摩(centi-Morgan, cM), 每单位厘摩定义为1%交换率。
物理作图(Physical mapping)
作图方法:采用分子生物学技术测定标记间的绝对距离,直接将DNA分子标记、基因或克隆标定在基因组实际位置。
图距单位:物理图的距离依作图方法而异,如辐射杂种(radiation hybrid)作图的计算单位为厘镭(cR), 限制性片段作图与克隆作图的图距为DNA的分子长度,即碱基对(base pair)。
6,用于遗传学做图的DNA标记:
① 限制片段长度多态性(RFLP):利用Southern 杂交和PCR方法;
② 简单序列长度多态性(SSLP):包括卫星DNA,小卫星DNA和微卫星DNA(又称为STR)。
微卫星比小卫星更适宜做标记,一是因为小卫星不是均匀分布,长分布在染色体末端的端粒区;二是因为PCR方法更适宜于对微卫星DNA的分型,微卫星的多态性更高。
常用PCR技术结合毛细电泳技术;
③ 单核苷酸多态性(SNP):最紧密的DNA标记。多数SNP是双等位基因。研究SNP多用寡核苷酸杂交分析。筛选策略有:DNA芯片和液相杂交技术(荧光淬灭技术)。
7,卫星DNA: (satellite DNA) 是一类高度重复序列。DNA在介质氯化铯中作密度梯度离心,离心速度可以高达每分钟几万转;此时DNA分子将按其大小分布在离心管内不同密度的氯化铯介质中,小的分子处于上层,大的分子处于下层;从离心管外看,不同层面的DNA形成了不同的条带。根据荧光强度的分析,可以看到在一条主带以外还有一个或多个小的卫星带。这些在卫星带中的DNA即被称为卫星DNA,这种DNA的GC含量一般少于主带中的DNA,浮力密度也低。
小卫星DNA:(minisatellite),有时又称可变串连重复(variable number of tandem repeats, VNTR),其重复单位的长度为数十个核苷酸,常位于端粒和近端粒区。
微卫星DNA(microsatellite)或简单串联重复(simple tandem repeats, STR ),其重复单位为1-4个核苷酸,由10-50个重复单位串联组成,散布在整个基因组。
8,不同模式生物的连锁分析:
对果蝇和小鼠等物种:通过有计划的育种试验;
对人类,通过家系分析;
对不发生简述分裂的细菌的连锁分析:结合,转导和转化。
9, Deficiencies of genetic maps:
l Limited resolution(分辨率)
l Limited accuracy (精确度)
Recombination hot spot(重组热点)
Exchange frequency differences between genders(性别差异)
Numerous exchanges between two locus(两位置之间多次改变)
三 物理图谱
问题:限制性做图与RFLP有什么区别?FISH在物理图谱中起什么作用?
限制性作图是物理作图法,可以得到两酶切位点之间的物理间隔距离(kb);RFLP是一种个体基因组中的多态性标记,由酶切位点碱基变异引起的酶切长度多态性,由连锁分析这些多态性标记在亲代和子代间的重组频率来得到RFLPs之间的遗传图距。FISH是将荧光标记的DNA片段通过杂交定位到染色体上,观察不同DNA片段在染色体上的位置和物理距离。
1,物理图谱的含义: Physical maps - identify exact location of DNA sequence in the genome
2,物理做图的原理(种类):
Principles for physical mapping (p88):
l The earliest physical map—— cytogenetic map (10Mb)
l Restriction mapping——restriction map (Kb) 其规模受限于限制片段的大小;方法有电泳和光学作图(包括凝胶拉伸和分子梳理);
l STS mapping任何一个唯一的DNA序列均可以作为STS。
获得STS 的方法,有表达序列标签(EST)、SSLP和随机基因组序列
• Clone-based mapping --------可用细胞流速仪进行检测
• RH (Radiation hybrid)——辐射杂种 (1Mb)
l FISH (Fluorescent in situ hybridization)——荧光原位杂交
3,物理图谱中大片段的克隆载体:
l Plasmid (质粒) 10kb
l λ噬菌体 15kb
l 粘粒Cosmid 50kb
l P1噬菌体 可达125kb
l PAC(P1人工染色体) 可达300kb
l YAC(酵母人工染色体) 200~2000kb 如含1Mb插入片段的32,000个克隆的人基因组YAC库。
l BAC (细菌人工染色体) 100~300kb 如含300kb插入片段的30万个克隆,覆盖人基因组30倍。
BAC是HGP通用的标准大片段克隆载体。
4,限制性做图的方法:提取DNA——稀有的限制性内切酶切割——DNA片段分离鉴定
(光学方法有凝胶伸展和分子梳理技术)
5,荧光原位杂交:(P94)
The position at which the probe hybridizes to the chromosomal DNA is visualized by detecting the fluorescent signal emitted by the labeled DNA.
Flow of FISH:
①Probe:
l ~100kb (from BAC clone of human genome)
l be tagged directly with fluorophores, with targets for antibodies or with biotin (By nick translation or PCR using tagged nucleotides).
②Interphase(间期) or metaphase(中期) chromosome attached to glass
③Blocking the repetitive DNA
④Hybridizing
⑤Detection by fluorescent microscope
Development of FISH:
Ø radioactively labeled in situ hybridization
l sensitivity
l Resolution
Ø Fluorescence In Situ Hybridization
l repetitive DNA sequences
l Mechanically stretched chromosomes (resolution reaches 200~300 kb)
l Non-metaphase chromosomes (Resolution down to 25 kb)
Application of FISH:
Ø Medical application
l Discover cytogenetic variation: deletion, translocation on chromosomes.
l Detection pathogen from the samples of patient's tissue.
Ø Academic research
l Genome mapping
l Genome comparison
6,STS序列标签做图的原理:
STS are short sequences that are operationally unique in the genome and are used to generate mapping reagents.
Principle for STS mapping (p96):
Collection of overlapping DNA fragments; Checking for the breaking frequency of two STSs
The most common sources of STSs:(P98)
Ø ESTs (expressed sequence tags)
Ø SSLP
Ø Random genomic sequeces
7,放射杂交做图(RH):Radiation hybrid (RH) map: A genome map in which STSs are positioned relative to one another on the basis of the frequency with which they are separated by radiation-induced breaks. The frequency is assayed by analysing a panel of human–hamster hybrid cell lines, each produced by lethally irradiating human cells and fusing them with recipient hamster cells such that each carries a collection of human chromosomal fragments. The unit of distance is centirays (cR), denoting a 1% chance of a break occuring between two loci. (p98)
辐射杂交制图流程:
辐射杂种细胞系(嵌板,panel)产生 →确定STSs →PCR 体系及反应条件→对PCR结果数据处理→构建RH 图谱
作图单位:厘镭(CentiRay)——DNA分子暴露在N拉德(rad)X射线剂量下两个分子标记之间发生1%断裂的频率。
8, 克隆文库与辐射杂种细胞系作为STS作图试剂的比较:
克隆文库 |
辐射杂种细胞系 |
|
外源片段含量 |
1段 |
多段 |
文库所需克隆数 |
多 |
少 |
可否直接测序或构建克隆重叠群 |
可 |
不可 |
四 基因组做图和数据挖掘
1,基因组做图的策略:
Ø 重叠群法(clone contigs method)——up to down
Ø 鸟枪法(whole-genome shotgun method)——bottom to up
2,全基因组鸟枪法测序使用物种:小基因组,包括原核生物,病毒等;
限制因素:
3,基因组测序的难点:①Repeats:Tandem repests;Genome-wide repeats; ②Gaps
4, DNA测序方法学:
①Chain termination DNA sequencing (Sanger et al, 1977): the sequence of a single-stranded DNA molecule is determined by enzymatic synthesis of complementary polynucleotide chains, these chains terminating at specific nucleotide positions;聚丙烯酰胺凝胶电泳检测
②Chemical degradation method (Maxam and Gilbert, 1977): the sequence of a double-stranded DNA molecule is determined by treatment with chemicals that cut the molecule at specific nucleotide positions. 聚丙烯酰胺凝胶电泳检测
③焦磷酸测序:可以用来快速去顶很短的序列;无需电泳
5, 连续DNA序列的组装:
① 通过全基因组鸟枪法拼接序列:
优点是测序速度快,能够在遗传或是物理图谱不存在的情况下工作;(主要特征为:最少利用了两个不同类型载体构建的克隆文库;确保其中一个克隆文库中所包含的片段长于所研究基因组中最长的重复序列)
② 用克隆重叠群法组装序列:
可以通过染色体步查方法建立克隆重叠群(该法费时费力);另一种方法是使用克隆指纹图谱技术:限制性图谱;重复DNA指纹图谱;重复DNA的PCR;STS含量做图
6, Clone contigs: A collection of clones whose DNA fragments overlap.
How to sorting clone contigs:
a) Chromosome walking: A technique that can be used to construct a clone contig by identifying overlapping fragments of cloned DNA.
b) Clone fingerprinting: Any one of several techniques that compare cloned DNA fragments in order to identify ones that overlap.
7, A scaffold(骨架) is a portion of the genome sequence reconstructed from end-sequenced whole-genome shotgun clones. Scaffolds are composed of contigs and gaps.
A contig (克隆重叠群)is a contiguous length of genomic sequence in which the order of bases is known to a high confidence level.
8, 序列间隙和物理间隙:P118;如何填补
9,大规模自动测序方法的改进:
Ø thermal cycle sequencing (热循环测序)
Ø Fluorescent primers are the basis of automated sequence reading
Ø Capillary Electrophoresis (毛细管电泳, CE) instead of Polyacrylamide Gel Electrophoresis (聚丙烯酰胺凝胶电泳, PAGE):
新的非常规测序方法:
Ø Pyrosequencing (p115)焦磷酸测序方法
l Sequencing-By-Synthesis
l ultra high throughput sequencing
原理:第一步——测序引物和PCR扩增的、单链的DNA模板杂交,与酶—DNA聚合酶(DNA polymerase)、ATP硫酸化酶(ATP sulfurylase)、荧光素酶(luciferase)、三磷酸腺苷双磷酸酶(apyrase)—和底物—adenosine 5´ phosphosulfate (APS)、荧光素(luciferin)孵育。
第二步——四种dNTP(dATPS,dTTP,dCTP,dGTP)之一被加入反应体系,如与模扳配对(A—T,C—G),此dNTP与引物的末端形成共价键,dNTP的焦磷酸基团(PPi)释放出来。
第三步——ATP硫酸化酶在APS存在的情况下催化焦磷酸形成ATP,ATP驱动荧光素酶介导的荧光素向氧化荧光素(oxyluciferin)的转化,氧化荧光素发出与ATP量成正比的可见光信号。
第四步——ATP和未掺入的dNTP由三磷酸腺苷双磷酸酶降解,淬灭光信号,并再生反应体系。
第五步——然后加入下一种dNTP。
Ø DNA chip——based on DNA hybridization
Ø Solexa and GS20
五. 不同模式生物的基因组介绍:
【1】微生物基因组介绍
1.1 病毒基因组介绍
1,病毒种类:真病毒,朊病毒和亚病毒(拟病毒和类病毒);
2, 病毒起源假说:
Ø 逆向假说:病毒可能曾经是一些寄生在较大细胞内的小细胞。随着时间的推移,那些在寄生生活中非必需的基因逐渐丢失。
Ø 细胞起源假说:一些病毒可能是从较大生物体的基因中“逃离”出来的DNA或RNA进化而来的。
Ø 共进化假说:病毒可能进化自蛋白质和核酸复合物,与细胞同时出现在远古地球,并且一直依赖细胞生命生存至今。
3, 病毒基因组的多样性:
Nucleic acid |
DNA |
RNA (自己携带RNA复制酶) |
|
Shape of Genome |
linear |
circular |
|
segmented |
|
Chain of Nucleic acid |
Double strands |
Single strand |
|
Partial double strands |
|
Polarity |
Sense (+): can be translated or transcribed directly by host cell |
Antisense (−) |
|
Double sense (+/−) |
4,病毒基因组的特点:
Ø One kind of nucleic acid—DNA (commonly double strand) or RNA (commonly single strand)
Ø The size of virus genomes varies greatly (3×103编码4个蛋白质~1.2 × 106bp编码100个蛋白质). Genome of dsDNA virus is generally bigger than that of RNA virus.
Ø Overlapping gene
Ø Generally genes of virus have single copy.
Ø Most of the genome are coding sequences. Genome of phage is continuous, while genome of eukaryotic virus is discontinuous (gene with intron).
DNA病毒的基因组特点:
Ø Size:
l dsDNA genome (herpesvirus, poxvirus etc.) is bigger (120~280kb);
l ssDNA genome (parvovirus) is smaller (5kb);
Ø High coding efficiency in small DNA virus:
l Overlapping gene
l Utilize both strands for coding.
Ø Inverse terminal repeat (ITR) in genome of DNA virus is important for replication initiation by formation of hairpin.
Ø End replication problem for linear DNA genome:
l Terminal protein priming (adenovirus)
l site-specific nicking priming (poxvirus)
RNA病毒的基因组特点:
Ø dsRNA genome is segmented (such as reovirus).
Ø Many positive ssRNA viruses (for example SARS coronavirus) genomes have 5’-Cap and 3’-Poly(A).
Ø Most ssRNA genome is a single molecule, but there still have some exceptions (for example influenza virus).
Ø Overlapping gene, variable splicing, frameshift are common in RNA viruses.
Overlapping genes (OGs) are defined as adjacent genes whose coding sequences partially or entirely overlap. Many OGs have been identified in the genomes of prokaryotes, viruses, and mitochondria. Overlapping gene pairs can be divided into three types: unidirectional, convergent, and divergent .
5,病毒基因组复制中的问题与多样性对策:
Ø 复制模型
①circle dsDNA virus: θreplication;σ replication
②Circular ssDNA virus
③Linear dsDNA virus
Ø 引物与引发
Ø 5‘末端的完整性
6,Variation of virus genome:
Ø Genetic drift: SNP
Ø Antigenic shift
Ø Rearrangement of segmented genome
Ø Genetic recombination
7,Diversity of expression strategies
Ø Timing control—病毒感染的级联调节
Ø Protein biosynthesis of eukaryotic RNA virus
l segmented gene
l Splicing and assembly of peptide
l IRES序列:细小核糖核酸病毒基因组存在内部核糖体进入位点(Inner Ribosomal Enter Site)
l Nesting subgenome RNAs
l Discontinuous mRNA
1.2 原核生物基因组介绍
1.基因组特点:
①Size: generally less than 5Mb; but also have exceptions, e.g. 30Mb for Bacillus megatherium (巨大芽孢杆菌).
②Most prokaryotic genome is circle, but some is linear.
③Compact genome organizaiton. Less non-coding sequences. Both strands have coding sequences.
④Operon is the representative structure of prokaryotic genome.
⑤Structure gene is generally single copy, but there are also some exceptions e.g. rrn coding for rRNA.
⑥The genome of E. coli is replicated bidirectionally from a single origin, identified as the genetic locus oriC.
⑦Lateral gene transfer(基因横向转移):Transfer of a gene from one species to another
2,大肠杆菌基因组物理结构:环状;基因可以双向复制;含有操纵子;基因连续,不含有内含子;
3,最小基因组:至少需要265~350个基因。至少包含能维持生命活动所必须的功能基因和调控基因,以及繁殖所用的基因。
4,原核基因组序列的破译使得菌种分类的概念变得更复杂了:因为原核生物间能通过多种方法进行基因交换,但根据其生物化学和生理学特性,这些原核生物属于不同的物种。基因流是物种概念的核心,但并不适于原核生物。单个物种的不同品系可以有完全不同的基因组序列,甚至有个别品系特异性的基因。
【2】真核生物基因组介绍
2.1 核基因组
1, 核基因组的特点:
1. Size:变化范围大,107~1011bp
2. Ploid level (倍数性):generally diploid (二倍体)
3. Each eukaryotic chromosome contains many replicons
4. monocistronic mRNA
5. repeat sequences
6. Discontinuous gene: exon, intron, splicing and alternative splicing
7. Non-coding region:90%
8. Gene density: gene dessert vs. gene island
9. Rare overlapping gene, gene within gene
2,假基因Pseudogene:
Ø 已经失活的无功能的基因拷贝,常用ψ表示。
Ø 类型及形成的原因:
①常规假基因(conventional pseudogene):DNA复制和突变引起,常位于同源基因有功能拷贝的附近。
a,无意突变:基因内部出现终止密码子;b, 启动子突变失活;c, 剪接信号缺陷;d, 偶尔也可能通过一个有利突变而激活
②加工的假基因(processed pseudogene):功能基因的mRNA经过逆转录产生cDNA插入基因组形成。
a,无内含子。
b,无启动子。来源于RNA聚合酶III转录物的假基因除外,因为它们的启动子位于mRNA序列内部,如Alu序列。
3, 为什么染色体带型和等容线模型暗示了基因并非平均分布于真核生物染色体上?P207
4,真核生物中的重复序列:P219
5,串联重复序列:
Type |
Length of the repeat unit |
Length of the cluster |
Location and role |
||
Satellite |
<5bp ~ >200bp |
~Mb |
centromere |
||
Minisatellite |
<25bp |
<20kb |
telomere |
||
Microsatellite |
1~4bp, <13bp |
<150bp |
Whole genome -wide |
形成原因:Replication slippage复制滑移;Accumulation of mutations(突变累积) in saltatory replications(跳跃复制)
6,DNA转座的两种机制:Replicative transoposition;Nonreplicative transoposition
7,DNA transposons of prokaryotes:
Ø 插入序列(inserted sequence)
Ø 复合转座子(composite transposons) 在DNA转座子的两端有一对IS成分,内含1个或多个基因,常为抗生素抗性基因。复合转座子借助其它IS转座酶以保守方式转座。
Ø Tn3-型转座子(Tn-type transposons) 具有自己的转座酶基因,无须IS顺序转座,Tn3因子为复制型转座。
Ø 可转座的噬菌体(transposable phage) 这是一类细菌病毒,复制转座是其正常生活史中一个内容。插入后可以切离。
8,LTR元件:
Ø Retrovirus
Ø Endogenous retroviruses( ERVs,内源逆转录病毒)are retroviral genomes integrated into vertebrate chromosomes. Some are still active, but most are decayed relics.
Ø Retrotransposons(逆转录转座子)have sequences similar to ERVs but are features of non-vertebrate eukaryotic genomes
9,LTR元件的形成机制:
Ø LTR (long terminal repeats) contains transcriptional promoter and enhancer sequences: U3(含强启动子)-R(正向重复序列)-U5(与转录终止和加polyA有关)
Ø Formation of cDNA with directed LTR during retrotransposition
Ø 4-nt direct repeat formed in the integration site in genome
10,Retroposons(逆转座子,返座元)
Ø LINE (long interspersed nuclear elements,长分散核因子): contains reverse transcriptase.
Example of LINE in human genome—L1
Ø SINE(short interspersed nuclear element,短分散核因子): its transposition depends on reverse transcriptase provided by other autonomous retroelements.
Example of SINE in human genome—Alu:
11,Transposition mechanisms of LINE and SINE:
Ø LINE:LINE with full length contains DNA endonuclease and reverse transcriptase gene.
l 通过切开靶位点双链,提供了引物末端。
l 反转录转座子作为模板合成cDNA
Ø SINE:transposed by “borrowing” enzymes from other autonomous retroelements
12. C值悖论:在大的真核生物基因组中、有较多的重复序列、更多的间接序列和更大的基因;(p211)
13.CpG island:
Ø CpG islands are stretches (>200bp) of unmethylated DNA with a higher frequency of CpG dinucleotides (>50%) when compared with the entire genome.
Ø most housekeeping genes have CpG islands at the 5' end of the transcript.
Ø Estimated over 30000 CpG island in human genome.
Ø CpG island methylation is correlated with gene inactivation and has been shown to be important during gene imprinting and tissue-specific gene expression
2.2. organelle genomes器官基因组
1.物理特性:
Ø Organelle genome is usually circular, but there is a great deal of variability in different organisms.
Ø Copy number:
l 人类:800×10=8000
l 酵母:65×100=6500
Ø Mitochondrial genome sizes are variable and are unrelated to the complexity of the organism
2.两类线粒体基因组的特点:
Ø 人类
Ø 基因组较小(16.6kb)
Ø 结构紧凑,间隔序列很少,
Ø 含有个别重叠基因
3.叶绿体基因组的特点
Ø 大小:物种间变化不大,组成相似,大小相近(100~200kb),包含约200个基因,如rRNA、tRNA、核糖体蛋白质基因、光合作用有关基因。
Ø 数目
l 绿藻中约1000个拷贝
l 高等植物中每个细胞约200个拷贝
Ø 特征:有两段较大的反向重复序列(IR区 ),编码rRNA,可以防止分子内重组,保持稳定的组成。
4,The origins of organelle genomes
Ø endosymbiont theory (内共生假说 )
Ø Animation
六 基因获取和功能研究
1,基因表达受那些环节的调控:
2,ESTs(Expressed Sequence tags )是从已建好的cDNA库中随机取出一个克隆,从5’末端或3’末端对插入的cDNA片段进行一轮单向自动测序,所获得的约60-500bp的一段cDNA序列。
3,Transcription map的含义:
Ø Marker: EST and complementary DNA
Ø Total transcribed sequences is less than 3% of whole genome. Most sequences including most repeats, introns, pseudogenes and intergenic spaces don’t express.
Ø Static vs. dynamic transcription map.
Ø Transcription map is the bridge between structural genome and functional genome.
Ø Disadvantages: the function of regulation sequences cannot be discovered by cDNA.
4,转录图谱的制作方法(Flow chart of large scale EST sequencing):
1)
cDNA文库的构建
2) 随机单轮测序
3) 文库与序列质量检验
4) 聚类和重叠群分析
5) ORF的寻找
6) 功能分类和注释 (Gene Ontology)
7) 表达谱分析
8) 可变剪接分析
5,转录图谱的意义(Significance of ESTs research):
Ø Construction of gene map (gene expression profile)
Ø Separation and identification of new gene
Ø Comparative analysis of gene expression
Ø Discovery of new SNP
Ø e-hybridization and e-PCR
Ø Alternative splicing
6, 5’和3’EST的特点:
Ø 5’-EST:
l Short 5’UTR (~300bp), high conservation in coding region and convenient for searching ORF and new gene
l More regulation information
l Convenient for clustering and assembling of ESTs
Ø 3’-EST
l 20~200bp poly(A) tail in mRNA is convenient for the synthesis of the first cDNA chain using oligo(dT) primer
l 3‘UTR has long specific non-coding sequences (~770bp in average) with low conservation
l 10% mRNAs have repeats in 3’-end which can be SSR marker;
l High specificity between organisms and high polymorphism between individuals
7,从EST获得全长cDNA的方法(RACE的原理):From EST to full length cDNA(P194)
Rapid amplification of cDNA ends: a PCR-based technique for mapping the end of a mRNA molecule.
Ø 3‘-RACE for 5’-EST(P145)
Ø 5’-RACE for 3’-EST
8,基因表达差异研究方法:
Large scale analysis of gene expression differences
l SSH(Suppression subtractive hybridization,抑制性减法杂交技术) 流程要求掌握!
l cDNA microarray (p172)
l SAGE (serial analysis of gene expression,基因表达系列分析 ) (P171)
9,From tradition to large scale techniques:
Based on hybridization: gene chip/ cDNA microarray
SSH + cDNA microarray
Based on direct sequencing of small fragment of representative cDNA: SAGE (Serial analysis of gene expression )
EST |
SSH |
SAGE |
Microarray |
|
大规模测序 |
是 |
是 |
是 |
是 |
原理 |
SSH是差减杂交与PCR结合的简单、快速分离差异基因的方法。其运用: 1,杂交动力学原理,即丰度高的单链DNA在退火时产生同源杂交的速度快于丰度低的单链DNA,从而使不同丰度的单链DNA得到均衡; 2,抑制PCR则利用链内退火优于链间退火的优点,使非目的基因片段两端反向重复序列在退火时产生类似发卡的互补结构, 无法作为模板与引物配对,选择性地抑制了非目的基因片段的扩增,从而使目的基因得到富集、分离. |
来自转录物内特定位置的一小段寡核苷酸序列(9-11个bp)含有鉴定一个转录物特异性的足够信息,可以作为区别转录物的标签(tag); 通过简单的方法将这些标签串联在一起,形成大量多联体(concatemer),对每个克隆到载体的多联体进行测序并应用SAGE软件分析,可确定表达的基因种类,并可根据标签出现的频率确定基因的表达丰度(abundance),还可发现新基因。 |
||
方法 |
1,提取实验组和对照组mRNA合成双链cDNA,经识别4碱基的限制性内切酶切割 2,实验组cDNA平均分为2份,分别连接2个接头 3,进行2轮差减杂交和抑制性PCR 4,获得富集的目的基因 |
|||
优点 |
采用两次差减杂交和两次PCR,保证了高特异性 (假阳性率可降至6%); 在杂交过程中可使不同丰度基因均衡化,从而获得低丰度差异表达基因;操作相对简便,是目前分离新基因的主要方法。 |
|||
缺点 |
起始材料需要mg级量mRNA; SSH差减克隆片段较小,获取cDNA全长序列有一定难度。 |
标签扩增、连接的效率存在差别; cDNA链未能合成到AE酶切位点;标签序列如果为保守序列则无法判断它代表何种基因;标签出现频率与丰度比例并不完全相同。 |
||
发现新基因 |
是 |
是 |
是 |
否 |
有无序列 |
有 |
有 |
无 |
无 |
SAGE的流程:①用生物素酰化的Oligo-dT引导合成cDNA第一链,再合成双链cDNA。用专门识别4 bp碱基的锚定酶(anchoring enzyme, AE),如NIaⅢ (识别位点为CATG) 消化合成的双链cDNA释放5’序列,而生物素酰化的3’端仍被吸附在链(霉)亲和素蛋白磁珠(streptavidin-coatedbeads)上;
②分离与磁珠结合的具3‘端poly(A)尾巴的cDNA片段,与标签酶(tagging enzyme, TE, 含有ⅡS类限制酶位点)的接头(A和B)连接,酶切位点一般位于识别位点下游约20 bp处,再用锚定酶(anchoring enzyme, AE ,如NIaIII酶)处理样品,释放带有接头的SAGE标签;
③带有接头的SAGE标签经DNA聚合酶 (Klenow)补平后,由连接酶产生带有两个接头的双标签(ditag),对双标签PCR扩增后,再用锚定酶消化,得到了尾尾相连的SAGE双标签,双标签的两端含有锚定酶的酶切位点;
④去除接头的SAGE双标签彼此连接形成长短不一的多联体,电泳分离后收集大小适中的片段克隆到高拷贝的质粒载体,由此组成SAGE库(SAGE library)。
10,基因芯片:固相载体上的寡核苷酸阵列
原理:
方法:Spotted Microarrays
In Situ Oligo Synthesis
Microfluidics
Integrated Chips
应用:基因表达分析; SNP检测分析;筛选/鉴定特殊序列等
问题:基因芯片与Microarray是一样的吗?有什么区别?
广义的基因芯片泛指寡核苷酸的微阵列,而狭义的基因芯片指原位合成的寡核苷酸的微阵列,主要用于检测SNP;而microarray一般指cDNA的微阵列,由点样制成,用于检测基因表达谱。
七 Gene Cloning and Function Research
1,克隆目的基因的策略及代表方法:
①Functional cloning:Using information about the function of a known protein that could be involved in a genetic disease. This approach has very limited application.
②Phenotype cloning: Large scale mutation by transposon tagging (转座子标签法)
Gene expression differences analysis
③Positional cloning:Using only information about the gene's approximate chromosomal location obtained from gene mapping
④Positional candidate cloning:Using information from map position and the gene's possible function, homology, and expression pattern. This approach has been quite successful and will dominate other strategies.
2, What is positional cloning?
v The core problem for positional cloning—gene localization.
v Expression of gene’s position on chromosome:
§ Cytogenetic location—describe the rough position on chromosome.
§ Molecular location—A gene’s molecular address pinpoints the location of that gene in terms of base pairs.
v Methods to localization gene
Cytogenetic analysis(细胞遗传学分析);Genome scan using molecular markers
定位克隆的流程:
4, 如何对基因定位:
v Cytogenetic abnormality
v Genome scanning (全基因组扫描) : Looking for the markers closest to the disease gene. (采用DNA分子多态性标记,以较大间距在大量样本、家系或同胞对中进行全基因组扫描,通过连锁分析或关联分析将相关基因定位到某些染色体区域;在这些区域再选择高密度的遗传标记,做精细分析,进一步缩小定位区域;查找该定位区域内的所有基因,从中选择可能的候选基因进行基因变异检测。)
3 factors in gene location by genome scan:
v Sample
v Genetic DNA polymorphism markers
§ RFLP markers
§ Microsatellite or STR markers
§ Single nucleotide polymorphism (SNP)
v Statistic methods
§ Linkage analysis (连锁分析)
§ Association analysis (关联分析)
5, 连锁不平衡(linkage disequilibrium): Linkage disequilibrium is a term used in the study of population genetics for the non-random association of alleles at two or more loci, not necessarily on the same chromosome.
v P(disease & M) ≠ P(disease) x P(M)
连锁分析(linkage analysis):利用家系遗传信息中的重组率计算两位点之间的染色体图距。根据疾病有无合适的遗传模式,可分别进行参数分析和非参数分析。
参数分析:需要设定遗传模式,基因频率和外显率,计算优势对数分数(LOD)值。可高效发现疾病基因的连锁标记,但如果模型设定错误,可能导致结论错误。主要适用于已知遗传模式的单基因遗传病基因定位。
非参数分析:对患病家系中的成对患病成员,比较其基因组同一座位上获得来自共同祖先的同一等位基因的频率,如果与孟德尔独立分离预期频率差异显著,则认为该等位标记与致病基因之间存在连锁不平衡。可适用于多基因疾病,可发现多个连锁不平衡位点,但不能得到其与疾病基因之间的图距。
6, 全基因组关联分析:基于观察标记位点等位基因和疾病基因之间的是否存在连锁不平衡(linkage disequilibrium, LD)的分析法。标记位点与致病基因之间越近、突变率越低、杂合率越高,用标记检出致病基因位点的几率就越高。
7,连锁分析与关联分析的区别:
v 关联分析通过比较样本间标记位点等位基因频率与疾病相关基因频率的相关性来判断他们之间连锁不平衡现象存在与否以及相关性强弱。
v 连锁分析通过检测家系中等位基因与疾病基因的遗传特性来判断是否他们之间是否连锁以及连锁程度(图距)。
8,Lod值:
v Lod得分是在一定重组率下两个位点相连锁的似然性与两个位点不连锁的似然性比值的对数值 L(θ<0.5)
v
Lod Score=log10
L(θ=0.5)
v Statistical Significance of the Lod Score:
§ lod score > 3: evidence of linkage
§ 2 < lod score < 3: suggestive evidence of linkage
§ -2 < lod score < 2: uninformative of linkage
§ lod score < -2: exclusion of linkage
9,Transcript identification (p188~195)
ORF scanningscan—computer methods;
Hybridization test;——Northern blotting; Zoo blotting
cDNA sequencing(P192);
Exon trapping(P195);
10, Two types of homologous sequences: Paralogs(旁系同源) and orthologs(直系同源)
Homologous genes are ones that share a common evolutionary ancestor, revealed by sequence similarities between the genes.
Orthologous genes are those homologs that are present in different organisms and whose common ancestor predates the split between the species.
Paralogous genes are present in the same organism, often members of a recognized multigene family, their common ancestor possibly or possibly not predating the species in which the genes are now found.
11, 通过实验研究验证基因的功能有哪些策略及方法:
§ Expression patterns
RNA expression assayed by Northern blot or PCR amplification of cDNA with primers specific to candidate transcript
• Look for misexpression (no expression, underexpression, overexpression)
§ Sequence differences
• Missense mutations identified by sequencing coding region of candidate gene from normal and abnormal individuals
§ Artificial interfering the expression of the gene*
12, Assigning gene function by experimental analysis:
Gene knock-out;RNAi;Gene trap;Gene over expression
13,RNAi是指在生物体细胞内,dsRNA引起同源mRNA的特异性降解,因而抑制相应基因表达的过程。
v 一种转录后水平的基因沉默
v 生物体内普遍存在的机制,抑制外源性进入机体内的有害RNA
§ Virus RNA
§ Retroelement
§ Transgenic dsRNA
14, Gene transfer methodology :
1)physical methods
§ Microinjection
§ Electroporation
§ Particle bombardment (gene gun)
§ Electrofusion
2)biological methods
§ Retroviral mediated gene transfer
3)chemical methods
§ Lipofection
v TransMessenger, RNAifFect, HiperfFect (Qiagen for siRNA)
§ Non-liposomal lipids: FuGENE 6 (Roche)
§ Diethylaminoethyl (DEAE)-dextran (DEAE-葡聚糖)
§ Calcium phosphate coprecipitation methods (磷酸钙共沉淀转染)
4)other methods
§ Nuclear transplantation for embryo cell or ES cell
15, Gene targeting and transgenic mice
§ Mechanism: homologous recombination
§ Steps
§ Selection markers
§ Conditional knock-out: Cre-loxP/specific promoter
八 Comparative Genomics and Genome Evolution (2h)
1,The basic mechanisms in population evolution:
§ Variation (变异)
§ Selection (选择)
• Natural selection (自然选择)
• Neutral drift/random drift (中性/随机漂变)
§ Reproductive isolation (生殖隔离)
2, 中性学说的要点:
(1)对每种生物大分子而言,只要分子的三级结构与功能基本不变,那么各进化路线,以突变替代表示的进化速率大致保持每年在每个位置上恒定。
(2)机能较次要的分子或分子片段的进化速率,高于机能较重要的分子或分子片段的进化速率。
(3)在分子进化进程中,使分子现存结构和功能破坏较小的突变比破坏较大的突变有更高的替换率。
(4)基因重复通常发生在一个具有新功能的基因出现之前。
(5)明显有害的选择清除和选择上呈中性的或稍有害的突变随机固定,比明显有利突变的正达尔文选择更为频繁。
3,遗传漂变(genetic drift):
群体遗传学的哈迪-温伯格定律(Hardy-Weinberg Law):在一个不发生突变、迁移和选择的无限大的相互交配的群体中,基因频率和基因型频率将逐代保持不变。(1908)
由于中性突变对生物的生存和繁殖没有影响。自然选择对他们不起作用,它们在种群中的保存、扩散、消失完全随机,并导致种群中某一等位基因在不同世代中传递时,其频率有较大的波动。
4, The molecular basis for variation and evolution:
v DNA duplication (p465~473)
§ By duplication of the entire genome;
§ By duplication of a single chromosome or part of a chromosome;
§ By duplication of a single gene or group of genes.
v Mutation (DNA复制错误导致的突变)
v Recombination (重组)
§ Homologous recombination
§ Translocation (转座)
v Horizontal gene transfer
5, 基因同线性 ( synteny ):
v 含义
§ 不同基因组中基因排列顺序的一致性
§ 可以出现在不同基因组的对应区段
§ 也可以出现在同一基因组内部的不同染色体位置
v 意义:两个物种之间的同线性程度可以作为衡量它们之间进化距离的尺度。但分析时要注意避免高保守和高变异区段
6,人类基因组计划的五大模式生物:
v 大肠杆菌(Esherichi coli)
v 酿酒酵母(Saccharomyces cevevisiae)
v 黑腹果蝇(Drosophila melanogaster)
v 秀丽线虫(Caenorhabditis elegans)
v 小鼠(Mus musculus)
v 拟南芥(Arabidopsis thaliana)