介绍
最早是我读研究生一年级时,用VBA处理过一些DNA序列处理(word)、绘制图片(Powerpoint),也写过几个excel函数用于DNA oligo的基本分析和操作。在做基因编辑时,为了快速批量设计tRNA-gRNA构建所需引物,修改了Excel的函数用于golden gate引物的设计。这些excel的宏文件也发送给过几个朋友、学生,这里再次发给大家,希望有所用途。
这些代码都是20年前写的,最初也只想自己用,非常简单粗糙,有兴趣的也可以依据自己的需要进行修改。
使用
1. 加载Tm_primer.xla
在excel中,通过“文件-选项-加载项”窗口中下发的“转到”按钮,打开“加载项”窗口,选择“浏览”找到Tm_primer.xla,加载到你的excel中.
简单的测试是否加载成功:excel的A1中输入一段DNA序列,然后输入公式Tm(A1)看是否运行正常。
2. 参考我提供的excel引物设计模板文件:PTG primer design template.xlsm
(1)Primer for single gRNA
用于sgRNA引物设计,将20bp的序列加4nt的overhang sequence,文档中的例子是用于pRGEB32B0~11载体(F引物的overhang是tRNA的末端4nt序列)
F引物公式:overhang所在单元格 & gRNA-spacer 单元格
R引物公式:overhang所在单元格 & upper(Rev(gRNA-spacer))
& 表示链接单元格中的字符
Rev(),Tm_primer.xla中的函数,获得互补序列
upper(),改为大写,与overhang序列区别
(2)Primer design for PTG
Golden gate assembly tRNA-gRNA fusion的引物,其中Div表示spacer序列中断开的位置。
注意事项
- overhang序列的单元格要用绝对引用(加$符号)
- 批量设计引物是一定要检查、检查再检查。
附Tm_primer.xla的源代码
加载Tm_primer.xla后,可以用excel中的VBA编辑器查看源代码(启动VBA编辑器:https://www.lanrenexcel.com/open-developer-tab/)。
VBA帮助:https://docs.microsoft.com/zh-cn/office/vba/api/overview/
我有一段时间觉得EXCEL的强大是由于VBA工具。
Tm_primer.xla文件中包括以下函数
Tm,简单的Tm计算 ,公式为(GC X 4 + AT x 2)
Function Tm(sequence) As Single
Dim Na As Integer
Dim Ng As Integer
Dim Nc As Integer
Dim Nt As Integer
sequence = UCase(sequence)
Na = Len(sequence) - Len(Application.WorksheetFunction.Substitute(sequence, "A", ""))
Ng = Len(sequence) - Len(Application.WorksheetFunction.Substitute(sequence, "G", ""))
Nc = Len(sequence) - Len(Application.WorksheetFunction.Substitute(sequence, "C", ""))
Nt = Len(sequence) - Len(Application.WorksheetFunction.Substitute(sequence, "T", ""))
Tm = 4 * (Ng + Nc) + 2 * (Na + Nt)
End Function
Tm2, 复杂点的Tm值计算,公式为 64.9 + 41 * (G + C - 16.4) / Len(sequence)
Function Tm2(sequence) As Single
Dim Na As Integer
Dim Ng As Integer
Dim Nc As Integer
Dim Nt As Integer
sequence = UCase(Trim(sequence))
Na = Len(sequence) - Len(Application.WorksheetFunction.Substitute(sequence, "A", ""))
Ng = Len(sequence) - Len(Application.WorksheetFunction.Substitute(sequence, "G", ""))
Nc = Len(sequence) - Len(Application.WorksheetFunction.Substitute(sequence, "C", ""))
Nt = Len(sequence) - Len(Application.WorksheetFunction.Substitute(sequence, "T", ""))
Tm2 = 64.9 + 41 * (Ng + Nc - 16.4) / Len(sequence)
End Function
GC含量
Function GC(sequence) As Single
Dim Ng As Integer
Dim Nc As Integer
sequence = UCase(sequence)
Ng = Len(sequence) - Len(Application.WorksheetFunction.Substitute(sequence, "G", ""))
Nc = Len(sequence) - Len(Application.WorksheetFunction.Substitute(sequence, "C", ""))
GC = (Ng + Nc) / Len(sequence)
End Function
Rev,转化为互补序列
Function Rev(sequence) As String
Dim t As String
t = UCase(sequence)
t = Application.WorksheetFunction.Substitute(t, "A", "t")
t = Application.WorksheetFunction.Substitute(t, "G", "c")
t = Application.WorksheetFunction.Substitute(t, "C", "g")
t = Application.WorksheetFunction.Substitute(t, "U", "a")
t = Application.WorksheetFunction.Substitute(t, "T", "a")
Rev = StrReverse(t)
End Function
Contrary,转为反向(即5'-3’转为3'-5’,序列不变)
Function Contrary(sequence) As String
Contrary = StrReverse(sequence)
End Function
specific, 太久了,忘了为什么写这个函数了
Function specific(sequence) As Integer
Dim t, m As String
t = sequence
For i = 1 To Len(t)
m = Mid(t, i, 1)
If (Asc(m) > Asc("A") And Asc(m) < Asc("Z")) Then
specific = i
Exit For
End If
Next
End Function
MyLink,批量改为超链接 (当时为了其他人查阅数据便,将基因编号改为超级链接,点击即可进入基因注释页面)
Function MyLink(Key As Range, Prev As Range, Post As Range)
Dim s As String
s = Trim(Prev.Text) & Trim(Key.Text) & Trim(Post.Text)
Key.Worksheet.Hyperlinks.Add Key, s
MyLink = s
End Function
BlastParse, 提取BLAST结果中的相关信息到excel表格中,是直接在VBA中运行的
Sub BlastParse()
Dim filename
'Get the file name of blast result file
With Application.FileDialog(msoFileDialogOpen)
.AllowMultiSelect = False
.Show
filename = .SelectedItems(1)
End With
'open the blast result file
Dim s As String
Open filename For Input As #1
'pase the blast result
Const NewQuery As String = "Query=*"
Const EndQuery As String = "Lamda*"
Const NewHit As String = ">*"
Const Para As String = "*Score*"
Const EndAnno As String = "*Length*"
Dim Nrow, Ncol As Integer
Dim Hit As Boolean
Dim Align As String
Dim Anno As String
'Set Dest = Application.Workbooks(1).Worksheets(1)
Do While Not EOF(1)
Line Input #1, s
With Application.ActiveSheet
If (s Like NewQuery) Then
Nrow = Nrow + 1
Ncol = 1
.Cells(Nrow, Ncol) = s
Hit = False
End If
If (s Like NewHit) And Not Hit Then
Ncol = Ncol + 1
Anno = ""
Do While (Not s Like EndAnno) And Not EOF(1)
Anno = Anno & Trim(s)
Line Input #1, s
Loop
.Cells(Nrow, Ncol) = Anno
Do While (Not EOF(1)) And Not (s Like Para)
Line Input #1, s
Loop
Ncol = Ncol + 1
.Cells(Nrow, Ncol) = s
Ncol = Ncol + 1
Line Input #1, s
.Cells(Nrow, Ncol) = s
Ncol = Ncol + 1
Line Input #1, s
.Cells(Nrow, Ncol) = s
Hit = True
End If
End With
Loop
Close #1
End Sub
ColMidd,好像也是为了上面解析BLAST结果写的一个函数
Function ColMidd(s As Range, Begin As Integer, Length As Integer)
With s.Characters(Start:=Begin, Length:=Length).Font
.ColorIndex = 3
End With
End Function
录制的一个宏,直接加个循环可以处理excel表格
Sub Macro1()
'
' Macro1 Macro
' 宏由 Kabin Xie 录制,时间: 2008-10-19
'
'
Dim i As Integer
For i = 2 To 1001
Range("g" & i).Select
' Range("G2").Select
'A ctiveCell.FormulaR1C1 = "SWTNSPRSPPKVRRD"
With ActiveCell.Characters(Start:=1, Length:=7).Font
.Name = "宋体"
.FontStyle = "常规"
.Size = 12
.Strikethrough = False
.Superscript = False
.Subscript = False
.OutlineFont = False
.Shadow = False
.Underline = xlUnderlineStyleNone
.ColorIndex = xlAutomatic
End With
With ActiveCell.Characters(Start:=8, Length:=1).Font
.Name = "宋体"
.FontStyle = "常规"
.Size = 12
.Strikethrough = False
.Superscript = False
.Subscript = False
.OutlineFont = False
.Shadow = False
.Underline = xlUnderlineStyleNone
.ColorIndex = 3
End With
With ActiveCell.Characters(Start:=9, Length:=7).Font
.Name = "宋体"
.FontStyle = "常规"
.Size = 12
.Strikethrough = False
.Superscript = False
.Subscript = False
.OutlineFont = False
.Shadow = False
.Underline = xlUnderlineStyleNone
.ColorIndex = xlAutomatic
End With
Next
End Sub
谢卡斌
2022.2.9