用excel操作序列(引物设计)

介绍

最早是我读研究生一年级时,用VBA处理过一些DNA序列处理(word)、绘制图片(Powerpoint),也写过几个excel函数用于DNA oligo的基本分析和操作。在做基因编辑时,为了快速批量设计tRNA-gRNA构建所需引物,修改了Excel的函数用于golden gate引物的设计。这些excel的宏文件也发送给过几个朋友、学生,这里再次发给大家,希望有所用途。
这些代码都是20年前写的,最初也只想自己用,非常简单粗糙,有兴趣的也可以依据自己的需要进行修改。

使用

1. 加载Tm_primer.xla

在excel中,通过“文件-选项-加载项”窗口中下发的“转到”按钮,打开“加载项”窗口,选择“浏览”找到Tm_primer.xla,加载到你的excel中.
简单的测试是否加载成功:excel的A1中输入一段DNA序列,然后输入公式Tm(A1)看是否运行正常。

2. 参考我提供的excel引物设计模板文件:PTG primer design template.xlsm

(1)Primer for single gRNA

用于sgRNA引物设计,将20bp的序列加4nt的overhang sequence,文档中的例子是用于pRGEB32B0~11载体(F引物的overhang是tRNA的末端4nt序列)
F引物公式:overhang所在单元格 & gRNA-spacer 单元格
R引物公式:overhang所在单元格 & upper(Rev(gRNA-spacer))
& 表示链接单元格中的字符
Rev(),Tm_primer.xla中的函数,获得互补序列
upper(),改为大写,与overhang序列区别

image.png

(2)Primer design for PTG

Golden gate assembly tRNA-gRNA fusion的引物,其中Div表示spacer序列中断开的位置。


image.png

注意事项

  1. overhang序列的单元格要用绝对引用(加$符号)
  2. 批量设计引物是一定要检查、检查再检查。

附Tm_primer.xla的源代码

加载Tm_primer.xla后,可以用excel中的VBA编辑器查看源代码(启动VBA编辑器:https://www.lanrenexcel.com/open-developer-tab/)。
VBA帮助:https://docs.microsoft.com/zh-cn/office/vba/api/overview/

我有一段时间觉得EXCEL的强大是由于VBA工具。

Tm_primer.xla文件中包括以下函数
Tm,简单的Tm计算 ,公式为(GC X 4 + AT x 2)

Function Tm(sequence) As Single
    Dim Na As Integer
    Dim Ng As Integer
    Dim Nc As Integer
    Dim Nt As Integer
    
    sequence = UCase(sequence)
    Na = Len(sequence) - Len(Application.WorksheetFunction.Substitute(sequence, "A", ""))
    Ng = Len(sequence) - Len(Application.WorksheetFunction.Substitute(sequence, "G", ""))
    Nc = Len(sequence) - Len(Application.WorksheetFunction.Substitute(sequence, "C", ""))
    Nt = Len(sequence) - Len(Application.WorksheetFunction.Substitute(sequence, "T", ""))

Tm = 4 * (Ng + Nc) + 2 * (Na + Nt)

End Function

Tm2, 复杂点的Tm值计算,公式为 64.9 + 41 * (G + C - 16.4) / Len(sequence)

Function Tm2(sequence) As Single
    Dim Na As Integer
    Dim Ng As Integer
    Dim Nc As Integer
    Dim Nt As Integer
    
    sequence = UCase(Trim(sequence))
    Na = Len(sequence) - Len(Application.WorksheetFunction.Substitute(sequence, "A", ""))
    Ng = Len(sequence) - Len(Application.WorksheetFunction.Substitute(sequence, "G", ""))
    Nc = Len(sequence) - Len(Application.WorksheetFunction.Substitute(sequence, "C", ""))
    Nt = Len(sequence) - Len(Application.WorksheetFunction.Substitute(sequence, "T", ""))

Tm2 = 64.9 + 41 * (Ng + Nc - 16.4) / Len(sequence)

End Function

GC含量

Function GC(sequence) As Single

  
    Dim Ng As Integer
    Dim Nc As Integer
    
    
    sequence = UCase(sequence)
   
    Ng = Len(sequence) - Len(Application.WorksheetFunction.Substitute(sequence, "G", ""))
    Nc = Len(sequence) - Len(Application.WorksheetFunction.Substitute(sequence, "C", ""))
   GC = (Ng + Nc) / Len(sequence)

End Function

Rev,转化为互补序列

Function Rev(sequence) As String
 Dim t As String
 t = UCase(sequence)
 t = Application.WorksheetFunction.Substitute(t, "A", "t")
 t = Application.WorksheetFunction.Substitute(t, "G", "c")
 t = Application.WorksheetFunction.Substitute(t, "C", "g")
 t = Application.WorksheetFunction.Substitute(t, "U", "a")
 t = Application.WorksheetFunction.Substitute(t, "T", "a")
 Rev = StrReverse(t)

End Function

Contrary,转为反向(即5'-3’转为3'-5’,序列不变)

Function Contrary(sequence) As String
Contrary = StrReverse(sequence)

End Function

specific, 太久了,忘了为什么写这个函数了


Function specific(sequence) As Integer
Dim t, m As String

t = sequence
For i = 1 To Len(t)
    m = Mid(t, i, 1)
    If (Asc(m) > Asc("A") And Asc(m) < Asc("Z")) Then
     specific = i
     Exit For
    End If
 Next
    
End Function

MyLink,批量改为超链接 (当时为了其他人查阅数据便,将基因编号改为超级链接,点击即可进入基因注释页面)


Function MyLink(Key As Range, Prev As Range, Post As Range)
Dim s As String

s = Trim(Prev.Text) & Trim(Key.Text) & Trim(Post.Text)

Key.Worksheet.Hyperlinks.Add Key, s

MyLink = s

End Function

BlastParse, 提取BLAST结果中的相关信息到excel表格中,是直接在VBA中运行的

Sub BlastParse()

Dim filename


'Get the file name of blast result file
With Application.FileDialog(msoFileDialogOpen)
    .AllowMultiSelect = False
    .Show
    
    filename = .SelectedItems(1)
End With

'open the blast result file


Dim s As String

Open filename For Input As #1


'pase the blast result
Const NewQuery As String = "Query=*"
Const EndQuery As String = "Lamda*"
Const NewHit As String = ">*"
Const Para As String = "*Score*"
Const EndAnno As String = "*Length*"
Dim Nrow, Ncol As Integer
Dim Hit As Boolean
Dim Align As String
Dim Anno As String



'Set Dest = Application.Workbooks(1).Worksheets(1)
Do While Not EOF(1)
    Line Input #1, s
    
    With Application.ActiveSheet
    If (s Like NewQuery) Then
     Nrow = Nrow + 1
     Ncol = 1
     .Cells(Nrow, Ncol) = s
     Hit = False
     
     
    End If
    
    If (s Like NewHit) And Not Hit Then
        
        Ncol = Ncol + 1
        Anno = ""
        Do While (Not s Like EndAnno) And Not EOF(1)
                Anno = Anno & Trim(s)
                Line Input #1, s
                
         Loop
         
        .Cells(Nrow, Ncol) = Anno
        
        Do While (Not EOF(1)) And Not (s Like Para)
            Line Input #1, s
        Loop
        Ncol = Ncol + 1
        .Cells(Nrow, Ncol) = s
        
        Ncol = Ncol + 1
        Line Input #1, s
        .Cells(Nrow, Ncol) = s
        
        Ncol = Ncol + 1
        Line Input #1, s
        .Cells(Nrow, Ncol) = s
        Hit = True
        
        
    End If
    
    End With
Loop
    

Close #1



End Sub

ColMidd,好像也是为了上面解析BLAST结果写的一个函数

Function ColMidd(s As Range, Begin As Integer, Length As Integer)

    With s.Characters(Start:=Begin, Length:=Length).Font
                .ColorIndex = 3
    End With
   
End Function

录制的一个宏,直接加个循环可以处理excel表格

Sub Macro1()
'
' Macro1 Macro
' 宏由 Kabin Xie 录制,时间: 2008-10-19
'

'
 Dim i As Integer
 For i = 2 To 1001
 
 Range("g" & i).Select
 
 '   Range("G2").Select
    'A ctiveCell.FormulaR1C1 = "SWTNSPRSPPKVRRD"
    With ActiveCell.Characters(Start:=1, Length:=7).Font
        .Name = "宋体"
        .FontStyle = "常规"
        .Size = 12
        .Strikethrough = False
        .Superscript = False
        .Subscript = False
        .OutlineFont = False
        .Shadow = False
        .Underline = xlUnderlineStyleNone
        .ColorIndex = xlAutomatic
    End With
    With ActiveCell.Characters(Start:=8, Length:=1).Font
        .Name = "宋体"
        .FontStyle = "常规"
        .Size = 12
        .Strikethrough = False
        .Superscript = False
        .Subscript = False
        .OutlineFont = False
        .Shadow = False
        .Underline = xlUnderlineStyleNone
        .ColorIndex = 3
    End With
    With ActiveCell.Characters(Start:=9, Length:=7).Font
        .Name = "宋体"
        .FontStyle = "常规"
        .Size = 12
        .Strikethrough = False
        .Superscript = False
        .Subscript = False
        .OutlineFont = False
        .Shadow = False
        .Underline = xlUnderlineStyleNone
        .ColorIndex = xlAutomatic
    End With
 Next
End Sub

谢卡斌
2022.2.9

你可能感兴趣的:(用excel操作序列(引物设计))