A regular expression is a useful feature in a programming language to check whether or not the string contains the desired value. It can not only check but also extract the data from the string.
In this post, we’ll go through the basic usage of regexp
.
Let’s start with an easy example. The first one only checks if the value is contained in a string. regexp
must be imported to use regular expression, which will be written as regex
in the following.
import (
"fmt"
"regexp"
)
func runMatch() {
reg, err := regexp.Compile("ID_\\d")
if err != nil {
fmt.Println("error in regex string")
}
fmt.Println(reg.Match([]byte("ID_1"))) // true
fmt.Println(reg.Match([]byte("ID_ONE"))) // false
fmt.Println(reg.MatchString("ID_1")) // true
fmt.Println(reg.MatchString("ID_123")) // true
fmt.Println(reg.MatchString("something_ID_123")) // true
fmt.Println(reg.MatchString("ID_ONE")) // false
}
We have to compile the regex string first by regexp.Compile(“string here”)
. It returns a regex instance which can actually be used to do something with the regex string.
The example search for ID_
is a metacharacter that has a special meaning. It is the same as [0-9]
which means one-digit number. If it is [2-9]
, it expects numbers in the range of 2
to 9
.
A backslash is handled as an escape character. The next character is handled as a metacharacter. If the metacharacter isn’t defined, Compile
method returns an error.
regexp.Compile("ID_\\d")
used in the example has two backslashes because the second one must be handled as a normal character. We can use back quotes instead.
regexp.Compile(`ID_\d`)
It handles the string as a raw string. A backslash is handled as a normal character.
The Match
method accepts only a byte array but it also provides a string version.
fmt.Println(reg.Match([]byte("ID_1"))) // true
fmt.Println(reg.Match([]byte("ID_ONE"))) // false
fmt.Println(reg.MatchString("ID_1")) // true
fmt.Println(reg.MatchString("ID_ONE")) // false
Use the right one depending on the data type.
There is MustCompile
method. It can be used to get an instance of regex but it panics if the regex string is invalid.
func panicCompile() {
defer func() {
if r := recover(); r != nil {
// panic: regexp: Compile(`ID_\p`): error parsing regexp: invalid character class range: `\p`
fmt.Println("panic: ", r)
}
}()
regexp.MustCompile("ID_\\p")
}
\p
is not a defined meta character and thus it’s invalid. Generally, MustCompile
should not be used because it panics. It’s much better to use the normal Compile
method and handle the error result correctly.
Don’t use MustCompile
without any good reason!
QuoteMeta
should be used if the regex string needs to be generated depending on input. Note that metacharacters can’t be used in this case because it escapes the special characters.
func useQuoteMeta() {
originalStr := "ID_\\p"
quotedStr := regexp.QuoteMeta(originalStr)
fmt.Println(originalStr) // ID_\p
fmt.Println(quotedStr) // ID_\\p
_, err := regexp.Compile(quotedStr)
if err != nil {
fmt.Println("error in regex string")
}
}
The original string is invalid as we saw in the previous section but it becomes a valid string by QuoteMeta()
. It escapes all special characters.
FindString
or FindAllString
can be used to find the specified word in a string. The string is not a fixed word when regex is necessary. So metacharacters should be used in this case. In the following case, it finds all matches with ID_X
.
func runFindSomething() {
reg, _ := regexp.Compile(`ID_\d`)
text := "ID_1, ID_42, RAW_ID_52, "
fmt.Printf("%q\n", reg.FindAllString(text, 2)) // ["ID_1" "ID_4"]
fmt.Printf("%q\n", reg.FindAllString(text, 5)) // ["ID_1" "ID_4" "ID_5"]
fmt.Printf("%q\n", reg.FindString(text)) // "ID_1"
}
Set the desired value to the second parameter if you want to limit the number of results. If the value is bigger than the number of the matched string, the result contains all the results. The behavior is the same as FindString
if it’s set to 1.
Set -1 if all the matched string needs to be used.
result := reg.FindAllString(text, -1)
fmt.Printf("%q\n", result) // ["ID_1" "ID_4" "ID_5"]
fmt.Printf("%q\n", result[2]) // "ID_5"
Use Index
method if the index is needed to cut the string for example.
fmt.Printf("%v\n", reg.FindAllStringIndex(text, 5)) // [[0 4] [6 10] [17 21]]
How can we implement it if we want to know the key and the value? If the string is ID_1
, the key is ID
and value is 1
. Submatch
method needs to be used in this case but the following way doesn’t work well.
text := "ID_1, ID_42, RAW_ID_52, "
reg, _ := regexp.Compile(`ID_\d`)
fmt.Printf("%q\n", reg.FindAllStringSubmatch(text, -1)) // [["ID_1"] ["ID_4"] ["ID_5"]]
The last key must be RAW_ID
and the matched strings need to be split by an underbar _
. It’s not a good way. The target values can be extracted by using parenthesis ()
. To get the key-value, it can be written in the following way. There are some metacharacters. If you are not familiar with the syntax, go to the official site.
reg2, err := regexp.Compile(`([^,\s]+)_(\d+)`)
if err != nil {
fmt.Println("error in regex string")
}
fmt.Printf("%q\n", reg2.FindAllString(text, -1)) // ["ID_1" "ID_42" "RAW_ID_52"]
result2 := reg2.FindAllStringSubmatch(text, -1)
fmt.Printf("%q\n", result2) // [["ID_1" "ID" "1"] ["ID_42" "ID" "42"] ["RAW_ID_52" "RAW_ID" "52"]]
fmt.Printf("Key: %s, ID: %s\n", result2[2][1], result2[2][2])
FindAllString
just returns the matched string but we need additional work for it to get the key-value. Submatch
method returns a nice value. The first index is always the whole matched string that appears in FindAllString
. The second index corresponds to the first parenthesis. It’s key in this case. Then, the third one corresponds to the value. With this result, we can easily use the key-value in the following code.