在jenkins jaoco生成增量覆盖率功能时:
研读的一个python通过python-git获取diff时,碰到了这两个正则符号:(\S+)、(\d+)
https://github.com/raoweijian/jacoco-diff/blob/master/diff_processor.py
下面是找到的符号的说明:
\d+的出处:
https://books.google.com.hk/books?id=T5ncDgAAQBAJ&pg=PA192&lpg=PA192&dq=re.match(%27@@+-%5Cd%2B,%5Cd%2B+%5C%2B(%5Cd%2B),%5Cd%2B+@@%27,+line):&source=bl&ots=FrgzHyhsYi&sig=ACfU3U0Ex9AAolnnuDbQTzqw2mcvUHMkyQ&hl=zh-CN&sa=X&ved=2ahUKEwjgm5u7muLnAhWS7GEKHZLSCDEQ6AEwBHoECAkQAQ#v=snippet&q=%5Cd&f=false
书名:Learning R Programming
下面是(\S+)的说明:
来源:https://www.ntu.edu.sg/home/ehchua/programming/howto/Regexe.html
摘录:
1.10 Example: Swapping Words using Parenthesized Back-References ^(\S+)\s+(\S+)$ and $2 $1
The ^ and $ match the beginning and ending of the input string, respectively.
The \s (lowercase s) matches a whitespace (blank, tab \t, and newline \r or \n). On the other hand, the \S+ (uppercase S) matches anything that is NOT matched by \s, i.e., non-whitespace. In regex, the uppercase metacharacter denotes the inverse of the lowercase counterpart, for example, \w for word character and \W for non-word character; \d for digit and \D or non-digit.
The above regex matches two words (without white spaces) separated by one or more whitespaces.
Parentheses () have two meanings in regex:
to group sub-expressions, e.g., (abc)*
to provide a so-called back-reference for capturing and extracting matches.
The parentheses in (\S+), called parenthesized back-reference, is used to extract the matched substring from the input string. In this regex, there are two (\S+), match the first two words, separated by one or more whitespaces \s+. The two matched words are extracted from the input string and typically kept in special variables $1 and $2 (or \1 and \2 in Python), respectively.
To swap the two words, you can access the special variables, and print "$2 $1" (via a programming language); or substitute operator "s/(\S+)\s+(\S+)/$2 $1/" (in Perl).
Code Example in Python
Python keeps the parenthesized back references in \1, \2, .... Also, \0 keeps the entire match.
$ python3
>>> re.findall(r'^(\S+)\s+(\S+)$', 'apple orange')
[('apple', 'orange')] # A list of tuples if the pattern has more than one back references
# Back references are kept in \1, \2, \3, etc.
>>> re.sub(r'^(\S+)\s+(\S+)$', r'\2 \1', 'apple orange') # Prefix r for raw string which ignores escape
'orange apple'
>>> re.sub(r'^(\S+)\s+(\S+)$', '\\2 \\1', 'apple orange') # Need to use \\ for \ for regular string
'orange apple'
Code Example in Java
Java keeps the parenthesized back references in $1, $2, ....
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class TestRegexSwapWords {
public static void main(String[] args) {
String inputStr = "apple orange";
String regexStr = "^(\\S+)\\s+(\\S+)$"; // Regex pattern to be matched
String replacementStr = "$2 $1"; // Replacement pattern with back references
// Step 1: Allocate a Pattern object to compile a regex
Pattern pattern = Pattern.compile(regexStr);
// Step 2: Allocate a Matcher object from the Pattern, and provide the input
Matcher matcher = pattern.matcher(inputStr);
// Step 3: Perform the matching and process the matching result
String outputStr = matcher.replaceFirst(replacementStr); // first match only
System.out.println(outputStr); // Output: orange apple
}
}