java正则表达式 提取、替换(转)
java正则表达式 提取、替换事例1
http://www.cnblogs.com/lihuiyy/archive/2012/10/08/2715138.html
比如,现在有一个 endlist.txt 文本文件,内容如下:
1300102 ,北京市
1300103 ,北京市
1300104 ,北京市
1300105 ,北京市
1300106 ,北京市
1300107 ,北京市
1300108 ,北京市
1300109 ,北京市
1300110 ,北京市
1300111 ,北京市
1300112 ,北京市
1300113 ,北京市
1300114 ,北京市
1300115 ,北京市
1300116 ,北京市
1300117 ,北京市
1300118 ,北京市
1300119,北京市
七位数字代表手机号码的前七位,后面的汉字表示号码归属地。现在我要将这些内容按照 130 131 132... 开头分别写到 130.txt 131.txt 132.txt.....这些文件中。
public
static
void
main(String args[])
{
File file = null;
BufferedReader br = null;
StringBuffer buffer = null;
String childPath = "src/endlist.txt";
String data = "";
try {
file = new File(childPath);
buffer = new StringBuffer();
InputStreamReader isr = new InputStreamReader(new FileInputStream(file), "utf-8");
br = new BufferedReader(isr);
int s;
while ((s = br.read()) != -1) {
buffer.append((char) s);
}
data = buffer.toString();
} catch (Exception e) {
e.printStackTrace();
}
Map<String, ArrayList<String>> resultMap = new HashMap<String, ArrayList<String>>();
for (int i = 0; i < 10; i++) {
resultMap.put("13" + i, new ArrayList<String>());
}
Pattern pattern = Pattern.compile("(\\d{3})(\\d{4},[\u4e00-\u9fa5]*\\n)");
Matcher matcher = pattern.matcher(data);
while (matcher.find()) {
resultMap.get(matcher.group(1)).add(matcher.group(2));
}
for (int i = 0; i < 10; i++) {
if (resultMap.get("13" + i).size() > 0) {
try {
File outFile = new File("src/13" + i + ".txt");
FileOutputStream outputStream = new FileOutputStream(outFile);
OutputStreamWriter writer = new OutputStreamWriter(outputStream, "utf-8");
ArrayList<String> tempList = resultMap.get("13" + i);
for (int j = 0; j < tempList.size(); j++) {
writer.append(resultMap.get("13" + i).get(j));
}
writer.close();
outputStream.close();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
}
File file = null;
BufferedReader br = null;
StringBuffer buffer = null;
String childPath = "src/endlist.txt";
String data = "";
try {
file = new File(childPath);
buffer = new StringBuffer();
InputStreamReader isr = new InputStreamReader(new FileInputStream(file), "utf-8");
br = new BufferedReader(isr);
int s;
while ((s = br.read()) != -1) {
buffer.append((char) s);
}
data = buffer.toString();
} catch (Exception e) {
e.printStackTrace();
}
Map<String, ArrayList<String>> resultMap = new HashMap<String, ArrayList<String>>();
for (int i = 0; i < 10; i++) {
resultMap.put("13" + i, new ArrayList<String>());
}
Pattern pattern = Pattern.compile("(\\d{3})(\\d{4},[\u4e00-\u9fa5]*\\n)");
Matcher matcher = pattern.matcher(data);
while (matcher.find()) {
resultMap.get(matcher.group(1)).add(matcher.group(2));
}
for (int i = 0; i < 10; i++) {
if (resultMap.get("13" + i).size() > 0) {
try {
File outFile = new File("src/13" + i + ".txt");
FileOutputStream outputStream = new FileOutputStream(outFile);
OutputStreamWriter writer = new OutputStreamWriter(outputStream, "utf-8");
ArrayList<String> tempList = resultMap.get("13" + i);
for (int j = 0; j < tempList.size(); j++) {
writer.append(resultMap.get("13" + i).get(j));
}
writer.close();
outputStream.close();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
}
第24行使用正则表达式 "( \\d{3})(\\d{4},[\u4e00-\u9fa5]*\\n)" 每个()中的内容为一组,索引从 1 开始,0表示整个表达式。所以这个表达式分为两组,第一组表示3个数字,第二组表示 4个数字加多个汉字加一个换行符。提取时如26-28行所示。
事例2
http://www.cnblogs.com/jxgxy/archive/2012/07/30/2615997.html
import
java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class test {
public static void main(String[] args) {
getStrings(); //用正则表达式获取指定字符串内容中的指定内容
System.out.println("********************");
replace(); //用正则表达式替换字符串内容
System.out.println("********************");
strSplit(); //使用正则表达式切割字符串
System.out.println("********************");
strMatch(); //字符串匹配
}
private static void strMatch() {
String phone = "13539770000";
//检查phone是否是合格的手机号(标准:1开头,第二位为3,5,8,后9位为任意数字)
System.out.println(phone + ":" + phone.matches("1[358][0-9]{9,9}")); //true
String str = "abcd12345efghijklmn";
//检查str中间是否包含12345
System.out.println(str + ":" + str.matches("\\w+12345\\w+")); //true
System.out.println(str + ":" + str.matches("\\w+123456\\w+")); //false
}
private static void strSplit() {
String str = "asfasf.sdfsaf.sdfsdfas.asdfasfdasfd.wrqwrwqer.asfsafasf.safgfdgdsg";
String[] strs = str.split("\\.");
for (String s : strs){
System.out.println(s);
}
}
private static void getStrings() {
String str = "rrwerqq84461376qqasfdasdfrrwerqq84461377qqasfdasdaa654645aafrrwerqq84461378qqasfdaa654646aaasdfrrwerqq84461379qqasfdasdfrrwerqq84461376qqasfdasdf";
Pattern p = Pattern.compile("qq(.*?)qq");
Matcher m = p.matcher(str);
ArrayList<String> strs = new ArrayList<String>();
while (m.find()) {
strs.add(m.group(1));
}
for (String s : strs){
System.out.println(s);
}
}
private static void replace() {
String str = "asfas5fsaf5s4fs6af.sdaf.asf.wqre.qwr.fdsf.asf.asf.asf";
//将字符串中的.替换成_,因为.是特殊字符,所以要用\.表达,又因为\是特殊字符,所以要用\\.来表达.
str = str.replaceAll("\\.", "_");
System.out.println(str);
}
}
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class test {
public static void main(String[] args) {
getStrings(); //用正则表达式获取指定字符串内容中的指定内容
System.out.println("********************");
replace(); //用正则表达式替换字符串内容
System.out.println("********************");
strSplit(); //使用正则表达式切割字符串
System.out.println("********************");
strMatch(); //字符串匹配
}
private static void strMatch() {
String phone = "13539770000";
//检查phone是否是合格的手机号(标准:1开头,第二位为3,5,8,后9位为任意数字)
System.out.println(phone + ":" + phone.matches("1[358][0-9]{9,9}")); //true
String str = "abcd12345efghijklmn";
//检查str中间是否包含12345
System.out.println(str + ":" + str.matches("\\w+12345\\w+")); //true
System.out.println(str + ":" + str.matches("\\w+123456\\w+")); //false
}
private static void strSplit() {
String str = "asfasf.sdfsaf.sdfsdfas.asdfasfdasfd.wrqwrwqer.asfsafasf.safgfdgdsg";
String[] strs = str.split("\\.");
for (String s : strs){
System.out.println(s);
}
}
private static void getStrings() {
String str = "rrwerqq84461376qqasfdasdfrrwerqq84461377qqasfdasdaa654645aafrrwerqq84461378qqasfdaa654646aaasdfrrwerqq84461379qqasfdasdfrrwerqq84461376qqasfdasdf";
Pattern p = Pattern.compile("qq(.*?)qq");
Matcher m = p.matcher(str);
ArrayList<String> strs = new ArrayList<String>();
while (m.find()) {
strs.add(m.group(1));
}
for (String s : strs){
System.out.println(s);
}
}
private static void replace() {
String str = "asfas5fsaf5s4fs6af.sdaf.asf.wqre.qwr.fdsf.asf.asf.asf";
//将字符串中的.替换成_,因为.是特殊字符,所以要用\.表达,又因为\是特殊字符,所以要用\\.来表达.
str = str.replaceAll("\\.", "_");
System.out.println(str);
}
}
|
|