regular expression
I. overview
1. Concept: Expressions that conform to certain rules.
Function: Used for special operation of strings.
3. Characteristic: Used for some specific symbols to represent some code operations, which can simplify writing. So learning regular expressions is learning the use of some special symbols.
4. Benefits: Complex operations on strings can be simplified.
5. Disadvantage: The more definitions are met, the longer the rules are, the worse the reading ability is.
Common Symbols
Description: X represents character X or matching rules.
1, characters
X character x
\ Backslash character
\ t tab ('u 0009')
\ n) New line (newline) character ('000A')
\ r) Return Character ('u 000D')
\ f) Page breaks ('u 000C')
\ a) Alarm (bell) character ('0007')
2. Character class
[a b c] * a, b or C (simple class)
Any character except a, b or c (negative)
(a-zA-Z). A to Z or A to Z, the letters at both ends are included (range)
(a-d[m-p]) A to D or m to p:[a-d m-p] (union)
[a-z & D E f] * d, e or F (intersection)
[a-z & & [^ b c]] a to z, except b and c:[ad-z] (subtracted)
[a-z & & & [^ m-p]] a to z, not m to p: [a-lq-z] (subtracted)
3. Predefined Character Classes
Any character (which may or may not match the line terminator)
\ d) Number: [0-9]
\ D) Non-numeric: [^ 0-9]
\ Blank character s: [ t n x0B f r]
\ Non-blank characters: [^\ s]
\ w) Word characters: [a-zA-Z_0-9]
\ Non-word characters: [^\ w]
4. Boundary Matcher
^ The beginning of the line
The End of a Line
\ b) Word boundaries
\ B) Non-word boundaries
\ A) The beginning of the input
\ G) The End of the Last Match
\ Z) The end of the input is used only for the last terminator (if any)
\ End of input
5. Greedy quantifiers
X? X? X, not once or once.
X* * X, zero or more times
X+* X, once or more
X{n} X, exactly n times
X{n,} X, at least N times
X{n,m} X, at least n times, but not more than m times
6. Group and capture
The capture group can be numbered by calculating its open brackets from left to right. For example, in the expression ((A) (B (C)), there are four such groups:
1 ((A)(B(C)))
2 \A
3 (B(C))
4 (C)
Group zeros always represent the entire expression. In substitution, $matches the content of the group.
3. Specific Functions of Regular Expressions
There are four specific functions: matching, cutting, replacement and acquisition.
1. Matching: The Boolean matches (String regex) method in the String class. The whole string is matched by rules, and if one of them does not conform to the rules, the matching ends and false is returned.
Example:
-
class MatchesDemo
-
{
-
-
-
-
-
-
public static void qqCheck_1(String qq)
-
{
-
if (!qq.startsWith("0"))
-
{
-
if (qq.length()>=5&&qq.length()<=15)
-
{
-
try
-
{
-
Long l=Long.parseLong(qq);
-
System.out.println(qq);
-
}
-
catch (NumberFormatException e)
-
{
-
System.out.println("Contains illegal characters!");
-
}
-
}
-
else
-
System.out.println("The length of your input is illegal!");
-
}
-
else
-
System.out.println("No number starting at 0, please retype it!");
-
}
-
-
-
public static void qqCheck_2(String qq)
-
{
-
String regex="[1-9]\\d{4,14}";
-
if (qq.matches(regex))
-
{
-
System.out.println(qq);
-
}
-
else
-
System.out.println(qq+":It's an illegal number!");
-
-
}
-
-
-
-
-
-
-
-
public static void phoneCheck(String phone)
-
{
-
String regex="1[358]\\d{9}";
-
if (phone.matches(regex))
-
{
-
System.out.println(phone+":::is ok..");
-
}
-
else
-
System.out.println("There is a mistake in the input of mobile phone number!");
-
}
-
-
-
public static void main(String[] args)
-
{
-
String qq="125696";
-
qqCheck_1(qq);
-
qqCheck_2(qq);
-
-
String phone="13345678910";
-
phoneCheck(phone);
-
}
-
}
2. Cutting: String[]split(String regex) method in String class.
Example:
-
class SplitDemo
-
{
-
-
public static void main(String[] args)
-
{
-
String regex1="\\.";
-
String regex2=" +";
-
String regex3="(.)\\1+";
-
String[] arr="192.168.1.62".split(regex1);
-
print(arr);
-
-
arr ="wo shi shui 545 21 3".split(regex2);
-
print(arr);
-
-
arr="erkktyqqquizzzzzo".split(regex3);
-
print(arr);
-
}
-
-
-
public static void print(String[] arr)
-
{
-
for (String s : arr)
-
{
-
System.out.println(s);
-
}
-
}
-
}
Explain:
Cutting by reduplication: In order to reuse rules, rules can be encapsulated into a group and completed with (). Groups are numbered, starting from 1. If you want to use an existing group, you can get it in the form of n (n is the number of the group).
For matching characters in a group, $n can be used to obtain them. The $denotes the end of a row in a regular, so it cannot be used to denote a group in a regular, and is generally used in substitutions. In the following functions.
3. Replacement: String replacement All (String regex, String replacement) method.
Example:
-
class ReplaceDemo
-
{
-
public static void main(String[] args)
-
{
-
String regex1="\\d{5,}";
-
String regex2="(.)\\1+";
-
-
String s1="erej569831217woshi2659810wosxs12356f";
-
s1=s1.replaceAll(regex1,"#");
-
-
String s2="erkktyqqquizzzzzo";
-
s2=s2.replaceAll(regex2,"$1");
-
-
System.out.println("s1:"+s1);
-
System.out.println("s2:"+s2);
-
}
-
}
4. Acquisition: Remove the regular substrings from the string.
Operation steps:
1) Encapsulate regular expressions into objects.
2) associate regular objects with strings to be manipulated.
3) After association, get the regular matching engine.
4) The engine operates on the regular substrings, such as extracting them.
Example:
-
import java .util.regex.*;
-
class PatternDemo
-
{
-
public static void main(String[] args)
-
{
-
String s= "ming tian jiu yao fang jia le ,da jia. ";
-
String regex="\\b[a-z]{4}\\b";
-
get(s,regex);
-
}
-
public static void get(String s,String regex)
-
{
-
-
Pattern p=Pattern.compile(regex);
-
-
Matcher m=p.matcher(s);
-
-
-
-
-
-
while(m.find())
-
{
-
System.out.println(m.group());
-
System.out.println(m.start()+"...."+m.end());
-
-
}
-
}
-
}
Four, practice
Choice of four functions (way of thinking):
1) If you only want to know whether the character is right or wrong, use matching.
2) Want to replace an existing string with another string.
3) Want to change strings into multiple strings in a customized way. Cut. Gets substrings other than rules.
4) Want to get the required string substrings, get them. Gets a regular substring.
Exercise 1
-
-
-
-
-
-
-
-
-
-
-
-
class ReplaceTest
-
{
-
public static void main(String[] args)
-
{
-
String s="Me and me...I..I want...want...Must....Science....Science......Editing and editing...Cheng...Cheng Cheng....";
-
System.out.println(s);
-
-
String regex="\\.+";
-
s=s.replaceAll(regex,"");
-
System.out.println(s);
-
-
regex="(.)\\1+";
-
s=s.replaceAll(regex,"$1");
-
System.out.println(s);
-
}
-
}
Exercise 2
-
-
-
-
-
-
-
-
-
-
-
-
import java.util.*;
-
-
class IPSortTest
-
{
-
public static void main(String[] args)
-
{
-
String ip="192.68.1.254 102.49.23.013 10.10.10.10 2.2.2.2 8.109.90.301";
-
System.out.println(ip);
-
-
String regex="(\\d+)";
-
ip=ip.replaceAll(regex,"00$1");
-
System.out.println(ip);
-
-
regex="0*(\\d{3})";
-
ip=ip.replaceAll(regex,"$1");
-
System.out.println(ip);
-
-
regex=" ";
-
String[] arr=ip.split(regex);
-
-
-
TreeSet<String > ts=new TreeSet<String>();
-
for (String str : arr )
-
{
-
ts.add(str);
-
}
-
-
regex="0*(\\d)";
-
for (String s : ts)
-
{
-
System.out.println(s.replaceAll(regex,"$1"));
-
}
-
}
-
}
Exercise 3
-
-
-
class CheckMail
-
{
-
public static void main(String[] args)
-
{
-
String mail="123a809bc@sina.com.cn";
-
String regex="\\w+@[a-zA-Z0-9]+(\\.[a-zA-Z]+){1,3}";
-
regex="\\w+@\\w+(\\.\\w+)+";
-
-
boolean b=mail.matches(regex);
-
System.out.println(b);
-
}
-
}
Exercise 4
-
-
-
-
-
-
-
-
import java.net.*;
-
import java.util.regex.*;
-
import java.io.*;
-
-
class Spider
-
{
-
public static void main(String[] args)throws Exception
-
{
-
-
getWebMail();
-
-
}
-
-
-
public static void getWebMail()throws Exception
-
{
-
-
URL url=new URL("http://tieba.baidu.com/p/1390896758");
-
-
URLConnection conn=url.openConnection();
-
-
BufferedReader br = new BufferedReader(new InputStreamReader(conn.getInputStream()));
-
String line=null;
-
-
-
String regex="\\w+@\\w+(\\.\\w+)+";
-
Pattern p=Pattern.compile(regex);
-
-
while ((line=br.readLine())!=null)
-
{
-
-
Matcher m=p.matcher(line);
-
-
while (m.find())
-
{
-
System.out.println(m.group());
-
}
-
}
-
}
-
-
-
public static void getFileMail()throws Exception
-
{
-
-
File file=new File("E:\\Java Study\\Practice\\day25\\mail.txt");
-
-
BufferedReader br=new BufferedReader(new FileReader(file));
-
String line=null;
-
-
-
String regex="\\w+@[a-zA-Z]+(\\.[a-zA-z]+)+";
-
-
Pattern p=Pattern.compile(regex);
-
-
-
while ((line=br.readLine())!=null)
-
{
-
-
-
Matcher m=p.matcher(line);
-
while (m.find())
-
{
-
System.out.println(m.group());
-
}
-
}
-
}
-
}