Java Career - Java Foundation - Regular Expressions

Keywords: Java Mobile Programming network

regular expression

I. overview

1. Concept: Expressions that conform to certain rules.

Function: Used for special operation of strings.

3. Characteristic: Used for some specific symbols to represent some code operations, which can simplify writing. So learning regular expressions is learning the use of some special symbols.

4. Benefits: Complex operations on strings can be simplified.

5. Disadvantage: The more definitions are met, the longer the rules are, the worse the reading ability is.

 

Common Symbols

Description: X represents character X or matching rules.

1, characters

X character x

\ Backslash character

\ t tab ('u 0009')

\ n) New line (newline) character ('000A')

\ r) Return Character ('u 000D')

\ f) Page breaks ('u 000C')

\ a) Alarm (bell) character ('0007')

2. Character class

[a b c] * a, b or C (simple class)

Any character except a, b or c (negative)

(a-zA-Z). A to Z or A to Z, the letters at both ends are included (range)

(a-d[m-p]) A to D or m to p:[a-d m-p] (union)

[a-z & D E f] * d, e or F (intersection)

[a-z & & [^ b c]] a to z, except b and c:[ad-z] (subtracted)

[a-z & & & [^ m-p]] a to z, not m to p: [a-lq-z] (subtracted)

3. Predefined Character Classes

Any character (which may or may not match the line terminator)

\ d) Number: [0-9]

\ D) Non-numeric: [^ 0-9]

\ Blank character s: [ t n x0B f r]

\ Non-blank characters: [^\ s]

\ w) Word characters: [a-zA-Z_0-9]

\ Non-word characters: [^\ w]

4. Boundary Matcher

^ The beginning of the line

The End of a Line

\ b) Word boundaries

\ B) Non-word boundaries

\ A) The beginning of the input

\ G) The End of the Last Match

\ Z) The end of the input is used only for the last terminator (if any)

\ End of input

5. Greedy quantifiers

X? X? X, not once or once.

X* * X, zero or more times

X+* X, once or more

X{n} X, exactly n times

X{n,} X, at least N times

X{n,m} X, at least n times, but not more than m times

6. Group and capture

The capture group can be numbered by calculating its open brackets from left to right. For example, in the expression ((A) (B (C)), there are four such groups:

                    1     ((A)(B(C)))

                    2     \A

                    3     (B(C))

                    4     (C)

Group zeros always represent the entire expression. In substitution, $matches the content of the group.

 

3. Specific Functions of Regular Expressions

There are four specific functions: matching, cutting, replacement and acquisition.

1. Matching: The Boolean matches (String regex) method in the String class. The whole string is matched by rules, and if one of them does not conform to the rules, the matching ends and false is returned.

Example:

  1. class  MatchesDemo  
  2. {  
  3.     /* 
  4.     Check the QQ number 
  5.     Requirements: 5-15.0 can not start, only digital 
  6.     */  
  7.     //Way one, no regular expressions  
  8.     public static void qqCheck_1(String qq)  
  9.     {  
  10.         if (!qq.startsWith("0"))  
  11.         {  
  12.             if (qq.length()>=5&&qq.length()<=15)  
  13.             {  
  14.                 try  
  15.                 {  
  16.                     Long l=Long.parseLong(qq);//The characteristics of non-digital anomaly using encapsulated basic data types  
  17.                     System.out.println(qq);  
  18.                 }  
  19.                 catch (NumberFormatException e)  
  20.                 {  
  21.                     System.out.println("Contains illegal characters!");  
  22.                 }  
  23.             }  
  24.             else  
  25.                 System.out.println("The length of your input is illegal!");  
  26.         }  
  27.         else  
  28.             System.out.println("No number starting at 0, please retype it!");  
  29.     }  
  30.       
  31.     //The second way is to use regularization to realize it.  
  32.     public static void qqCheck_2(String qq)  
  33.     {  
  34.         String regex="[1-9]\\d{4,14}";  
  35.         if (qq.matches(regex))//Matching with matches method in String class  
  36.         {  
  37.             System.out.println(qq);  
  38.         }  
  39.         else  
  40.             System.out.println(qq+":It's an illegal number!");  
  41.   
  42.     }  
  43.   
  44.   
  45.     /* 
  46.         matching 
  47.         Mobile phone number is only 13xxx, 15xxx, 18xxxx 
  48.     */  
  49.   
  50.     public static void phoneCheck(String phone)  
  51.     {  
  52.         String regex="1[358]\\d{9}";  
  53.         if (phone.matches(regex))  
  54.         {  
  55.             System.out.println(phone+":::is ok..");  
  56.         }  
  57.         else  
  58.             System.out.println("There is a mistake in the input of mobile phone number!");  
  59.     }  
  60.   
  61.   
  62.     public static void main(String[] args)   
  63.     {  
  64.         String qq="125696";  
  65.         qqCheck_1(qq);//Not in a regular way  
  66.         qqCheck_2(qq);//In a regular manner  
  67.   
  68.         String phone="13345678910";  
  69.         phoneCheck(phone);//Is the matching phone number correct?  
  70.     }  
  71. }  

2. Cutting: String[]split(String regex) method in String class.

Example:

  1. class SplitDemo   
  2. {  
  3.   
  4.     public static void main(String[] args)   
  5.     {  
  6.         String regex1="\\.";//Press.  
  7.         String regex2=" +";//Cut by spaces, there may be one or more spaces  
  8.         String regex3="(.)\\1+";//Cut according to reduplicated words that appear twice or more  
  9.         String[] arr="192.168.1.62".split(regex1);//Press.  
  10.         print(arr);  
  11.   
  12.         arr ="wo  shi   shui    545  21     3".split(regex2);//Cut by space  
  13.         print(arr);  
  14.   
  15.         arr="erkktyqqquizzzzzo".split(regex3);//According to overlapping words  
  16.         print(arr);   
  17.     }  
  18.   
  19.     //ergodic  
  20.     public static void print(String[] arr)  
  21.     {  
  22.         for (String s : arr)  
  23.         {  
  24.             System.out.println(s);  
  25.         }  
  26.     }  
  27. }  

Explain:

Cutting by reduplication: In order to reuse rules, rules can be encapsulated into a group and completed with (). Groups are numbered, starting from 1. If you want to use an existing group, you can get it in the form of n (n is the number of the group).

For matching characters in a group, $n can be used to obtain them. The $denotes the end of a row in a regular, so it cannot be used to denote a group in a regular, and is generally used in substitutions. In the following functions.

3. Replacement: String replacement All (String regex, String replacement) method.

Example:

  1. class ReplaceDemo   
  2. {  
  3.     public static void main(String[] args)   
  4.     {  
  5.         String regex1="\\d{5,}";//Replace numbers in strings with#  
  6.         String regex2="(.)\\1+";//Replace reduplicated words with one  
  7.   
  8.         String s1="erej569831217woshi2659810wosxs12356f";  
  9.         s1=s1.replaceAll(regex1,"#");//Replace numbers in strings with#  
  10.   
  11.         String s2="erkktyqqquizzzzzo";  
  12.         s2=s2.replaceAll(regex2,"$1");//Replace a reduplicated word with one, where $1 represents a character in the matching group  
  13.   
  14.         System.out.println("s1:"+s1);  
  15.         System.out.println("s2:"+s2);  
  16.     }  
  17. }  

4. Acquisition: Remove the regular substrings from the string.

Operation steps:

1) Encapsulate regular expressions into objects.

2) associate regular objects with strings to be manipulated.

3) After association, get the regular matching engine.

4) The engine operates on the regular substrings, such as extracting them.

Example:

  1. import java .util.regex.*;  
  2. class  PatternDemo  
  3. {  
  4.     public static void main(String[] args)   
  5.     {  
  6.         String s= "ming tian jiu yao fang jia le ,da jia. ";  
  7.         String regex="\\b[a-z]{4}\\b";  
  8.         get(s,regex);  
  9.     }  
  10.     public static void get(String s,String regex)  
  11.     {  
  12.         //Encapsulate rules as objects.  
  13.         Pattern p=Pattern.compile(regex);  
  14.         //Let the regular object be associated with the string to be used. Gets the matcher object.  
  15.         Matcher m=p.matcher(s);  
  16.   
  17.         //System.out.println(m.matches());  
  18.         //In fact, the matches method in the String class. It's done with Pattern s and Matcher objects.  
  19.         //It is simply encapsulated by String's method. But the function is single.  
  20.   
  21.         while(m.find())//The find() method is to apply rules to strings and perform regular substring lookups.  
  22.         {  
  23.             System.out.println(m.group());//The group() method is used to obtain the matched results.  
  24.             System.out.println(m.start()+"...."+m.end());  
  25. //start() and end() represent the index of the beginning and end of the matching character, respectively  
  26.         }  
  27.     }  
  28. }  

 

Four, practice

Choice of four functions (way of thinking):

1) If you only want to know whether the character is right or wrong, use matching.

2) Want to replace an existing string with another string.

3) Want to change strings into multiple strings in a customized way. Cut. Gets substrings other than rules.

4) Want to get the required string substrings, get them. Gets a regular substring.

Exercise 1

  1. /* 
  2. Practice: 
  3. Requirement: Convert the following strings into: I want to learn programming 
  4. "I, I... I... I... I want... I want... I want... I want... I want... I want to... I want to... I want to... I want to... I want to... I want to... I want to... I want to... I want to... I want to... I want to“ 
  5.  
  6. Train of thought: 
  7. Change an existing string into another string. Use the replacement function. 
  8. 1,You can remove it first. 
  9. 2,Then turn multiple repetitive content into a single content. 
  10.  
  11. */  
  12. class  ReplaceTest  
  13. {  
  14.     public static void main(String[] args)   
  15.     {  
  16.         String s="Me and me...I..I want...want...Must....Science....Science......Editing and editing...Cheng...Cheng Cheng....";  
  17.         System.out.println(s);  
  18.   
  19.         String regex="\\.+";//First remove.  
  20.         s=s.replaceAll(regex,"");//Remove.  
  21.         System.out.println(s);  
  22.   
  23.         regex="(.)\\1+";//Turn duplicated content into a single content  
  24.         s=s.replaceAll(regex,"$1");//Duplicate removal  
  25.         System.out.println(s);  
  26.     }  
  27. }  

Exercise 2

  1. /* 
  2. Demand: 
  3. Sort the ip addresses in the order of address segments. 
  4. 192.68.1.254 102.49.23.013 10.10.10.10 2.2.2.2 8.109.90.301 
  5.  
  6. Train of thought: 
  7. It also follows the natural sequence of strings, as long as they are 3 bits per paragraph. 
  8. 1,Complete each paragraph according to the maximum 0 required, then each paragraph will be guaranteed at least three. 
  9. 2,Only three bits are reserved for each paragraph. In this way, all ip addresses are 3 bits per segment. 
  10.  
  11. */  
  12. import java.util.*;  
  13.   
  14. class  IPSortTest  
  15. {  
  16.     public static void main(String[] args)   
  17.     {  
  18.         String ip="192.68.1.254 102.49.23.013 10.10.10.10 2.2.2.2 8.109.90.301";  
  19.         System.out.println(ip);  
  20.   
  21.         String regex="(\\d+)";  
  22.         ip=ip.replaceAll(regex,"00$1");//Make sure there are at least three in each paragraph.  
  23.         System.out.println(ip);  
  24.   
  25.         regex="0*(\\d{3})";  
  26.         ip=ip.replaceAll(regex,"$1");//Keep only three in each paragraph  
  27.         System.out.println(ip);  
  28.   
  29.         regex=" ";  
  30.         String[] arr=ip.split(regex);//Cut by space  
  31.           
  32.         //Define a TreeSet collection, using natural sorting of elements  
  33.         TreeSet<String > ts=new TreeSet<String>();  
  34.         for (String str : arr )  
  35.         {  
  36.             ts.add(str);//Add to  
  37.         }  
  38.           
  39.         regex="0*(\\d)";//Replace the superfluous 0 in front of each paragraph.  
  40.         for (String s : ts)  
  41.         {  
  42.             System.out.println(s.replaceAll(regex,"$1"));//Replace the superfluous 0 in front of each paragraph.  
  43.         }  
  44.     }  
  45. }  

Exercise 3

  1. //Requirement: Check the email address.  
  2.   
  3. class  CheckMail  
  4. {  
  5.     public static void main(String[] args)   
  6.     {  
  7.         String mail="123a809bc@sina.com.cn";  
  8.         String regex="\\w+@[a-zA-Z0-9]+(\\.[a-zA-Z]+){1,3}";//More precise  
  9.         regex="\\w+@\\w+(\\.\\w+)+";//Relatively imprecise matching.  
  10.   
  11.         boolean b=mail.matches(regex);  
  12.         System.out.println(b);  
  13.     }  
  14. }  

Exercise 4

  1. /* 
  2. Web crawler (spider) 
  3. It's actually a function for collecting specific information on the network. 
  4. Requirement: Can be used to collect information such as mailbox, qq number, etc. 
  5. Application: For example, searching blogs by keywords is actually the "spider" used to get related blogs by searching keywords. 
  6. */  
  7.   
  8. import java.net.*;  
  9. import java.util.regex.*;  
  10. import java.io.*;  
  11.   
  12. class  Spider  
  13. {  
  14.     public static void main(String[] args)throws Exception  
  15.     {  
  16.         //getFileMail();  
  17.         getWebMail();  
  18.           
  19.     }  
  20.   
  21.     //Get mail from a web page  
  22.     public static  void getWebMail()throws Exception  
  23.     {  
  24.         //Packing Web Address  
  25.         URL url=new URL("http://tieba.baidu.com/p/1390896758");  
  26.         //Connect servers  
  27.         URLConnection conn=url.openConnection();  
  28.         //Page Read Stream with Buffer  
  29.         BufferedReader br = new BufferedReader(new InputStreamReader(conn.getInputStream()));  
  30.         String line=null;  
  31.           
  32.         //Define regular expressions that match mail addresses  
  33.         String regex="\\w+@\\w+(\\.\\w+)+";  
  34.         Pattern p=Pattern.compile(regex);//Encapsulating regular expressions  
  35.         //Read Web Page Data  
  36.         while ((line=br.readLine())!=null)  
  37.         {  
  38.             //Regular Associated Data  
  39.             Matcher m=p.matcher(line);  
  40.             //Find a matching mailbox  
  41.             while (m.find())  
  42.             {  
  43.                 System.out.println(m.group());//Output matching mailbox  
  44.             }         
  45.         }     
  46.     }  
  47.   
  48.     //Gets the mail address in the specified document. Use the capture function. Pattern Matcher  
  49.     public static void getFileMail()throws Exception  
  50.     {  
  51.         //Encapsulating files into objects  
  52.         File file=new File("E:\\Java Study\\Practice\\day25\\mail.txt");  
  53.         //Create read streams with buffers  
  54.         BufferedReader br=new BufferedReader(new FileReader(file));  
  55.         String line=null;  
  56.   
  57.         //Define regular expressions  
  58.         String regex="\\w+@[a-zA-Z]+(\\.[a-zA-z]+)+";  
  59.         //Create Pattern s objects to encapsulate regular expressions  
  60.         Pattern p=Pattern.compile(regex);  
  61.   
  62.         //Read the data in the file  
  63.         while ((line=br.readLine())!=null)  
  64.         {     
  65.               
  66.             //Shutdown string  
  67.             Matcher m=p.matcher(line);  
  68.             while (m.find())//Find a matching string  
  69.             {  
  70.                 System.out.println(m.group());//Output Matched Strings  
  71.             }  
  72.         }  
  73.     }  
  74. }  

Posted by devstudio on Thu, 18 Apr 2019 09:54:34 -0700