SpringBook code structure:
Required pom dependencies
Design of database tables
Say no more, put the code first: Meteorological Service
public void testReadByDoc(String path) throws Exception { Meteorological meteorological = new Meteorological(); String [] content =null; //Take the subscript of the current. field int i = path.indexOf("."); //Read the file to the stream InputStream is = new FileInputStream(path); if(path.length()-i==4){ //doc file HWPFDocument doc = new HWPFDocument(is); Range range = doc.getRange(); //Read word into paragraph format content =MeteorologicalUtil.printInfo(range); }else { // docx file XWPFDocument xdoc = new XWPFDocument(is); content =MeteorologicalUtil.printInfox(xdoc); } //Remove empty passages String[] contenta = MeteorologicalUtil.removeArrayEmptyTextBackNewArray(content); //Take the length of the array int len =contenta.length; Date time = MeteorologicalUtil.getTime(contenta); //Time to get weather forecasts String s = contenta[len - 6]; if(s.contains("and")){ //Determine whether to include and meteorological.setAlert(contenta[len-5]); meteorological.setWeather(contenta[len-4]); } String minimum = contenta[len - 1].substring(0, contenta[len - 1].length() - 1); String maximum = contenta[len - 2].substring(0, contenta[len - 2].length() - 1); String windforce = contenta[len - 3].substring(0, contenta[len - 3].length() - 1); meteorological.setWeather(contenta[len-4]); meteorological.setMaximum(maximum); meteorological.setMinimum(minimum); meteorological.setNowtime(time); meteorological.setWindforce(windforce); meteorologicalMapper.insert(meteorological); //Encapsulation and Preservation is.close(); }
This code step:
First, parse the word document, read the content of the document in the form of paragraphs, and then get the information in the word document.
Since there may be spaces and carriage returns in the document that affect how we process the document according to paragraphs, we need to remove these possible impacts on our code (part of the code is posted below).
At this time, the de-duplication part has been solved, and we can process the next step according to the data we get (processing code is also posted below).
When the solution is completed, the data is put into the object and stored in the database.
Because there are still timers (timers included in spring), we need to add a timer. Well, let's not say much. Now we start posting processing code.
Because of code problems, some methods are mentioned in util tool classes
public class MeteorologicalUtil { public static String [] printInfo(Range range) { //Get the number of paragraphs int paraNum = range.numParagraphs(); String [] paragraphArr =new String[paraNum]; for (int i=0; i<paraNum; i++) { paragraphArr[i] =range.getParagraph(i).text(); } return paragraphArr; } public static String [] printInfox(XWPFDocument xwpfDocument) { //Get the number of paragraphs int paraNum =xwpfDocument.getParagraphs().size(); String [] paragraphArr =new String[paraNum]; List<XWPFParagraph> paragraphs = xwpfDocument.getParagraphs(); for(int i =0 ;i<paraNum;i++){ paragraphArr[i] =paragraphs.get(i).getParagraphText(); } return paragraphArr; } /** * Obtaining meteorological time * @param arr * @return */ public static Date getTime(String [] arr){ SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy year MM month dd day HH:mm"); simpleDateFormat.setTimeZone(TimeZone.getTimeZone("GMT+0")); //Eliminate time difference for (String ph :arr){ String timeStr = patternTime(ph); if(timeStr!=null){ try { return simpleDateFormat.parse(timeStr); } catch (ParseException e) { e.printStackTrace(); return null; } } } return null; } /** * Remove empty passages * @param strArray * @return */ public static String[] removeArrayEmptyTextBackNewArray(String[] strArray) { //Converting arrays to list objects List<String> strList= Arrays.asList(strArray); List<String> strListNew=new ArrayList<>(); for (int i = 0; i <strList.size(); i++) { strList.set(i,strList.get(i).replaceAll("\b","").replaceAll("\r","")); //strList.set(i,strList.get(i).substring(0,strList.get(i).length()-1)); if (strList.get(i)!=null&&!strList.get(i).equals("")){ strListNew.add(strList.get(i)); } } //Converting a collection into an array String[] strNewArray = strListNew.toArray(new String[strListNew.size()]); return strNewArray; } public static String patternTime(String content){ //In the format of **** year ** month ** day ** year * month * day, we can change different filtering rules, and filter regular expressions in different formats to match the time in the text. Pattern pattern = Pattern.compile("((([0-9]{4})year([0-9]{2}|[1-9]))month([0-9]{2}|[1-9]))day([0-9]{2}|[1-9]):([0-9]{2}|[1-9])"); //Attempt to extract data of this type Matcher matcher = pattern.matcher(content); if (matcher.find()) { //Determine whether the text finds a regular string and extract it String str_ymd = matcher.group(0); return str_ymd; } return null; } }
Well, the code has been put on, and then we need to add a timer. (This timer is easy to understand because it comes with spring.)
@Configuration //The declaration is a configuration class @EnableScheduling //Open Timing Tasks public class MeteorologicalTask { @Autowired MeteorologicalService meteorologicalService; // @ Scheduled (cron = 0 01 * *?)// Execution cycle (executed at 1:00 a.m. every day) (do not know how to handle the timing, you can see Cron online) @Scheduled(cron = "*/5 * * * * ?")//Execute every 5 seconds public void work() { //File path File file = new File("C:\\Users\\qps12\\Desktop\\Meteorological Bureau 2"); //Get all files or folders under the folder File[] fileList = file.listFiles(); for (int i = 0; i < fileList.length; i++) { if (fileList[i].isFile()) { //Check only files. And traverse File currentFile = fileList[i]; String path = currentFile.getAbsolutePath(); //Absolute path of current file try { meteorologicalService.testReadByDoc(path); //Perform parsing, encapsulating, and saving data } catch (Exception e) { e.printStackTrace(); return; } currentFile.delete(); //Delete files } } }
}
Note: This method is only applicable to document processing with almost identical format type. If you want to deal with some documents without rules, you'd better use the method of fuzzy matching (not studied yet, so you won't haha). In addition, I paste the documents that I handle myself: