National VAT invoice inspection platform of State Administration of Taxation
https://inv-veri.chinatax.gov...
Recently, my friend has a new requirement, that is, to do a crawler for invoice verification. Because there are some unfriendly anti crawls on this website, it's very unfriendly to novices ~ ~ ~ so I spent some time on Saturday to be healthy.
The difficulty is not bad. Through the analysis, it is the enterprise version of sojson. It may be the latest version of v6, or it may be v5, and then a webdriver test is added, because it needs to be charged to use the enhanced version of v6 against headless browser. So no matter him, black cat and white cat are 🐱
<!--more-->
debugger
I gave you a gift at the beginning. In fact, it's OK. There are so many debugger articles that we can't do more. Here's mainly the analysis process, so it's a little bit...
After debugger's test, I was clean and comfortable, and my mother didn't have to worry about my heartless interruption when I was debugging.
js encryption version judgment
Although sojson = obobobobfuscation + code written by oneself
Generally speaking, ob obfuscation doesn't have a debugger. It looks like a sojson thief. That's him. After reading the sojson advertisement, you can be sure that it is a customized version
Here is a screenshot of sojsonv6 anti obfuscation, the same world, the same routine...
Simple js processing
In April and may, I learned a little bit about the AST processing of js, and then I simply dealt with it.
Processing strings are extracted as method calls & & processing extracted operators and deleting
Handling the flattening process
Then replace it with charles. I have written about how to eat charles before... Because I'm just withholding those codes, I'm confusing them. I'm already familiar with dealing with sojson. I'm just beeping...
Then refresh it, repeat the debugger method, and then make the next breakpoint on the network, and you will find the whole process.
Initialization phase
Getting the address of different provinces may be different
General process
Add ajaxSetup here. In fact, this is what I found when I pushed back the process. Then I like to mention url signature...
RSA
JSEncrypt
The public library implemented by RSA in js is only JSEncrypt and nodejs version based on JSEncrypt redevelopment. No other standardized RSA has been found yet. But they rely on the browser's internal crypto or the crypto implemented by nodejs. The RSA implementation of a specific library that does not depend on the system has not been found yet, and the built-in js engine in Java cannot work normally. So this step is to get this value for Java to implement RSA encryption.
Java implementation of RSA algorithm
package cn.gov.chinatax.utils; import sun.misc.BASE64Decoder; import javax.crypto.BadPaddingException; import javax.crypto.Cipher; import javax.crypto.IllegalBlockSizeException; import javax.crypto.NoSuchPaddingException; import java.io.IOException; import java.security.InvalidKeyException; import java.security.KeyFactory; import java.security.NoSuchAlgorithmException; import java.security.PublicKey; import java.security.spec.InvalidKeySpecException; import java.security.spec.X509EncodedKeySpec; import java.util.Base64; /** * @Description * @auther Gouzai * @create 2020-06-05 18:44 */ public class RSA { public static String encryp(String str,String key) { try { X509EncodedKeySpec bobPubKeySpec = new X509EncodedKeySpec(new BASE64Decoder().decodeBuffer(key)); // RSA symmetric encryption algorithm KeyFactory keyFactory = KeyFactory.getInstance("RSA"); // Public key taking object PublicKey publicKey = keyFactory.generatePublic(bobPubKeySpec); Cipher cipher = Cipher.getInstance("RSA"); cipher.init(Cipher.ENCRYPT_MODE, publicKey); byte[] bytes = cipher.doFinal(str.getBytes()); return Base64.getEncoder().encodeToString(bytes); } catch (IOException e) { e.printStackTrace(); } catch (NoSuchPaddingException e) { e.printStackTrace(); } catch (NoSuchAlgorithmException e) { e.printStackTrace(); } catch (IllegalBlockSizeException e) { e.printStackTrace(); } catch (BadPaddingException e) { e.printStackTrace(); } catch (InvalidKeyException e) { e.printStackTrace(); } catch (InvalidKeySpecException e) { e.printStackTrace(); } return null; } }
RSA KEY
MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCXY6ndiMJE7wF0qg9emVQik7FnCBidCr8V+yG/++iN/CwV0Rfe81wnjg2I23nbLJVuT63Y1T4x2etNr58BTHuzrCRy8gj3HPaS0GSGuiN7EWI1s0Bg6N78nvStPxeinyD8Qh3Bqa+5Z014nbOqn20kW4d3efLAeI7A6yc2uMPvfwIDAQAB
Get verification code
His verification code is bound to invoice code and invoice number
Decryption verification code
After decryption, get the base64 image, check the time, and enter the type of verification code
Verification code identification
The verification code here uses the customized identification method of fish guide, and there is a test interface below. However, in order to facilitate the test, all of them will have up to 500 identification opportunities every day, which is enough for the test. One invoice can only be queried five times a day...
Since the verification code recognition rate is more than 98%, the query failure caused by the error of the verification code is basically not seen.
Get invoice information
Here is an fplx (invoice type), which can be searched globally. It is in the same file as the js to get the server configuration
Sign first
Organize the splicing code and then pass it in
Temporary storage
Dynamic generation of js code for splicing
Get initialization data
Parse data
Set to text
Show it's done
The whole process is finished. Short for one-stop service...
effect
There are two types of split tickets in the test: the first one has no detailed list (magpie tower tea restaurant). The second has a detailed list (Wal Mart)
Colored egg
This website mainly tests
sojson's routine, directly passed...
Some libraries have been modified, such as Base64. There are several Base64 in them. Never confuse them. If they are confused, GG
testing window.navigator.webdriver
The product of available width and height of the screen is judged by a critical value
Code deduction skills
No matter where the button is used, mom doesn't have to worry about me anymore
pit
java String split is different from js...
Are government programmers so boring??? It's all arrays. Don't you want to update it?... The whole process of updating a field must be changed...
The name of the parameter is actually the first one in Chinese Pinyin, such as pflx (invoice type), fpdm (invoice code), fphm (invoice number)...