How to Break the Free Anti-Climbing Mechanism

Keywords: Python github

I see a question in my mind, which probably means why I can't get the information of rental price by using xpath. The link to the question is here:

Question address

When I saw the problem, I thought it was easy to solve. I thought it would be over soon after I finished writing the answer. I found myself too young too simple. The website to crawl is from Free rental.

Begin to answer this question, as follows:

When I saw your question, I wanted to introduce xpath, but I found myself too young too simple. It seems that Sprite is used to display the price in order to climb back freely, and the most important thing is that the display order of the numbers in the Sprite is random. Every refresh will change a picture.

What is Sprite Map

What is Sprite Map? Simply put, by combining all the pictures into one big picture, and then displaying a part of the picture in a shifted way. The benefits of Sprite are gone. And the purpose of freely using Sprite Map is just to climb back.

To see how Sprite works, let's take a look at the sprite chart which is used to display the price freely, as follows:

All the figures are put together on a map.

Price display

So what to do to show the price, how to write the front-end code?
The HTML section is as follows:

<p value="" class="price">
  <span style="background-position:1000px" class="num rmb">¥</span>
  <span style="background-position:-240px" class="num"></span>
  <span style="background-position:-210px" class="num"></span>
  <span style="background-position:-150px" class="num"></span>
  <span style="background-position:-210px" class="num"></span>
  <span class="gray-6"> (monthly)</span>
</p>

Mainly through the css setting background-position settings picture shift display different numbers.

What about Sprite? There is no code to set up pictures here. Then look at the CSS section, as follows:

body.ratio2 .price span.num {
    background-size: auto 30px;
    background-image: url(//static8.ziroom.com/phoenix/pc/images/price/e05092a2f84c9cca5e4d881535072ae1.png);
}

background-image sets the background image displayed. We can intercept the url and prefix http as follows:

http://static8.ziroom.com/pho...

If you visit the address, you will get a picture similar to the one at the beginning, as follows:

Note: I don't know if these pictures will be cleaned up frequently. If you can't open the picture when you check the answer, you can go to the free website to review it again.

So with this chart, how do you show the price? This is where the embedded css in html works. Look again at the html code that displays the price:

<p value="" class="price">
  <span style="background-position:1000px" class="num rmb">¥</span>
  <span style="background-position:-240px" class="num"></span>
  <span style="background-position:-210px" class="num"></span>
  <span style="background-position:-150px" class="num"></span>
  <span style="background-position:-210px" class="num"></span>
  <span class="gray-6"> (monthly)</span>
</p>

Let's take a look first. What's the page shown in the code above? As follows:

The price of the display is 2090, and then continue to look at the sequence of numbers in Sprite, background-position in html code and display size of css picture (30px), you can launch, the relationship between the display number and background-position is as follows:

0px       1
-30px     7
-60px     4
-90px     3
-120px    5
-150px    9
-180px    8
-210px    0
-240px    2
-270px    6

code implementation

If Sprite is fixed, we can write code similar to the following:

position_text_map = {
    "background-position:0px": 1,
    "background-position:-30px": 7,
    "background-position:-60px": 4,
    "background-position:-90px": 3,
    "background-position:-120px": 5,
    "background-position:-150px": 9,
    "background-position:-180px": 8,
    "background-position:-210px": 0,
    "background-position:-240px": 2,
    "background-position:-270px": 6
}

price = 0
for span_selector in price_selector.xpath("/span[@class='num']"):
    position = span_selector.xpath('//div/@style')[0]
    price = price * 10 + position_text_map[position]
print(price)

The final price can then be calculated.

Random order

But unfortunately, it's not that simple. Sprite is randomly generated every time, so only the website knows how many numbers each position corresponds to, but we can't.

So, can't it be solved? Of course not, ocr technology is needed at this time, that is, image transcription. Fortunately, the price needs to be recognized by human eyes, so it is not as strange as the verification code. We can find some solutions from github.

For example, the use of tesseract provides a set of solutions for image and character recognition. github is as follows:

tesseract-ocr/tesseract

We can also find its corresponding python package:

sirfz/tesserocr

As long as we succeed in identifying the sequence of numbers in Sprite, the following things will be easy to do.

summary

Simply put, in fact, it is the price of each number into a picture display. One of the most important technologies used is Sprite Map. In this way, the specific text can be converted into the corresponding css, similar to some encryption effect. Finally, the anti-climbing is realized.

Posted by p-co on Tue, 06 Aug 2019 20:11:50 -0700