Experience Recall (Manual) => Python Version - Docker uses a simple example of selenium

Keywords: Linux Selenium Docker Python pip

Dockerfile is as follows

FROM python
RUN pip install -i http://pypi.douban.com/simple \
    requests selenium retrying --trusted-host pypi.douban.com

docker-compose.yaml reads as follows

version: "3.7"
services:
  myspider:
    build: .
    volumes:  # Data Volume Mapping
      - /root/mycode:/root/mycode
    command: python /root/mycode/1.py
    # Depending on the selenium service below, note that this dependency can only do so
    # selenium service starts first, myspider service starts later (some service internal programs start fast, some slow)
    # Basically, it can not solve the problem of complete dependence, so we can use delay processing and other methods.
    depends_on:
      - selenium
  selenium:
    image: selenium/standalone-chrome # Draw mirror to complete automatic configuration
    ports:
      - "4444:4444"
    shm_size: 2g  # Setting Host Shared Memory 2g
    hostname: selenium    # Other containers can use this name to access eg: http://selenium:4444/

The crawler script code 1.py is as follows

import requests
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from retrying import retry

# Note: Dependency_on in docker-compose.yaml mentions:
# Services can play a dependent role.
# And the startup process in the service can not achieve complete dependence (who is fast and who is slow lucky when the startup speed is comparable)

# Sequence can be controlled by adding delay.
# import time    
# time.sleep(3)    # This sleep delay, more or less inaccurate, can be replaced by the following retrying

# It can also be decorated by retrying module.
# retrying Usage for reference  https://segmentfault.com/a/1190000019301761#articleHeader17
@retry(
    stop_max_attempt_number = 10000,
    stop_max_delay = 10*1000,
)
def verify_request():
    response = requests.get("http://selenium:4444", timeout=0.5)
    print(response)
verify_request()

# The following is basically a fixed way to connect Docker Selenium services, which can be used as a template set
with webdriver.Remote(
    command_executor='http://Selenium: 4444 / WD / hub',  selenium is the host name of docker-compose
    desired_capabilities=DesiredCapabilities.CHROME
) as driver:
    driver.get('http://www.baidu.com')
    # Absolute paths are used here, otherwise data volume mapping fails
    # The mapping section is in the volume section of docker-compose.yaml above.
    with open('/root/mycode/test.html', 'w') as f:    
        f.write(driver.page_source)
        print('Write successfully')

Trample

selenium has a server-side program, so we can deploy it in a remote "Docker container for cloud servers"
After container deployment...
"Only accessible from cloud servers,
Not accessible on remote servers.
(
    In fact, there is no need to access remote servers, some evil idea, let me take a detour... Want remote access
    In fact, the code is also deployed in the container, container interoperability is completely OK.
    But if you want to try remote access, it really can't.
)"

(
    In my opinion, since the cloud server host can access the server-side programs started inside the container 
    The remote server can't access the server-side programs started inside the container... That must be the connection configuration problem between container and host.
    With this train of thought, I searched for a long time, and then I found an egg.
)

I can't help it. Scientifically surf the Internet, search for solutions to this problem.
Later, it was discovered by accident that the client could be connected now. 
Later, the test, wow, really needs scientific Internet access to remote access, this server...

But I still don't understand why my cloud server host can successfully access the server side of the internal container without scientific Internet access.
(Although this doubt is unnecessary)

Posted by DarkJamie on Wed, 14 Aug 2019 00:39:35 -0700