problem
Using Linux server in k8s cluster linuxserver/docker-calibre-web The image is deployed janeczku/calibre-web , after 211011 upgraded the latest image, it was found that the web page was frequently unresponsive: the browser tab continued to rotate, and the timeout would not be reported until a long time later, and all requests could not be completed normally since then.
After many attempts, it is found that the operation method of reproducing the problem in the front end is to edit books, obtain metadata and click save.
Check
-
First, directly use kubectl port forward calibre-6c5c84fd4f-z2mgb 8083:12345 proxy to access calibre's pod in the cluster, bypass other network components in the cluster, such as traefik, and eliminate the problem of the cluster itself. The problem is still found, and the cluster network problem is preliminarily eliminated.
-
At this time, you need to look at the calibre Web Log: after some search and research, it is found that the log of the project will not be output to stdout, but the calibre-web.log file located in the project directory. Accordingly, the location in the container is / app/calibre-web/calibre-web.log
-
You can see many duplicate log blocks in the log file as follows. It is initially suspected that google scholar was requested when editing the metadata of the book, but the status code 403 was returned, and the logic of the relevant code was wrong when processing the request for retry, resulting in infinite retry, which further blocked the main thread of the web server.
INFO {scholarly:116} Got an access denied error (403). [2021-10-13 10:10:29,801] INFO {scholarly:118} No other connections possible. [2021-10-13 10:10:29,801] INFO {scholarly:124} Will retry after 80.11549845444705 seconds (with another session). [2021-10-13 10:11:54,220] INFO {scholarly:105} Session proxy config is {}
-
You need to view it now janeczku/calibre-web If pip installs the scholarly library, it will request Google Scholar, otherwise it will skip. The project puts some additional pip dependencies into a separate requirements file, optional-requirements.txt.
# Improve this to check if scholarly is available in a global way, like other pythonic libraries try: from scholarly import scholarly have_scholar = True except ImportError: have_scholar = False @editbook.route("/scholarsearch/<query>",methods=['GET']) @login_required_if_no_ano @edit_required def scholar_search(query): if have_scholar: scholar_gen = scholarly.search_pubs(' '.join(query.split('+'))) i=0 result = [] for publication in scholar_gen: del publication['source'] result.append(publication) i+=1 if(i>=10): break return Response(json.dumps(result),mimetype='application/json') else: return "[]"
-
see linuxserver/docker-calibre-web From the Dockerfile in, you can see that optional-requirements.txt is indeed installed by default.
Local replication
-
For the first time, start calibre web locally and only install the dependencies in requirements.txt. At this time, you can edit the metadata of books normally.
-
After pip install optional-requirements.txt, edit the book metadata, and the problem is repeated.
-
The problems in this article are closely related to the agent situation on the calibre web server side. They are summarized as follows:
- First of all, it is obvious that without the installation of scholarly, there will be no problem with the accessibility of google scholar
- After the installation is enabled, if you can access google scholar normally, there will be no problem
- When scholarly is enabled in the installation, if the gogole scholar returns 403, the problems described in this article will occur.
- After the installation and enabling of scholarly, if the server is in China and there is no proxy, after the test, scholarly will throw maxtriesexeedexception after the maximum number of retries. The server will also be blocked before the exception is thrown, but it will return to normal after the exception is thrown.
Trace the source
Install scholarly, which provides an api to access google scholar. pip install scholarly
❯ pip show scholarly Name: scholarly Version: 1.2.2
The code snippet with the problem is intercepted as follows. You can see that when you access google to get 403 status code, the loop cannot jump out.
# scholarly/_navigator.py if resp.status_code == 200 and not has_captcha: return resp.text elif has_captcha: self.logger.info("Got a captcha request.") self._session = self.pm._handle_captcha2(pagerequest) continue # Retry request within same session elif resp.status_code == 403: self.logger.info(f"Got an access denied error (403).") if not self.pm.has_proxy(): self.logger.info("No other connections possible.") if not self.got_403: self.logger.info("Retrying immediately with another session.") else: if not self.pm._use_luminati: w = random.uniform(60, 2*60) self.logger.info("Will retry after {} seconds (with another session).".format(w)) time.sleep(w) self._new_session() self.got_403 = True continue # Retry request within same session else: self.logger.info("We can use another connection... let's try that.") else: self.logger.info(f"""Response code {resp.status_code}. Retrying...""")
Temporary settlement
docker pull lwabish/calibre-web:china
The faster solution is: fork linuxserver/docker-calibre-web , add pip uninstall scholarly -y to the dockerfile and cancel the support of google scholar.
The newly built image has been uploaded to the docker hub: lwabish/calibre-web - Docker Image | Docker Hub.