Development of natural language processing web server based on gensim

Keywords: sudo Nginx Python pip

1. Notes on Natural Language Processing

1.generator and list differences

generator uses the next mechanism, instead of building the entire storage space, list s need to build the entire storage space

2. The meaning of u, r, b before the string

u indicate string use unicode

r indicate do not use transfer.etc,print('/n') will make cursor to next raw,but print(r'/n') will print '/n' exactly

b indicate python2.x's bytes

3. Processing steps

1. Download the wiki's original package

2. Parse data package with wiki process.py to get traditional character data

3.opencc for processing and converting into Simplified Chinese data

4.jieba for word segmentation and word type classification

5.gensim Modeling

6. Interface Development

4. File upload

Select a corpus file to upload:

#View Function Code Slice	
file=request.files['file']
file.save

5. File Download

#https://www.jianshu.com/p/8daa3d011cfd
from flask import send_file, send_from_directory
import os
@app.route("/download/<filename>", methods=['GET'])
def download_file(filename):
    # You need to know two parameters, the first is the path to the local directory and the second is the file name (with extension)
    directory = os.getcwd()  # Assume in the current directory
    return send_from_directory(directory, filename, as_attachment=True)

6.http 413

Large upload file caused

Add client_max_body_size 200m to nginx configuration;

7.yield

Used to construct generators.Returns the result of an expression to the right of a keyword, equivalent to return,On Next Call next()Continue executing statements when

def foo(num):
    print("starting...")
    while num<10:
        num=num+1
        yield num
for n in foo(0):#Implicitly calling next()
    print(n)
------------------------------------------
starting...
1
2
3
4
5
6
7
8
9
10

8. Navigation bar label selection effect

1.Navigation bar<li>Label Add ID

2.		$(document).ready(function(){
		var location=window.location.href;
		var id=location.substring(location.lastIndexOf('/')+1);
		$("#"+id).addClass("active");#If the id here is empty, jQuery will report a symbol error!
	});

Server Deployment

1. First (gunicorn+flask+Nginx)

https://blog.csdn.net/qq_36114862/article/details/81380956 

	1)install gunicorn	

		pip install gunicorn

	2)start-up gunicorn

		gunicorn --worker=3 main:app -b 127.0.0.1:5000

	3)install Nginx

		sudo apt install nginx

	4)start-up Nginx

		sudo /etc/init.d/nginx start#Generally self-starting

	5)Nginx Check Profile

		sudo Nginx -t

	6)modify Nginx configuration file

		sudo vim /etc/nginx/sites-available/default

		server { listen 80; server_name _; # External Address 
			location / { 
				proxy_pass http://127.0.0.1:5000; #Here is the same ip and port as your gunicore 
				proxy_redirect off; 
				proxy_set_header Host $http_host; 
				proxy_set_header X-Real-IP $remote_addr; 
				proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; 							proxy_set_header X-Forwarded-Proto $scheme; 
						}
				}

		7)restart Nginx

		sudo service nginx restart

Appendix: Collection of issues encountered in deployment

1. Adjust Python Priority

sudo update-alternatives --install /usr/bin/python python /usr/bin/python2.7 10
sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.7 20

2.virtualenv usage (emphasis):

1. First, install virtualenv, just pip under the default python2:

		[sudo] pip install virtualenv

2. Create virtual environments:

		virtualenv -p /usr/bin/python3 py3env

3. Activate virtual environment:

	source py3env/bin/activate
		You'll notice that the shell prompt line is much earlier (py3env), so you can safely develop with python3.Try downloading a triple library first

		pip install httplib2

That's it!
If you want to exit the python3 virtual environment, enter the command: deactivate

	The main record is: Create a virtual environment: This step directly creates a python3 environment instead of a python2.x environment.

3.SyntaxError: Non-ASCII character '\xe6' in file test.py on line 1, but no encoding declared

	Solution: Add the following code to the header of the.py file: Just do it!

  # -- coding: utf

4.No module named distutils.spawn virtualenv python3

Solution: sudo apt-get install python3-distutils

5.pip install gensim read timeout

		Solution: Replace the mirror source PIP install-i https://pypi.mirrors.ustc.edu.cn/simple gensim

6.vim fallback

u in command line mode

7.Nginx Restart

sudo service nginx restart

8.HTPP 502

There are many reasons, and what we have found so far is

1. Inconsistency between port number of nginx profile and gunicorn listening

9.requirements.txt

1.Establish

	pip freeze > requirements.txt

2.install

	pip install -r requirements.txt

10./upload file upload request unrecognized

Add enctype='multipart/form-data'to the form form

11. Add Log Module

1.Create a good directory structure

2.Create Log Profile

3.The following code generates the log object

import logging.config
logging.config.fileConfig("./app/config/logger.ini")
logger = logging.getLogger("main")
logger.info('first log')

11. No model file was generated after file upload

The Python version that comes with the server is 3.5, and the development environment is 3.7.2.So install 3.7.2 on the server

1. Download Python

	wget https://www.python.org/ftp/python/3.7.2/Python-3.7.2.tgz

2. Unzip tar-zvxf Python-3.7.2.tgz

3.../configure&sudo make &sudo make install

4. The following issues were encountered during the installation of Python 3.7.2

1.zipimport.ZipImportError: can't decompress data; zlib not available

	sudo apt install zlib1g.dev
	sudo apt install zlib1g

		Question 13_after entering the above instructions

2.ModuleNotFoundError: No module named '_ctypes'

	sudo apt-get install libffi-dev

3.No module named _ssl

 		1.vi /root/Python-3.7.1/Modules/Setup.dist

				2.Uncomment the comment below for four lines

			Socket module helper for socket(2)
			_socket socketmodule.c timemodule.c

				3.sudo apt-get install libssl-dev(You can try step 3 directly&4)

				4.Recompile Installation

			./configure --prefix=/usr/local/python37
			make
			make install

4.No module named '_bz2'

			sudo apt-get install libbz2-dev
			cd python3.7
			./configure
			make
			make install

12. View memory

	free -m

13.Word2Vec error, execution stuck for some time killed

14. Install gensim to replace the mirror source

pip install -i https://pypi.mirrors.ustc.edu.cn/simple gensim

15. Run gensim alarm smart_open deprecated

PIP uninstall-r reuirement.txt Repackages in a working environment after uninstalling all packages

Posted by Pieman86 on Sun, 21 Jul 2019 19:44:31 -0700