In the operating system, process is the smallest unit of resource allocation, and thread is the smallest unit of CPU scheduling.
Coroutine: it is the concurrency under single thread, also known as micro thread and fiber. The English name is coroutine. In a word: the cooperation process is a kind of user state lightweight thread, that is, the cooperation process is controlled and scheduled by the user program itself. In other words, programmers use code to control switching
Reference: http://www.cnblogs.com/Eva-J/articles/8324673.html
# The operating system is responsible for calling between processes # Threads start multiple threads CPU The smallest unit of execution is actually a thread # Start a thread to create a thread register stack # Close a thread # Association # Essentially a thread # Can switch between multiple tasks to save some IO time # Switching between tasks in the cooperation process also consumes time,But the cost is far less than the switching between process threads # Means to achieve concurrency import time def consumer(): while True: x = yield time.sleep(1) print('Processing data :',x) def producer(): c = consumer() next(c) for i in range(10): time.sleep(1) print('production data:',i) c.send(i) # This producer consumer model simulates the program switching back and forth, but it cannot be avoided IO time producer()
Use PIP3 install green and pip3 install gevent to install the module. Continue:
# The real cooperation module is to use greenlet Switch completed from greenlet import greenlet def eat(): print('eating start') g2.switch() # Switch to g2 print('eating end') g2.switch() def play(): print('playing start') g1.switch() # Switch to g1 print('playing end') g1 = greenlet(eat) # Entrusted to g1 g2 = greenlet(play) g1.switch()
- greenlet can implement the cooperation process, but it is too troublesome to point to the next cooperation process manually every time. python also has a more powerful module than greenlet, gevent, which can automatically switch tasks
Reference: https://www.cnblogs.com/PrettyTom/p/6628569.html
# The cooperation process is a kind of user state lightweight thread, that is, the cooperation process is controlled and scheduled by the user program itself. import time import gevent def eat(): print('eating start') # time.sleep(1) # gevent cannot sense time.sleep time gevent.sleep(1) print('eating end') def play(): print('playing start') gevent.sleep(1) print('playing end') g1 = gevent.spawn(eat) g2 = gevent.spawn(play) g1.join() g2.join()
The right way for gevent:
## Import this sentence to package all blocking IO in all modules. You can sense time.sleep from gevent import monkey;monkey.patch_all() import time import gevent import threading def eat(): print(threading.current_thread().getName()) # Dummy Fake, virtual. print(threading.current_thread()) print('eating start') time.sleep(1.2) print('eating end') def play(): print(threading.current_thread().getName()) print(threading.current_thread()) print('playing start') time.sleep(1) print('playing end') g1 = gevent.spawn(eat) # Register to process, encountered IO Will switch automatically g2 = gevent.spawn(play) # g1.join() # g2.join() gevent.joinall([g1,g2]) print('master') # Task switching between processes and threads is done by the operating system # The switch between cooperation tasks is controlled by the program(Code)complete,Only when the cooperation module can recognize IO During operation,The program will switch tasks,Achieve the effect of concurrency
Synchronous and asynchronous:
# Synchronous and asynchronous from gevent import monkey;monkey.patch_all() import time import gevent def task(n): time.sleep(1) print(n) def sync(): for i in range(5): task(i) def async(): g_lst = [] for i in range(5): g = gevent.spawn(task,i) g_lst.append(g) gevent.joinall(g_lst) # for g in g_lst:g.join() sync() # synchronization async() # asynchronous
Use concurrency in crawler
# Association : The concept of being able to achieve concurrency in one thread # Can avoid some tasks IO operation # During the execution of the task,Detected IO Switch to another task # Multithreading has been weakened # Improve the process on one thread CPU Utilization ratio # The efficiency of cooperation process is faster than that of multithreading # Examples of reptiles # During the request IO wait for from gevent import monkey;monkey.patch_all() import gevent from urllib.request import urlopen # Built in modules def get_url(url): response = urlopen(url) content = response.read().decode('utf-8') return len(content) g1 = gevent.spawn(get_url,'http://www.baidu.com') g2 = gevent.spawn(get_url,'http://www.sogou.com') g3 = gevent.spawn(get_url,'http://www.taobao.com') g4 = gevent.spawn(get_url,'http://www.hao123.com') g5 = gevent.spawn(get_url,'http://www.cnblogs.com') gevent.joinall([g1,g2,g3,g4,g5]) print(g1.value) print(g2.value) print(g3.value) print(g4.value) print(g5.value) ret = get_url('http://www.baidu.com') print(ret)