Preface
Mail collection mainly includes pop (mainly used for remote management of mail on the server on the client) and IMAP (interactive mail access protocol). The relevant modules poplib and imaplib are provided in the corresponding Python. Although POP3 is widely supported, it is outdated, and the implementation of POP3 servers varies greatly. Most of them are poorly implemented. So if our mail servers support IMAP, it is better to use imaplib.IMAP4, because IMAP servers tend to be better implemented. Basically, mainstream mailboxes support IMAP protocols, such as qq, 163, gmail, outlook, etc. So we choose the IMAP protocol to implement the script of reading mail.
Implementation process
-
Log in to the mailbox and read the original mail
The use of imaplib library to achieve mailbox login, so you need to import the library import imaplib, and then use the method in the imaplib library to login to the mailbox and read the mail.
def get_mail(email_address, password): # The servers here are selected on demand. server = imaplib.IMAP4_SSL("imap.gmail.com") server.login(email_address, password) # The folder in the mailbox defaults to'INBOX' inbox = server.select("INBOX") # Search for matching mail, the first parameter is the character set, None defaults to ASCII encoding, the second parameter is the query condition, where ALL is to find all. type, data = server.search(None, "ALL") # Mail List, Spacing to Get Mail Index msgList = data[0].split() # Latest, 0 is the earliest latest = msgList[len(msgList) - 1] type, datas = server.fetch(latest, '(RFC822)') # Decoding with utf-8 text = datas[0][1].decode('utf8') # Convert to email.message object message = email.message_from_string(text) return message
The return value of the above program is email.message, that is, the original mail. If we print it out, we will find that some of the code can not be read, so next we need to convert the original mail into readable mail.
About email.message
E-mail messages consist of headers and payload (also known as content). The title is RFC 5322 or RFC 6532 The field name and value of the style. Payloads can be simple text messages, or structured sequences of binary objects or sub-messages, each with its own set of headers and its own payload. The latter type of payload is indicated by a MIME type with a message such as multipart or message/rfc822.
from EmailMessage The conceptual model provided by objects is related to the representation of messages. RFC 5322 An ordered Dictionary of payload-coupled headings of the principal, which can be a list of sub-EmailMessage objects. In addition to conventional dictionary methods for accessing header names and values, there are also methods for accessing specialized information from the head (e.g., MIME content types), for operating on payloads, for generating serialized versions of messages, and for recursively traversing object trees.
EmailMessage Class dictionary interfaces are indexed by Title names, which must be ASCII values. The dictionary value is a string with some additional methods. Headers are stored and returned as bytes, but field names match case insensitive. Unlike true dict, it has a sort key and can have duplicate keys. Other methods are provided to handle headers with duplicate keys.
-
Converting raw mail to readable mail
-
Names in Subject or Email are encoded strings. To display properly, you must decode and define a decode function.
def decode_str(s): value, charset = decode_header(s)[0] if charset: value = value.decode(charset) return value
-
To prevent non-UTF-8 encoded messages from being displayed, a detection mail coding function is defined.
def guess_charset(msg): charset = msg.get_charset() if charset is None: content_type = msg.get('Content-Type', '').lower() pos = content_type.find('charset=') if pos >= 0: # Remove fields whose tail does not represent coding charset = content_type[pos + 8:].strip('; format=flowed; delsp=yes') return charset
-
Next, read the contents of the message through looping.
# Use global variables to save mail content mail_content = '\n' # indent is used for indentation display: def print_info(msg, indent=0): global mail_content if indent == 0: for header in ['From', 'To', 'Subject']: value = msg.get(header, '') if value: if header == 'Subject': value = decode_str(value) else: hdr, addr = parseaddr(value) name = decode_str(hdr) value = u'%s <%s>' % (name, addr) mail_content += '%s%s: %s' % (' ' * indent, header, value) + '\n' parts = msg.get_payload() for n, part in enumerate(parts): content_type = part.get_content_type() if content_type == 'text/plain': content = part.get_payload(decode=True) # charset = guess_charset(msg) charset = 'utf-8' if charset: content = content.decode(charset) mail_content += '%sText:\n %s' % (' ' * indent, content) else: # There is no reading of non-text/plain type content, only its format, usually text/html, is read. mail_content += '%sAttachment: %s' % (' ' * indent, content_type) return mail_content
-
Finally, call the above function to output the mail content
if __name__ == '__main__': email_addr = "myEmail@gmail.com" password = "mypassword" test = print_info(get_mail(email_addr, password)) print("mail content is: %s" % test)
-
Related issues
-
Mail denial of access?
The security of gmail mailbox is quite high, so before reading gmail mailbox, we need to set up the mailbox. There are two main settings: enabling imap service, enabling access rights for applications with lower security, and setting steps:
- Enter the mailbox, select the settings, click on the "Forward and POP/IMAP" option, select IMAP enabled, save the settings
- Access to web pages Select the "Enable" option
-
Other mail settings?
Reference documents: Reading Gmail mail on other e-mail clients using POP
Gmail related settings: Monitor the running status of Gmail settings