Introduction to data analysis clarify the basics of python: introduction to python's basic commands and data structures

Keywords: Python Data Analysis Data Mining

My programming enlightenment is the c language. I also took java in the University and learned very little. Later, my programming homework mainly depends on python and c + +, but I haven't learned systematically. Generally, I look at other people's code and change it myself. If I don't understand it, I check it temporarily. In fact, the University didn't type a lot of code. At present, we mainly use SQL for data development. Occasionally, we need python script to transmit data, and we are also self-learning the content of data analysis. Sometimes, we can't remember how to write the basic functions, and the foundation is not solid. Therefore, we are ready to learn the basis of python again, record the learning process, and continue to learn the content of data analysis in more detail.

The materials mainly used in the study are "Python data analysis and mining practice", written by Zhang Liangjun, etc. they also understand some details and supplement the content by reading other bloggers' posts, drinking rookie tutorials, python official website documents, etc.

Basic command

(0) usage of print

print('*******Output numbers and variables*******')
# Direct output
print('------A string------')  # The string needs quotation marks
# Output variable
x = 99
print('*******Format output*******')
# Format output integer
a = 5
str1 = ('I have %d apples' % a)
str2 = ('the length of %s is %d' % ('apple', len('apple')))

# Format output floating point number (float)
pi = 3.145926
print('%.3f' % pi)  # Keep three decimal places
print(round(pi, 3))

print('*******format usage*******')
# format usage
print('{} {}'.format('hello', 'python'))
print('{0} {1}'.format('hello', 'python'))
print('{0} {1} {0}'.format('hello', 'python'))
print('{a} {b}'.format(a='hhhhello', b='ppppython'))

(1) Basic operation

Variables do not need named types, but can be assigned directly, such as:

a = 2 

python supports multiple assignments: a, b, c = 2, 3, 4

python addition, subtraction, multiplication and division:

plus   a = a + 1 or a += 1

Minus a = a - 1 or a -= 1

Multiply a = a * 2 or a *= 2

Except a = a / 2 or a /= 2

Remainder a = a% 2 or a% = 2

Power a = a ** 2 or a **= 2

python supports flexible operation of strings, such as;

s = 'I love python'
s = s + ' very much'
# Use split to split s into lists with spaces as intervals
print(s.split(' '))

The result is

  (Note: the effect of "single quotation mark" and "double quotation mark" in python is the same)

(2) Judgment and circulation

python uses indent alignment as the hierarchy mark of statements. Curly braces {} are not required, and the indents at the same level need to be one-to-one corresponding.


i = 1
if i < 0:
elif i > 0:

Else cannot be written in the middle segment. If it will report an error, elif needs to be written.

Loop: while loop and for loop

i = 1
s = 1

while i <= 100:
    i += 1

print('-----------Lovely dividing line-----------')

for j in range(100):
    s += 1

In: judge whether an element is in it

range(a,b,c): generate a continuous arithmetic sequence with a as the first item, C as the tolerance and no more than b-1. The range(100) with only one number written in the code refers to the arithmetic sequence with a tolerance of 1 (default) and no more than 100 from 1 (default).

(3) Functions

python uses def to define functions

# Function definition
def add1(x):
    return x + 1

# Use function

Unlike ordinary programming languages, Python's function return value can take various forms, such as returning a list or multiple values.

For simple functions, it is a little troublesome to formally write, name, calculate and return with def. Python supports defining "inline functions" for simple functions with lambda, which is similar to "anonymous functions" in Matlab.

f = lambda x: x + 2     # f(x)=x+2
g = lambda x, y: x + y  # g(x,y)=x+y

print('f(1)The value of is:', f(1))
print('g(1,2)The value of is:', g(1,2))

data structure

Python has four built-in data structures:

List, Tuple tuple, dictionary, Set set

They can be collectively referred to as containers, which are composed of some "things", which can be numbers, characters or even lists, or a combination of several of them.

(1) List / tuple

Lists and tuples are sequence structures, and the operations are almost the same, but there are two significant differences:

From the appearance, the list is in square brackets [] and the tuple is in parentheses ()

Functionally, the list can be modified, but the tuple cannot, for example:

a = [1,2,3]
a[0] = 7

The output result is

If the square brackets are changed to (), an error will be reported

  Lists and tuples can be converted to each other. For example, the result of tuple([1,2]) is (1,2)

The list itself has many practical methods (tuples are not allowed to be modified, so there are few methods)

a.append(1) adds 1 to the end of list a

a.count(1) counts the number of occurrences of element 1 in list a

a.extend([1,2]) appends the contents of list [1,2] to the end of list a

a.index(1) find the index position of the first 1 from list a

a.insert(2,1) inserts 1 into the position where the index of list a is 2

a.pop(1) removes the element with index 1 in list a

The function of "list parsing" can simplify our code to operate the elements in the list one by one:

a = [1, 2, 3]
b = []
for i in a:
# Function equivalent to
c = [1, 2, 3]
d = [i+2 for i in c]

  (2) Dictionary

A dictionary can be regarded as a mapping, using {} to store data. Its subscript is no longer a number starting with 0, but starts with the Key defined by itself. The basic method of creating a dictionary is as follows:

d = {'today':20, 'tomorrow':'Friday'}

today and tomorrow are the keys of the dictionary, which must be unique in the whole dictionary; 20 and Friday are the values corresponding to the keys.

d['today']  #The value is 20

In addition to direct definition, you can also use dict() to convert to a dictionary, or use dict.fromkeys

a = dict([['today', 20], ['tomorrow', 30]])
b = dict.fromkeys(['today', 'tomorrow'], 20)

The functions and methods of many dictionaries are the same as those of lists.

(3) Assemble

Set uses curly braces {} or set() function to convert the list into a set to create a set. The difference between it and the list is:

  1. Its elements are not repeated and disordered. If they are repeated, they will be automatically de duplicated
  2. It does not support indexing  
a = {1,2,2,3}
b = [1,2,3]
c = set(b)

  Because of the characteristics of sets (especially disorder), sets have some special operations.

a = {1, 2, 3, 4, 5}
b = {1, 2, 3, 6}
c = a | b  # Union of a and b
d = a & b  # Intersection of a and b
e = a - b  # Difference sets of a and b
f = a ^ b  # Symmetric difference sets of a and b (in a or b, but not in both)

(4) Functional programming

Functional programming is a programming generic. It regards computer operation as mathematical function calculation, and avoids the use of program state and changeable objects.

In python, functional programming is mainly composed of the use of several functions: lambda(), map(), reduce(), filter().

lambda has been introduced earlier. It is mainly used to customize "inline functions".

1. map function

a = [1,2,3]
b = map(lambda x:x+2, a)
b = list(b)

Here, map just creates a command container to be run, and only returns results when other functions call it, so it needs to be materialized into a list.

First, we use lambda to define a function f(x) = x+2, and then use map() to apply the function to each element of the map (the above is parameter a). We can also accept multi parameter functions, such as:

x = map(lambda x,y:x+y,a,b)  # Substitute a and b into x and y

The map function is much more efficient than list parsing and the for command (list parsing is essentially a for loop)

2. reduce function

The reduce function is similar to the map function, but map is used for traversal one by one, and reduce is used for recursive calculation, such as:

from functools import reduce  # import is not required in python2.x
reduce(lambda x,y: x*y, range(1, 100))

map is to directly bring the values in parentheses into the functional expression, and here there is a range function in reduce to generate an equal difference sequence with an interval of 1 from 1 to 100, a total of 99 numbers, and define the function f(x,y)=x*y, reduce will bring the first two digits of 1 ~ 99 into the function calculation as X and Y values, then substitute the result into x, continue the calculation as y, and so on until all numbers participate in the operation. If you use the circular command, it needs to be written as:

s = 1
for i in range(1,100):
    s = s * i

3. filter function

As its name suggests, it is a filter used to filter out qualified elements in the list, such as:

c = list(filter(lambda x: 5 < x < 8, range(10)))

Here, the list function is the same as map, and the return result is [6, 7].

The above functions can be implemented by loops, but the loop speed of functions is much faster than for loops and while loops. The rational use of map() and other functions can give consideration to simplicity and efficiency.

Common libraries of Python data analysis tools

These libraries are described in detail in the official documents, so I won't repeat them here. I can't remember too much theory. It's better to practice. The next data analysis and learning will be applied to these libraries, and the functions used will be introduced in detail.

Posted by tullmejs on Thu, 11 Nov 2021 19:19:32 -0800