Python Science Computing Library-Scipy
I. Introduction to Scipy
1.1. Introduction and Installation of Scipy
Official website: http://www.scipy.org/SciPy Installation: Open cmd under C: Python 27 Scripts to execute: Execution: pip install scipy
1.2. Installation of Anaconda and Environment Building (Demonstration by Example)
Create environment: CONDA create-n env_name python = 3.6 Example: CONDA create-n Py_36 python = 3.6 # Create an environment named Py_367 List all environments: CONDA info-e Entry environment: source activate Py_36 (OSX/LINUX system) Actate Py_36 (Windows System)
1.3. jupyter Installation
Introduction to jupyter: jupyter (Jupyter Notebook) is an interactive notebook Supporting more than 40 programming languages Data Cleaning and Conversion, Numerical Simulation, Statistical Modeling, Machine Learning, etc. Jupyter installation: conda install jupyter notebook Start jupyter: Activate the appropriate environment Execute in the console: jupyter notebook Noteebook server runtime address: http://localhost:8888 New (notebook, text file, folder) Close notebook: ctrl+c twice
1.4. Scpy's'hello word'
Requirement: Save a.mat file with a multi-dimensional array, load the mat file, get the content and print it Step 1: Import the modules required by scipy from scipy import io# (Modules to be used) Step 2: Save data with savemat io.savemat(file_name,mdict) io.savemat('a.mat',{''array:a}) Step 3: Load data using loadmat io.loadmat(file_name) data = io.loadmat('a.mat')
Give an example:
from scipy import io #Import io
import numpy as np #Import numpy and name it np
arr = np.array([1,2,3,4,5,6])
io.savemat('test.mat',{'arr1':arr})
loadArr=io.loadmat('test.mat')
2. Implementing Statistical Function by Scipy
Requirements: Analyzing random numbers with statistical functions in Scipy's scipy.stats
stats provides a function for generating continuous distribution
uniform distribution
x=stats.norm.rvs(size = 20) generates 20 [0,1] uniformly distributed random numbers
- Normal Distribution
x=stats.norm.rvs(size = 20) generates 20 normal distribution random numbers
- beta distribution
x=stats.beta.rvs(size=20, a=3,b=4) generates 20 random numbers of obedience parameters a=3,b=4 beta distribution
- discrete distribution
- Bernoulli distribution
- Geometric Distribution (geom)
- poisson distribution
x=stats.poisson.rvs(0.6,loc=0,size = 20) generates 20 random numbers subject to Poisson distribution
3. Calculating the Mean and Standard Deviation of Random Numbers
stats.norm.fit: The normal distribution is used to fit the generated data, and the mean and standard deviation are obtained.
4. Computing the skewness of random numbers
1. concept:
Skewness describes the skewness of probability distribution.
There are two return values, the second is p-value, i.e. the probability that the data sets obey normal distribution (0-1)
2 Calculating skewness with stats.skewtest()
5. Calculating the kurtosis of random numbers
1 Concept: kurtosis - Describes the steepness of the probability distribution curve
2 Calculating kurtosis by stats.kurtosis()
3. The kurtosis of normal distribution is 3 and excess_k is 0.
The platykurtic is flatter than the normal distribution in excess_k<0.
Leptokurtic is steeper than normal distribution in excess_k>0
Example:(../Scipy/Test01/test1)
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
arr = stats.norm.rvs(size=900)
(mean,std) = stats.norm.fit(arr)
print('average value',mean) #Mean mean
print('std standard deviation',std) #std standard deviation
(skewness,pvalue1) = stats.skewtest(arr)
print('Skewness value')
print(skewness)
print('The probability of normal distribution data is')
print(pvalue1)
(Kurtosistest,pvalue2) = stats.kurtosistest(arr)
print('Kurtosistest',Kurtosistest) #kurtosis
print('pvalue2',pvalue2)
(Normltest,pvalue3) = stats.normaltest(arr)
print('Normltest',Normltest) #Subordinate to orthodox distribution
print('pvalue3',pvalue3)
num = stats.scoreatpercentile(arr,95) #Number at a percentage
print('At 95%The numerical value at:') #Number at a percentage
print num
indexPercent = stats.percentileofscore(arr,1) #Percentage at a certain value
print ('Percentage at value 1:') #Percentage at a certain value
print indexPercent
plt.hist(arr) #Setting Histogram
plt.show() #Display map
6. Test of Normal Distribution Degree
1 normality test returns two values, and the second returns p-values.
2 Use stats.normaltest() to test
In general, pvalue > 0.05 means normal distribution.
7. Calculate the value of a percentage in the area where the data are located
1. Calculate the value at a 100-point ratio by using scoreatpercentile
Format: scoreatpercentile (data set, percentage)
stats.scoreatpercentile(name_arr,percent)
2 examples: Find the value of 95% of the location
num = stats.scoreatpercentile(arr,95)
print num
8. Find the corresponding percentage from a certain value
1 Use percentile of score to calculate the percentage corresponding to a certain value
Format: percentile of score (data set, numerical value)
2 examples: indexPercent = stats. percentile of score (arr, 1)
Histogram display
import matplotlib.pyplot as plt
In the Anaconda environment (py36) C: Users Lenovo > import: conda install matplotlib
plt.hist(arr) # Set Histogram
plt.show()# display chart
9. Comprehensive Practice
1. Find out the following values of the test scores:
Mean median mode range variance Standard deviation coefficient of variation (mean/variance) skewness kurtosis
Step 1: Create two two-dimensional arrays: [Score, number of occurrences]
def createScore():
arrEasy = np.array([
[0,20],[2.5,24],[5,16],[7.5,19],[10,23],[12.5,26],
[15,29],[17.5,23],[20,27],[22.5,31],[27.5,40],[30,53],
[32.5,66],[35,90],[37.5,110],[40,160],[42.5,138],[45,175],
[47.5,182],[50,195],[52.5,118],[55,217],[57.5,226],[60,334],
[62.5,342],[65,359],[67.5,510],[70,521],[72.5,300],[75,210],
[75.5,90],[80,20]
])
return score
def createScore():
arrDiff = np.array([
[0,2],[2.5,4],[5,6],[7.5,9],[10,13],[12.5,16],[15,19],
[17.5,23],[20,27],[40,130],[42.5,148],[45,165],[47.5,182],
[50,195],[52.5,108],[55,217],[57.5,226],[60,334],
[62.5,342],[65,349],[67.5,500],[70,511],[72.5,300],
[75,200],[75.5,80],[80,20]
])
return score
Step 2: Create a function that flattens the incoming multidimensional array - > into a one-dimensional array
def createScore(arr):
score = [] #Scores of all trainees
row = arr.shape[0]
for i in np.arange(0,row):
for j in np.arange(0,int(arr[i][1])):
score.append(arr[i][1]))
score = np.array(score)
return score
Step 3: Create a function to count the incoming array
def calStatValue(score):
#Centralized Trend Measurement
print('mean value')
print(np.mean(score))
print('Median')
print(np.median(score))
print('Mode number')
print(stats.mode(score))
#Discrete Trend Measurement
print('range')
print(np.ptp(score))
print('variance')
print(np.var(score))
print('standard deviation')
print(np.std(score))
print('Coefficient of variation')
print(np.mean(score)/np.std(score))
#Measurement of skewness and kurtosis
print('skewness')
print(stats.skewness(score))
print('kurtosis')
print(stats.Kurtosis(score))
Step 4: Create a function and make a simple boxplot/bar chart
def drawGraghic(score)
plt.boxplot([score],labels['score']) #Box diagram
plt.title('Box diagram')
plt.show()
plt.hist(score,100)
plt.show()
//step5:
//step6:
//Case complete code:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
def createScore(arr):
score = [] #Scores of all trainees
row = arr.shape[0] #How many groups of elements are retrieved
for i in np.arange(0,row): #Traversing through all element groups
for j in np.arange(0,int(arr[i][1])):#Number of f i lls from 0, row 1
score.append(arr[i][0]))
score = np.array(score)
return score
def createOneScore():
arrOne = np.array([
[0,20],[2.5,24],[5,16],[7.5,19],[10,23],
[12.5,26],[15,29],[17.5,23],[20,27],
[35,90],[37.5,110],[40,160],
[42.5,138],[45,175],[47.5,182],[50,195],
[52.5,118],[55,217],[57.5,226],[60,334],
[72.5,300],[75,210],[75.5,90],[80,20]
])
return createScore(arrOne)
def createTwoScore():
arrTwo = np.array([
[0,2],[2.5,4],[5,6],[7.5,9],[10,13],
[12.5,16],[15,19],[17.5,23],[20,27],
[35,90],[37.5,110],[40,130],
[42.5,148],[45,165],[47.5,182],[50,195],
[52.5,108],[55,217],[57.5,226],[60,334],
[72.5,300],[75,200],[75.5,80],[80,20]
])
return createScore(arrTwo)
def calStatValue(score):
#Centralized Trend Measurement
print('mean value')
print(np.mean(score))
print('Median')
print(np.median(score))
print('Mode number')
print(stats.mode(score))
#Discrete Trend Measurement
print('range')
print(np.ptp(score))
print('variance')
print(np.var(score))
print('standard deviation')
print(np.std(score))
print('Coefficient of variation')
print(np.mean(score)/np.std(score))
#Measurement of skewness and kurtosis
(skewness,pvalue1) = stats.skewtest(score)
print('skewness')
print(stats.skewness(score))
(Kurtosistest,pvalue2) = stats.kurtosistest(arr)
print('kurtosis')
print(stats.Kurtosis(score))
return
#Drawing
def drawGraghic(score)
plt.boxplot([score],labels['score']) #Box diagram
plt.title('Box diagram')
plt.show()
plt.hist(score,100)
plt.show()
return