Scipy Foundation + Advancement

Keywords: jupyter Python Anaconda pip

Python Science Computing Library-Scipy

I. Introduction to Scipy

1.1. Introduction and Installation of Scipy

Official website: http://www.scipy.org/SciPy
 Installation: Open cmd under C: Python 27 Scripts to execute:
Execution: pip install scipy

1.2. Installation of Anaconda and Environment Building (Demonstration by Example)

Create environment: CONDA create-n env_name python = 3.6
 Example: CONDA create-n Py_36 python = 3.6 # Create an environment named Py_367
 List all environments: CONDA info-e
 Entry environment: source activate Py_36 (OSX/LINUX system)
            Actate Py_36 (Windows System) 

1.3. jupyter Installation

Introduction to jupyter: jupyter (Jupyter Notebook) is an interactive notebook
            Supporting more than 40 programming languages
            Data Cleaning and Conversion, Numerical Simulation, Statistical Modeling, Machine Learning, etc. 
Jupyter installation: conda install jupyter notebook
 Start jupyter: Activate the appropriate environment
 Execute in the console: jupyter notebook
 Noteebook server runtime address: http://localhost:8888   
                New (notebook, text file, folder)
Close notebook: ctrl+c twice

1.4. Scpy's'hello word'

Requirement: Save a.mat file with a multi-dimensional array, load the mat file, get the content and print it
 Step 1: Import the modules required by scipy
    from scipy import io# (Modules to be used)
Step 2: Save data with savemat
    io.savemat(file_name,mdict)
    io.savemat('a.mat',{''array:a})
Step 3: Load data using loadmat
    io.loadmat(file_name)
    data = io.loadmat('a.mat')
Give an example:
from scipy import io              #Import io
import numpy as np                #Import numpy and name it np    
arr = np.array([1,2,3,4,5,6])
io.savemat('test.mat',{'arr1':arr})
loadArr=io.loadmat('test.mat')

2. Implementing Statistical Function by Scipy

Requirements: Analyzing random numbers with statistical functions in Scipy's scipy.stats

stats provides a function for generating continuous distribution
uniform distribution
x=stats.norm.rvs(size = 20) generates 20 [0,1] uniformly distributed random numbers
- Normal Distribution
x=stats.norm.rvs(size = 20) generates 20 normal distribution random numbers
- beta distribution
x=stats.beta.rvs(size=20, a=3,b=4) generates 20 random numbers of obedience parameters a=3,b=4 beta distribution
- discrete distribution
- Bernoulli distribution
- Geometric Distribution (geom)
- poisson distribution
x=stats.poisson.rvs(0.6,loc=0,size = 20) generates 20 random numbers subject to Poisson distribution

3. Calculating the Mean and Standard Deviation of Random Numbers

stats.norm.fit: The normal distribution is used to fit the generated data, and the mean and standard deviation are obtained.

4. Computing the skewness of random numbers

1. concept:
Skewness describes the skewness of probability distribution.
There are two return values, the second is p-value, i.e. the probability that the data sets obey normal distribution (0-1)

2 Calculating skewness with stats.skewtest()

5. Calculating the kurtosis of random numbers

1 Concept: kurtosis - Describes the steepness of the probability distribution curve
2 Calculating kurtosis by stats.kurtosis()
3. The kurtosis of normal distribution is 3 and excess_k is 0.
The platykurtic is flatter than the normal distribution in excess_k<0.
Leptokurtic is steeper than normal distribution in excess_k>0

Example:(../Scipy/Test01/test1)
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

arr = stats.norm.rvs(size=900)
(mean,std) = stats.norm.fit(arr)
print('average value',mean)       #Mean mean
print('std standard deviation',std)     #std standard deviation
(skewness,pvalue1) = stats.skewtest(arr)
print('Skewness value')    
print(skewness)
print('The probability of normal distribution data is')
print(pvalue1)
(Kurtosistest,pvalue2) = stats.kurtosistest(arr)
print('Kurtosistest',Kurtosistest)    #kurtosis
print('pvalue2',pvalue2)
(Normltest,pvalue3) = stats.normaltest(arr)
print('Normltest',Normltest)          #Subordinate to orthodox distribution
print('pvalue3',pvalue3)
num = stats.scoreatpercentile(arr,95) #Number at a percentage
print('At 95%The numerical value at:')                #Number at a percentage
print num
indexPercent = stats.percentileofscore(arr,1) #Percentage at a certain value
print ('Percentage at value 1:')                   #Percentage at a certain value
print indexPercent
plt.hist(arr)   #Setting Histogram
plt.show()      #Display map

6. Test of Normal Distribution Degree

1 normality test returns two values, and the second returns p-values.
2 Use stats.normaltest() to test
In general, pvalue > 0.05 means normal distribution.

7. Calculate the value of a percentage in the area where the data are located

1. Calculate the value at a 100-point ratio by using scoreatpercentile
Format: scoreatpercentile (data set, percentage)
stats.scoreatpercentile(name_arr,percent)
2 examples: Find the value of 95% of the location
num = stats.scoreatpercentile(arr,95)
print num

8. Find the corresponding percentage from a certain value

1 Use percentile of score to calculate the percentage corresponding to a certain value
Format: percentile of score (data set, numerical value)
2 examples: indexPercent = stats. percentile of score (arr, 1)

Histogram display

import matplotlib.pyplot as plt
In the Anaconda environment (py36) C: Users Lenovo > import: conda install matplotlib
plt.hist(arr) # Set Histogram
plt.show()# display chart

9. Comprehensive Practice

1. Find out the following values of the test scores:
Mean median mode range variance  
Standard deviation coefficient of variation (mean/variance) skewness kurtosis
Step 1: Create two two-dimensional arrays: [Score, number of occurrences]
def createScore():
    arrEasy = np.array([
    [0,20],[2.5,24],[5,16],[7.5,19],[10,23],[12.5,26],
    [15,29],[17.5,23],[20,27],[22.5,31],[27.5,40],[30,53],
    [32.5,66],[35,90],[37.5,110],[40,160],[42.5,138],[45,175],
    [47.5,182],[50,195],[52.5,118],[55,217],[57.5,226],[60,334],
    [62.5,342],[65,359],[67.5,510],[70,521],[72.5,300],[75,210],
    [75.5,90],[80,20]
    ])
    return score
def createScore():  
    arrDiff = np.array([
    [0,2],[2.5,4],[5,6],[7.5,9],[10,13],[12.5,16],[15,19],
    [17.5,23],[20,27],[40,130],[42.5,148],[45,165],[47.5,182],
    [50,195],[52.5,108],[55,217],[57.5,226],[60,334],
    [62.5,342],[65,349],[67.5,500],[70,511],[72.5,300],
    [75,200],[75.5,80],[80,20]
    ])
    return score
Step 2: Create a function that flattens the incoming multidimensional array - > into a one-dimensional array
def createScore(arr):
    score = []          #Scores of all trainees
    row = arr.shape[0]
    for i in np.arange(0,row):
        for j in np.arange(0,int(arr[i][1])):
        score.append(arr[i][1]))
    score = np.array(score)
    return score
Step 3: Create a function to count the incoming array
def calStatValue(score):
    #Centralized Trend Measurement
    print('mean value')
    print(np.mean(score))
    print('Median')
    print(np.median(score))
    print('Mode number')
    print(stats.mode(score))
    #Discrete Trend Measurement
    print('range')
    print(np.ptp(score))
    print('variance')
    print(np.var(score))
    print('standard deviation')
    print(np.std(score))
    print('Coefficient of variation')
    print(np.mean(score)/np.std(score))
    #Measurement of skewness and kurtosis
    print('skewness')
    print(stats.skewness(score))
    print('kurtosis')
    print(stats.Kurtosis(score))
Step 4: Create a function and make a simple boxplot/bar chart
def drawGraghic(score)
    plt.boxplot([score],labels['score'])    #Box diagram
    plt.title('Box diagram')
    plt.show()
    plt.hist(score,100)
    plt.show()

//step5: 
//step6: 

//Case complete code:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
def createScore(arr):
    score = []                  #Scores of all trainees
    row = arr.shape[0]          #How many groups of elements are retrieved
    for i in np.arange(0,row):  #Traversing through all element groups
        for j in np.arange(0,int(arr[i][1])):#Number of f i lls from 0, row 1
            score.append(arr[i][0]))
    score = np.array(score)
    return score
def createOneScore():
    arrOne = np.array([
        [0,20],[2.5,24],[5,16],[7.5,19],[10,23],
        [12.5,26],[15,29],[17.5,23],[20,27],
        [35,90],[37.5,110],[40,160],
        [42.5,138],[45,175],[47.5,182],[50,195],
        [52.5,118],[55,217],[57.5,226],[60,334],
        [72.5,300],[75,210],[75.5,90],[80,20]
        ])
    return createScore(arrOne)
def createTwoScore():       
    arrTwo = np.array([
        [0,2],[2.5,4],[5,6],[7.5,9],[10,13],
        [12.5,16],[15,19],[17.5,23],[20,27],
        [35,90],[37.5,110],[40,130],
        [42.5,148],[45,165],[47.5,182],[50,195],
        [52.5,108],[55,217],[57.5,226],[60,334],
        [72.5,300],[75,200],[75.5,80],[80,20]
        ])
    return createScore(arrTwo)
def calStatValue(score):
    #Centralized Trend Measurement
    print('mean value')
    print(np.mean(score))
    print('Median')
    print(np.median(score))
    print('Mode number')
    print(stats.mode(score))
    #Discrete Trend Measurement
    print('range')
    print(np.ptp(score))
    print('variance')
    print(np.var(score))
    print('standard deviation')
    print(np.std(score))
    print('Coefficient of variation')
    print(np.mean(score)/np.std(score))

    #Measurement of skewness and kurtosis
    (skewness,pvalue1) = stats.skewtest(score)  
    print('skewness')
    print(stats.skewness(score))

    (Kurtosistest,pvalue2) = stats.kurtosistest(arr)
    print('kurtosis')
    print(stats.Kurtosis(score))    
    return

#Drawing
def drawGraghic(score)
    plt.boxplot([score],labels['score'])    #Box diagram
    plt.title('Box diagram')
    plt.show()
    plt.hist(score,100)
    plt.show()
    return          

Posted by Elizabeth on Sat, 11 May 2019 02:26:47 -0700