# Machine learning (11) -- logistic regression

Keywords: Python Machine Learning

Classification algorithm - logistic regression

[also an iterative method, self updating w]

Linear input to classification problem:

Input: the formula of linear regression is used as the input of logical regression sigmoid function: Logistic regression formula: [that is, how sigmoid converts input into probability value] e: 2.71

Z = result of regression

Output: probability value can also be obtained

Loss function and optimization of logistic regression

It is the same as the principle of linear regression, but because it is a classification problem, the loss function is different and can only be solved by gradient descent  [when the target value is a class, the closer the probability is to 1, the smaller the loss function]

[which category has few samples in the second category, and the judgment probability refers to this category] [the target value is zero, and the closer the probability is to 1, the greater the loss function]

[logistic regression only judges whether a category is or not, and judges the probability of belonging to a category. h (x) all refers to judging a category, so the image is opposite, so the above situation can occur] [similar to information entropy, the smaller the better]

[it is also to update the weight]

Comparative analysis of loss function: Syntax:

sklearn.linear_model.LogisticRegression

sklearn.linear_model.LogisticRegression(penalty = 'l2', C = 1.0) [with regularization]

Logistic regression classifier

coef_: regression coefficient

penalty: regularization term

Application: [only applicable to category II]

• Judge the user's gender
• Predict whether users will buy a given product category
• Judge whether a comment is positive or negative

[logistic regression is a powerful tool to solve binary classification problems]

Case:

Tumor prediction in benign and malignant breast cancer

Data Description:

(1) 699 samples, a total of 11 columns of data. The first column uses the retrieved id, the last 9 columns are the medical characteristics related to tumor, and the last column represents the value of tumor type.

(2) Contains 16 missing values, use "?" Mark.

Target value:

Malignant tumor was selected as the target value as a positive example

technological process:

1. Online data acquisition tool (pandas)
2. Data missing value processing and standardization
3. LogisticRegression estimator process
```from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression, SGDRegressor,  Ridge, LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, classification_report
from sklearn.externals import joblib
import pandas as pd
import numpy as np

def logistic():
"""
Logistic regression was used to make binary classification for cancer prediction (according to the attribute characteristics of cells)
:return: NOne
"""
# Construct column label name
column = ['Sample code number','Clump Thickness', 'Uniformity of Cell Size','Uniformity of Cell Shape','Marginal Adhesion', 'Single Epithelial Cell Size','Bare Nuclei','Bland Chromatin','Normal Nucleoli','Mitoses','Class']

print(data)

# Processing missing values
data = data.replace(to_replace='?', value=np.nan)

data = data.dropna()

# Data segmentation
x_train, x_test, y_train, y_test = train_test_split(data[column[1:10]], data[column], test_size=0.25)

# Standardized treatment
std = StandardScaler()

x_train = std.fit_transform(x_train)
x_test = std.transform(x_test)

# Logistic regression prediction
lg = LogisticRegression(C=1.0)

lg.fit(x_train, y_train)

print(lg.coef_)

y_predict = lg.predict(x_test)

print("Accuracy:", lg.score(x_test, y_test))

print("Recall rate:", classification_report(y_test, y_predict, labels=[2, 4], target_names=["Benign", "malignant"]))#Remember label

return None

if __name__ == "__main__":
logistic()```

The results show that: Result analysis:

When it comes to cancer, we're thinking about recall rates Posted by jasonmills58 on Sun, 26 Sep 2021 16:33:35 -0700