Python User churn data mining: establish logistic regression, XGboost, random forest, decision tree, support vector machine, naive Bayesian model and Kmeans user portrait
Original link: http://tecdat.cn/?p=24346 1.1 project background:In today's highly homogeneous brand marketing stage, the competition between enterprises is mainly reflected in the competition for customers. "User is God" urges many enterprises to compete for as many customers as possible at any cost. However, in the process of develop ...
Posted by jds580s on Wed, 17 Nov 2021 00:26:02 -0800
Introduction to data analysis clarify the basics of python: introduction to python's basic commands and data structures
My programming enlightenment is the c language. I also took java in the University and learned very little. Later, my programming homework mainly depends on python and c + +, but I haven't learned systematically. Generally, I look at other people's code and change it myself. If I don't understand it, I check it temporarily. In fact, the Univers ...
Posted by tullmejs on Thu, 11 Nov 2021 19:19:32 -0800
Feature learning notes of data mining training camp
This learning note is the learning content of Alibaba cloud Tianchi Longzhu data mining training camp. The learning links are: -Tianchi Lab - a real-time online data analysis collaboration tool, enjoy free computing resources (aliyun.com)
1, Summary of learning points
Further analyze the features and process the data Complete the analysis ...
Posted by chris_2001 on Wed, 10 Nov 2021 05:49:41 -0800
How to climb the title of CSDN comprehensive hot list of the whole station and count the keyword frequency | crawler cases
catalogue
preface
environment
Crawler code
Keyword extraction code
Main program code
summary
preface
Recently, I was on a business trip and found that there was Xiaoqiang in my hotel. So when I was bored on a business trip, I wrote some crawler code to play. Asking is the occasion. This article mainly crawls the 100 titles of CSDN's ...
Posted by Maharg105 on Thu, 04 Nov 2021 17:39:54 -0700
R language principal component regression (PCR) and multiple linear regression feature dimensionality reduction analysis of vehicle fuel consumption, design and performance data and spectral data
Original link: http://tecdat.cn/?p=24152What is PCR? (PCR = PCA + MLR)• PCR is a regression technique that processes many x variables • given Y and X data: • PCA on X matrix – define a new variable: principal component (score) • in multivariate linearity_ Return_ (_MLR_) Some of these new variables are used ...
Posted by Dujo on Thu, 04 Nov 2021 08:15:49 -0700
chapter 5 using Item Pipeline to process data
Chapter 5 using Item Pipeline to process data
in the previous chapter, we learned the methods of extracting data and encapsulating data. In this chapter, we learn how to process the crawled data. In scripy, an Item Pipeline is a component that processes data. An Item Pipeline is a class that contains a specific interface. It is usua ...
Posted by MuseiKaze on Mon, 01 Nov 2021 08:28:37 -0700
Probability model evaluation index
Source: Data STUDIO
Author: Yun Duojun
1. Brier Score
The accuracy of probability prediction is called "calibration degree", which is a way to measure the difference between the probability predicted by the algorithm and the real result. A commonly used indicator is called Brill score, which is calculated as the mean square err ...
Posted by tomz0r on Thu, 28 Oct 2021 02:00:07 -0700
Python implementation of CART decision tree algorithm (detailed comments)
1, Introduction to CART decision tree algorithm
CART (Classification And Regression Trees) algorithm is a tree construction algorithm, which can be used for both classification tasks and regression. Compared with ID3 and C4.5, which can only be used for discrete data and classification tasks, CART algorithm has a much wider application. It can ...
Posted by elklabone on Wed, 27 Oct 2021 00:59:01 -0700
Python 3 implementation and improvement of Apriori algorithm
Python 3 implementation and improvement of Apriori algorithm
Code reference machine learning practice The improved methods are partly from data mining: concepts and technologies, and partly from https://blog.csdn.net/weixin_30702887/article/details/98992919 I summarize and implement here, and record my learning of Apriori algorithm
First ...
Posted by narch31 on Wed, 13 Oct 2021 21:22:17 -0700
Data mining for disk damage prediction
Data import and preprocessing
The first is to import all packages required for this data mining
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.decompositi ...
Posted by goaman on Mon, 11 Oct 2021 22:14:02 -0700