Python User churn data mining: establish logistic regression, XGboost, random forest, decision tree, support vector machine, naive Bayesian model and Kmeans user portrait

Original link: http://tecdat.cn/?p=24346 1.1 project background:In today's highly homogeneous brand marketing stage, the competition between enterprises is mainly reflected in the competition for customers. "User is God" urges many enterprises to compete for as many customers as possible at any cost. However, in the process of develop ...

Posted by jds580s on Wed, 17 Nov 2021 00:26:02 -0800

Introduction to data analysis clarify the basics of python: introduction to python's basic commands and data structures

My programming enlightenment is the c language. I also took java in the University and learned very little. Later, my programming homework mainly depends on python and c + +, but I haven't learned systematically. Generally, I look at other people's code and change it myself. If I don't understand it, I check it temporarily. In fact, the Univers ...

Posted by tullmejs on Thu, 11 Nov 2021 19:19:32 -0800

Feature learning notes of data mining training camp

This learning note is the learning content of Alibaba cloud Tianchi Longzhu data mining training camp. The learning links are: -Tianchi Lab - a real-time online data analysis collaboration tool, enjoy free computing resources (aliyun.com) 1, Summary of learning points Further analyze the features and process the data Complete the analysis ...

Posted by chris_2001 on Wed, 10 Nov 2021 05:49:41 -0800

How to climb the title of CSDN comprehensive hot list of the whole station and count the keyword frequency | crawler cases

catalogue preface environment Crawler code Keyword extraction code Main program code summary preface Recently, I was on a business trip and found that there was Xiaoqiang in my hotel. So when I was bored on a business trip, I wrote some crawler code to play. Asking is the occasion. This article mainly crawls the 100 titles of CSDN's ...

Posted by Maharg105 on Thu, 04 Nov 2021 17:39:54 -0700

R language principal component regression (PCR) and multiple linear regression feature dimensionality reduction analysis of vehicle fuel consumption, design and performance data and spectral data

Original link: http://tecdat.cn/?p=24152What is PCR? (PCR = PCA + MLR)• PCR is a regression technique that processes many x variables • given Y and X data: • PCA on X matrix – define a new variable: principal component (score) • in multivariate linearity_ Return_ (_MLR_)   Some of these new variables are used ...

Posted by Dujo on Thu, 04 Nov 2021 08:15:49 -0700

chapter 5 using Item Pipeline to process data

Chapter 5 using Item Pipeline to process data   in the previous chapter, we learned the methods of extracting data and encapsulating data. In this chapter, we learn how to process the crawled data. In scripy, an Item Pipeline is a component that processes data. An Item Pipeline is a class that contains a specific interface. It is usua ...

Posted by MuseiKaze on Mon, 01 Nov 2021 08:28:37 -0700

Probability model evaluation index

Source: Data STUDIO Author: Yun Duojun 1. Brier Score The accuracy of probability prediction is called "calibration degree", which is a way to measure the difference between the probability predicted by the algorithm and the real result. A commonly used indicator is called Brill score, which is calculated as the mean square err ...

Posted by tomz0r on Thu, 28 Oct 2021 02:00:07 -0700

Python implementation of CART decision tree algorithm (detailed comments)

1, Introduction to CART decision tree algorithm CART (Classification And Regression Trees) algorithm is a tree construction algorithm, which can be used for both classification tasks and regression. Compared with ID3 and C4.5, which can only be used for discrete data and classification tasks, CART algorithm has a much wider application. It can ...

Posted by elklabone on Wed, 27 Oct 2021 00:59:01 -0700

Python 3 implementation and improvement of Apriori algorithm

Python 3 implementation and improvement of Apriori algorithm Code reference machine learning practice The improved methods are partly from data mining: concepts and technologies, and partly from https://blog.csdn.net/weixin_30702887/article/details/98992919 I summarize and implement here, and record my learning of Apriori algorithm First ...

Posted by narch31 on Wed, 13 Oct 2021 21:22:17 -0700

Data mining for disk damage prediction

Data import and preprocessing The first is to import all packages required for this data mining import pandas as pd import matplotlib.pyplot as plt import numpy as np from sklearn.preprocessing import StandardScaler from sklearn.preprocessing import MinMaxScaler from sklearn.model_selection import train_test_split from sklearn.decompositi ...

Posted by goaman on Mon, 11 Oct 2021 22:14:02 -0700