Big Data [Page 26] - Programmer Group - a programming skills sharing group

Big Data

Apache Kudu cannot delete nonexistent data

Extend KafkaConnect Sink with Apache Kudu client. Java client of Apache Kudu used. Suddenly one day, I found that the job could not be submitted, and I kept reporting an error. Later, it was found that this is a verification mechanism of Kudu itself. In order to ignore this verification mechanism, and more in line with our SQL habits, I have ...

Posted by simmosn on Wed, 23 Oct 2019 07:02:25 -0700

8 common SQL error usages

Common SQL error usage 1. LIMIT statement Paging query is one of the most commonly used scenarios, but it is also the most prone to problems. For example, for the following simple statements, the general way DBA s think of is to add a composite index to the fields of type, name, create "time. In this way, conditional sorting can effe ...

Posted by DBHostS on Wed, 23 Oct 2019 05:42:36 -0700

Kafka self built cluster synchronizes data to Alibaba cloud Kafka Standard Edition through MirrorMaker

Explain:1. Only two topic s are synchronized this time, and subsequent optimization will continue to update...2. Self built cluster CDH5.8, kafka2.1.0; Alibaba cloud cluster Standard Version kafka0.10.xTrample:1. Add the CMM of kafka role instance in cdh, which should not support SSL connection.2. VPC network access. I don't know that the purc ...

Posted by midgar777 on Thu, 17 Oct 2019 21:15:50 -0700

I. MapReduce basic principle

I. MapReduce overview 1, definition Is a distributed computing programming framework. The core function is to integrate the business logic code written by the user and the default components into a complete distributed program, which runs on a hadoop cluster concurrently. 2. Advantages and disadvantages (1) advantages1 > easy to program: wi ...

Posted by young_coder on Thu, 17 Oct 2019 18:18:22 -0700

There's something wrong with the reptile

After the code optimization of golang climbing treasure net, the following error was reported. It took half an hour to find out the reason. Record here. The code is as follows: There is a Parser of type interface: type Parser interface { Parser(contents []byte, url string) ParserResult Serialize() (funcName string, args interface{}) } ...

Posted by robin105 on Thu, 17 Oct 2019 15:14:25 -0700

Elegant implementation of single instance with golang

In the process of writing code, we often encounter code that only needs to run once for the global angle, such as global initialization operation, single instance mode in design mode. For the singleton mode, there are starving mode and lazy mode in java, which are implemented with synchronized synchronization keyword. Its purpose is to initiali ...

Posted by pieai on Thu, 17 Oct 2019 13:04:35 -0700

Reptile performance analysis and optimization

Two days ago, we wrote a single task version of crawler, which crawled the user information of treasure net, so how about its performance? We can see from the network utilization. We can see from the performance analysis window in the task manager that the download rate is about 200kbps, which is quite slow. We analyze the design of single tas ...

Posted by JamieWAstin on Thu, 17 Oct 2019 10:45:42 -0700

pandas hierarchy index

hierarchical indexing Next, create a Series. When entering the Index, enter a list consisting of two sub lists. The first sublist is the outer index, and the second list is the inner index. Example code: import pandas as pd import numpy as np ser_obj = pd.Series(np.random.randn(12),index=[ ['a', 'a', 'a', 'b', 'b', 'b', 'c', ...

Posted by nikifi on Thu, 17 Oct 2019 07:11:38 -0700

Parallel programming of Scala Actor in spark notes

1.1. Course objectives1.1.1. Objective 1: familiar with Scala Actor concurrent programming1.1.2. Goal 2: prepare for Akka learning Note: Scala Actor is scala 2.10.x and earlier. Scala added Akka as its default Actor in version 2.11.x, and the old one has been abandoned. 1.2. What is Scala actor 1.2.1. Concept The Actor in scala can realize the ...

Posted by bogu on Wed, 16 Oct 2019 01:13:40 -0700

Implementation of node crawling web pages

I. Preface It has always been felt that the crawler is a very high-end thing. In the era of big data, the crawler is particularly important. After a lot of exploration, we finally realized this function with node, including the analysis of grabbed content. Two, text 1. First of all, build an http service, where we are familiar with koa (this is ...

Posted by itisme on Fri, 11 Oct 2019 11:39:34 -0700

Hot Keywords