Data service analysis of Spark project

Keywords: Spark SQL hive Hadoop

Business logic processing

Peers

To judge whether it is a peer object, we can judge whether two objects have passed through multiple identical places by using longitude and latitude. Of course, each monitoring device can also be marked. When an object passes through the monitoring device, it can be captured by the device.

Train of thought:

  1. According to the RSFZ of the object, the monitored objects are aggregated and sorted through all places within the specified time. The processed data are:
    rsfz,[(tgsj3, sbbh3), (tgsj2, sbbh2), (tgsj4, sbbh4), (tgsj5, sbbh5)]

  2. According to the number of places that peer objects will pass, segment the places where objects pass. For example, the above data is decomposed into:
    sbbh3 -> sbbh2 -> sbbh4,(rsfz,[ sbbh3 tgsj3,sbbh2 tgsj2,sbbh4 tgsj4 ])
    sbbh2 -> sbbh4 -> sbbh5,(rsfz,[ sbbh2 tgsj2,sbbh4 tgsj4,sbbh5 tgsj5 ])

  3. Aggregate the monitored objects through the same site sequence,
    sbbh3 -> sbbh2 -> sbbh4,[(rsfz1,[sbbh3 tgsj3, sbbh2 tgsj2, sbbh4 tgsj4],(rsfz2, [sbbh3 tgsj3, sbbh2 tgsj2, sbbh4 tgsj4] ),(rsfz3, (sbbh3 tgsj3, sbbh2 tgsj2, sbbh4 tgsj4))]

  4. Judge whether the monitored objects in the same sequence meet the requirements of peers through the time difference in the same place.

Code

public class TogetherCompute {
    public static void main(String[] args) {
        SparkSession spark = SparkSession.builder()
                .enableHiveSupport()
                .appName("TogetherCompute")
                .getOrCreate();

        JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
        jsc.setLogLevel("ERROR");
        Dataset<Row> allData = spark.sql("select * from t_people_together");
        JavaRDD<Row> allDataRDD = allData.javaRDD();

        JavaPairRDD<String, Tuple2<String, String>> allPairRDD = allDataRDD.mapToPair(new PairFunction<Row, String, Tuple2<String, String>>() {
            @Override
            public Tuple2<String, Tuple2<String, String>> call(Row row) throws Exception {
                String rsfz = row.getAs("rsfz");
                String tgsj = row.getAs("tgsj");
                String sbbh = row.getAs("sbbh");
                return new Tuple2<String, Tuple2<String, String>>(rsfz, new Tuple2<String, String>(tgsj, sbbh));
            }
        });

        JavaPairRDD<String, Iterable<Tuple2<String, String>>> groupDataRDD = allPairRDD.groupByKey();

        groupDataRDD.foreach(new VoidFunction<Tuple2<String, Iterable<Tuple2<String, String>>>>() {
            @Override
            public void call(Tuple2<String, Iterable<Tuple2<String, String>>> s2) throws Exception {
                String t1 = s2._1;
                StringBuilder sbt2 = new StringBuilder();
                StringBuilder sbt3 = new StringBuilder();

                Iterator<Tuple2<String, String>> iter = s2._2.iterator();

                while (iter.hasNext()) {
                    Tuple2<String, String> tuple = iter.next();
                    sbt2.append(tuple._1).append(",");
                    sbt3.append(tuple._2).append(",");
                }
            }
        });

        JavaRDD<Tuple2<String, Tuple2<String, String>>> flatDataRDD = groupDataRDD.flatMap(new FlatMapFunction<Tuple2<String, Iterable<Tuple2<String, String>>>, Tuple2<String, Tuple2<String, String>>>() {
            @Override
            public Iterator<Tuple2<String, Tuple2<String, String>>> call(Tuple2<String, Iterable<Tuple2<String, String>>> s2) throws Exception {
                List<Tuple2<String, Tuple2<String, String>>> result = new ArrayList<Tuple2<String, Tuple2<String, String>>>();
                List<Tuple2<String, String>> list = IteratorUtils.toList(s2._2.iterator());
                /**
                 * The data first passes through groupByKey, which is grouped by key.
                 * Then we cut and recombine by flatMap to get < jwzb, gsfz, tgsj >
                 * Here we use every three latlons as a group,
                 * The output is [jwzb,[gsfz,tgsj]]
                 */

                for (int i = 0; i < list.size() - 2; i++) {
                    StringBuilder sbTGSJ = new StringBuilder();
                    StringBuilder sbSBBH = new StringBuilder();
                    for (int j = 0; j < 3; j++) {
                        if (j + i < list.size()) {
                            sbTGSJ.append(list.get(j + i)._1).append(",");
                            sbSBBH.append(list.get(j + i)._2).append(",");
                        } else {
                            break;
                        }
                    }
                    System.out.println("sbTime:" + sbTGSJ.toString());
                    System.out.println("sbKkbh:" + sbSBBH.toString());
                    result.add(new Tuple2<String, Tuple2<String, String>>(sbSBBH.toString(), new Tuple2<String, String>(s2._1, sbTGSJ.toString())));
                }

                return result.iterator();
            }
        });

        flatDataRDD.mapToPair(new PairFunction<Tuple2<String, Tuple2<String, String>>, String, Tuple2<String, String>>() {
            @Override
            public Tuple2<String, Tuple2<String, String>> call(Tuple2<String, Tuple2<String, String>> t2) throws Exception {
                return new Tuple2<String, Tuple2<String, String>>(t2._1, t2._2);
            }
        }).groupByKey().map(new Function<Tuple2<String, Iterable<Tuple2<String, String>>>, String>() {
            @Override
            public String call(Tuple2<String, Iterable<Tuple2<String, String>>> v1) throws Exception {
                Set<String> rsfzSet = new HashSet<String>();
                StringBuilder sbrsfz = new StringBuilder();

                String sbbh = v1._1;

                List<Tuple2<String, String>> list = IteratorUtils.toList(v1._2.iterator());
                for (int i = 0; i < list.size(); i++) {
                    for (int j = i + 1; j < list.size(); j++) {
                        String tgsj1 = list.get(i)._2;
                        String tgsj2 = list.get(j)._2;

                        String rsfz1 = list.get(i)._1;
                        String rsfz2 = list.get(j)._1;

                        String[] times01 = tgsj1.split(",");
                        String[] times02 = tgsj2.split(",");

                        for (int k = 0; k < times01.length; k++) {
                            double subMinutes = TimeUtils.getSubMinutes(times01[i], times02[i]);
                            if (subMinutes <= 3) {
                                rsfzSet.add(rsfz1);
                                rsfzSet.add(rsfz2);
                            }
                        }
                    }
                }
                for (String rsfz : rsfzSet) {
                    sbrsfz.append(rsfz).append(",");
                }
                String resultStr = sbbh + "&" + (sbrsfz.toString());
                return resultStr;
            }
        }).filter(new Function<String, Boolean>() {
            @Override
            public Boolean call(String v1) throws Exception {
                return v1.split("&").length > 1;
            }
        }).foreach(new VoidFunction<String>() {
            @Override
            public void call(String s) throws Exception {

                String rsfz = s.split("&")[1];
                String tgsj = s.split("&")[2];
                String sbbh = s.split("&")[3];

                Connection conn = JdbcUtils.getConnection();

                PreparedStatement pstmt = null;
                String sql = "insert into t_people_result2 (CJSJ,RSFZ,TGSJ,SBBH) values (?,?,?,?)";

                pstmt = (PreparedStatement) conn.prepareStatement(sql);

                //Add a time stamp.
                long cjsj = System.currentTimeMillis();
                pstmt.setString(1, cjsj + "");
                pstmt.setString(2, rsfz);
                pstmt.setString(3, tgsj);
                pstmt.setString(4, sbbh);
                pstmt.executeUpdate();
                JdbcUtils.free(pstmt, conn);
            }
        });
    }
}

Implementation:

  1. Create a directory on hdfs and import the data file into the directory
-- stay hdfs Create on people_together
bin/hadoop dfs -mkdir /people_together

-- Local people01.csv File upload to hdfs Upper /people_together Catalog
bin/hdfs dfs -put /root/people01.csv /people_together
  1. Open Hive and create a table
-- hive 
CREATE EXTERNAL TABLE t_people_together (ID string,
RSFZ string,
GRXB string,
PSQK string,
SSYZ string,
SSYS string,
XSYZ string,
XSYS string,
TGSJ string,
SBBH string,
JWZB string)
row format delimited fields terminated by ',' lines terminated by '\n' location '/people_together' tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="1");

--See
select * from t_people_together 
  1. Create tables in mysql
CREATE TABLE t_people_result2 (
CJCJ text,
RSFZ text,
TGSJ text,
SBBH text
)charset utf8 collate utf8_general_ci;
  1. spark execution
./spark-submit 
--master spark://bigdata01:7077 
--class com.monitor.together.TogetherCompute 
--deploy-mode client 
/root/monitoranalysis-1.0-SNAPSHOT.jar

Example code

https://github.com/yy1028500451/MonitorAnalysis/tree/master/src/main/java/com/monitor/together

Posted by aztec on Sun, 17 Nov 2019 08:26:39 -0800