Use java to build meta path and generate adjacency matrix

Keywords: Java Machine Learning Back-end

Understanding of meta path: (what I do is the text direction, so I take commentator - number of comments - commentator as a meta path)
A meta path is a path where multiple (more than 2) nodes are connected through another type of node. In the yelp dataset, I want to build a meta path of comments - number of comments - comments. That is to find comments with the same number of comments, which form an adjacency matrix. Then there are different comment numbers. For example, 142 comment numbers form an adjacency matrix, 143 comment numbers form an adjacency matrix... These small adjacency matrices finally form a large adjacency matrix with all nodes. Finally, this matrix is the meta path comment - comment number - Comment matrix.
Pseudo code:
1. Extract the two columns of data you need in the data set. I extracted 1000 data here as an exercise. Sort the number of comments in that group. (the corresponding features should also be saved when making the real data set later)
2. The read data becomes a two-dimensional array (here Reference link)
3. Change the first dimension element of the two-dimensional array into the array subscript corresponding to 0-1000 to obtain the array
4. Get the end position of each group and store it in an array, that is, traverse the array. If the values of the second column number are the same, use those values to construct the adjacency matrix and store it in the c matrix each time
5. Create an adjacency matrix (final dimension)
The adjacency matrix is constructed by the elements in the subscript of c matrix.
6. Store the generated adjacency matrix as an xls file. Reference link here

import org.apache.poi.ss.usermodel.Cell;

import org.apache.poi.xssf.usermodel.XSSFRow;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

import java.io.*;
import java.util.ArrayList;
import java.util.List;

public class getData {//Read the data of each row in the specified sheet of excel and store it into a two-dimensional array, including the first row
    public static String[][] getSheetData(XSSFSheet sheet) throws IOException {
        String[][] testArray = new String[sheet.getPhysicalNumberOfRows()][];
        for(int rowId =0;rowId<sheet.getPhysicalNumberOfRows();rowId++){
            XSSFRow row = sheet.getRow(rowId);
            List<String> testSetList = new ArrayList<String>();
            for(int column=0;column<row.getPhysicalNumberOfCells();column++){
                row.getCell(column).setCellType(Cell.CELL_TYPE_STRING);
                testSetList.add(row.getCell(column).getStringCellValue());
            }
            testArray[rowId] = (String[])testSetList.
                    toArray(new String[testSetList.size()]);
        }
        return testArray;
    }
    //Print 2D array
    public static void printDoubleArray(String[][] testArray) throws IOException{

        //Print read in data
        for(int i =0; i<testArray.length;i++ )
        {
            for (int j=0; j<testArray[i].length;j++)
            {
                System.out.print(testArray[i][j]+" ");
            }
            System.out.println();
        }

        //Create a two-dimensional array. The value of the first column becomes 0, 1, 2... 999, and the value of the second column is the value of the original array
        String narray[][] = new String[testArray.length][testArray[0].length];
        for(int k =0;k<testArray.length;k++){
            narray[k][0]= String.valueOf(k);//The values in the first column are 0, 1, 2,
                //The values in the first column are 0, 1, 2,
                //The value of the second column is the value of the original array
                narray[k][1]=testArray[k][1];
            }
        //Print out this two-dimensional array
        for(int k =0; k<narray.length;k++ )
        {
            for (int j=0; j<narray[k].length;j++)
            {
                System.out.print(narray[k][j]+" \\ ");
            }
            System.out.println();
        }
        //Create an adjacency matrix A
        String A[][]=new String[testArray.length][testArray.length];

        //Traversing the array, if the values of the second column number are the same, the adjacency matrix is constructed with those values
        int cont []= new int[10];//Record the end position of each group
        int k=0;
        for(int l =0;l<cont.length;l++){
            if(k<narray.length){
        for(k =0; k<narray.length-1;k++){
            if(narray[k][1]!=narray[k+1][1]) {
                cont[l]=(k);
                l++;
                }}
        }}

        //Output cont array
        for(int m =0; m<cont.length;m++ )
      {
          System.out.println(cont[m]+" ...");
      }

      /* //Record the location of the first group
      for(int k =0; k<narray.length-1;k++ )
        { count++;
            if(narray[k][1]!=narray[k+1][1]) {

                break;
            }}
        System.out.println(count+",,,");*/


//        //Find the first column corresponding to the number of comments and build an array c
//        String c [] = new String[count];
//        int i=0;
//        for(int j=0;j< count;j++){
//            c[i] = narray[j][0];
//            i++;
//        }
//        //Fill in the corresponding value in the adjacency matrix
//        for(int k=0;k<c.length;k++){
//            for(int m=0;m<c.length;m++){
//                A[Integer.valueOf(c[k])][Integer.valueOf(c[m])]=String.valueOf(1);
//                A[Integer.valueOf(c[m])][Integer.valueOf(c[k])]= String.valueOf(1);
//            }
//        }
Output the adjacency matrix at this time
//
//        for(int k =0; k<A.length;k++ )
//        {
//            for (int j=0; j<A[k].length;j++)
//            {
//                System.out.print(A[k][j]+" || ");
//            }
//            System.out.println();
//        }
                //Find the first column corresponding to the number of comments and build an array c
        int z=0;
        while(cont[z]!=0){
           String[] c=getData.buildc(cont,z,narray);
//        String[] c = new String[cont[z]];
//
//        int i=0;
//        for(int j=0;j<c.length;j++){
//            c[i] = narray[j][0];
//            i++;
//        }

//        //Output c matrix
//            for(int j=0;j<c.length;j++){
//                System.out.println(c[j]+" --");
//            }
        //Fill in the corresponding value in the adjacency matrix
            //Here should also be the array controlled in the loop


        for(int u=Integer.parseInt(c[0]);u<Integer.parseInt(c[c.length-1]);u++){
            for (int m = Integer.parseInt(c[0]); m < Integer.parseInt(c[c.length - 1]); m++) {
                A[u][m] = String.valueOf(1);
                A[m][u] = String.valueOf(1);
            }
        }
            z++;
        }
Output the adjacency matrix at this time
//
//        for(int t =0; t<A.length;t++ )
//        {
//            for (int j=0; j<A[t].length;j++)
//            {
//                System.out.print(A[t][j]+" || ");
//            }
//            System.out.println();
//        }

        // Code exported to excel
        int rowNum = A.length;
        int columnNum = A[0].length;
        try {
            FileWriter fw = new FileWriter("D:\\HomeWork\\webHomework\\getData\\MYDATA.xls");
            for (int i = 0; i < rowNum; i++) {
                for (int j = 0; j < columnNum; j++)
                    fw.write(A[i][j]+ "\t"); // tab interval
                fw.write("\n"); // Line feed
            }
            fw.close();
        }
        catch (IOException e){
            e.printStackTrace();
        }


    }
    public static String[] buildc(int[] cont, int z, String[][] narray) {
        //z is the array subscript
        String c[];
         if(z==0){
         c = new String[cont[z]+1];
        int i=0;
        for(int j=0;j<c.length;j++){
            c[i] = narray[j][0];
            i++;
        }}
        else{
            c = new String[cont[z]-cont[z-1]];
            int i=0;
            int j=cont[z-1]+1;
            for(int k=0;k<c.length;k++){
                //k is used to control the number of cycles and the length of the array
                c[i] = narray[j][0];
                i++;
                j++;
            }
        }
        return c;
    }



        public static void main(String[] args) throws IOException {
        // Method stub automatically generated by TODO

        File file = new File("D:\\HomeWork\\webHomework\\getData\\1000data.xlsx");
        FileInputStream fis = new FileInputStream(file);
        @SuppressWarnings("resource")
        XSSFWorkbook wb = new XSSFWorkbook(new BufferedInputStream(fis));
        printDoubleArray(getSheetData(wb.getSheetAt(0)));

    }
}

Comments are not deleted. They are intermediate code when I write code.
I Xiaobai, there is a better way to leave footprints in the comment area.
Insufficient:

  1. In fact, this is a semi-finished product, and there are still many details that are not in place. For example, the array generated in step 4 stores elements repeatedly by using 0 elements as an interval. However, if you know the number of features you build, there is no problem.
  2. There are still problems in the first and last rows of the generated matrix. For example, 0-298 is a group and 299-652 is a group, but the position of 299 rows in the generated matrix has no value. This is because the subscript of the matrix is inconsistent with the usual number of rows. The problem is not big.
  3. The data is grouped first and then sorted. The constructed adjacency matrix is one by one. At that time, you can sort first and then group. The constructed matrix is a sparse matrix.
  4. After the matrix is constructed, it can be converted into. mat file by matlab
    The above shortcomings are my next work. I will post the code when I do it.
    These are some problems when using idea. Please refer to my notes: https://note.youdao.com/s/ZI2289I2

Posted by harinath on Sat, 06 Nov 2021 15:13:29 -0700