How MaxCompute Implements Cross-Project Authorization

Keywords: Big Data SQL Java JSON

In actual enterprise usage scenarios, data, functions, computing resources in different projects may need to be used together. Then how to achieve this cross-project use, this paper uses practical examples to illustrate.

Get ready

  • Project space: grant_from, where authorized data, functions, and resources are in the project
  • Project space: grant_to, where authorized users need to perform calculations in the project.
  • User: aliyun$ xxxx@aliyun.com The grant_front project owner, the subsequent authorization operation is performed by this account.
  • User: ram$ xxxx@aliyun.com chuanxue2, by empowering the development role of grant_to project in the big data development suite, now the data, functions and resources in grant_from project are needed when computing with grant_to project.
  • Table: dual. There is a table in grant_front and grant_to, and there is only one data in it for testing UDF.
  • Table: grant_from.wc_in, a table in the grant_from project. Need to be authorized to ram$ xxxx@aliyun.com chuanxue2, let him use it when calculating in grant_to.
  • Function: getPersonName, the authorized function in grant_from
  • resource_file.txt, and other jar packages are resource files in grant_from.

surface

To grant authorization

You need to add users and authorize your account by tabulating. Authorized accounts can be referenced across projects.

-- The following actions are grant_from Li by aliyun$xxxx@aliyun.com Executing
-- add user It only needs to be added once. You can skip it if you add it before.
odps@ grant_from>add user RAM$xxxx@aliyun.com:chuanxue2;
-- If only SQL For inquiries, just give Select Permission is enough. But use MapReduce As an input table, you need to Describe Jurisdiction. It's authorized here.
odps@ grant_from>grant Select,Describe on table wc_in to user RAM$xxxx@aliyun.com:chuanxue2;

It is necessary to mention that data authorization can be done more conveniently in the data management of big data development suite, which can be referred to. Here.

Use

You can use this table in SQL using ProjectName.TableName, for example

---The following operations are performed by RAM$xxxx@aliyun.com:chuanxue2 stay grant_to Inside operation
odps@ grant_to>select * from grant_from.wc_in;

In MapReduce/Graph, for example, you can write code as MapReduce

    InputUtils.addTable(TableInfo.builder().projectName("grant_from").tableName("wc_in").build(), job);
    OutputUtils.addTable(TableInfo.builder().tableName("wc_out").build(), job);

All other methods of use are the same as those of the tables in this project, and there is no further elaboration here.

function

To grant authorization

For example, grant_front has a UDF that parses the contents of json strings, and now wants RAM$ xxxx@aliyun.com Chuanxue 2 can also be used in grant_to.
First look at this UDF, aliyun$ xxxx@aliyun.com How is it used in grant_from?

odps@ grant_from>select getPersonName('{"id":100,"name":"chuanxue","age":11}') from dual;
+-----+
| _c0 |
+-----+
| chuanxue |
+-----+

Before authorizing, you need to determine what resource files are involved in this function.

odps@ grant_from>desc function  getPersonName;
Name                                    getPersonName
Owner                                   ALIYUN$xxxx@aliyun.com
Created Time                            2017-05-26 13:31:33
Class                                   odps.test.GetPersonName
Resources                               grant_from/resources/getPersonName.jar,grant_from/resources/gson-2.2.4.jar

So the steps of authorization include:

--add user Or, as mentioned earlier, you only need to add it once, but you can not add it if you have already added it.
odps@ grant_from>add user RAM$xxxx@aliyun.com:chuanxue2;

-- Authorization function
odps@ grant_from>grant read on function getPersonName to user RAM$xxxx@aliyun.com:chuanxue2;
OK
odps@ grant_from>grant read on resource getPersonName.jar to user RAM$xxxx@aliyun.com:chuanxue2;
OK
odps@ grant_from>grant read on resource gson-2.2.4.jar to user RAM$xxxx@aliyun.com:chuanxue2;
OK

Use

After authorization, RAM$ xxxx@aliyun.com chuanxue2 can use authorized functions in grant_to project. This function can be easily referenced through Project Name: FunctionName.

-- This operation is performed by an authorized subaccount in grant_to Enforcement

odps@ grant_to>select grant_from:getPersonName('{"id":100,"name":"chuanxue","age":11}') from dual;
+-----+
| _c0 |
+-----+
| chuanxue |
+-----+

Resources

Command line reference

There are fewer authorization scenarios requiring resources. However, for example, the function just mentioned can only authorize resources to the other party and let the other party create a function by itself.
The authorization of the resource file is the same as the authorization of the previous function. It is no longer repeated. The method and call of creating the function can be referred to as follows:

-- stay grant_to Create a function in getPersonName2,The resources used are all from previously authorized sources. grant_from Li getPersonName.jar and gson-2.2.4.jar
odps@ grant_to>create function getPersonName2 as odps.test.GetPersonName using grant_from/resources/getPersonName.jar,grant_from/re
sources/gson-2.2.4.jar;

-- When the call is created, it will be called normally. dual So is the table. grant_to Li dual Table.
odps@ grant_to>select getPersonName2('{"id":100,"name":"chuanxue","age":11}') from dual;

The same is true for mapreduce, which refers to jar packages for other projects. First authorize in grant_from

-- cx_word_count2.jar In fact, it is. cx_word_count.jar Change the file name and pass it to grant_from For cross-project references.
odps@ grant_from>add jar C:\Users\chuanxue\Desktop\cx_word_count2.jar -f;
OK: Resource 'cx_word_count2.jar' have been updated.
odps@ grant_from>grant read on resource cx_word_count2.jar to user RAM$cloudtecengr@aliyun.com:chuanxue2;
OK

After authorization, RAM$ xxxx@aliyun.com chuanxue2 can use this jar package in grant_to

odps@ grant_to>jar -resources grant_from/resources/cx_word_count2.jar -classpath C:\Users\chuanxue\Desktop\cx_word_count2.jar odps.test.WordCount;

Read Resources in Java Code

Another scenario is how to read resource files across projects in code. In general, the method of reading resource files in this project is (refer to) Here)

            byte[] buffer = new byte[1024];
            int bytesRead = 0;

            String filename = context.getJobConf().get("import.filename");
            bufferedInput = context.readResourceFileAsStream(filename);

            while ((bytesRead = bufferedInput.read(buffer)) != -1) {
              String chunk = new String(buffer, 0, bytesRead);
              importdata.append(chunk);
            }

The call on the command line is:

odps@ grant_to>jar -resources grant_from/resources/cx_word_count3.jar,grant_from/resources/resource_file.txt -classpath C:\Users\chuanxue\Desktop\cx_word_count3.jar odps.test.WordCount;

In the code, or with

        String filename = "resource_file.txt";
        bufferedInput = context.readResourceFileAsStream(filename);

That is to say, it's in the jar command to tell the task that the resources used are from other projects, and in MapReduce's Java code, all the usage is the same, and there's no need to specify the project for the resources.

Posted by Satabi2 on Thu, 27 Jun 2019 12:57:28 -0700