In actual enterprise usage scenarios, data, functions, computing resources in different projects may need to be used together. Then how to achieve this cross-project use, this paper uses practical examples to illustrate.
Get ready
- Project space: grant_from, where authorized data, functions, and resources are in the project
- Project space: grant_to, where authorized users need to perform calculations in the project.
- User: aliyun$ xxxx@aliyun.com The grant_front project owner, the subsequent authorization operation is performed by this account.
- User: ram$ xxxx@aliyun.com chuanxue2, by empowering the development role of grant_to project in the big data development suite, now the data, functions and resources in grant_from project are needed when computing with grant_to project.
- Table: dual. There is a table in grant_front and grant_to, and there is only one data in it for testing UDF.
- Table: grant_from.wc_in, a table in the grant_from project. Need to be authorized to ram$ xxxx@aliyun.com chuanxue2, let him use it when calculating in grant_to.
- Function: getPersonName, the authorized function in grant_from
- resource_file.txt, and other jar packages are resource files in grant_from.
surface
To grant authorization
You need to add users and authorize your account by tabulating. Authorized accounts can be referenced across projects.
-- The following actions are grant_from Li by aliyun$xxxx@aliyun.com Executing -- add user It only needs to be added once. You can skip it if you add it before. odps@ grant_from>add user RAM$xxxx@aliyun.com:chuanxue2; -- If only SQL For inquiries, just give Select Permission is enough. But use MapReduce As an input table, you need to Describe Jurisdiction. It's authorized here. odps@ grant_from>grant Select,Describe on table wc_in to user RAM$xxxx@aliyun.com:chuanxue2;
It is necessary to mention that data authorization can be done more conveniently in the data management of big data development suite, which can be referred to. Here.
Use
You can use this table in SQL using ProjectName.TableName, for example
---The following operations are performed by RAM$xxxx@aliyun.com:chuanxue2 stay grant_to Inside operation odps@ grant_to>select * from grant_from.wc_in;
In MapReduce/Graph, for example, you can write code as MapReduce
InputUtils.addTable(TableInfo.builder().projectName("grant_from").tableName("wc_in").build(), job); OutputUtils.addTable(TableInfo.builder().tableName("wc_out").build(), job);
All other methods of use are the same as those of the tables in this project, and there is no further elaboration here.
function
To grant authorization
For example, grant_front has a UDF that parses the contents of json strings, and now wants RAM$ xxxx@aliyun.com Chuanxue 2 can also be used in grant_to.
First look at this UDF, aliyun$ xxxx@aliyun.com How is it used in grant_from?
odps@ grant_from>select getPersonName('{"id":100,"name":"chuanxue","age":11}') from dual; +-----+ | _c0 | +-----+ | chuanxue | +-----+
Before authorizing, you need to determine what resource files are involved in this function.
odps@ grant_from>desc function getPersonName; Name getPersonName Owner ALIYUN$xxxx@aliyun.com Created Time 2017-05-26 13:31:33 Class odps.test.GetPersonName Resources grant_from/resources/getPersonName.jar,grant_from/resources/gson-2.2.4.jar
So the steps of authorization include:
--add user Or, as mentioned earlier, you only need to add it once, but you can not add it if you have already added it. odps@ grant_from>add user RAM$xxxx@aliyun.com:chuanxue2; -- Authorization function odps@ grant_from>grant read on function getPersonName to user RAM$xxxx@aliyun.com:chuanxue2; OK odps@ grant_from>grant read on resource getPersonName.jar to user RAM$xxxx@aliyun.com:chuanxue2; OK odps@ grant_from>grant read on resource gson-2.2.4.jar to user RAM$xxxx@aliyun.com:chuanxue2; OK
Use
After authorization, RAM$ xxxx@aliyun.com chuanxue2 can use authorized functions in grant_to project. This function can be easily referenced through Project Name: FunctionName.
-- This operation is performed by an authorized subaccount in grant_to Enforcement odps@ grant_to>select grant_from:getPersonName('{"id":100,"name":"chuanxue","age":11}') from dual; +-----+ | _c0 | +-----+ | chuanxue | +-----+
Resources
Command line reference
There are fewer authorization scenarios requiring resources. However, for example, the function just mentioned can only authorize resources to the other party and let the other party create a function by itself.
The authorization of the resource file is the same as the authorization of the previous function. It is no longer repeated. The method and call of creating the function can be referred to as follows:
-- stay grant_to Create a function in getPersonName2,The resources used are all from previously authorized sources. grant_from Li getPersonName.jar and gson-2.2.4.jar odps@ grant_to>create function getPersonName2 as odps.test.GetPersonName using grant_from/resources/getPersonName.jar,grant_from/re sources/gson-2.2.4.jar; -- When the call is created, it will be called normally. dual So is the table. grant_to Li dual Table. odps@ grant_to>select getPersonName2('{"id":100,"name":"chuanxue","age":11}') from dual;
The same is true for mapreduce, which refers to jar packages for other projects. First authorize in grant_from
-- cx_word_count2.jar In fact, it is. cx_word_count.jar Change the file name and pass it to grant_from For cross-project references. odps@ grant_from>add jar C:\Users\chuanxue\Desktop\cx_word_count2.jar -f; OK: Resource 'cx_word_count2.jar' have been updated. odps@ grant_from>grant read on resource cx_word_count2.jar to user RAM$cloudtecengr@aliyun.com:chuanxue2; OK
After authorization, RAM$ xxxx@aliyun.com chuanxue2 can use this jar package in grant_to
odps@ grant_to>jar -resources grant_from/resources/cx_word_count2.jar -classpath C:\Users\chuanxue\Desktop\cx_word_count2.jar odps.test.WordCount;
Read Resources in Java Code
Another scenario is how to read resource files across projects in code. In general, the method of reading resource files in this project is (refer to) Here)
byte[] buffer = new byte[1024]; int bytesRead = 0; String filename = context.getJobConf().get("import.filename"); bufferedInput = context.readResourceFileAsStream(filename); while ((bytesRead = bufferedInput.read(buffer)) != -1) { String chunk = new String(buffer, 0, bytesRead); importdata.append(chunk); }
The call on the command line is:
odps@ grant_to>jar -resources grant_from/resources/cx_word_count3.jar,grant_from/resources/resource_file.txt -classpath C:\Users\chuanxue\Desktop\cx_word_count3.jar odps.test.WordCount;
In the code, or with
String filename = "resource_file.txt"; bufferedInput = context.readResourceFileAsStream(filename);
That is to say, it's in the jar command to tell the task that the resources used are from other projects, and in MapReduce's Java code, all the usage is the same, and there's no need to specify the project for the resources.