Sqoop 5 Minutes Demo¶
This page will walk you through the basic usage of Sqoop. You need to have installed and configured Sqoop server and client in order to follow this guide. Installation procedure is described on Installation page. Please note that exact output shown in this page might differ from yours as Sqoop evolves. All major information should however remain the same.
Sqoop uses unique names or persistent ids to identify connectors, links, jobs and configs. We support querying a entity by its unique name or by its perisent database Id.
Start client in interactive mode using following command:
Configure client to use your Sqoop server:
sqoop:000> set server --host your.host.com --port 12000 --webapp sqoop
Verify that connection is working by simple version checking:
sqoop:000> show version --all client version: Sqoop 2.0.0-SNAPSHOT source revision 418c5f637c3f09b94ea7fc3b0a4610831373a25f Compiled by vbasavaraj on Mon Nov 3 08:18:21 PST 2014 server version: Sqoop 2.0.0-SNAPSHOT source revision 418c5f637c3f09b94ea7fc3b0a4610831373a25f Compiled by vbasavaraj on Mon Nov 3 08:18:21 PST 2014 API versions: [v1]
You should received similar output as shown above describing the sqoop client build version, the server build version and the supported versions for the rest API.
You can use the help command to check all the supported commands in the sqoop shell.
sqoop:000> help For information about Sqoop, visit: http://sqoop.apache.org/ Available commands: exit (\x ) Exit the shell history (\H ) Display, manage and recall edit-line history help (\h ) Display this help message set (\st ) Configure various client options and settings show (\sh ) Display various objects and configuration options create (\cr ) Create new object in Sqoop repository delete (\d ) Delete existing object in Sqoop repository update (\up ) Update objects in Sqoop repository clone (\cl ) Create new object based on existing one start (\sta) Start job stop (\stp) Stop job status (\stu) Display status of a job enable (\en ) Enable object in Sqoop repository disable (\di ) Disable object in Sqoop repository
Creating Job Object¶
Connectors implement the From for reading data from and/or To for writing data to. Generic JDBC Connector supports both of them List of supported directions for each connector might be seen in the output of show connector -all command above. In order to create a job we need to specifiy the From and To parts of the job uniquely identified by their link Ids. We already have 2 links created in the system, you can verify the same with the following command
sqoop:000> show links -all 2 link(s) to show: link with id 1 and name First Link (Enabled: true, Created by root at 11/4/14 4:27 PM, Updated by root at 11/4/14 4:27 PM) Using Connector id 2 Link configuration JDBC Driver Class: com.mysql.jdbc.Driver JDBC Connection String: jdbc:mysql://mysql.ent.cloudera.com/sqoop Username: sqoop Password: JDBC Connection Properties: protocol = tcp link with id 2 and name Second Link (Enabled: true, Created by root at 11/4/14 4:38 PM, Updated by root at 11/4/14 4:38 PM) Using Connector id 1 Link configuration HDFS URI: hdfs://nameservice1:8020/
Next, we can use the two link Ids to associate the From and To for the job.
sqoop:000> create job -f 1 -t 2 Creating job for links with from id 1 and to id 2 Please fill following values to create new job object Name: Sqoopy FromJob configuration Schema name:(Required)sqoop Table name:(Required)sqoop Table SQL statement:(Optional) Table column names:(Optional) Partition column name:(Optional) id Null value allowed for the partition column:(Optional) Boundary query:(Optional) ToJob configuration Output format: 0 : TEXT_FILE 1 : SEQUENCE_FILE Output format: 0 : TEXT_FILE 1 : SEQUENCE_FILE Choose: 0 Compression format: 0 : NONE 1 : DEFAULT 2 : DEFLATE 3 : GZIP 4 : BZIP2 5 : LZO 6 : LZ4 7 : SNAPPY 8 : CUSTOM Choose: 0 Custom compression format:(Optional) Output directory:(Required)/root/projects/sqoop Driver Config Extractors: 2 Loaders: 2 New job was successfully created with validation status OK and persistent id 1
Our new job object was created with assigned id 1.
Start Job ( a.k.a Data transfer )¶
You can start a sqoop job with the following command:
sqoop:000> start job --jid 1 Submission details Job ID: 1 Server URL: http://localhost:12000/sqoop/ Created by: root Creation date: 2014-11-04 19:43:29 PST Lastly updated by: root External ID: job_1412137947693_0001 http://vbsqoop-1.ent.cloudera.com:8088/proxy/application_1412137947693_0001/ 2014-11-04 19:43:29 PST: BOOTING - Progress is not available
You can iteratively check your running job status with status job command:
sqoop:000> status job --jid 1 Submission details Job ID: 1 Server URL: http://localhost:12000/sqoop/ Created by: root Creation date: 2014-11-04 19:43:29 PST Lastly updated by: root External ID: job_1412137947693_0001 http://vbsqoop-1.ent.cloudera.com:8088/proxy/application_1412137947693_0001/ 2014-11-04 20:09:16 PST: RUNNING - 0.00 %
And finally you can stop running the job at any time using stop job command:
sqoop:000> stop job --jid 1