2.4. Sqoop 5 Minutes Demo

This page will walk you through the basic usage of Sqoop. You need to have installed and configured Sqoop server and client in order to follow this guide. Installation procedure is described in Installation. Please note that exact output shown in this page might differ from yours as Sqoop evolves. All major information should however remain the same.

Sqoop uses unique names or persistent ids to identify connectors, links, jobs and configs. We support querying a entity by its unique name or by its perisent database Id.

2.4.1. Starting Client

Start client in interactive mode using following command:

sqoop2-shell

Configure client to use your Sqoop server:

sqoop:000> set server --host your.host.com --port 12000 --webapp sqoop

Verify that connection is working by simple version checking:

sqoop:000> show version --all
client version:
  Sqoop 2.0.0-SNAPSHOT source revision 418c5f637c3f09b94ea7fc3b0a4610831373a25f
  Compiled by vbasavaraj on Mon Nov  3 08:18:21 PST 2014
server version:
  Sqoop 2.0.0-SNAPSHOT source revision 418c5f637c3f09b94ea7fc3b0a4610831373a25f
  Compiled by vbasavaraj on Mon Nov  3 08:18:21 PST 2014
API versions:
  [v1]

You should received similar output as shown above describing the sqoop client build version, the server build version and the supported versions for the rest API.

You can use the help command to check all the supported commands in the sqoop shell.

sqoop:000> help
For information about Sqoop, visit: http://sqoop.apache.org/

Available commands:
  exit    (\x  ) Exit the shell
  history (\H  ) Display, manage and recall edit-line history
  help    (\h  ) Display this help message
  set     (\st ) Configure various client options and settings
  show    (\sh ) Display various objects and configuration options
  create  (\cr ) Create new object in Sqoop repository
  delete  (\d  ) Delete existing object in Sqoop repository
  update  (\up ) Update objects in Sqoop repository
  clone   (\cl ) Create new object based on existing one
  start   (\sta) Start job
  stop    (\stp) Stop job
  status  (\stu) Display status of a job
  enable  (\en ) Enable object in Sqoop repository
  disable (\di ) Disable object in Sqoop repository

2.4.3. Creating Job Object

Connectors implement the From for reading data from and/or To for writing data to. Generic JDBC Connector supports both of them List of supported directions for each connector might be seen in the output of show connector -all command above. In order to create a job we need to specifiy the From and To parts of the job uniquely identified by their link Ids. We already have 2 links created in the system, you can verify the same with the following command

sqoop:000> show link --all
2 link(s) to show:
link with name First Link (Enabled: true, Created by root at 11/4/14 4:27 PM, Updated by root at 11/4/14 4:27 PM)
Using Connector with name generic-jdbc-connector
  Link configuration
    JDBC Driver Class: com.mysql.jdbc.Driver
    JDBC Connection String: jdbc:mysql://mysql.ent.cloudera.com/sqoop
    Username: sqoop
    Password:
    JDBC Connection Properties:
      protocol = tcp
link with name Second Link (Enabled: true, Created by root at 11/4/14 4:38 PM, Updated by root at 11/4/14 4:38 PM)
Using Connector with name hdfs-connector
  Link configuration
    HDFS URI: hdfs://nameservice1:8020/

Next, we can use the two link names to associate the From and To for the job.

 sqoop:000> create job -f "First Link" -t "Second Link"
 Creating job for links with from name First Link and to name Second Link
 Please fill following values to create new job object
 Name: Sqoopy

 FromJob configuration

  Schema name:(Required)sqoop
  Table name:(Required)sqoop
  Table SQL statement:(Optional)
  Table column names:(Optional)
  Partition column name:(Optional) id
  Null value allowed for the partition column:(Optional)
  Boundary query:(Optional)

ToJob configuration

  Output format:
   0 : TEXT_FILE
   1 : SEQUENCE_FILE
  Choose: 0
  Compression format:
   0 : NONE
   1 : DEFAULT
   2 : DEFLATE
   3 : GZIP
   4 : BZIP2
   5 : LZO
   6 : LZ4
   7 : SNAPPY
   8 : CUSTOM
  Choose: 0
  Custom compression format:(Optional)
  Output directory:(Required)/root/projects/sqoop

  Driver Config
  Extractors:(Optional) 2
  Loaders:(Optional) 2
  New job was successfully created with validation status OK  and name jobName

Our new job object was created with assigned name Sqoopy. Note that if null value is allowed for the partition column, at least 2 extractors are needed for Sqoop to carry out the data transfer. On specifying 1 extractor in this scenario, Sqoop shall ignore this setting and continue with 2 extractors.

2.4.4. Start Job ( a.k.a Data transfer )

You can start a sqoop job with the following command:

sqoop:000> start job -name Sqoopy
Submission details
Job Name: Sqoopy
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2014-11-04 19:43:29 PST
Lastly updated by: root
External ID: job_1412137947693_0001
  http://vbsqoop-1.ent.cloudera.com:8088/proxy/application_1412137947693_0001/
2014-11-04 19:43:29 PST: BOOTING  - Progress is not available

You can iteratively check your running job status with status job command:

sqoop:000> status job -n Sqoopy
Submission details
Job Name: Sqoopy
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2014-11-04 19:43:29 PST
Lastly updated by: root
External ID: job_1412137947693_0001
  http://vbsqoop-1.ent.cloudera.com:8088/proxy/application_1412137947693_0001/
2014-11-04 20:09:16 PST: RUNNING  - 0.00 %

Alternatively you can start a sqoop job and observe job running status with the following command:

sqoop:000> start job -n Sqoopy -s
Submission details
Job Name: Sqoopy
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2014-11-04 19:43:29 PST
Lastly updated by: root
External ID: job_1412137947693_0001
  http://vbsqoop-1.ent.cloudera.com:8088/proxy/application_1412137947693_0001/
2014-11-04 19:43:29 PST: BOOTING  - Progress is not available
2014-11-04 19:43:39 PST: RUNNING  - 0.00 %
2014-11-04 19:43:49 PST: RUNNING  - 10.00 %

And finally you can stop running the job at any time using stop job command:

sqoop:000> stop job -n Sqoopy