This project has retired. For details please refer to its Attic page.
Sqoop Java Client API Guide — Apache Sqoop documentation

Apache Sqoop documentation

Sqoop Java Client API Guide

Contents

Sqoop Java Client API Guide

This document will explain how to use Sqoop Java Client API with external application. Client API allows you to execute the functions of sqoop commands. It requires Sqoop Client JAR and its dependencies.

The main class that provides wrapper methods for all the supported operations is the

public class SqoopClient {
  ...
}

Java Client API is explained using Generic JDBC Connector example. Before executing the application using the sqoop client API, check whether sqoop server is running.

Workflow

Given workflow has to be followed for executing a sqoop job in Sqoop server.

  1. Create LINK object for a given connectorId - Creates Link object and returns linkId (lid)
  2. Create a JOB for a given “from” and “to” linkId - Create Job object and returns jobId (jid)
  3. Start the JOB for a given jobId - Start Job on the server and creates a submission record

Project Dependencies

Here given maven dependency

<dependency>
  <groupId>org.apache.sqoop</groupId>
    <artifactId>sqoop-client</artifactId>
    <version>${requestedVersion}</version>
</dependency>

Initialization

First initialize the SqoopClient class with server URL as argument.

String url = "http://localhost:12000/sqoop/";
SqoopClient client = new SqoopClient(url);

Server URL value can be modfied by setting value to setServerUrl(String) method

client.setServerUrl(newUrl);

Job

A sqoop job holds the From and To parts for transferring data from the From data source to the To data source. Both the From and the To are uniquely identified by their corresponding connector Link Ids. i.e when creating a job we have to specifiy the FromLinkId and the ToLinkId. Thus the pre-requisite for creating a job is to first create the links as described above.

Once the linkIds for the From and To are given, then the job configs for the associated connector for the link object have to be filled. You can get the list of all the from and to job config/inputs using Display Config and Input Names For Connector for that connector. A connector can have one or more links. We then use the links in the From and To direction to populate the corresponding MFromConfig and MToConfig respectively.

In addition to filling the job configs for the From and the To representing the link, we also need to fill the driver configs that control the job execution engine environment. For example, if the job execution engine happens to be the MapReduce we will specifiy the number of mappers to be used in reading data from the From data source.

Save Job

Here is the code to create and then save a job

String url = "http://localhost:12000/sqoop/";
SqoopClient client = new SqoopClient(url);
//Creating dummy job object
long fromLinkId = 1;// for jdbc connector
long toLinkId = 2; // for HDFS connector
MJob job = client.createJob(fromLinkId, toLinkId);
job.setName("Vampire");
job.setCreationUser("Buffy");
// set the "FROM" link job config values
MFromConfig fromJobConfig = job.getFromJobConfig();
fromJobConfig.getStringInput("fromJobConfig.schemaName").setValue("sqoop");
fromJobConfig.getStringInput("fromJobConfig.tableName").setValue("sqoop");
fromJobConfig.getStringInput("fromJobConfig.partitionColumn").setValue("id");
// set the "TO" link job config values
MToConfig toJobConfig = job.getToJobConfig();
toJobConfig.getStringInput("toJobConfig.outputDirectory").setValue("/usr/tmp");
// set the driver config values
MDriverConfig driverConfig = job.getDriverConfig();
driverConfig.getStringInput("throttlingConfig.numExtractors").setValue("3");

Status status = client.saveJob(job);
if(status.canProceed()) {
 System.out.println("Created Job with Job Id: "+ job.getPersistenceId());
} else {
 System.out.println("Something went wrong creating the job");
}

User can retrieve a job using the following methods

Method Description
getJob(jid) Returns a job by id
getJobs() Returns list of jobs in the sqoop

List of status codes

Function Description
OK There are no issues, no warnings.
WARNING Validated entity is correct enough to be proceed. Not a fatal error
ERROR There are serious issues with validated entity. We can’t proceed until reported issues will be resolved.

View Error or Warning valdiation message

In case of any WARNING AND ERROR status, user has to iterate the list of validation messages.

printMessage(link.getConnectorLinkConfig().getConfigs());

private static void printMessage(List<MConfig> configs) {
  for(MConfig config : configs) {
    List<MInput<?>> inputlist = config.getInputs();
    if (config.getValidationMessages() != null) {
     // print every validation message
     for(Message message : config.getValidationMessages()) {
      System.out.println("Config validation message: " + message.getMessage());
     }
    }
    for (MInput minput : inputlist) {
      if (minput.getValidationStatus() == Status.WARNING) {
       for(Message message : config.getValidationMessages()) {
        System.out.println("Config Input Validation Warning: " + message.getMessage());
      }
    }
    else if (minput.getValidationStatus() == Status.ERROR) {
      for(Message message : config.getValidationMessages()) {
       System.out.println("Config Input Validation Error: " + message.getMessage());
      }
     }
    }
   }

Job Start

Starting a job requires a job id. On successful start, getStatus() method returns “BOOTING” or “RUNNING”.

//Job start
long jobId = 1;
MSubmission submission = client.startJob(jobId);
System.out.println("Job Submission Status : " + submission.getStatus());
if(submission.getStatus().isRunning() && submission.getProgress() != -1) {
  System.out.println("Progress : " + String.format("%.2f %%", submission.getProgress() * 100));
}
System.out.println("Hadoop job id :" + submission.getExternalId());
System.out.println("Job link : " + submission.getExternalLink());
Counters counters = submission.getCounters();
if(counters != null) {
  System.out.println("Counters:");
  for(CounterGroup group : counters) {
    System.out.print("\t");
    System.out.println(group.getName());
    for(Counter counter : group) {
      System.out.print("\t\t");
      System.out.print(counter.getName());
      System.out.print(": ");
      System.out.println(counter.getValue());
    }
  }
}
if(submission.getExceptionInfo() != null) {
  System.out.println("Exception info : " +submission.getExceptionInfo());
}


//Check job status for a running job
MSubmission submission = client.getJobStatus(jobId);
if(submission.getStatus().isRunning() && submission.getProgress() != -1) {
  System.out.println("Progress : " + String.format("%.2f %%", submission.getProgress() * 100));
}

//Stop a running job
submission.stopJob(jobId);

Above code block, job start is asynchronous. For synchronous job start, use startJob(jid, callback, pollTime) method. If you are not interested in getting the job status, then invoke the same method with “null” as the value for the callback parameter and this returns the final job status. pollTime is the request interval for getting the job status from sqoop server and the value should be greater than zero. We will frequently hit the sqoop server if a low value is given for the pollTime. When a synchronous job is started with a non null callback, it first invokes the callback’s submitted(MSubmission) method on successful start, after every poll time interval, it then invokes the updated(MSubmission) method on the callback API and finally on finishing the job executuon it invokes the finished(MSubmission) method on the callback API.

Display Config and Input Names For Connector

You can view the config/input names for the link and job config types per connector

String url = "http://localhost:12000/sqoop/";
SqoopClient client = new SqoopClient(url);
long connectorId = 1;
// link config for connector
describe(client.getConnector(connectorId).getLinkConfig().getConfigs(), client.getConnectorConfigBundle(connectorId));
// from job config for connector
describe(client.getConnector(connectorId).getFromConfig().getConfigs(), client.getConnectorConfigBundle(connectorId));
// to job config for the connector
describe(client.getConnector(connectorId).getToConfig().getConfigs(), client.getConnectorConfigBundle(connectorId));

void describe(List<MConfig> configs, ResourceBundle resource) {
  for (MConfig config : configs) {
    System.out.println(resource.getString(config.getLabelKey())+":");
    List<MInput<?>> inputs = config.getInputs();
    for (MInput input : inputs) {
      System.out.println(resource.getString(input.getLabelKey()) + " : " + input.getValue());
    }
    System.out.println();
  }
}

Above Sqoop 2 Client API tutorial explained how to create a link, create job and and then start the job.

Contents