Sqoop 2 Connectors ¶

This document describes how to use the built-in connectors. This includes a detailed description of how connectors partition, format their output, extract data, and load data.

Contents

Sqoop 2 Connectors
- Generic JDBC Connector
  - Usage
  - Partitioner
  - Extractor
  - Loader
  - Destroyers

Generic JDBC Connector ¶

The Generic JDBC Connector can connect to any data source that adheres to the JDBC 4 specification.

Usage ¶

To use the Generic JDBC Connector, create a link for the connector and a job that uses the link.

Link Configuration¶

Inputs associated with the link configuration include:

Input	Type	Description	Example
JDBC Driver Class	String	The full class name of the JDBC driver. Required and accessible by the Sqoop server.	com.mysql.jdbc.Driver
JDBC Connection String	String	The JDBC connection string to use when connecting to the data source. Required. Connectivity upon creation is optional.	jdbc:mysql://localhost/test
Username	String	The username to provide when connecting to the data source. Optional. Connectivity upon creation is optional.	sqoop
Password	String	The password to provide when connecting to the data source. Optional. Connectivity upon creation is optional.	sqoop
JDBC Connection Properties	Map \| A map of JDBC connection properties to pass to the JDBC driver Optional.		profileSQL=true&useFastDateParsing=false

FROM Job Configuration¶

Inputs associated with the Job configuration for the FROM direction include:

Input	Type	Description	Example
Schema name	String	The schema name the table is part of. Optional	sqoop
Table name	String	The table name to import data from. Optional. See note below.	test
Table SQL statement	String	The SQL statement used to perform a free form query. Optional. See notes below.	`SELECT COUNT(*) FROM test ${CONDITIONS}`
Table column names	String	Columns to extract from the JDBC data source. Optional Comma separated list of columns.	col1,col2
Partition column name	Map	The column name used to partition the data transfer process. Optional. Defaults to primary key of table.	col1
Null value allowed for the partition column	Boolean	True or false depending on whether NULL values are allowed in data of the Partition column. Optional.	true
Boundary query	String \| The query used to define an upper and lower boundary when partitioning. Optional.

Notes¶

Table name and Table SQL statement are mutually exclusive. If Table name is provided, the Table SQL statement should not be provided. If Table SQL statement is provided then Table name should not be provided.
Table column names should be provided only if Table name is provided.
If there are columns with similar names, column aliases are required. For example: SELECT table1.id as "i", table2.id as "j" FROM table1 INNER JOIN table2 ON table1.id = table2.id.

TO Job Configuration¶

Inputs associated with the Job configuration for the TO direction include:

Input	Type	Description	Example
Schema name	String	The schema name the table is part of. Optional	sqoop
Table name	String	The table name to import data from. Optional. See note below.	test
Table SQL statement	String	The SQL statement used to perform a free form query. Optional. See note below.	`INSERT INTO test (col1, col2) VALUES (?, ?)`
Table column names	String	Columns to insert into the JDBC data source. Optional Comma separated list of columns.	col1,col2
Stage table name	String	The name of the table used as a staging table. Optional.	staging
Should clear stage table	Boolean \| True or false depending on whether the staging table should be cleared after the data transfer has finished. Optional.		true

Notes¶

Table name and Table SQL statement are mutually exclusive. If Table name is provided, the Table SQL statement should not be provided. If Table SQL statement is provided then Table name should not be provided.
Table column names should be provided only if Table name is provided.

The Generic JDBC Connector partitioner generates conditions to be used by the extractor. It varies in how it partitions data transfer based on the partition column data type. Though, each strategy roughly takes on the following form:

(upper boundary - lower boundary) / (max partitions)

By default, the primary key will be used to partition the data unless otherwise specified.

The following data types are currently supported:

TINYINT
SMALLINT
INTEGER
BIGINT
REAL
FLOAT
DOUBLE
NUMERIC
DECIMAL
BIT
BOOLEAN
DATE
TIME
TIMESTAMP
CHAR
VARCHAR
LONGVARCHAR

Extractor ¶

During the extraction phase, the JDBC data source is queried using SQL. This SQL will vary based on your configuration.

If Table name is provided, then the SQL statement generated will take on the form SELECT * FROM <table name>.
If Table name and Columns are provided, then the SQL statement generated will take on the form SELECT <columns> FROM <table name>.
If Table SQL statement is provided, then the provided SQL statement will be used.

The conditions generated by the partitioner are appended to the end of the SQL query to query a section of data.

The Generic JDBC connector extracts CSV data usable by the CSV Intermediate Data Format.

Loader ¶

During the loading phase, the JDBC data source is queried using SQL. This SQL will vary based on your configuration.

If Table name is provided, then the SQL statement generated will take on the form INSERT INTO <table name> (col1, col2, ...) VALUES (?,?,..).
If Table name and Columns are provided, then the SQL statement generated will take on the form INSERT INTO <table name> (<columns>) VALUES (?,?,..).
If Table SQL statement is provided, then the provided SQL statement will be used.

This connector expects to receive CSV data consumable by the CSV Intermediate Data Format.

Destroyers ¶

The Generic JDBC Connector performs two operations in the destroyer in the TO direction:

Copy the contents of the staging table to the desired table.
Clear the staging table.

No operations are performed in the FROM direction.

Apache Sqoop documentation