2.2.5. Kite Connector

2.2.5.1. Usage

To use the Kite Connector, create a link for the connector and a job that uses the link. For more information on Kite, checkout the kite documentation: http://kitesdk.org/docs/1.0.0/Kite-SDK-Guide.html.

2.2.5.1.2. FROM Job Configuration

Inputs associated with the Job configuration for the FROM direction include:

Input Type Description Example
URI String The Kite dataset URI to use. Required. See notes below. dataset:hdfs:/tmp/ns/ds

2.2.5.1.2.1. Notes

  1. The URI and the authority from the link configuration will be merged to create a complete dataset URI internally. If the given dataset URI contains authority, the authority from the link configuration will be ignored.
  2. Only hdfs and hive are supported currently.

2.2.5.1.3. TO Job Configuration

Inputs associated with the Job configuration for the TO direction include:

Input Type Description Example
URI String The Kite dataset URI to use. Required. See note below. dataset:hdfs:/tmp/ns/ds
File format Enum The format of the data the kite dataset should write out. Optional. See note below. PARQUET

2.2.5.1.3.1. Notes

  1. The URI and the authority from the link configuration will be merged to create a complete dataset URI internally. If the given dataset URI contains authority, the authority from the link configuration will be ignored.
  2. Only hdfs and hive are supported currently.

2.2.5.2. Partitioner

The kite connector only creates one partition currently.

2.2.5.3. Extractor

During the extraction phase, Kite is used to query a dataset. Since there is only one dataset to query, only a single reader is created to read the dataset.

NOTE: The avro schema kite generates will be slightly different than the original schema. This is because avro identifiers have strict naming requirements.

2.2.5.4. Loader

During the loading phase, Kite is used to write several temporary datasets. The number of temporary datasets is equivalent to the number of loaders that are being used.

2.2.5.5. Destroyers

The Kite connector TO destroyer merges all the temporary datasets into a single dataset.