ETL Tools
This topic describes how to perform ETL tasks by using the ETL toolkit in a containerized deployment of i2 Analyze.
The run_etl_toolkit_tool_as_i2_etl
client function is used to run the ETL tools described in this topic as the i2ETL user. For more information about this client function, see run_etl_toolkit_tool_as_i2_etl
Building an ETL Client image
The ETL client image is built from the Dockerfile in images/etl_client
.
The following docker build
command builds the configured image:
docker build -t "etlclient_redhat:4.4.4" "images/etl_client"
Add Information Store ingestion source
The addInformationStoreIngestionSource
tool defines an ingestion source in the Information Store. For more information about ingestion sources in the Information Store, see Defining an ingestion source.
You must provide the following arguments to the tool:
Argument | Description | Maximum characters |
---|---|---|
n |
A unique name for the ingestion source | 30 |
d |
A description of the ingestion source that might appear in the user interface | 100 |
Use the run_etl_toolkit_tool_as_i2_etl
client function to run the tool. For example:
run_etl_toolkit_tool_as_i2_etl
bash -c "/opt/i2/etltoolkit/addInformationStoreIngestionSource
-n <>
-d <> "
Drop Information Store error tables
The dropInformationStoreErrorTables
tool is used to remove the _ERROR
and _REJECT
tables from the Information Store.
Use the run_etl_toolkit_tool_as_i2_etl
client function to run the tool. For example:
run_etl_toolkit_tool_as_i2_etl
bash -c "/opt/i2/etltoolkit/dropInformationStoreErrorTables"
Clear Information Store staging schema
The clearInformationStoreStagingSchema
tool is used to clear all the tables in the Information Store Staging Schema.
Use the run_etl_toolkit_tool_as_i2_etl
client function to run the tool. For example:
run_etl_toolkit_tool_as_i2_etl
bash -c "/opt/i2/etltoolkit/clearInformationStoreStagingSchema"
Create Information Store staging table
The createInformationStoreStagingTable
tool creates the staging tables that you can use to ingest data into the Information Store. For more information about creating the tables, see Creating the staging tables.
You must provide the following arguments to the tool:
Argument | Description |
---|---|
stid |
The schema type identifier of the item type to create the staging table for |
sn |
The name of the database schema to create the staging table in |
tn |
The name of the staging table to create |
Use the run_etl_toolkit_tool_as_i2_etl
client function to run the tool. For example:
run_etl_toolkit_tool_as_i2_etl
bash -c "/opt/i2/etltoolkit/createInformationStoreStagingTable
-stid <>
-sn <>
-tn <> "
Ingest Information Store records
The ingestInformationStoreRecords
is used to ingest data into the Information Store. For more information about ingesting data into the Information Store, see The ingestInformationStoreRecords toolkit task
You can use the following arguments with the tool:
Argument | Description |
---|---|
imf |
The full path to the ingestion mapping file. |
imid |
The ingestion mapping identifier in the ingestion mapping file of the mapping to use |
im |
Optional: The import mode to use. Possible values are STANDARD, VALIDATE, BULK, DELETE, BULK_DELETE or DELETE_PREVIEW. The default is STANDARD. |
icf |
Optional: The full path to an ingestion settings file |
il |
Optional: A label for the ingestion that you can use to refer to it later |
lcl |
Optional: Whether (true/false) to log the links that were deleted/affected as a result of deleting entities |
Use the run_etl_toolkit_tool_as_i2_etl
client function to run the tool. For example:
run_etl_toolkit_tool_as_i2_etl
bash -c "/opt/i2/etltoolkit/ingestInformationStoreRecords
-imf <>
-imid <>
-im <>"
Sync Information Store records
The syncInformationStoreCorrelation
tool is used after an error during correlation, to synchronize the data in the Information Store with the data in the Solr index so that the data returns to a usable state
Use the run_etl_toolkit_tool_as_i2_etl
client function to run the tool. For example:
run_etl_toolkit_tool_as_i2_etl
bash -c "/opt/i2/etltoolkit/syncInformationStoreCorrelation"
Duplicate provenance check
The duplicateProvenanceCheck
tool can be used for identifying records in the Information Store with duplicate origin identifiers. Any provenance that has a duplicated origin identifier is added to a staging table in the Information Store.
Use the run_etl_toolkit_tool_as_i2_etl
client function to run the tool. For example:
runEtlToolkitTool
bash -c "/opt/i2/etltoolkit/syncInformationStoreCorrelation"
Duplicate provenance delete
The duplicateProvenanceDelete
tool deletes (entity/link) provenance from the Information Store that has duplicated origin identifiers. The provenance to delete is identified in the staging tables created by the duplicateProvenanceCheck
tool.
You can provide the following argument to the tool:
Argument | Description |
---|---|
stn |
The name of the staging table that contains the origin identifiers to delete. |
If no arguments are provided, duplicate origin identifiers are deleted from all staging tables.
Use the run_etl_toolkit_tool_as_i2_etl
client function to run the tool. For example:
run_etl_toolkit_tool_as_i2_etl
bash -c "/opt/i2/etltoolkit/syncInformationStoreCorrelation"
Generate Information Store index creation scripts
The generateInformationStoreIndexCreationScript
tool generates the scripts that create the indexes for each item type in the Information Store. For more information, see database index management
You must provide the following arguments to the tool:
Argument | Description |
---|---|
stid |
The schema type identifier of the item type to create the index creation scripts for. |
op |
The location to create the scripts. |
Use the run_etl_toolkit_tool_as_i2_etl
client function to run the tool. For example:
runEtlToolkitTask
bash -c "/opt/i2/etltoolkit/generateInformationStoreIndexCreationScript
-op <>
-stid <> "
Generate Information Store index drop scripts
The generateInformationStoreIndexDropScript
tool generates the scripts that drop the indexes for each item type in the Information Store. For more information, see database index management
You must provide the following arguments to the tool:
Argument | Description |
---|---|
stid |
The schema type identifier of the item type to create the index drop scripts for. |
op |
The location to create the scripts. |
Use the run_etl_toolkit_tool_as_i2_etl
client function to run the tool. For example:
runEtlToolkitTask
bash -c "/opt/i2/etltoolkit/generateInformationStoreIndexDropScript
--op <>
-stid <> "
Delete orphaned database objects
The deleteOrphanedDatabaseObjects
tool deletes (entity/link) database objects that are not associated with an i2 Analyze record from the Information Store.
You can provide the following arguments to the tool:
Argument | Description |
---|---|
iti |
Optional: The schema type identifier of the item type to delete orphaned database objects for. |
If no item type id is provided, orphaned objects for all item types are removed
Use the run_etl_toolkit_tool_as_i2_etl
client function to run the tool. For example:
run_etl_toolkit_tool_as_i2_etl
bash -c "/opt/i2/etltoolkit/deleteOrphanedDatabaseObjects
-iti <> "
Disable merged property values
The disableMergedPropertyValues
tool removes the database views used to define the property values of merged i2 Analyze records.
You can provide the following arguments to the tool:
Argument | Description |
---|---|
etd |
The location of the root of the etl toolkit. |
stid |
The schema type identifier to disable the views for. |
If no schema type identifier is provided, the views for all of the item types are be removed
Use the run_etl_toolkit_tool_as_dba
client function to run the tool. For example:
run_etl_toolkit_tool_as_i2_etl
bash -c "/opt/i2/etltoolkit/disableMergedPropertyValues
-etd <>
-stid <>"
For more information about correlation, see Information Store data correlation
Enable merge property values
The enableMergedPropertyValues
tool creates the database views used to define the property values of merged i2 Analyze records.
You can provide the following arguments to the tool:
Argument | Description |
---|---|
etd |
The location of the root of the etl toolkit. |
stid |
The schema type identifier to create the views for. |
If no schema type identifier is provided, the views for all of the item types are generated. If the views already exist, they are overwritten.
Use the run_etl_toolkit_tool_as_dba
client function to run the tool as the database administrator. For example:
run_etl_toolkit_tool_as_i2_etl
bash -c "/opt/i2/etltoolkit/enableMergedPropertyValues
-etd <>
-stid <> "
For more information about correlation, see Information Store data correlation