Ingesting data into the Information Store

The examples/pre-prod/walkthroughs/change-management/ingestDataWalkthrough.sh script is a worked example that demonstrates how to script the process of ingesting data into the Information Store.

Note: Before you complete this walkthrough, reset your environment to the base configuration. For more information, see Resetting your environment.

The example provides a scripted mechanism to ingest data by using ETL toolkit. The script ingests data with and without correlation identifiers using both STANDARD and BULK import modes.
For more information about ingestion, see:

To script the ingestion process, the following information is stored within the script:

The mapping of item type identifiers to staging tables
The mapping of staging tables to CSV files and format files.
A list of import mapping identifiers from the mapping file.

The script uses the run_sql_server_command_as_etl client function to import the data into the staging tables, and run_etl_toolkit_tool_as_i2_etl to run the ingestion commands. Note the external ETL user is performing the import into the staging tables, but the i2 Internal ETL user is executing ETL tasks such as creating the ingestion source or performing the import from the staging tables.

The data and ingestion artifacts that are used in the example are in the examples/data directory of the minimal toolkit.

Creating the ingestion sources

The run_etl_toolkit_tool_as_i2_etl client function is used to run addInformationStoreIngestionSource ETL toolkit tool to create the ingestion sources.

run_etl_toolkit_tool_as_i2_etl
addInformationStoreIngestionSource
For more information about ingestion sources in the Information Store, see Defining an ingestion source

For an example of how to use the tool, see the Create the ingestion sources section in the ingestDataWalkthrough.sh

Creating the staging tables

The run_etl_toolkit_tool_as_i2_etl client function is used to run createInformationStoreStagingTable ETL toolkit tool to create the staging tables.

run_etl_toolkit_tool_as_i2_etl
createInformationStoreStagingTable
For more information about creating staging tables in the Information Store, see Creating the staging tables

To create all of the staging tables in the example, schema type identifiers are mapped to staging table names. For an example of how the mappings are used, see the Create the staging tables section in the ingestDataWalkthrough.sh

Note: Because there are multiple staging tables for the LAC1 link type, a second loop is used to iterate through the staging table names of each schema type.

Ingesting data

The walkthrough demonstrates how to ingest both non-correlated and correlated data. For each type of data the staging tables and populated, the data ingestion task is executed, then the staging tables are cleaned.

Ingesting non correlated data

In this case, the SQL Server BULK INSERT command is used to insert the CSV data into the staging tables.

Ingesting correlated data

See the Ingesting correlated data section of the walkthrough script.

The ingestion procedure

The run_sql_server_command_as_etl client function is used to run the sql statement that inserts the data into the staging tables.

The example uses CSV and format files to insert the data. The files have the same name (person.csv and person.fmt). The staging table names are mapped to the CSV and format files. There is one mapping for base data (BASE_DATA_TABLE_TO_CSV_AND_FORMAT_FILE_NAME) and one for correlation data (CORRELATED_DATA_TABLE_AND_FORMAT_FILE_NAME). For an example of how the mappings are used, see the Insert the base data into the staging tables section in the ingestDataWalkthrough.sh

The run_etl_toolkit_tool_as_i2_etl client function is used to run ingestInformationStoreRecords ETL toolkit tool to ingest the data into the Information Store from the staging tables.

The ingestInformationStoreRecords tool is used with BULK and STANDARD import modes. Standard import mode is used to ingest the correlation data sets. Bulk import mode is used to ingest the base data set.

The import mapping identifiers to use with the ingestInformationStoreRecords tool are defined in the IMPORT_MAPPING_IDS and BULK_IMPORT_MAPPING_IDS lists.

A loop is used to ingest data for each mapping identifier in the lists.