Ingesting development data into the Information Store
When you are developing a configuration, ingest a small amount of representative test data into the system to ensure the schema is suitable for your data and you can configure i2 Analyze to meet your requirements. For more information about ingesting data, see Ingesting data into the Information Store.
Ensure that the DEPLOYMENT_PATTERN
variable in the <config_name>/utils/variables.conf
file is set to a pattern that includes the Information Store.
For example:
DEPLOYMENT_PATTERN="i2c_istore"
Process overview:
- Provide a data set
- Create the ingestion sources
- Provide and run scripts to complete the ingestion process
If you have deployed with the law enforcement schema, complete the steps in Example ingestion process to ingest example data into your Information Store.
Data sets
The /i2a-data
directory is used to contain the data sets that you ingest into the Information Store of a deployment. The data that you ingest into the Information Store must conform to the Information Store schema. However, one data set can be ingested with different configs.
There is not a 1-to-1 mapping between data sets and configs. Each data set must contain at least one ingestion script. This script contains the functions that populate the staging tables with your data and calls the ETL toolkit tools that ingest the data.
The expected directory structure is as follows:
- i2a-data
- <data_set>
- scripts
- <script1>
- <script2>
Ingesting data into the config dev environment
The manage-data
command is used to manage the ingestion process in the dev environment.
To ingest data into the Information Store, you must create scripts that call the ETL tools that complete the actions required by i2 Analyze for you ingest data.
For more information, see: - ETL tools - Ingesting data into the Information Store
- You can find example scripts in the
examples/ingestion/scripts
directory. These scripts demonstrate how to create staging tables, populate them, and ingest the data into the Information Store.
To run scripts, the manage-data
command is called as follows:
manage-data -c <config_name> -t ingest -d <data_set> -s <script_name>
Where:
<config_name>
is the name of the config that is currently deployed and running in the config dev environment<data_set>
is the name of a directory ini2a-data
<script_name>
is the name of a script in the directory specified for <data_set>
Creating ingestion sources
The ingestion sources for a config are contained in the configuration. Ingestion sources are defined in <config_name>/configuration/ingestion/scripts/create-ingestion-sources
.
- Copy the
examples/ingestion/scripts/create-ingestion-sources
file to the<config_name>/configuration/ingestion/scripts/
directory.
In the script, the INGESTION_SOURCES
array contains the name and description of 2 example sources.
INGESTION_SOURCES=(
[Example Ingestion Source 1]=EXAMPLE_1
[Example Ingestion Source 2]=EXAMPLE_2
)
You can modify or add to the array of ingestion sources.
To create the ingestion sources in the array, the manage-data
command is called as follows:
manage-data -c <config_name> -t sources
Example ingestion process
The i2 Analyze minimal toolkit contains the example law-enforcement-data-set-1
data that can be ingested when the example law enforcement schema (law-enforcement-schema.xml
) is deployed. This contains a number of CSV files that contain the data, and a mapping.xml file. For more information about the mapping file, see Ingestion mapping files.
Before you can ingest the law enforcement example data, complete the following steps to provide the data set and scripts:
- Copy the
pre-reqs/i2analyze/toolkit/examples/data/law-enforcement-data-set-1
directory to thei2a-data
directory. Copy the
examples/ingestion/scripts
directory to thei2a-data/law-enforcement-data-set-1
directory.
The directory structure is as follows:- i2a-data - law-enforcement-data-set-1 - scripts - ingest-law-enforcement-data-set-1 - create-staging-tables
Copy the
examples/ingestion/scripts/create-ingestion-sources
file to the<config_name>/configuration/ingestion/scripts/
directory.Use the
manage-data
to create the ingestion sources defined in the examplecreate-ingestion-sources
.
For example:manage-data -c config-development -t sources
The example scripts separate the creation of the staging tables from the ingestion of data. To ingest the example data into the config-development config, run the following commands:
manage-data -c config-development -t ingest -d law-enforcement-data-set-1 -s create-staging-tables manage-data -c config-development -t ingest -d law-enforcement-data-set-1 -s ingest-law-enforcement-data-set-1
The manage-data script
The scripts/manage-data
script is used to manage data in an environment. It can be used to run scripts that use the ETL toolkit tools, or to remove all data from the Information Store.
The following usage and help is provided for the manage-data
command:
Usage:
manage-data -c <config_name> -t {ingest} -d <data_set> -s <script_name> [-v]
manage-data -c <config_name> -t {sources} [-s <script_name>] [-v]
manage-data -c <config_name> -t {delete} [-v]
manage-data -h
Options:
-c <config_name> Name of the config to use.
-t {delete|ingest|sources} The task to run. Either delete or ingest data, or add ingestion sources. Delete permanently removes all data from the database.
-d <data_set> Name of the data set to ingest.
-s <script_name> Name of the ingestion script file.
-v Verbose output.
-h Display the help.
After you add data to your environment, you can configure the rest of the configuration.