Querying an external data source
In this part of the walkthrough, you will connect to the NYPD Complaint Dataset as your external data source, and marshal the data into entities, links, and properties so that you can return results that can be displayed in Analyst's Notebook.
Again, use the troubleshooting guide if you need to.
Create an app token
To connect to the NYPD dataset, you need an app token that allows you to make unlimited requests (within reason) to Socrata's API. If you don't use an app token, the APIs will throttle by IP address.
Visit this link to sign in or create an account.
Click your name at the top right to access My Profile.
Click Edit Account Settings.
In the side pane, click Developer Settings.
At the bottom of the page, click Create New App Token. Specify your own "Application Name" and "Description", and save.
If you leave the site, you can always retrieve the app token by logging into your account again.
Access the data
To start the process of accessing the data, create a file named data-service.ts
alongside the index.ts
file in the src
folder. This new file will ultimately contain the code for making requests to the data source.
Make the connector configurable
To be able to access and query the NYPD Complaint Dataset, you need to configure your connector with settings for both the URL and the app token.
In config/settings.json
, the socrata_url
, as follows:
{
"socrata_url": "https://data.cityofnewyork.us/resource/7x9x-zpz6.json"
}
Unlike the URL, the app token is a secret, and so you should access it from the environment. In this example, the environment variable will be named ENV_SOCRATA_TOKEN
.
Within the config/settings.json
file we can add a setting named token
and populate the value from the environment.
We can also add a .env
file at the root of the connector that will load the value into the Node.js environment when running the connector locally.
Important: The .env
should not be checked in or distributed as it will contain your own secret API token.
For example, the config/settings.json
file:
{
"socrata_url": "https://data.cityofnewyork.us/resource/7x9x-zpz6.json",
"token": "${env.ENV_SOCRATA_TOKEN}"
}
And the .env
file:
ENV_SOCRATA_TOKEN=<YOUR TOKEN>
You can also add a .env.sample
file to the connector root that contains information and examples for other developers and deployers. Always use dummy data in this file. For example:
# The token required by the connector to access the data source.
# To generate a token, navigate to https://data.cityofnewyork.us/profile/app_tokens
# Then, from "Developer Settings" run "Create New App Token" and follow the on-screen instructions.
ENV_SOCRATA_TOKEN=abc123
Requests to the data source will require access to the configuration information. In data-service.ts
, add an import to bring in the settings
namespace from the @i2analyze/i2connect
package and assign constants for the configuration values.
import { settings } from '@i2analyze/i2connect';
const baseUrl = settings.getString('socrata_url', true);
const token = settings.getString('token');
See the topic in the Further Materials section for more information about making a connector configurable.
Model the data returned from the data source
Next, you're going to create a TypeScript interface that models an individual complaint from the NYPD dataset. Some data sources might already have type definitions that you could use to model the data returned, so it's worth checking before creating your own! Add the following interface to data-service.ts
.
export interface IComplaint {
/** Complaint number */
readonly cmplnt_num: string;
/** Crime status */
readonly crm_atpt_cptd_cd: string;
/** Jurisdiction code */
readonly jurisdiction_code: string;
/** Offense classification code */
readonly ky_cd: string;
/** Level of offense */
readonly law_cat_cd: string;
/** Offense description */
readonly ofns_desc: string;
/** Precinct code */
readonly addr_pct_cd: string;
/** Borough name */
readonly boro_nm: string;
/** Coordinates - latitude */
readonly latitude: string;
/** Coordinates - longitude */
readonly longitude: string;
/** Victim age group */
readonly vic_age_group: string;
/** Victim race */
readonly vic_race: string;
/** Victim sex */
readonly vic_sex: string;
/** Suspect age group */
readonly susp_age_group: string;
/** Suspect race */
readonly susp_race: string;
/** Suspect sex */
readonly susp_sex: string;
}
Create a function to query the data source
To make an HTTP request from your connector, this example uses an npm package called node-fetch (alternatives are available). You need to add the node-fetch
package along with the @types/node-fetch
type definitions to your connector.
npm
npm install node-fetch@2.6.7
npm install -D @types/node-fetch@2.5.12
yarn
yarn add node-fetch
yarn add -D @types/node-fetch
You must then import the installed packages, along with the URL
class, at the top of data-service.ts
with the following code:
import fetch from 'node-fetch';
import { URL } from 'url';
In the same file, add a function called requestData()
that the connector services can use to query the NYPD data source. The inputs to this function will be the query parameters that for constructing the URL for the query, along with the variables that you defined for the token
and baseUrl
earlier.
/**
* Request some data from the NYPD dataset.
*
* @param queryParams - The request object that will be encoded into the query parameters. See https://dev.socrata.com/docs/queries/ for more details.
*/
export async function requestData(queryParams: Record<string, string>): Promise<IComplaint[]> {
const url = new URL(baseUrl);
for (const [key, value] of Object.entries(queryParams)) {
url.searchParams.append(key, value);
}
// Append the token value if it exists.
if (token) {
url.searchParams.append('$$app_token', token);
}
const response = await fetch(url.href);
if (response.status === 200) {
return (await response.json()) as IComplaint[];
} else {
throw new Error(response.statusText);
}
}
Adapting the data
Currently, the NYPD Connector: Get all service creates a Location
entity, a Complaint
entity, and a Locatedat
link, using the specified schema. Now, you'll update the service to query the NYPD data source and adapt the returned data to align with that schema.
First, add the following imports to your result-building.ts
file:
import { IComplaint } from './data-service';
import { nypdcomplaintdataschema as schema } from './schema/nypd-complaint-data-schema';
const { Complaint, Location, Person } = schema.entityTypes;
Then, add the functions to adapt the data from the source data model to your schema types.
The code below demonstrates the addition of a Location entity, given a Complaint. As part of adapting the data, you have to:
Convert the
addr_pct_cd
property from a string to an integer, which is ultimately a number primitive type in JavaScript. This can be done using the parseInt function.Combine the
longitude
andlatitude
into a GeoJSON point (assuming that the longitude and latitude are in the WGS84 spatial reference system) to set theCoordinates
property on theLocation
entity. In doing so, thelongitude
andlatitude
need to be parsed as floats using the parseFloat function.
Add the following to result-building.ts
:
export function addLocation(datum: IComplaint, result: services.IResult): records.IResultEntityRecord<typeof Location> {
const locationId = `Borough: ${datum.boro_nm} Precinct: ${datum.addr_pct_cd}`;
const entity = result.addEntity(Location, locationId);
entity.setProperties({
'Precinct Code': parseInt(datum.addr_pct_cd, 10),
'Borough Name': datum.boro_nm,
Coordinates: {
type: 'Point',
coordinates: [parseFloat(datum.longitude), parseFloat(datum.latitude)],
},
});
entity.setSourceReference(sourceReference);
return entity;
}
Next, add three similar functions for creating Complaint, Suspect, and Victim entities:
export function addComplaint(
datum: IComplaint,
result: services.IResult
): records.IResultEntityRecord<typeof Complaint> {
const complaintId = `Complaint: ${datum.cmplnt_num}`;
const entity = result.addEntity(Complaint, complaintId);
entity.setProperties({
'Complaint Number': datum.cmplnt_num,
'Crime Status': datum.crm_atpt_cptd_cd,
'Jurisdiction Code': parseInt(datum.jurisdiction_code, 10),
'Offence Classification Code': parseInt(datum.ky_cd, 10),
'Level Of Offence': datum.law_cat_cd,
'Offence Description': datum.ofns_desc,
});
entity.setSourceReference(sourceReference);
return entity;
}
export function addSuspect(datum: IComplaint, result: services.IResult): records.IResultEntityRecord<typeof Person> {
const suspectId = `Suspect: ${datum.cmplnt_num}`;
const entity = result.addEntity(Person, suspectId);
entity.setProperties({
'Age Group': datum.susp_age_group,
Race: datum.susp_race,
Sex: datum.susp_sex,
});
entity.setSourceReference(sourceReference);
return entity;
}
export function addVictim(datum: IComplaint, result: services.IResult): records.IResultEntityRecord<typeof Person> {
const victimId = `Victim: ${datum.cmplnt_num}`;
const entity = result.addEntity(Person, victimId);
entity.setProperties({
'Age Group': datum.vic_age_group,
Race: datum.vic_race,
Sex: datum.vic_sex,
});
entity.setSourceReference(sourceReference);
return entity;
}
Update the service to return real data
Now you just need to wire everything together, so that your service returns the data that you retrieve from the NYPD data source.
First, at the top of the src/index.ts
file, add the following imports:
import { addComplaint, addLocation, addSuspect, addVictim, addLink } from './result-building';
import { requestData } from './data-service';
Along with the schema link types:
const { Locatedat, Suspectof, Victimof } = schema.linkTypes;
In the service callback, you must first make a request to fetch data using requestData
, and then loop through the complaints that you retrieve to build up the result set. For example, update the NYPD Connector: Get all service with the following:
addService(
{
id: 'getAll',
name: 'NYPD Connector: Get all',
description: 'A service that retrieves all data.',
},
async ({ result }) => {
// The maximum number of rows returned from the NYPD complaint dataset
const data = await requestData({ $limit: '100' });
for (const datum of data) {
const locationEntity = addLocation(datum, result);
const complaintEntity = addComplaint(datum, result);
const suspectEntity = addSuspect(datum, result);
const victimEntity = addVictim(datum, result);
addLink(Locatedat, datum.cmplnt_num, complaintEntity, locationEntity, result);
addLink(Victimof, datum.cmplnt_num, victimEntity, complaintEntity, result);
addLink(Suspectof, datum.cmplnt_num, suspectEntity, complaintEntity, result);
}
}
);
Note: You can query the data source using SoQL (Socrata Query Language), which supports a number of configuration parameters. When executing requestData
, set a $limit
parameter to restrict the number of records retrieved. It's good to keep this value small to reduce the response time of each request until you are more comfortable with SoQL.
Reload the connector configuration in i2 Analyze
To make the new service available, you must reload the connector so that i2 Analyze picks up the configuration changes. Just like when you deployed the connector for the first time, you can use the Admin Console.
Open a web browser and navigate to
https://i2analyze.eia:9443/opal/admin#/connectors
.If you are prompted to log in, enter
Jenny
andJenny
as the username and password.Click the Reload gateway button.
Investigate in Analyst's Notebook
Now you can see what happens when you use the connector from the Analyst's Notebook desktop client.
Open Analyst's Notebook and log in when prompted.
Click the External Searches button in the ribbon. Find the query named NYPD Connector: Get all.
Click Open to run the service, which now queries the data source and returns results.
Next steps
Next, you can configure your own parameterized search.