Connect to an external data source
Here, you will connect to the NYPD Complaint Dataset as your external data source and marshal the data into entities, links, and properties so that you can return results which can be displayed in Analyst's Notebook.
Again, use the troubleshooting guide if you need to.
Create a Socrata app token
You need an app token that will allow you to make unlimited requests to Socrata's API (within reason). If you don't use an app token, the APIs will throttle by IP address.
Visit this link to register your account.
After you log in, navigate to https://data.cityofnewyork.us/profile/edit/developer_settings.
On subsequent visits, you can reach this page again by following these steps:
Click your name in the top right of the header bar, and then select My Profile.
Click the pencil icon next to the Your Profile heading.
In the side panel, click Developer Settings.
At the bottom of the page, click Create New App Token, specify your own "Application Name" and "Description", and save.
If you leave the site for any reason, you can always retrieve your app token again by logging into your account again.
Query the external data source
Retrieve the raw data
Java
To apply the app token to the connector application, we can specify it in the
application.properties
file. Update theapplication.properties
file to include the following properties:# Resource URL, for example https://data.cityofnewyork.us/resource/7x9x-zpz6.json socrata.url= # API Token. Create a Socrata account and create an API Token. Paste it here socrata.api.token=
Here you should specify the NYPD Complaint Dataset API resource for the
socrata.url
key as the URL in the comment, and your Socrata API Token for thesocrata.api.token
key.Create a
transport
folder under therest\externalsource
directory. This will be used to hold the Socrata response classes.- rest - externalsource - transport - SocrataClient.java
Under
rest\externalsource\transport
, create a file calledSocrataResponseData.java
and populate it with this:package com.i2group.demo.rest.externalsource.transport; import lombok.ToString; /** * A POJO (Plain Old Java Object) for response objects from Socrata. These represent the fields in * the NYPD database. */ @ToString public class SocrataResponseData { // You can map properties using the @JsonProperty("field_name") annotation if you want to rename them, // or directly refer to them to use the transport names. // // The Jackson ObjectMapper can deserialize data into appropriate types in many cases. // For example: // "stringField" : "This is a string" => public String stringField; // "dateAndTimeField" :"2018-01-12T00:00:00.000" => public LocalDateTime dateAndTimeField; // "numberField" : 1 => public int numberField; // "optionalNumberField" : null => public Integer optionalNumberField; // TODO: Add response fields to this class. }
This class represents a single object from the Socrata data source.
When we make a request to the Socrata data source, the response will contain a list of
SocrataResponseData
. Let's create a class that represents that! In therest\externalsource\transport
directory, create a class calledSocrataResponse.java
and populate it with this code:package com.i2group.demo.rest.externalsource.transport; import java.util.ArrayList; /** Response class for Socrata */ public class SocrataResponse extends ArrayList<SocrataResponseData> {}
Now we can start to build the request to the external data source. In the
ExternalConnectorDataService.java
class, add a constructor to initialize theSocrataClient
class with the Socrata values added in the properties file.private final SocrataClient socrataClient; /** * Constructor used to initialize the Socrata client and factory objects used to retrieve complaint data. * * @param baseUrl The URL of the NYPD complaint dataset. * @param apiToken The API token used to access the NYPD complaint dataset. */ @Autowired public ExternalConnectorDataService( @Value("${socrata.url}") String baseUrl, @Value("${socrata.api.token}") String apiToken) { socrataClient = new SocrataClient(baseUrl, apiToken); }
You'll notice that the
@Value
class was not automatically imported. Resolve this by adding the following to the import section:import org.springframework.beans.factory.annotation.Value;
To construct the request to the external source, replace the
retrieveTestDataFromExternalSource()
method with the following:public I2ConnectData retrieveTestDataFromExternalSource() { final List<I2ConnectEntityData> entities = new ArrayList<>(); final List<I2ConnectLinkData> links = new ArrayList<>(); // TODO: Make request to external source and map data back to your own schema // Set some request parameters // See https://dev.socrata.com/docs/queries/ for SODA query params final Map<String, Object> params = new HashMap<>(); params.put("limitValue", 1); // Only returning 1 entity for the moment. increase when ready final String url = "?$limit={limitValue}"; // Make the request and map the whole response body as a string so that you can // see what is returned // TODO: Remove this since it's just for debugging System.out.println(socrataClient.get(url, String.class, params)); // Make the request, map the response to an object model and print each object final SocrataResponse response = socrataClient.get(url, SocrataResponse.class, params); response .forEach( entry -> { // TODO: Map response data to your own object model System.out.println("item: " + entry); }); final I2ConnectData connectorResponse = new I2ConnectData(); connectorResponse.entities = entities; connectorResponse.links = links; return connectorResponse; }
You need to implement the ExternalConnectorDataService
and SocrataResponseData
classes so that they retrieve data from the NYPD Complaint Dataset and use it to create entities and links to be returned to i2 Analyze. It should not be necessary to modify the example SocrataClient
.
The dataset can be queried using SoQL (Socrata Query Language). To do this, in ExternalConnectorDataService.retrieveTestDataFromExternalSource()
, you must construct a URL with specified parameters (if necessary) to retrieve the data. By default, a $limit
parameter has been set to the value of 1
to restrict the number of records retrieved. It's best to keep this value small to reduce the response time of each request until you are more comfortable with SoQL.
Python
To apply the app token to the connector application, create a file named
application.yml
under thestatic
directory and add the following contents to the file:socrata: url: # Resource URL, for example https://data.cityofnewyork.us/resource/7x9x-zpz6.json token: # Replace with Socrata API token
Here you should specify the NYPD Complaint Dataset API resource for the
socrata.url
key as the URL in the comment and your Socrata API Token for thesocrata.api.token
key.In
service.py
, add the following imports at the top of the file:import yaml import requests import logging logger = logging.getLogger(__name__)
To construct the request to the external source, replace the
query_external_datasource()
function with the following:def query_external_datasource(): """ Builds the request URL and queries the external data source using specified parameters. Raises: FileNotFoundError: If the application.yml file does not exist / could not be found. Returns: dict: JSON response containing all records. """ entities = [] links = [] # TODO: Make request to external source with open('static/application.yml') as yml_file: config = yaml.safe_load(yml_file) base_url = config['socrata']['url'] api_token = config['socrata']['token'] if api_token is None: logger.warn('WARNING: apiToken is not specified, requests may be rejected!') limit = 1 # Only returning 1 entity for the moment. increase when ready # Set some request parameters # See https://dev.socrata.com/docs/queries/ for SODA query params request_url = f"{base_url}?$limit={limit}" x = requests.get(request_url, headers = { 'X-App-Token': api_token }) records = x.json() for entry in records: # TODO: Map response data to your own object model # You can remove the following print line as it's only for debugging print(entry) response = I2ConnectData( entities=entities, links=links ) return response.to_dict()
You need to implement the query_external_datasource
function in service.py
so that it retrieves data from the NYPD Complaint Dataset and uses it to create entities and links to be returned to i2 Analyze.
The dataset can be queried using SoQL (Socrata Query Language). To do this, you must construct a URL with specified parameters (if necessary) to retrieve the data. By default, a $limit
parameter has been set to the value of 1
to restrict the number of records retrieved. It's best to keep this value small to reduce the response time of each request until you are more comfortable with SoQL.
(Optional) Verify the data
It's worth testing that you are successfully querying the data and returning results. Print the returned value to the console and check that it matches with the data you see when you make a request to the acquire endpoint via Postman.
Marshal the data to objects
To make it easier to create entities and links using the data retrieved, you can create a class to represent a single row of the dataset. This will have a field for each of the columns of the data. You can then write a function that serializes the incoming data into a collection of these objects.
Note that in Java, there exists a library which makes this process much easier: jackson-annotations
.
You might want to add source references to the entities and links that are returned by your connector. This allows users trace the source of the data represented by those entities and links. For information on adding source references, see here.
(Optional) Verify your marshalling function
Test that you are successfully marshalling the data. You should be able to assert against the properties of your object to verify the expected and actual results are equal.
Extract entities and links from objects
You can create entities and links from the objects that represent rows of the dataset and define their properties using the relevant fields.
Implement this extraction; deriving entities from each record as well as establishing links between them.
Create a response object with a list of entities and links to be returned. You need to take care not to duplicate entities in the list. Also take care again to assign property values in the correct format. Refer to the data model examples again if you need.
(Optional) Validate your return response
Verify that the response returned from your function is valid and is as expected.
View results in Analyst's Notebook
You should now be able to log into Analyst's Notebook and run your query. If there are any errors, you may want to check that your schema is in the right shape, that your data is clean and that there are no missing values.
Next steps
Next, you can configure your own parameterized search.