<aside> 🙌 Volunteer opportunities regarding the ETL Pipeline will materialize after the initial infrastructure has been completed - likely in early September 2022. We intend to create an open source project for the ETL Pipeline and Classifyr tagging interface with tasks well suited for volunteer developers.

</aside>

In order to accommodate a large number of data sources and their disparate formats, an ETL (Extract Transform Load) Pipeline is required. This pipeline will be responsible for automatically extracting data from its source, parsing the records into a machine-readable format, transforming it to a common schema, and mapping call types to their APCO standard counterpart (as defined by Classifyr).

Cost Calculations

Cost estimates are based on an estimated 1.3 GB per data set year, and 1 year’s worth of data from x data sets, depending on the phase:

The data sets that we’ve explored to date come from larger cities, making it difficult to determine the actual total data size for the full-scale 8,000 call center milestone. As a result, the numbers used here are based on the data that we have today and are likely to come down.

Cost estimates do not include the cost of hosting and operating Classifyr.

** The “Graybike phase” refers to the initial buildout by contractor graybike.com

OpenSearch + Logstash

Logstash is a free and open server-side data processing pipeline that ingests data from a multitude of sources and allows for per-source transformations.

https://lucid.app/publicSegments/view/1661e79e-9e18-4552-a843-4f7b77a89776/image.png

AWS OpenSearch is a managed OpenSearch (a fork of Elasticsearch) implementation that can be launched quickly with various security controls and integrations with various AWS and external tools and services. OpenSearch includes OpenSearch Dashboards (a fork of Kibana) that can used for quick analysis of data with plugins available to expand its capabilities.

Using a cluster of Logstash containers, we can both pull data from data sources and accept data through an API. Logstash has a number of options included for data input, transformations, and output, along with a large number of plugins and support for scripted options. This would allow Logstash to communicate directly with Classifyr to get updated field and call type mappings.

Community Engagement

Although we can accept information via Classifyr to automate much of the process, volunteers from the network will still be able to contribute Logstash configurations to be managed by a system such as Ansible or Puppet.

Access to OpenSearch Dashboards can be provided using SAML or a user store. Fine-grained access controls allow index, field, and data-level permissions. Additionally, many data science tools can connect to OpenSearch allowing future R911 partners to use what they are most comfortable with.

Cost Estimate