IARPA RFI – Tapping Security Cameras for Computer Vision and Facial Recoginition Algorithm Training Datasets

Home / Articles / External Non-Government


April 1, 2019 | Originally published by Date Line: April 1 on

For computer vision and facial recognition systems to work reliably, they need training datasets that approximate real-world conditions. So far, researchers have had access to only a small number of image datasets, many of which are heavily populated with still pictures of fair-skinned men. This limitation impacts the accuracy of the technology when it comes across types of images it”s not familiar with – those of women or people of color, for instance.

Another challenge is related to the varying quality of the images on video feeds available from surveillance cameras. Often the cameras” scope and angle, as well as the lighting or weather during a given recording, make it difficult for law enforcement to track or re-identify people from security camera footage as they try to reconstruct crimes, protect critical infrastructure and secure special events.

To help solve this problem, the Intelligence Advanced Research Projects Activity has issued a request for information regarding video data that will help improve computer vision research in multicamera networks. IARPA is seeking capability statements for an annotated video collection of 960 hours that includes:

Data collected over multiple days with varying illumination from a network of at least 20 cameras with varying positions, views, resolutions and frame rates that include both overlapping and non-overlapping fields of view.
Data captured over 10,000 sq. meters in urban and semi-urban environments with multiple intersections, buildings entrances/exits and pedestrian foot traffic as well as signs, vehicles, trees and other obstructions.
Data involving a minimum of 5,000 pedestrians and at least 200 subject volunteers given instructions on how to behave and/or where to go in the camera network.

Related Links:

IBM Diversity in Faces (DiF) Dataset – DiF is a large and diverse dataset that seeks to advance the study of fairness and accuracy in facial recognition technology. The first of its kind available to the global research community, DiF provides a dataset of annotations of one million human facial images.

Flickr (Multimedia Commons) YFCC100M Core Dataset – YFCC100M is the largest publicly and freely useable multimedia collection, containing  the metadata of around 99.2 million photos and 0.8 million videos from Flickr, all of which were shared under one of the various Creative Commons licenses.

RFI Information:

FBO Solicitation: Camera Network Research Data Collection

Solicitation Number: IARPA-RFI-19-06
Agency: Office of the Director of National Intelligence
Office: Intelligence Advanced Research Projects Activity
Response Date: May 10, 2019 12:00 pm Eastern

Synopsis: This RFI seeks capability statements relating to the collection of research data from multi-camera video networks in support of computer vision research. Over the past five years, there have been notable advances in computer vision approaches to facilitate tracking and re-identification of persons in security camera networks. However, the primary datasets available to the research community for algorithm training and performance evaluations, while incredibly valuable, are somewhat limited in subject count, camera network scope, and environmental factors, resulting in a disconnect between the data being leveraged by researchers and the types of video data that would exist in actual video networks utilized by public safety and law enforcement entities. Further research in the area of computer vision within multi-camera video networks may support post-event crime scene reconstruction, protection of critical infrastructure and transportation facilities, military force protection, and in the operations of National Special Security Events.