Dataset – IBM Diversity in Faces

Home / Articles / External Non-Government


April 20, 2019 | Originally published by Date Line: April 20 on

The Diversity in Faces (DiF) is a large and diverse dataset that seeks to advance the study of fairness and accuracy in facial recognition technology.The first of its kind available to the global research community, DiF provides a dataset of annotations of one million human facial images.

How do we measure and ensure diversity for human faces in AI systems?

We are familiar with how faces differ by age,gender,and skin tone,and how different faces can vary across some of these dimensions. But,as prior studies have shown,these dimensions are not adequate for characterizing the full diversity of human faces. Dimensions like face symmetry,facial contrast,the pose the face is in,the length or width of the face’s attributes (eyes,nose,forehead,etc.) are also important. For the facial recognition systems to perform as desired – and the outcomes to become increasingly accurate – training data must be diverse and offer a breadth of coverage. For example,the training datasets must be large enough and different enough that the technology learns all the ways in which faces differ to accurately recognize those differences in a variety of situations.The images must reflect the distribution of features in faces we see in the world.

To help accelerate the study of diversity and coverage of data for AI facial recognition systems, IBM Research has released a large and diverse dataset called Diversity in Faces (DiF) to advance the study of fairness and accuracy in facial recognition technology.