Live Datasets for Visual AI

People in Public - 175k


The People in Public dataset is a consented large scale video dataset of people doing things in public places. Our team has pioneered the use of a custom designed mobile app that combines video collection, activity labeling and bounding box annotation into a single step. Our goal is to make collecting annotated video datasets as easily and cheaply as recording a video. Currently, we are collecting a dataset of the MEVA classes ( This package provides a release of this dataset, containing 184,379 annotated activity instances collected by over 150 subjects in 44 countries around the world.


This dataset contains 184,379 video clips of 66 classes of activities performed by people in public places. The activity labels are subsets of the 37 activities in the Multiview Extended Video with Activities (MEVA) dataset and is consistent with the Activities in Extended Video (ActEV) challenge.


Release summary

This release was curated to export PIP-175k with additional context, to:


Follow the installation instructions for vipy. We recommend

pip install vipy[all]

to include a fast JSON parser (ujson) for loading large ground truth annotations.

Unpack pip_175k.tar.gz in /path/to/, then:

import vipy
cd /path/to/pip_175k
pip = vipy.util.load('pip_175k.json')  


v = pip[0]  # first video in the list   # display unannotated video   # generate video annotations and show video
v.quicklook().show()   # display video summary image
v[0].savefig().saveas('out.png')  # save annotated first frame of first video, and save to a PNG
v.tracks()  # tracks ID and tracks in this video
v.activities()  # activity ID and activities in this video
v_doors = [v for v in pip if 'door' in v.category()]  # only videos with door categories
categories = set([v.category() for v in pip])  # set of pip categories
d_pip2meva = vipy.util.load('categories_pip_to_meva.pkl')  # category mapping
d_category_to_counts = vipy.util.countby(pip, lambda v: v.category())

Toolchain Exports

v.csv('/path/to/out.csv')  # export annotations for this video as flat CSV file (with header)
pipcsv = [v.csv() for v in pip]  # export all annotations for this dataset as list of tuples
v.json()  # export this annotated video as json 
v.torch()   # export frames as torch tensor
v.numpy()  # export frames as numpy array
labels = [(labels, im) for (labels, im) in v.labeled_frames()]  # framewise activity labels for multi-label loss
v.mindim(256).randomcrop( (224,224) ).torch(startframe='random', length=64)   # change the minimum dimension of the video to (and scale annotations), take random square center crop 
    			      					     		  # and export as a torch tensor of size 1x64x224x224 starting from a random start frame. 
mp4file = v.filename()  # absolute path the the MP4 video file                                                      
mp4file_resized = v.resize(cols=256).saveas('resized.mp4').filename() # absolute path the resized MP4 video file                                                      

If you are training with this dataset, we recommend following this demo to generate framewise activity labels and tensors and use the best practices.

Alternatively, contact us and we can work with you to export a dataset to your specifications that can be imported directly by your toolchain.

PIP Collection Notes


This temporal padding may result in negative start times for some activities.

Best Practices for Training

Notebook demo [html][ipynb] showing best practices for using the PIP-175k dataset for training.


Creative Commons Attribution 4.0 International (CC BY 4.0)

Every subject in this dataset has consented to their personally identifable information to be shared publicly for the purpose of advancing computer vision research. Non-consented subjects have their faces blurred out.


Supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior/ Interior Business Center (DOI/IBC) contract number D17PC00344. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DOI/IBC, or the U.S. Government.


Visym Labs <>