Work Description

Title: PedX Dataset Open Access Deposited

h
Attribute Value
Methodology
  • Data Acquisition: The PedX dataset contains more than 5,000 pairs of high-resolution stereo images with 2, 500 frames of 3D LiDAR pointclouds. The cameras and LiDAR sensors are calibrated and time synchronized. Three four-way stop intersections with heavy pedestrian-vehicle interaction were selected for data collection. Cameras were installed on the roof of the car to obtain driver-perspective images. To cover all four crosswalks at an intersection, the images were captured by two pairs of stereo cameras – one pair facing forward and another facing the incoming road from the left. The dataset includes more than 14,000 pedestrian models with a distance of 5-45m from the cameras and reliable 2D and 3D labels are provided for each instance.

  • Sensor Setup: 4 Laserscanners: Velodyne HDL-32E 4 Color cameras, 12 Megapixels: Allied Vision Manta G-1236C 4 Lenses, 12mm: V1228-MPY

  • Calibration: Camera-to-camera calibration for stereo camera pairs using MATLAB Camera Calibration Toolbox. Camera-to-LiDAR calibration using 50 manually selected constraints between 3D LiDAR points and 2D pixel locations.

  • Scene Selection: Capture sites and times chosen for maximum traffic, complexity, and lighting and weather variation. Focus on 4-way stop intersections without traffic signals.

  • Data Formatting: Image Compression: Raw Bayer 12-bit images converted into compressed PNG/JPEG image formats. Rectified images provided.
Description
  • PedX is a large-scale multi-modal collection of pedestrians at complex urban intersections. The dataset provides high-resolution stereo images and LiDAR data with manual 2D and automatic 3D annotations. The data was captured using two pairs of stereo cameras and four Velodyne LiDAR sensors.
Creator
Depositor
  • ramv@umich.edu
Contact information
Discipline
Date coverage
  • 2017-11-30 to 2017-12-12
Citations to related material
Resource type
Last modified
  • 05/09/2023
Published
  • 05/09/2023
DOI
  • https://doi.org/10.7302/0fv2-nn47
License
To Cite this Work:
Kim, W., Ramanagopal, M. S., Barto, C., Yu, M., Rosaen, K., Goumas, N., Vasudevan, R., Johnson-Roberson, M. (2023). PedX Dataset [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/0fv2-nn47

Relationships

This work is not a member of any user collections.

Files (Count: 4; Size: 262 GB)

Date: 10 Sep 2018

Dataset Title: PedX Dataset

Dataset Creators: Ram Vasudevan, Wonhui Kim, Manikandasriram Srinivasan Ramanagopal, Charles Barto, Ming-Yuan Yu, Karl Rosaen, Nick Goumas and Matthew Johnson-Roberson

Dataset Contact: Ram Vasudevan, ramv@umich.edu

Key Points:

Dataset collected to analyze pedestrian behavior in complex urban environments.
Multiple sensors, including LiDAR scanners and cameras, used to capture data from 4-way stop intersections.
Scenes were selected to maximize traffic, complexity, and lighting and weather variation.
The PedX dataset includes data on both pedestrians and vehicles at intersections, and can be used for various research purposes related to transportation, computer vision, and machine learning.
The dataset was collected in a controlled environment to ensure the safety of all participants and to control for external variables that could affect the results.
The dataset includes a variety of data types, such as 2D and 3D image data, LiDAR data, and corresponding labels for each data point.
The dataset is publicly available for researchers to download and use for their own research purposes.
The research team conducted several experiments and analyses using the PedX dataset, including studies on pedestrian-vehicle interaction and studies on the accuracy of pedestrian detection algorithms.
The results of the research using the PedX dataset can inform the development of autonomous vehicles and pedestrian safety measures in urban environments.

Research Overview:

PedX is a large-scale multi-modal collection of pedestrians at complex urban intersections. The dataset provides high-resolution stereo images and LiDAR data with manual 2D and automatic 3D annotations. The data was captured using two pairs of stereo cameras and four Velodyne LiDAR sensors. The data was captured at three 4-way stop intersections in a downtown area. The selected intersections provided opprtunity to study complex interactions between pedestrians and vehicles with varying lighting and weather conditions.

Methodology:

PedX is a large-scale multi-modal collection of pedestrians at complex urban intersections. The dataset provides high-resolution stereo images and LiDAR data with manual 2D and automatic 3D annotations. The data was captured using two pairs of stereo cameras and four Velodyne LiDAR sensors.

Data Acquisition:

The PedX dataset contains more than 5,000 pairs of high-resolution stereo images with 2, 500 frames of 3D LiDAR pointclouds. The cameras and LiDAR sensors are calibrated and time synchronized. Three four-way stop intersections with heavy pedestrian-vehicle interaction were selected for data collection. Cameras were installed on the roof of the car to obtain driver-perspective images. To cover all four crosswalks at an intersection, the images were captured by two pairs of stereo cameras – one pair facing forward and another facing the incoming road from the left. The dataset includes more than 14,000 pedestrian models with a distance of 5-45m from the cameras and reliable 2D and 3D labels are provided for each instance.

Sensor Setup:

4 Laserscanners: Velodyne HDL-32E
4 Color cameras, 12 Megapixels: Allied Vision Manta G-1236C
4 Lenses, 12mm: V1228-MPY

Calibration:

Camera-to-camera calibration for stereo camera pairs using MATLAB Camera Calibration Toolbox.
Camera-to-LiDAR calibration using 50 manually selected constraints between 3D LiDAR points and 2D pixel locations.

Scene Selection:

Capture sites and times chosen for maximum traffic, complexity, and lighting and weather variation.
Focus on 4-way stop intersections without traffic signals.

Data Formatting:

Image Compression: Raw Bayer 12-bit images converted into compressed PNG/JPEG image formats.
Rectified images provided.

Distance: Histograms show distributions of pedestrian distances.

Orientation: Polar histograms display distributions of pedestrian body orientations relative to the world reference frame.

Links to Relevant Sources:
https://deepblue.lib.umich.edu/data/concern/data_sets/6h440s98b?locale=en. University of Michigan - Deep Blue dataset
http://pedx.io/ The Pedx Website,
https://arxiv.org/pdf/1809.03605.pdf The PedX paper: Benchmark Dataset for Metric 3D Pose Estimation of Pedestrians in Complex Urban Intersections
https://github.com/umautobots/pedx The PedX dataset scripts and loaders.

Utilizing the data:
We provide a demo script to assist in visualizing the data. Follow the steps below to visualize the data once you have downloaded it from the Deep Blue Repository.

1. Obtain the demo scripts from our repository https://github.com/umautobots/pedx
2. Ensure the python requirements are met from the requirements.txt file ( pip install -r requirements.txt )
3. Edit the demo.py file and modify the capture_date variable to specify the date.
4. If the data is not in the ./data folder, modify the basedir variable to point to the folders location.
5. Modify the frame_id if desired to the stating frame.
6. Modify the savedir variable if the data saving location will be different then the default location.
7. Execute the demo.py script to load and visualize the data from the datasets.

Files contained here:

The dataset includes raw and rectified images and point cloud data captured from the selected intersections. The data is organized in folders based on the location and date of capture. Each folder contains a series of image and point cloud files, along with metadata describing the sensor setup and calibration. The folders show divisions based on each each sequence. The PedX dataset contains more than 5,000 pairs of high-resolution stereo images with 2,500 frames of 3D LiDAR pointclouds.

1. Data contains the rectified images, point clouds, calibrated parameters and frame metadata.

2. All the manual/automatic annotations are in data/labels. 2D/3D annotations are provided in an instance-level.

3. We provide 3 video sequences captured at different 4-way stop intersections on different dates.

4. Capture dates: 20171130T2000, 20171207T2024, 20171212T2030

a. The cameras are color-coded for convenience.
b. Cameras: ylw79D0, red707B, blu79CF, grn43E3

5. Stereo pairs: ylw79D0-red707B, blu79CF-grn43E3 (left-right camera)

6. We provide a simple Python demo script: demo.py. pedx provides Python helper functions to load and visualize the data. We have tested the script with the Python packages listed in requirements.txt

The data tree for this dataset should look like this when fully extracted onto your system.

pedx/
├── pedx/
├── data/
├── images/
│ ├── 20171130T2000/
│ ├── 20171207T2024/
│ └── 20171212T2030/
│ ├── ylw79D0/
│ ├── red707B/
│ ├── blu79CF/
│ └── grn43E3/
│ └── 20171212T2030_grn43E3_0001234.jpg
├── pointclouds/
│ ├── 20171130T2000/
│ ├── 20171207T2024/
│ └── 20171212T2030/
│ └── 20171212T2030_0001234.ply
├── labels/│ | ├── 2d/
| │ ├── 20171130T2000/
| │ ├── 20171207T2024/
| │ └── 20171212T2030/
| └── 3d/
| ├── smpl/
| │ ├── 20171130T2000/
| │ ├── 20171207T2024/
| | └── 20171212T2030/
| └── segment/
| ├── 20171130T2000/
| ├── 20171207T2024/
| └── 20171212T2030/
├── calib/
│ ├── calib_cam_to_cam_blu79CF-grn43E3.txt
│ ├── calib_cam_to_cam_blu79CF-red707B.txt
│ ├── calib_cam_to_range_blu79CF.txt
│ └── calib_cam_to_range_ylw79D0.txt
└── timestamps/
├── timestamps-images-20171130T2000.txt
├── timestamps-images-20171207T2024.txt
├── timestamps-images-20171212T2030.txt
├── timestamps-pointclouds-20171130T2000.txt
├── timestamps-pointclouds-20171207T2024.txt
└── timestamps-pointclouds-20171212T2030.txt

Related publication(s):

[1] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-person 2d pose estimation using part affinity fields,” in CVPR, 2017.
[2] D. Mehta, S. Sridhar, O. Sotnychenko, H. Rhodin, M. Shafiei, H.-P. Seidel, W. Xu, D. Casas, and C. Theobalt, “Vnect: Real-time 3d human pose estimation with a single rgb camera,” arXiv preprint arXiv:1705.01583, 2017.
[3] X. Zhou, Q. Huang, X. Sun, X. Xue, and Y. Wei, “Towards 3d human pose estimation in the wild: a weakly-supervised approach,” in IEEE International Conference on Computer Vision, 2017.
[4] X. Zhou, M. Zhu, S. Leonardos, K. G. Derpanis, and K. Daniilidis, “Sparseness meets deepness: 3d human pose estimation from monocular video,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4966–4975.
[5] C.-H. Chen and D. Ramanan, “3d human pose estimation= 2d pose estimation+ matching,” arXiv preprint arXiv:1612.06524, 2016.
[6] J. Martinez, R. Hossain, J. Romero, and J. J. Little, “A simple yet effective baseline for 3d human pose estimation,” in ICCV, 2017.
[7] L. Sigal, A. O. Balan, and M. J. Black, “Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion,” International journal of computer vision, vol. 87, no. 1, pp. 4–27, 2010.
[8] C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu, “Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 7, pp. 1325–1339, jul 2014.
[9] E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, and B. Schiele, “Deepercut: A deeper, stronger, and faster multi-person pose estimation model,” in European Conference on Computer Vision (ECCV), May 2016.
[10] A.-I. Popa, M. Zanfir, and C. Sminchisescu, “Deep multitask architecture for integrated 2d and 3d human sensing,” arXiv preprint arXiv:1701.08985, 2017.
[11] D. Mehta, H. Rhodin, D. Casas, O. Sotnychenko, W. Xu, and C. Theobalt, “Monocular 3d human pose estimation in the wild using improved cnn supervision.”
[12] D. Tome, C. Russell, and L. Agapito, “Lifting from the deep: Convolutional 3d pose estimation from a single image,” CVPR 2017 Proceedings, pp. 2500–2509, 2017.
[13] V. Ramakrishna, T. Kanade, and Y. Sheikh, “Reconstructing 3d human pose from 2d image landmarks,” in European Conference on Computer Vision. Springer, 2012, pp. 573–586.
[14] F. Bogo, A. Kanazawa, C. Lassner, P. Gehler, J. Romero, and M. J. Black, “Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image,” in Computer Vision – ECCV 2016, ser. Lecture Notes in Computer Science. Springer International Publishing, Oct. 2016.
[15] I. Akhter and M. J. Black, “Pose-conditioned joint angle limits for 3d human pose reconstruction,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1446–1455.
[16] M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black, “SMPL: A skinned multi-person linear model,” ACM Trans. Graphics (Proc. SIGGRAPH Asia), vol. 34, no. 6, pp. 248:1–248:16, Oct. 2015.
[17] C. Lassner, J. Romero, M. Kiefel, F. Bogo, M. J. Black, and P. V. Gehler, “Unite the people: Closing the loop between 3d and 2d human representations,” arXiv preprint arXiv:1701.02468, 2017.
[18] A. Kanazawa, M. J. Black, D. W. Jacobs, and J. Malik, “End-toend recovery of human shape and pose,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[19] I. K. Riza Alp Guler, Natalia Neverova, “Densepose: Dense human pose estimation in the wild,” arXiv, 2018.
[20] K. Rematas, I. Kemelmacher-Shlizerman, B. Curless, and S. Seitz, “Soccer on your tabletop,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4738–4747.
[21] F. Ofli, R. Chaudhry, G. Kurillo, R. Vidal, and R. Bajcsy, “Berkeley mhad: A comprehensive multimodal human action database,” in Applications of Computer Vision (WACV), 2013 IEEE Workshop on. IEEE, 2013, pp. 53–60.
[22] “CMU Graphics Lab Motion Capture Database,” http://mocap.cs.cmu.edu/.
[23] M. F. Ghezelghieh, R. Kasturi, and S. Sarkar, “Learning camera viewpoint using cnn to improve 3d body pose estimation,” in 3D Vision (3DV), 2016 Fourth International Conference on. IEEE, 2016, pp. 685–693.
[24] W. Chen, H. Wang, Y. Li, H. Su, Z. Wang, C. Tu, D. Lischinski, D. Cohen-Or, and B. Chen, “Synthesizing training images for boosting human 3d pose estimation,” in 3D Vision (3DV), 2016 Fourth International Conference on. IEEE, 2016, pp. 479–488.
[25] G. Varol, J. Romero, X. Martin, N. Mahmood, M. J. Black, I. Laptev, and C. Schmid, “Learning from synthetic humans,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), 2017.
[26] G. Rogez and C. Schmid, “Mocap-guided data augmentation for 3d pose estimation in the wild,” in Advances in Neural Information Processing Systems, 2016, pp. 3108–3116.
[27] D. Mehta, H. Rhodin, D. Casas, P. Fua, O. Sotnychenko, W. Xu, and C. Theobalt, “Monocular 3d human pose estimation in the wild using improved cnn supervision,” in 3D Vision (3DV), 2017 Fifth International Conference on. IEEE, 2017. [Online]. Available: http://gvv.mpi-inf.mpg.de/3dhp dataset
[28] G. Pavlakos, X. Zhou, K. G. Derpanis, and K. Daniilidis, “Harvesting multiple views for marker-less 3d human pose annotations,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, 2017, pp. 1253–1262.
[29] Y. Huang, F. Bogo, C. Classner, A. Kanazawa, P. V. Gehler, I. Akhter, and M. J. Black, “Towards accurate markerless human shape and pose estimation over time,” in International Conference on 3D Vision(3DV), 2017.
[30] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, et al., “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends® in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011.
[31] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask r-cnn,” ´ arXiv preprint arXiv:1703.06870, 2017.
[32] H. O. Jacobs, O. K. Hughes, M. Johnson-Roberson, and R. Vasudevan, “Real-time certified probabilistic pedestrian forecasting,” IEEE Robotics and Automation Letters, vol. 2, no. 4, pp. 2064–2071, 2017.

Use and Access:

A simple Python demo script is provided to show the usage: demo.py. pedx provides Python helper functions to load and visualize the data. We have tested the script with the Python packages listed in requirements.txt. This can be obtain from the github for the dataset https://github.com/umautobots/pedx

To Cite Data:

Ram Vasudevan, Wonhui Kim, Manikandasriram Srinivasan Ramanagopal, Charles Barto, Ming-Yuan Yu, Karl Rosaen, Nick Goumas and Matthew Johnson-Roberson
(2018). Pedestrian Behavior Dataset from Complex Urban Scenes https://deepblue.lib.umich.edu/data/concern/data_sets/6h440s98b?locale=en. University of Michigan - Deep Blue. http://pedx.io/ https://arxiv.org/pdf/1809.03605.pdf

Download All Files (To download individual files, select them in the “Files” panel above)

Total work file size of 262 GB is too large to download directly. Consider using Globus (see below).

Files are ready   Download Data from Globus
Best for data sets > 3 GB. Globus is the platform Deep Blue Data uses to make large data sets available.   More about Globus

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.