[PPT] - Making sense of 3D data Nico Blodow blodow@cs.tum.edu Intelligent PowerPoint Presentation

SLIDE 1

Making sense of 3D data

Nico Blodow blodow@cs.tum.edu Intelligent Autonomous Systems, TUM, Germany June 14, 2012

SLIDE 2

Motivation

Central question in many 3D perception applications:

How can we – at all times – know what is going on around us?

SLIDE 3

Motivation

Central question in many 3D perception applications:

How can we – at all times – know what is going on around us?

Focus of my work:

Dynamic Scene Perception and Spatio-temporal Memory for Robot Manipulation

SLIDE 4

Motivation

In service robotics especially, we have little to no control over the environment:

Wide range of objects:

textured, non-textured
3D objects, flat objects (cutlery, paper. . . )
indifferentiable objects (12 equal cups)
state of objects (my cup, empty/full milk
carton. . . )
clutter, occlusions

SLIDE 5

Motivation

In service robotics especially, we have little to no control over the environment:

Wide range of object locations:

table top
containers (cupboards, drawers. . . )
fridge

SLIDE 6

Motivation

In service robotics especially, we have little to no control over the environment:

Motivation

Many approaches:

environment mapping, room / furniture classification
table extents and positions, object catalog, container contents
object detection, reconstruction and classification
object identity resolution, tracking, etc.

Key Challenges

data throughput
dynamic environments
humans
hard constraints on processing times

This means: we need fast as well as general algorithms

SLIDE 8

Outline

1 GPU-Accelerated depth image processing

pcl::cuda Kinect Results

2 Point Cloud Compression

Octree Octree-based PC Compression Detail Component Compression

3 Unstructured Information Management Architecture

Next Best View Room and furniture mapping

SLIDE 9

Past

Current strategies for optimizations:

Downsampling = much less data
Spatial locators / tree structures
Ignoring some problems (online,

humans)

reordering points for cache optimization
"framedropping" / Using slow scanners –

problem: Kinect While these are all good and valid strategies (We can reach processing speeds in the range of seconds)

ur target is < 30ms

Kinect produces VGA × 5 bytes @ 30Hz = 44MB/s! ◮ GPGPU programming

SLIDE 10

pcl::cuda

focus on real-time point cloud processing
implemented in thrust — CUDA template library similar to STL
biggest problem: data transfer between Host (=CPU)

and Device (=GPU)

therefore: all algorithms should be implemented on the GPU to

minimize performance hits

Input data: Kinect Bayer image + depth image

(or of course everything from pcl::io)

SLIDE 11

pcl::cuda

pcl::cuda::io

deals with IO, projection of depth data to 3D, GPU memory transfer methods, Kinect "dediscretization", subcloud extraction etc.

pcl::cuda::nn

neighborhood search; depth-image-based neighborhood search

pcl::cuda::features

contains infrastructure for feature estimation, several implementations for normal estimation

pcl::cuda::sampleconsensus

deals with robust estimation techniques and models. RANSAC and (novel, parallel) MultiSAC estimators, novel optimized plane estimator

SLIDE 12

pcl::gpu

itseez’s reimplementation in “pure” CUDA
kinfu — Kinect Fusion reimplementation
features — normals, spin images, PFH, FPFH, VFH etc.
octree search structures

SLIDE 13

Improving Kinect Data

e.g. Wall (top down view)

SLIDE 14

Improving Kinect Data

Kinect data discretized in disparity

SLIDE 15

Improving Kinect Data

Normal estimation (and all other feature computations) will have errors

SLIDE 16

Improving Kinect Data

If we knew the true geometry, we could compute whether the measured (red) point could have been sampled from that surface (purple point)

SLIDE 17

Improving Kinect Data

We don’t know the model (except for e.g. in RANSAC), but we can assume it to be smooth

SLIDE 18

Improving Kinect Data

→ same parameters:

SLIDE 19

MultiSAC plane estimation

Replace 3-point sample for plane estimation with 1 point +

(smooth/oversmooth) normal

leads to lower nr. of iterations k =

log(1−p) log(1−(1−ǫ)s) 1 create batch of plane hypothesis on GPU by sampling 1 point each 2 iterate (CPU) over k plane hypotheses, compute inliers on GPU 3 after accepting model, each model created from an inlier can be

invalidated easily

4 compare plane equations of accepted model with all other valid

models, only recompute inliers when necessary

SLIDE 20

Performance on NVIDIA GTX 560

CUDA yields remarkable speedup for highly parallel tasks

Example:

openni_camera driver in ROS: 70% CPU usage
OpenNIGrabber in PCL: 30% CPU usage
Our Solution: 3% CPU usage, 3% GPU usage.

CPU(OpenMP) CUDA Disparity to Cloud + smoothing 25 − 35ms 2 − 2.5ms Normal Estimation 250 − 1000ms 0.5ms Fast Normal Estimation 2.5 − 3.5ms < 0.15ms Surface Orienation Segmentation 1s ≈ 100ms Multiple Plane Estimation > 10s1 50 − 200ms

1possibly much longer

SLIDE 21

Using the semantic map in perception

SLIDE 22

Harnessing OpenGL + CUDA interoperability

Using semantic maps for real time semantic segmentation

Normal space, depth image and mask from sensor’s point of view (< 1ms) semantic map (normal space), distances between Kinect data and semantic map, distances filtered (≈ 1ms)

SLIDE 23

Outline

1 GPU-Accelerated depth image processing

pcl::cuda Kinect Results

2 Point Cloud Compression

Octree Octree-based PC Compression Detail Component Compression

3 Unstructured Information Management Architecture

Next Best View Room and furniture mapping

SLIDE 24

Motivation

Point Cloud StreamCompression Point Cloud StreamDecompression Network

Goals/Motivation:

Efficient for real-time processing
General compression approach for unstructured point clouds

(varying size, resolution, density, point ordering)

Exploit spatial sparseness of point clouds
Exploit temporal redundancies in point cloud streams
Keep introduced coding distortion below sensor noise

(Work with Julius Kammerl)

SLIDE 25

Background

c NVIDIA Research

Hierarchical tree data structures can efficiently describe sparse

3D information

Focus on real-time compression favors octree-based point cloud

compression approach

Octree structures enable fast spatial decomposition

SLIDE 26

Octree-based Encoding

Serialized Octree: 00000100 01000001 00011000 00100000 00000100 01000001 00011000 00100000

Root node describes a cubic bounding box which encapsulates all

points

Child nodes recursively subdivide point space
Nodes have up to eight children ⇒ Byte encoding
Point encoding by serializing high-resolution octree structures!

SLIDE 27

Temporal Encoding

Temporarily adjacent point clouds often strongly correlate:

Serialized Octree A: 00000100 01000001 00011000 00100000 00000100 01000001 00011000 00100000 00000100 01000010 00011000 00000010 Serialized Octree B: 00000100 01000010 00011000 00000010

Differentially encode octree structures using XOR:

XOR Encoded Octree B: 00000000 00000011 00000000 00000010

Gain: reduced entropy of the serialized binary data!
Compression using fast range coder (fixed-point version of

arithmetic entropy coder)

SLIDE 28

Results

Experimental results of octree-based point cloud compression

SLIDE 29

Results

Data rate comparison between regular octree compression (gray)

and differential octree compression (black) for 1mm3 resolution

SLIDE 30

Demo - change detection

Applications:

3D (not just 2.5D!) video streaming
Real-time spatial change detection

based on XOR comparison of octree structure

SLIDE 31

Detail encoding

Challenge: With increased octree resolution, complexity grows

exponentially

Solution: Limit octree resolution and encode point detail

coefficients

. 3 0.002 . 4 0.002

Enables trade-off between complexity and compression

performance

Also applicable to point components (color, normals, etc.)

SLIDE 32

Compression Pipeline

Encoding Pipeline: Octree Structure Point Component Encoding Entropy Encoding Point Detail Encoding Position Detail Coefficients Binary Serialization Compressed PC Point Cloud Component Voxel Avg. + Detail Coefficients Decoding Pipeline: Compressed PC Entropy Decoding Point Detail Decoding Point Component Decoding Octree Structure Point Cloud

SLIDE 33

Results

Experimental results for point detail encoding at octree resolution
f 9mm3.
Enables fast real-time encoding with high point precision
Constant run-time with Octree + Point Detail Coding
(Byproduct: octree based search operations, downsampling, point

density analysis, change detection, occupancy maps)

SLIDE 34

Demo - point cloud compression

Point cloud compression demo

SLIDE 35

Outline

1 GPU-Accelerated depth image processing

pcl::cuda Kinect Results

2 Point Cloud Compression

Octree Octree-based PC Compression Detail Component Compression

3 Unstructured Information Management Architecture

Next Best View Room and furniture mapping

SLIDE 36

Plugging everything together

SLIDE 37

Dealing with multimodal data

Robust robot systems have to solve really hard tasks

harness various state-of-the-art methods from different fields
data sources (images, point clouds, web stores. . . ) and data

formats can be very specialized

UIMA (IBM Watson) is designed to deal with unstructured

information created by different experts

→ can this be transferred from the text domain to 2D/3D data and
ntological knowledge?

SLIDE 38

UIMA — Concepts (1/4)

(image courtesy of http://uima.apache.org)

Unstructured information processing and management
Common analysis structure (CAS) holds original raw data, can

hold multiple Subjects of Analysis (SOFAs), multiple Views, Annotations

Collection Readers: Generate CAS with initial document (web

scrapers, sensor streams, ontologies etc.)

SLIDE 39

UIMA — Concepts (2/4)

(image courtesy of http://uima.apache.org)

Analysis Engines: (primitive or aggregate) can enrich the CAS

with Annotations on the different Views/SOFAs

Annotations are stored in Index Repositories for access and

search

all data structures adhere to a language-agnostic hierarchical

type system

SLIDE 40

UIMA — Concepts (3/4)

(image courtesy of http://uima.apache.org)

Analysis Engines (= ensembles of experts) is controlled by flow

controllers → (task/scene/resource adaptation. . . )

CAS Consumers process the annotated CAS for storage or reuse

(e.g. interface to KnowRob!)

MongoDB database collects long term memory, object models,

environment models etc.

SLIDE 41

UIMA — Concepts (3/4)

(image courtesy of http://uima.apache.org)

Collection Engines can be spawned from Java, C++, command

line from a CPE Descriptor

CPM responsible for CAS Management, failure recovery, scale out

SLIDE 42

UIMA — Type System for Perception

subtypes of

Identifiable : storable in MongoDB

Scene consist of

Clusters taken at a certain time

Clusters contain a

point cloud and a list of Annotations

"semantic" type

hierarchy

SLIDE 43

UIMA — Type System for Perception

subtypes of

Identifiable : storable in MongoDB

Scene consist of

Clusters taken at a certain time

Clusters contain a

point cloud and a list of Annotations

"semantic" type

hierarchy

SLIDE 44

Example: Robot Google Goggles

3D sensors can effectively

segment object hypotheses (but resolution too low)

leverage Google Goggles’

multimodal image analysis

corresponding (higher-res)

camera image region uploaded to Goggles

Response: List of things (of

various types) detected in

bject image

SLIDE 45

Example: Robot Google Goggles

SLIDE 46

Example: Robot Google Goggles

User Submitted Logo / Brand Similar Image (+ original site) Text + Translation Product Link (+ Barcodes, Landmarks, etc) Segmentation / Clustering: Kinect-based Tabletop Object Detection

SLIDE 47

Current IAS-UIMA Capabilities

German-Deli scraping
Kinect/PCD File input
ROS → UIMA interface
GPU-based normal estimation, clustering
per-cluster feature estimation (all PCL features)
per-cluster Goggles annotation
storage in MongoDB database

SLIDE 48

Outlook and Future Work

Further integration of UIMA and previous perception systems

(COP)

entity resolution framework (going from clusters/classification to
bject instances)
Integration of / interfacing to KnowRob, CRAM
leverage Machine Learning for control flow, ensemble methods

SLIDE 49

Peer-reviewed Conference Publications (1/2)

[1] Real-time Compression of Point Cloud Streams (Julius Kammerl, Nico Blodow, Radu Bogdan Rusu, Suat Gedikli, Michael Beetz, Eckehard Steinbach), In IEEE International Conference on Robotics and Automation (ICRA), 2012. [2] Autonomous Semantic Mapping for Robots Performing Everyday Manipulation Tasks in Kitchen Environments (Nico Blodow, Lucian Cosmin Goron, Zoltan-Csaba Marton, Dejan Pangercic, Thomas RÃijhr, Moritz Tenorth, Michael Beetz), In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2011. [3] General 3D Modelling of Novel Objects from a Single View (Zoltan-Csaba Marton, Dejan Pangercic, Nico Blodow, Jonathan Kleinehellefort, Michael Beetz), In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2010. [4] Perception and Probabilistic Anchoring for Dynamic World State Logging (Nico Blodow, Dominik Jain, Zoltan-Csaba Marton, Michael Beetz), In 10th IEEE-RAS International Conference on Humanoid Robots, 2010. [5] Model-based and Learned Semantic Object Labeling in 3D Point Cloud Maps of Kitchen Environments (Radu Bogdan Rusu, Zoltan Csaba Marton, Nico Blodow, Andreas Holzbach, Michael Beetz), In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2009. [6] Fast Geometric Point Labeling using Conditional Random Fields (Radu Bogdan Rusu, Andreas Holzbach, Nico Blodow, Michael Beetz), In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2009. [7] Close-range Scene Segmentation and Reconstruction of 3D Point Cloud Maps for Mobile Manipulation in Human Environments (Radu Bogdan Rusu, Nico Blodow, Zoltan Csaba Marton, Michael Beetz), In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2009. [8] Fast Point Feature Histograms (FPFH) for 3D Registration (Radu Bogdan Rusu, Nico Blodow, Michael Beetz), In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Kobe, Japan, May 12-17, 2009. [9] Partial View Modeling and Validation in 3D Laser Scans for Grasping (Nico Blodow, Radu Bogdan Rusu, Zoltan Csaba Marton, Michael Beetz), In 9th IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2009. [10] The Assistive Kitchen – A Demonstration Scenario for Cognitive Technical Systems (Michael Beetz, Freek Stulp, Bernd Radig, Jan Bandouch, Nico Blodow, Mihai Dolha, Andreas Fedrizzi, Dominik Jain, Uli Klank, Ingo Kresse, Alexis Maldonado, Zoltan Marton, Lorenz MÃ˝ usenlechner, Federico Ruiz, Radu Bogdan Rusu, Moritz Tenorth), In IEEE 17th International Symposium on Robot and Human Interactive Communication (RO-MAN), Muenchen, Germany, 2008. (Invited paper.)

SLIDE 50

Peer-reviewed Conference Publications (2/2)

[11] Action Recognition in Intelligent Environments using Point Cloud Features Extracted from Silhouette Sequences (Radu Bogdan Rusu, Jan Bandouch, Zoltan Csaba Marton, Nico Blodow, Michael Beetz), In IEEE 17th International Symposium on Robot and Human Interactive Communication (RO-MAN), Muenchen, Germany, 2008. [12] Functional Object Mapping of Kitchen Environments (Radu Bogdan Rusu, Zoltan Csaba Marton, Nico Blodow, Mihai Emanuel Dolha, Michael Beetz), In Proceedings of the 21st IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Nice, France, September 22-26, 2008. [13] Aligning Point Cloud Views using Persistent Feature Histograms (Radu Bogdan Rusu, Nico Blodow, Zoltan Csaba Marton, Michael Beetz), In Proceedings of the 21st IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Nice, France, September 22-26, 2008. [14] Learning Informative Point Classes for the Acquisition of Object Model Maps (Radu Bogdan Rusu, Zoltan Csaba Marton, Nico Blodow, Michael Beetz), In Proceedings of the 10th International Conference on Control, Automation, Robotics and Vision (ICARCV), Hanoi, Vietnam, December 17-20, 2008. [15] Persistent Point Feature Histograms for 3D Point Clouds (Radu Bogdan Rusu, Zoltan Csaba Marton, Nico Blodow, Michael Beetz), In Proceedings of the 10th International Conference on Intelligent Autonomous Systems (IAS-10), Baden-Baden, Germany, 2008. [16] Towards 3D Object Maps for Autonomous Household Robots (Radu Bogdan Rusu, Nico Blodow, Zoltan-Csaba Marton, Alina Soos, Michael Beetz), In Proceedings of the 20th IEEE International Conference on Intelligent Robots and Systems (IROS), 2007.

SLIDE 51

Journal and Workshop Articles

Journal Articles

[17] Combined 2D-3D Categorization and Classification for Multimodal Perception Systems (Zoltan Csaba Marton, Dejan Pangercic, Nico Blodow, Michael Beetz), In The International Journal of Robotics Research, Sage Publications, 2011. [18] Towards 3D Point Cloud Based Object Maps for Household Environments (Radu Bogdan Rusu, Zoltan Csaba Marton, Nico Blodow, Mihai Dolha, Michael Beetz), In Robotics and Autonomous Systems Journal (Special Issue on Semantic Knowledge in Robotics), volume 56, 2008.

Workshop Papers

[19] Inferring Generalized Pick-and-Place Tasks from Pointing Gestures (Nico Blodow, Zoltan-Csaba Marton, Dejan Pangercic, Thomas Ruehr, Moritz Tenorth, Michael Beetz), In IEEE International Conference on Robotics and Automation (ICRA), Workshop on Semantic Perception, Mapping and Exploration, 2011. [20] CAD-model recognition and 6DOF pose estimation using 3D cues (Aitor Aldoma, Markus Vincze, Nico Blodow, David Gossow, Suat Gedikli, Radu Bogdan Rusu, Gary R. Bradski), In IEEE International Conference on Computer Vision Workshops, ICCV 2011 Workshops, Barcelona, Spain, November 6-13, 2011, 2011. [21] Making Sense of 3D Data (Nico Blodow, Zoltan-Csaba Marton, Dejan Pangercic, Michael Beetz), In Robotics: Science and Systems Conference (RSS), Workshop on Strategies and Evaluation for Mobile Manipulation in Household Environments, 2010. [22] CoP-Man – Perception for Mobile Pick-and-Place in Human Living Environments (Michael Beetz, Nico Blodow, Ulrich Klank, Zoltan Csaba Marton, Dejan Pangercic, Radu Bogdan Rusu), In Proceedings of the 22nd IEEE/RSJ International Conference

n Intelligent Robots and Systems (IROS) Workshop on Semantic Perception for Mobile Manipulation, 2009. (Invited paper.)

[23] Interpretation of Urban Scenes based on Geometric Features (Radu Bogdan Rusu, Zoltan Csaba Marton, Nico Blodow, Michael Beetz), In Proceedings of the 21st IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Workshop on 3D Mapping, Nice, France, September 26, 2008. (Invited paper) [24] Autonomous Mapping of Kitchen Environments and Applications (Zoltan Csaba Marton, Nico Blodow, Mihai Dolha, Moritz Tenorth, Radu Bogdan Rusu, Michael Beetz), In Proceedings of the 1st International Workshop on Cognition for Technical Systems, Munich, Germany, 6-8 October, 2008.

SLIDE 52

Thank you

Thanks for your attention!

SLIDE 53

Autonomous Exploration and Mapping]Autonomous Exploration and 3D Mapping

SLIDE 54

3D Mapping of indoor environments

SLIDE 55

Next Best View given “window voxels”

s ϑ φ φ2 dmax dmin v1 v2

voxel vi visibility from sensor position s

v ϑ φ φ2 dmax dmin

possible sensor positions to see voxel v “window voxels” are between free and unknown space. Stacked Costmaps with visibility kernel yield good scanning poses.

SLIDE 56

Point Cloud Interpretation - Floor, Ceiling

SLIDE 57

Point Cloud Interpretation - Walls, Vertical and Horizontal Planes

SLIDE 58

Point Cloud Interpretation - Fixtures, Doors, Drawers

*

Region growing from handles using median intensity and median average distance (work with Dejan Pangercic, Zoltan Marton)

SLIDE 59

Door and Drawer Hypotheses Validation through Interaction

Work by Thomas Ruehr.

SLIDE 60

Door and Drawer Hypotheses Validation through Interaction 2

SLIDE 61

Making sense of 3D data

Motivation

Central question in many 3D perception applications:

How can we – at all times – know what is going on around us?

Motivation

Central question in many 3D perception applications:

How can we – at all times – know what is going on around us?

Focus of my work:

Dynamic Scene Perception and Spatio-temporal Memory for Robot Manipulation

Motivation

In service robotics especially, we have little to no control over the environment:

Wide range of objects:

Motivation

In service robotics especially, we have little to no control over the environment:

Wide range of object locations:

Motivation

In service robotics especially, we have little to no control over the environment:

Other problems:

Motivation

Many approaches:

Key Challenges

This means: we need fast as well as general algorithms

Outline

pcl::cuda Kinect Results

Octree Octree-based PC Compression Detail Component Compression

Next Best View Room and furniture mapping

Past

Current strategies for optimizations:

humans)

problem: Kinect While these are all good and valid strategies (We can reach processing speeds in the range of seconds)

Kinect produces VGA × 5 bytes @ 30Hz = 44MB/s! ◮ GPGPU programming

pcl::cuda

and Device (=GPU)

minimize performance hits

pcl::cuda

deals with IO, projection of depth data to 3D, GPU memory transfer methods, Kinect "dediscretization", subcloud extraction etc.

neighborhood search; depth-image-based neighborhood search

contains infrastructure for feature estimation, several implementations for normal estimation

deals with robust estimation techniques and models. RANSAC and (novel, parallel) MultiSAC estimators, novel optimized plane estimator

pcl::gpu

Improving Kinect Data

e.g. Wall (top down view)

Improving Kinect Data

Kinect data discretized in disparity

Improving Kinect Data

Normal estimation (and all other feature computations) will have errors

Improving Kinect Data

If we knew the true geometry, we could compute whether the measured (red) point could have been sampled from that surface (purple point)

Improving Kinect Data

We don’t know the model (except for e.g. in RANSAC), but we can assume it to be smooth

Improving Kinect Data

→ same parameters:

MultiSAC plane estimation

(smooth/oversmooth) normal

invalidated easily

models, only recompute inliers when necessary

Performance on NVIDIA GTX 560

Example:

CPU(OpenMP) CUDA Disparity to Cloud + smoothing 25 − 35ms 2 − 2.5ms Normal Estimation 250 − 1000ms 0.5ms Fast Normal Estimation 2.5 − 3.5ms < 0.15ms Surface Orienation Segmentation 1s ≈ 100ms Multiple Plane Estimation > 10s1 50 − 200ms

Using the semantic map in perception

Harnessing OpenGL + CUDA interoperability

Using semantic maps for real time semantic segmentation

Outline

pcl::cuda Kinect Results

Octree Octree-based PC Compression Detail Component Compression

Next Best View Room and furniture mapping

Motivation

Goals/Motivation:

(varying size, resolution, density, point ordering)

(Work with Julius Kammerl)

Background

3D information

compression approach

Octree-based Encoding

points

Temporal Encoding

Temporarily adjacent point clouds often strongly correlate:

Differentially encode octree structures using XOR:

arithmetic entropy coder)

Results