Making sense of 3D data Nico Blodow blodow@cs.tum.edu Intelligent - - PowerPoint PPT Presentation

making sense of 3d data
SMART_READER_LITE
LIVE PREVIEW

Making sense of 3D data Nico Blodow blodow@cs.tum.edu Intelligent - - PowerPoint PPT Presentation

Making sense of 3D data Nico Blodow blodow@cs.tum.edu Intelligent Autonomous Systems, TUM, Germany June 14, 2012 Motivation Central question in many 3D perception applications: How can we at all times know what is going on around us?


slide-1
SLIDE 1

Making sense of 3D data

Nico Blodow blodow@cs.tum.edu Intelligent Autonomous Systems, TUM, Germany June 14, 2012

slide-2
SLIDE 2

Motivation

Central question in many 3D perception applications:

How can we – at all times – know what is going on around us?

slide-3
SLIDE 3

Motivation

Central question in many 3D perception applications:

How can we – at all times – know what is going on around us?

Focus of my work:

Dynamic Scene Perception and Spatio-temporal Memory for Robot Manipulation

slide-4
SLIDE 4

Motivation

In service robotics especially, we have little to no control over the environment:

Wide range of objects:

  • textured, non-textured
  • 3D objects, flat objects (cutlery, paper. . . )
  • indifferentiable objects (12 equal cups)
  • state of objects (my cup, empty/full milk
  • carton. . . )
  • clutter, occlusions
slide-5
SLIDE 5

Motivation

In service robotics especially, we have little to no control over the environment:

Wide range of object locations:

  • table top
  • containers (cupboards, drawers. . . )
  • fridge
slide-6
SLIDE 6

Motivation

In service robotics especially, we have little to no control over the environment:

Other problems:

  • humans interfere with task / objects
  • large universe of objects
  • ever changing universe of objects
  • lighting
  • . . .
slide-7
SLIDE 7

Motivation

Many approaches:

  • environment mapping, room / furniture classification
  • table extents and positions, object catalog, container contents
  • object detection, reconstruction and classification
  • object identity resolution, tracking, etc.

Key Challenges

  • data throughput
  • dynamic environments
  • humans
  • hard constraints on processing times

This means: we need fast as well as general algorithms

slide-8
SLIDE 8

Outline

1 GPU-Accelerated depth image processing

pcl::cuda Kinect Results

2 Point Cloud Compression

Octree Octree-based PC Compression Detail Component Compression

3 Unstructured Information Management Architecture

Next Best View Room and furniture mapping

slide-9
SLIDE 9

Past

Current strategies for optimizations:

  • Downsampling = much less data
  • Spatial locators / tree structures
  • Ignoring some problems (online,

humans)

  • reordering points for cache optimization
  • "framedropping" / Using slow scanners –

problem: Kinect While these are all good and valid strategies (We can reach processing speeds in the range of seconds)

  • ur target is < 30ms

Kinect produces VGA × 5 bytes @ 30Hz = 44MB/s! ◮ GPGPU programming

slide-10
SLIDE 10

pcl::cuda

  • focus on real-time point cloud processing
  • implemented in thrust — CUDA template library similar to STL
  • biggest problem: data transfer between Host (=CPU)

and Device (=GPU)

  • therefore: all algorithms should be implemented on the GPU to

minimize performance hits

  • Input data: Kinect Bayer image + depth image

(or of course everything from pcl::io)

slide-11
SLIDE 11

pcl::cuda

  • pcl::cuda::io

deals with IO, projection of depth data to 3D, GPU memory transfer methods, Kinect "dediscretization", subcloud extraction etc.

  • pcl::cuda::nn

neighborhood search; depth-image-based neighborhood search

  • pcl::cuda::features

contains infrastructure for feature estimation, several implementations for normal estimation

  • pcl::cuda::sampleconsensus

deals with robust estimation techniques and models. RANSAC and (novel, parallel) MultiSAC estimators, novel optimized plane estimator

slide-12
SLIDE 12

pcl::gpu

  • itseez’s reimplementation in “pure” CUDA
  • kinfu — Kinect Fusion reimplementation
  • features — normals, spin images, PFH, FPFH, VFH etc.
  • octree search structures
slide-13
SLIDE 13

Improving Kinect Data

e.g. Wall (top down view)

slide-14
SLIDE 14

Improving Kinect Data

Kinect data discretized in disparity

slide-15
SLIDE 15

Improving Kinect Data

Normal estimation (and all other feature computations) will have errors

slide-16
SLIDE 16

Improving Kinect Data

If we knew the true geometry, we could compute whether the measured (red) point could have been sampled from that surface (purple point)

slide-17
SLIDE 17

Improving Kinect Data

We don’t know the model (except for e.g. in RANSAC), but we can assume it to be smooth

slide-18
SLIDE 18

Improving Kinect Data

→ same parameters:

slide-19
SLIDE 19

MultiSAC plane estimation

  • Replace 3-point sample for plane estimation with 1 point +

(smooth/oversmooth) normal

  • leads to lower nr. of iterations k =

log(1−p) log(1−(1−ǫ)s) 1 create batch of plane hypothesis on GPU by sampling 1 point each 2 iterate (CPU) over k plane hypotheses, compute inliers on GPU 3 after accepting model, each model created from an inlier can be

invalidated easily

4 compare plane equations of accepted model with all other valid

models, only recompute inliers when necessary

slide-20
SLIDE 20

Performance on NVIDIA GTX 560

  • CUDA yields remarkable speedup for highly parallel tasks

Example:

  • openni_camera driver in ROS: 70% CPU usage
  • OpenNIGrabber in PCL: 30% CPU usage
  • Our Solution: 3% CPU usage, 3% GPU usage.

CPU(OpenMP) CUDA Disparity to Cloud + smoothing 25 − 35ms 2 − 2.5ms Normal Estimation 250 − 1000ms 0.5ms Fast Normal Estimation 2.5 − 3.5ms < 0.15ms Surface Orienation Segmentation 1s ≈ 100ms Multiple Plane Estimation > 10s1 50 − 200ms

1possibly much longer

slide-21
SLIDE 21

Using the semantic map in perception

slide-22
SLIDE 22

Harnessing OpenGL + CUDA interoperability

Using semantic maps for real time semantic segmentation

Normal space, depth image and mask from sensor’s point of view (< 1ms) semantic map (normal space), distances between Kinect data and semantic map, distances filtered (≈ 1ms)

slide-23
SLIDE 23

Outline

1 GPU-Accelerated depth image processing

pcl::cuda Kinect Results

2 Point Cloud Compression

Octree Octree-based PC Compression Detail Component Compression

3 Unstructured Information Management Architecture

Next Best View Room and furniture mapping

slide-24
SLIDE 24

Motivation

Point Cloud StreamCompression Point Cloud StreamDecompression Network

Goals/Motivation:

  • Efficient for real-time processing
  • General compression approach for unstructured point clouds

(varying size, resolution, density, point ordering)

  • Exploit spatial sparseness of point clouds
  • Exploit temporal redundancies in point cloud streams
  • Keep introduced coding distortion below sensor noise

(Work with Julius Kammerl)

slide-25
SLIDE 25

Background

c NVIDIA Research

  • Hierarchical tree data structures can efficiently describe sparse

3D information

  • Focus on real-time compression favors octree-based point cloud

compression approach

  • Octree structures enable fast spatial decomposition
slide-26
SLIDE 26

Octree-based Encoding

Serialized Octree: 00000100 01000001 00011000 00100000 00000100 01000001 00011000 00100000

  • Root node describes a cubic bounding box which encapsulates all

points

  • Child nodes recursively subdivide point space
  • Nodes have up to eight children ⇒ Byte encoding
  • Point encoding by serializing high-resolution octree structures!
slide-27
SLIDE 27

Temporal Encoding

Temporarily adjacent point clouds often strongly correlate:

Serialized Octree A: 00000100 01000001 00011000 00100000 00000100 01000001 00011000 00100000 00000100 01000010 00011000 00000010 Serialized Octree B: 00000100 01000010 00011000 00000010

Differentially encode octree structures using XOR:

XOR Encoded Octree B: 00000000 00000011 00000000 00000010

  • Gain: reduced entropy of the serialized binary data!
  • Compression using fast range coder (fixed-point version of

arithmetic entropy coder)

slide-28
SLIDE 28

Results

  • Experimental results of octree-based point cloud compression
slide-29
SLIDE 29

Results

  • Data rate comparison between regular octree compression (gray)

and differential octree compression (black) for 1mm3 resolution

slide-30
SLIDE 30

Demo - change detection

Applications:

  • 3D (not just 2.5D!) video streaming
  • Real-time spatial change detection

based on XOR comparison of octree structure

slide-31
SLIDE 31

Detail encoding

  • Challenge: With increased octree resolution, complexity grows

exponentially

  • Solution: Limit octree resolution and encode point detail

coefficients

. 3 0.002 . 4 0.002

  • Enables trade-off between complexity and compression

performance

  • Also applicable to point components (color, normals, etc.)
slide-32
SLIDE 32

Compression Pipeline

Encoding Pipeline: Octree Structure Point Component Encoding Entropy Encoding Point Detail Encoding Position Detail Coefficients Binary Serialization Compressed PC Point Cloud Component Voxel Avg. + Detail Coefficients Decoding Pipeline: Compressed PC Entropy Decoding Point Detail Decoding Point Component Decoding Octree Structure Point Cloud

slide-33
SLIDE 33

Results

  • Experimental results for point detail encoding at octree resolution
  • f 9mm3.
  • Enables fast real-time encoding with high point precision
  • Constant run-time with Octree + Point Detail Coding
  • (Byproduct: octree based search operations, downsampling, point

density analysis, change detection, occupancy maps)

slide-34
SLIDE 34

Demo - point cloud compression

  • Point cloud compression demo
slide-35
SLIDE 35

Outline

1 GPU-Accelerated depth image processing

pcl::cuda Kinect Results

2 Point Cloud Compression

Octree Octree-based PC Compression Detail Component Compression

3 Unstructured Information Management Architecture

Next Best View Room and furniture mapping

slide-36
SLIDE 36

Plugging everything together

slide-37
SLIDE 37

Dealing with multimodal data

Robust robot systems have to solve really hard tasks

  • harness various state-of-the-art methods from different fields
  • data sources (images, point clouds, web stores. . . ) and data

formats can be very specialized

  • UIMA (IBM Watson) is designed to deal with unstructured

information created by different experts

  • → can this be transferred from the text domain to 2D/3D data and
  • ntological knowledge?
slide-38
SLIDE 38

UIMA — Concepts (1/4)

(image courtesy of http://uima.apache.org)

  • Unstructured information processing and management
  • Common analysis structure (CAS) holds original raw data, can

hold multiple Subjects of Analysis (SOFAs), multiple Views, Annotations

  • Collection Readers: Generate CAS with initial document (web

scrapers, sensor streams, ontologies etc.)

slide-39
SLIDE 39

UIMA — Concepts (2/4)

(image courtesy of http://uima.apache.org)

  • Analysis Engines: (primitive or aggregate) can enrich the CAS

with Annotations on the different Views/SOFAs

  • Annotations are stored in Index Repositories for access and

search

  • all data structures adhere to a language-agnostic hierarchical

type system

slide-40
SLIDE 40

UIMA — Concepts (3/4)

(image courtesy of http://uima.apache.org)

  • Analysis Engines (= ensembles of experts) is controlled by flow

controllers → (task/scene/resource adaptation. . . )

  • CAS Consumers process the annotated CAS for storage or reuse

(e.g. interface to KnowRob!)

  • MongoDB database collects long term memory, object models,

environment models etc.

slide-41
SLIDE 41

UIMA — Concepts (3/4)

(image courtesy of http://uima.apache.org)

  • Collection Engines can be spawned from Java, C++, command

line from a CPE Descriptor

  • CPM responsible for CAS Management, failure recovery, scale out
slide-42
SLIDE 42

UIMA — Type System for Perception

  • subtypes of

Identifiable : storable in MongoDB

  • Scene consist of

Clusters taken at a certain time

  • Clusters contain a

point cloud and a list of Annotations

  • "semantic" type

hierarchy

slide-43
SLIDE 43

UIMA — Type System for Perception

  • subtypes of

Identifiable : storable in MongoDB

  • Scene consist of

Clusters taken at a certain time

  • Clusters contain a

point cloud and a list of Annotations

  • "semantic" type

hierarchy

slide-44
SLIDE 44

Example: Robot Google Goggles

  • 3D sensors can effectively

segment object hypotheses (but resolution too low)

  • leverage Google Goggles’

multimodal image analysis

  • corresponding (higher-res)

camera image region uploaded to Goggles

  • Response: List of things (of

various types) detected in

  • bject image
slide-45
SLIDE 45

Example: Robot Google Goggles

slide-46
SLIDE 46

Example: Robot Google Goggles

User Submitted Logo / Brand Similar Image (+ original site) Text + Translation Product Link (+ Barcodes, Landmarks, etc) Segmentation / Clustering: Kinect-based Tabletop Object Detection

slide-47
SLIDE 47

Current IAS-UIMA Capabilities

  • German-Deli scraping
  • Kinect/PCD File input
  • ROS → UIMA interface
  • GPU-based normal estimation, clustering
  • per-cluster feature estimation (all PCL features)
  • per-cluster Goggles annotation
  • storage in MongoDB database
slide-48
SLIDE 48

Outlook and Future Work

  • Further integration of UIMA and previous perception systems

(COP)

  • entity resolution framework (going from clusters/classification to
  • bject instances)
  • Integration of / interfacing to KnowRob, CRAM
  • leverage Machine Learning for control flow, ensemble methods
slide-49
SLIDE 49

Peer-reviewed Conference Publications (1/2)

[1] Real-time Compression of Point Cloud Streams (Julius Kammerl, Nico Blodow, Radu Bogdan Rusu, Suat Gedikli, Michael Beetz, Eckehard Steinbach), In IEEE International Conference on Robotics and Automation (ICRA), 2012. [2] Autonomous Semantic Mapping for Robots Performing Everyday Manipulation Tasks in Kitchen Environments (Nico Blodow, Lucian Cosmin Goron, Zoltan-Csaba Marton, Dejan Pangercic, Thomas RÃijhr, Moritz Tenorth, Michael Beetz), In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2011. [3] General 3D Modelling of Novel Objects from a Single View (Zoltan-Csaba Marton, Dejan Pangercic, Nico Blodow, Jonathan Kleinehellefort, Michael Beetz), In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2010. [4] Perception and Probabilistic Anchoring for Dynamic World State Logging (Nico Blodow, Dominik Jain, Zoltan-Csaba Marton, Michael Beetz), In 10th IEEE-RAS International Conference on Humanoid Robots, 2010. [5] Model-based and Learned Semantic Object Labeling in 3D Point Cloud Maps of Kitchen Environments (Radu Bogdan Rusu, Zoltan Csaba Marton, Nico Blodow, Andreas Holzbach, Michael Beetz), In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2009. [6] Fast Geometric Point Labeling using Conditional Random Fields (Radu Bogdan Rusu, Andreas Holzbach, Nico Blodow, Michael Beetz), In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2009. [7] Close-range Scene Segmentation and Reconstruction of 3D Point Cloud Maps for Mobile Manipulation in Human Environments (Radu Bogdan Rusu, Nico Blodow, Zoltan Csaba Marton, Michael Beetz), In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2009. [8] Fast Point Feature Histograms (FPFH) for 3D Registration (Radu Bogdan Rusu, Nico Blodow, Michael Beetz), In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Kobe, Japan, May 12-17, 2009. [9] Partial View Modeling and Validation in 3D Laser Scans for Grasping (Nico Blodow, Radu Bogdan Rusu, Zoltan Csaba Marton, Michael Beetz), In 9th IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2009. [10] The Assistive Kitchen – A Demonstration Scenario for Cognitive Technical Systems (Michael Beetz, Freek Stulp, Bernd Radig, Jan Bandouch, Nico Blodow, Mihai Dolha, Andreas Fedrizzi, Dominik Jain, Uli Klank, Ingo Kresse, Alexis Maldonado, Zoltan Marton, Lorenz MÃ˝ usenlechner, Federico Ruiz, Radu Bogdan Rusu, Moritz Tenorth), In IEEE 17th International Symposium on Robot and Human Interactive Communication (RO-MAN), Muenchen, Germany, 2008. (Invited paper.)

slide-50
SLIDE 50

Peer-reviewed Conference Publications (2/2)

[11] Action Recognition in Intelligent Environments using Point Cloud Features Extracted from Silhouette Sequences (Radu Bogdan Rusu, Jan Bandouch, Zoltan Csaba Marton, Nico Blodow, Michael Beetz), In IEEE 17th International Symposium on Robot and Human Interactive Communication (RO-MAN), Muenchen, Germany, 2008. [12] Functional Object Mapping of Kitchen Environments (Radu Bogdan Rusu, Zoltan Csaba Marton, Nico Blodow, Mihai Emanuel Dolha, Michael Beetz), In Proceedings of the 21st IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Nice, France, September 22-26, 2008. [13] Aligning Point Cloud Views using Persistent Feature Histograms (Radu Bogdan Rusu, Nico Blodow, Zoltan Csaba Marton, Michael Beetz), In Proceedings of the 21st IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Nice, France, September 22-26, 2008. [14] Learning Informative Point Classes for the Acquisition of Object Model Maps (Radu Bogdan Rusu, Zoltan Csaba Marton, Nico Blodow, Michael Beetz), In Proceedings of the 10th International Conference on Control, Automation, Robotics and Vision (ICARCV), Hanoi, Vietnam, December 17-20, 2008. [15] Persistent Point Feature Histograms for 3D Point Clouds (Radu Bogdan Rusu, Zoltan Csaba Marton, Nico Blodow, Michael Beetz), In Proceedings of the 10th International Conference on Intelligent Autonomous Systems (IAS-10), Baden-Baden, Germany, 2008. [16] Towards 3D Object Maps for Autonomous Household Robots (Radu Bogdan Rusu, Nico Blodow, Zoltan-Csaba Marton, Alina Soos, Michael Beetz), In Proceedings of the 20th IEEE International Conference on Intelligent Robots and Systems (IROS), 2007.

slide-51
SLIDE 51

Journal and Workshop Articles

Journal Articles

[17] Combined 2D-3D Categorization and Classification for Multimodal Perception Systems (Zoltan Csaba Marton, Dejan Pangercic, Nico Blodow, Michael Beetz), In The International Journal of Robotics Research, Sage Publications, 2011. [18] Towards 3D Point Cloud Based Object Maps for Household Environments (Radu Bogdan Rusu, Zoltan Csaba Marton, Nico Blodow, Mihai Dolha, Michael Beetz), In Robotics and Autonomous Systems Journal (Special Issue on Semantic Knowledge in Robotics), volume 56, 2008.

Workshop Papers

[19] Inferring Generalized Pick-and-Place Tasks from Pointing Gestures (Nico Blodow, Zoltan-Csaba Marton, Dejan Pangercic, Thomas Ruehr, Moritz Tenorth, Michael Beetz), In IEEE International Conference on Robotics and Automation (ICRA), Workshop on Semantic Perception, Mapping and Exploration, 2011. [20] CAD-model recognition and 6DOF pose estimation using 3D cues (Aitor Aldoma, Markus Vincze, Nico Blodow, David Gossow, Suat Gedikli, Radu Bogdan Rusu, Gary R. Bradski), In IEEE International Conference on Computer Vision Workshops, ICCV 2011 Workshops, Barcelona, Spain, November 6-13, 2011, 2011. [21] Making Sense of 3D Data (Nico Blodow, Zoltan-Csaba Marton, Dejan Pangercic, Michael Beetz), In Robotics: Science and Systems Conference (RSS), Workshop on Strategies and Evaluation for Mobile Manipulation in Household Environments, 2010. [22] CoP-Man – Perception for Mobile Pick-and-Place in Human Living Environments (Michael Beetz, Nico Blodow, Ulrich Klank, Zoltan Csaba Marton, Dejan Pangercic, Radu Bogdan Rusu), In Proceedings of the 22nd IEEE/RSJ International Conference

  • n Intelligent Robots and Systems (IROS) Workshop on Semantic Perception for Mobile Manipulation, 2009. (Invited paper.)

[23] Interpretation of Urban Scenes based on Geometric Features (Radu Bogdan Rusu, Zoltan Csaba Marton, Nico Blodow, Michael Beetz), In Proceedings of the 21st IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Workshop on 3D Mapping, Nice, France, September 26, 2008. (Invited paper) [24] Autonomous Mapping of Kitchen Environments and Applications (Zoltan Csaba Marton, Nico Blodow, Mihai Dolha, Moritz Tenorth, Radu Bogdan Rusu, Michael Beetz), In Proceedings of the 1st International Workshop on Cognition for Technical Systems, Munich, Germany, 6-8 October, 2008.

slide-52
SLIDE 52

Thank you

Thanks for your attention!

slide-53
SLIDE 53

Autonomous Exploration and Mapping]Autonomous Exploration and 3D Mapping

slide-54
SLIDE 54

3D Mapping of indoor environments

slide-55
SLIDE 55

Next Best View given “window voxels”

s ϑ φ φ2 dmax dmin v1 v2

voxel vi visibility from sensor position s

v ϑ φ φ2 dmax dmin

possible sensor positions to see voxel v “window voxels” are between free and unknown space. Stacked Costmaps with visibility kernel yield good scanning poses.

slide-56
SLIDE 56

Point Cloud Interpretation - Floor, Ceiling

slide-57
SLIDE 57

Point Cloud Interpretation - Walls, Vertical and Horizontal Planes

slide-58
SLIDE 58

Point Cloud Interpretation - Fixtures, Doors, Drawers

*

Region growing from handles using median intensity and median average distance (work with Dejan Pangercic, Zoltan Marton)

slide-59
SLIDE 59

Door and Drawer Hypotheses Validation through Interaction

Work by Thomas Ruehr.

slide-60
SLIDE 60

Door and Drawer Hypotheses Validation through Interaction 2

slide-61
SLIDE 61

Final (Manually Augmented) Map

Full Video: http://www.youtube.com/watch?v=T15ycSmNOFY