Fast Multipole Methods in Arbitrary Dimensions with Chenhan Yu - PowerPoint PPT Presentation
Fast Multipole Methods in Arbitrary Dimensions with Chenhan Yu James Levitt Severin Riez Bill March Bo Xiao GEORGE BIROS padas.ices.utexas.edu Problem statement and contributions FMM for kernel matrices given points in high D
Fast Multipole Methods in Arbitrary Dimensions with Chenhan Yu James Levitt Severin Riez Bill March Bo Xiao GEORGE BIROS padas.ices.utexas.edu
Problem statement and contributions • FMM for kernel matrices given points in high D • FMM for SPD matrices—no points given • Four components - Matrix permutations to expose low-rank structure — O(N) - Compress blocks — O(N logN) - Fast Matvec — O(N) - HPC implementation (MPI async + ARM, x86/KNL, GPUs) 2
Motivation: Kernel classification 3
Motivation: arbitrary SPD matrices • Hessian matrix for large scale optimization • Schur-complement operators for computing inverse graph Laplacians • Factorization of dense covariance matrices No available geometry information (i.e., points) 5
Two algorithms • ASKIT: Algebraically Skeletonized Kernel Independent Treecode • GOFMM: Geometry Oblivious Fast Multipole Method 6
Highlights ASKIT CFD:12B/3D ~ 700 GB Kernels: 1B/128D ~ 1TB Malhotra, Gholami, & B’ SC14 March, Yu, Xiao, B. KDD’15 7
Highlights GOFMM 8
Achieving O(N log a N) complexity HIERARCHICAL NYSTROM ENSEMBLE NYSTROM MATRICES 9
Sparse + low-rank 10
C9XicdVJda9RAFJ3EqjV+bfXRl8F kQpLku7W3QdhUR8E Su4t AJy2Ryszt0MokzE+0S8lf6JL76e/w3 v0odqFeSDhz7 knd85NWilpXRj+8fwbOzdv3d69E9y9d/ Bw87eo6+2rI2AiShVaU5SbkFJDRMn YKTygAvUgXH6dnbZf34OxgrS/3FLSpICj7TMpeCO0xNOxcNW4k0qeLirA3YafA6oJSlMJO6SQvujDxvMfNh2kR S5/TkDKG5xAh5uK4XZ8Z6Owf/+W1Gpue6LJn2X9Vc1uDJUE7 XTDXjTqR6OYhr04PhwMDxAcDqLBsE+jXriKLtnE0XTPG7OsFHUB2gnFrT2NwsolDTdOCgWoWluo8Kp8BqcINS/AJs3Kg5Y+w0xG89Lgox1dZa92NLywdlGkyMQp53ZL VfwTSdLyrJ2LaV2+TBp K5qB1qsv5fXirqSLpdDM2lAOLVAwIWRODIVc264cLjCgBnQ8EOURcF1ts9yXki1yCDntXINs/kG4h0t4I+gZ27esIobqTN0o23QCKy9A/TFwEc 7lMFhjtcPrNypt m9Q7Q80tj6f/BJO6NetHnuDt+szF/lzwhT8kLEpFXZEzekyMyIcLb8fa9A6/vn/sX/k/ 15rqe5uex2Qr/N9/AWX95/Y=</latexit> <latexit sha1_base64="MaAHAKwpmXsK4CO80ODxZcw9WHU=">A <latexit sha1_base64="MaAHAKwpmXsK4CO80ODxZcw9WHU=">A C9XicdVJda9RAFJ3EqjV+bfXRl8F kQpLku7W3QdhUR8E Su4t AJy2Ryszt0MokzE+0S8lf6JL76e/w3 v0odqFeSDhz7 knd85NWilpXRj+8fwbOzdv3d69E9y9d/ Bw87eo6+2rI2AiShVaU5SbkFJDRMn YKTygAvUgXH6dnbZf34OxgrS/3FLSpICj7TMpeCO0xNOxcNW4k0qeLirA3YafA6oJSlMJO6SQvujDxvMfNh2kR S5/TkDKG5xAh5uK4XZ8Z6Owf/+W1Gpue6LJn2X9Vc1uDJUE7 XTDXjTqR6OYhr04PhwMDxAcDqLBsE+jXriKLtnE0XTPG7OsFHUB2gnFrT2NwsolDTdOCgWoWluo8Kp8BqcINS/AJs3Kg5Y+w0xG89Lgox1dZa92NLywdlGkyMQp53ZL VfwTSdLyrJ2LaV2+TBp K5qB1qsv5fXirqSLpdDM2lAOLVAwIWRODIVc264cLjCgBnQ8EOURcF1ts9yXki1yCDntXINs/kG4h0t4I+gZ27esIobqTN0o23QCKy9A/TFwEc 7lMFhjtcPrNypt m9Q7Q80tj6f/BJO6NetHnuDt+szF/lzwhT8kLEpFXZEzekyMyIcLb8fa9A6/vn/sX/k/ 15rqe5uex2Qr/N9/AWX95/Y=</latexit> <latexit sha1_base64="MaAHAKwpmXsK4CO80ODxZcw9WHU=">A C9XicdVJda9RAFJ3EqjV+bfXRl8F kQpLku7W3QdhUR8E Su4t AJy2Ryszt0MokzE+0S8lf6JL76e/w3 v0odqFeSDhz7 knd85NWilpXRj+8fwbOzdv3d69E9y9d/ Bw87eo6+2rI2AiShVaU5SbkFJDRMn YKTygAvUgXH6dnbZf34OxgrS/3FLSpICj7TMpeCO0xNOxcNW4k0qeLirA3YafA6oJSlMJO6SQvujDxvMfNh2kR S5/TkDKG5xAh5uK4XZ8Z6Owf/+W1Gpue6LJn2X9Vc1uDJUE7 XTDXjTqR6OYhr04PhwMDxAcDqLBsE+jXriKLtnE0XTPG7OsFHUB2gnFrT2NwsolDTdOCgWoWluo8Kp8BqcINS/AJs3Kg5Y+w0xG89Lgox1dZa92NLywdlGkyMQp53ZL VfwTSdLyrJ2LaV2+TBp K5qB1qsv5fXirqSLpdDM2lAOLVAwIWRODIVc264cLjCgBnQ8EOURcF1ts9yXki1yCDntXINs/kG4h0t4I+gZ27esIobqTN0o23QCKy9A/TFwEc 7lMFhjtcPrNypt m9Q7Q80tj6f/BJO6NetHnuDt+szF/lzwhT8kLEpFXZEzekyMyIcLb8fa9A6/vn/sX/k/ 15rqe5uex2Qr/N9/AWX95/Y=</latexit> <latexit sha1_base64="MaAHAKwpmXsK4CO80ODxZcw9WHU=">A <latexit sha1_base64="r/IpZ RUlDGDMOsHvgtPVTYtm+0=">A CuXicdVFNj9MwEHXD1xK+unDkYlGBEIcqyW6XVuJQAQckhFgkyq60jirHmbRmHSfYDlBZ+X/8Bf4EVzgyTVvBSjCSrac3b8bjN1mtpHVR9L0X Lp85eq1vevhjZu3bt/p79/9YKvGCJiJSlXmNOMWlNQwc9IpOK0N8DJTcJKdv1jnTz6DsbLS792qhrTkCy0LKbhDat7P Oua+Ex cd6G7CyklGWwkNpnJXdGfm2ReT3 cdzSRx1IWsrYhkx2ZLIjGej8TyVLw3beH0TDeHIYTxIaDZPkaDQ+QHA0ikfjQxoPoy4GZBvH8/3elOWVaErQTihu7Vkc1S713DgpFGDXxkKN4/IFnCHUvASb+u4fLX2ITE6LyuDRjnbs3xWel9auygyVO XSXuhWKPik07VknfunpH FOPVS140DLTbvFY2irqJrg2kuDQinVgi4MBJHpmLJDRcO1xAyAxq+iKosuc6fsIKXUq1yKHijnGe2 EL8owVcpl64pWc1N1Ln6Ebr0QjMvQT0xcAbHO5tDY 7XC zcqFb390her4zlv4fzJLhZBi/SwbT51vz98h98oA8JjF5SqbkFTkmMyLIN/KD/CS/gmdBFiyDjxtp0NvW3CMXIrC/Aa7 2Og=</latexit> <latexit sha1_base64="r/IpZ RUlDGDMOsHvgtPVTYtm+0=">A CuXicdVFNj9MwEHXD1xK+unDkYlGBEIcqyW6XVuJQAQckhFgkyq60jirHmbRmHSfYDlBZ+X/8Bf4EVzgyTVvBSjCSrac3b8bjN1mtpHVR9L0X Lp85eq1vevhjZu3bt/p79/9YKvGCJiJSlXmNOMWlNQwc9IpOK0N8DJTcJKdv1jnTz6DsbLS792qhrTkCy0LKbhDat7P Oua+Ex cd6G7CyklGWwkNpnJXdGfm2ReT3 cdzSRx1IWsrYhkx2ZLIjGej8TyVLw3beH0TDeHIYTxIaDZPkaDQ+QHA0ikfjQxoPoy4GZBvH8/3elOWVaErQTihu7Vkc1S713DgpFGDXxkKN4/IFnCHUvASb+u4fLX2ITE6LyuDRjnbs3xWel9auygyVO XSXuhWKPik07VknfunpH FOPVS140DLTbvFY2irqJrg2kuDQinVgi4MBJHpmLJDRcO1xAyAxq+iKosuc6fsIKXUq1yKHijnGe2 EL8owVcpl64pWc1N1Ln6Ebr0QjMvQT0xcAbHO5tDY 7XC zcqFb390her4zlv4fzJLhZBi/SwbT51vz98h98oA8JjF5SqbkFTkmMyLIN/KD/CS/gmdBFiyDjxtp0NvW3CMXIrC/Aa7 2Og=</latexit> <latexit sha1_base64="r/IpZ RUlDGDMOsHvgtPVTYtm+0=">A CuXicdVFNj9MwEHXD1xK+unDkYlGBEIcqyW6XVuJQAQckhFgkyq60jirHmbRmHSfYDlBZ+X/8Bf4EVzgyTVvBSjCSrac3b8bjN1mtpHVR9L0X Lp85eq1vevhjZu3bt/p79/9YKvGCJiJSlXmNOMWlNQwc9IpOK0N8DJTcJKdv1jnTz6DsbLS792qhrTkCy0LKbhDat7P Oua+Ex cd6G7CyklGWwkNpnJXdGfm2ReT3 cdzSRx1IWsrYhkx2ZLIjGej8TyVLw3beH0TDeHIYTxIaDZPkaDQ+QHA0ikfjQxoPoy4GZBvH8/3elOWVaErQTihu7Vkc1S713DgpFGDXxkKN4/IFnCHUvASb+u4fLX2ITE6LyuDRjnbs3xWel9auygyVO XSXuhWKPik07VknfunpH FOPVS140DLTbvFY2irqJrg2kuDQinVgi4MBJHpmLJDRcO1xAyAxq+iKosuc6fsIKXUq1yKHijnGe2 EL8owVcpl64pWc1N1Ln6Ebr0QjMvQT0xcAbHO5tDY 7XC zcqFb390her4zlv4fzJLhZBi/SwbT51vz98h98oA8JjF5SqbkFTkmMyLIN/KD/CS/gmdBFiyDjxtp0NvW3CMXIrC/Aa7 2Og=</latexit> <latexit sha1_base64="r/IpZ RUlDGDMOsHvgtPVTYtm+0=">A CuXicdVFNj9MwEHXD1xK+unDkYlGBEIcqyW6XVuJQAQckhFgkyq60jirHmbRmHSfYDlBZ+X/8Bf4EVzgyTVvBSjCSrac3b8bjN1mtpHVR9L0X Lp85eq1vevhjZu3bt/p79/9YKvGCJiJSlXmNOMWlNQwc9IpOK0N8DJTcJKdv1jnTz6DsbLS792qhrTkCy0LKbhDat7P Oua+Ex cd6G7CyklGWwkNpnJXdGfm2ReT3 cdzSRx1IWsrYhkx2ZLIjGej8TyVLw3beH0TDeHIYTxIaDZPkaDQ+QHA0ikfjQxoPoy4GZBvH8/3elOWVaErQTihu7Vkc1S713DgpFGDXxkKN4/IFnCHUvASb+u4fLX2ITE6LyuDRjnbs3xWel9auygyVO XSXuhWKPik07VknfunpH FOPVS140DLTbvFY2irqJrg2kuDQinVgi4MBJHpmLJDRcO1xAyAxq+iKosuc6fsIKXUq1yKHijnGe2 EL8owVcpl64pWc1N1Ln6Ebr0QjMvQT0xcAbHO5tDY 7XC zcqFb390her4zlv4fzJLhZBi/SwbT51vz98h98oA8JjF5SqbkFTkmMyLIN/KD/CS/gmdBFiyDjxtp0NvW3CMXIrC/Aa7 2Og=</latexit> C9XicdVJda9RAFJ3EqjV+bfXRl8F kQpLku7W3QdhUR8E Su4t AJy2Ryszt0MokzE+0S8lf6JL76e/w3 v0odqFeSDhz7 knd85NWilpXRj+8fwbOzdv3d69E9y9d/ Bw87eo6+2rI2AiShVaU5SbkFJDRMn YKTygAvUgXH6dnbZf34OxgrS/3FLSpICj7TMpeCO0xNOxcNW4k0qeLirA3YafA6oJSlMJO6SQvujDxvMfNh2kR S5/TkDKG5xAh5uK4XZ8Z6Owf/+W1Gpue6LJn2X9Vc1uDJUE7 XTDXjTqR6OYhr04PhwMDxAcDqLBsE+jXriKLtnE0XTPG7OsFHUB2gnFrT2NwsolDTdOCgWoWluo8Kp8BqcINS/AJs3Kg5Y+w0xG89Lgox1dZa92NLywdlGkyMQp53ZL VfwTSdLyrJ2LaV2+TBp K5qB1qsv5fXirqSLpdDM2lAOLVAwIWRODIVc264cLjCgBnQ8EOURcF1ts9yXki1yCDntXINs/kG4h0t4I+gZ27esIobqTN0o23QCKy9A/TFwEc 7lMFhjtcPrNypt m9Q7Q80tj6f/BJO6NetHnuDt+szF/lzwhT8kLEpFXZEzekyMyIcLb8fa9A6/vn/sX/k/ 15rqe5uex2Qr/N9/AWX95/Y=</latexit> Hierarchical matrices, basic idea K 11 � K 12 K 21 K 22 0 K 11 � � K 12 0 = + K 22 K 21 0 0 11
Constructing the approximation 12
Idea I: far-field —> low rank x i x j x w 13
Idea II: Near/Far field split 14
Idea III: recursion 1 2 3 4 15
Challenges in high-dimensions • Constructing the far-field approximations polynomial in ambient-D • Near-far field decomposition polynomial in ambient-D • No scalable algorithms (other than Nystrom) • Nystrom method assumes low rank provably not the case with increasing N 16
Randomized linear algebra — Nystrom method • Low-rank decomposition of G • Random sampling of O(s) points, s: target rank • Work • Error 17
ASKIT • Randomized Linear Algebra — far field approximation • Parallel binary trees — permutation, partitioning • Nearest neighbors — pruning and sampling • Treecode / FMM SISC’15,16 • MPI / OpenMP / SIMD / GPU acceleration ACHA’15 • Inspired by KDD’15 Ying & B. & Zorin’03 SC’15 Haiko & Martinsson & Tropp’11 Drineas & Kahan & Mahoney'06 IPDPS’15,16,17 18
Far-field s-rank approximation • SVD is too expensive — use sampling • Sample rows leverage, norm, range-space • Interpolative decomposition • ASKIT: approximate norm adaptive sampling using nearest-neighbors + adaptive rank selection 19
Skeletonization (far field approximation) 20
Evaluation 21
Evaluation 22
Complexity and error • Work off-diagonal • Error • Nystrom diagonal 23
Parallel complexity Points per MPI task n = N Tree depth D = log N p s Tree construction ≤ ( t s + t w ) log 2 p log N + ( t w log p ) ( d + k ) n ⇣ n ⌘ s 3 s + log p Skeletonization ≤ t f Evaluation ≤ t s p + ( t w + t f ) d k s D n 24
Summary of ASKIT features • Binary tree for matrix permutation • Approximate randomized nearest neighbors • Nearest neighbors for skeletonization • Bottom-up recursive low-rank approximation • Top-down pass for fast evaluation • Adaptive sampling and rank selection 25
Gaussian 3D, 1M points 64D/20D intr, 1M points 26
Kernel acceleration 27
Nystrom vs ASKIT (8M/784D) NYSTROM ASKIT 28
Kernel regression scaling MNIST dataset for OCR strong scaling, 8M points d=784 29
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.