1 / 15 Tensor Low-Rank Reconstruction for Semantic Segmentation - - PowerPoint PPT Presentation

1 15 tensor low rank reconstruction for semantic
SMART_READER_LITE
LIVE PREVIEW

1 / 15 Tensor Low-Rank Reconstruction for Semantic Segmentation - - PowerPoint PPT Presentation

1 / 15 Tensor Low-Rank Reconstruction for Semantic Segmentation Wanli Chen 1 , Xinge Zhu 1 , Ruoqi Sun 2 , Junjun He 2,3 , Ruiyu Li 4 , Xiaoyong Shen 4 , Bei Yu 1 1 CSE Department, Chinese University of Hong Kong 2 Shanghai Jiao Tong University 3


slide-1
SLIDE 1

1 / 15

slide-2
SLIDE 2

Tensor Low-Rank Reconstruction for Semantic Segmentation

Wanli Chen1, Xinge Zhu1, Ruoqi Sun2, Junjun He2,3, Ruiyu Li4, Xiaoyong Shen4, Bei Yu1

1CSE Department, Chinese University of Hong Kong 2Shanghai Jiao Tong University 3Shenzhen Institutes of Advanced Technology 4SmartMore

2 / 15

slide-3
SLIDE 3

Introduction

Object Context Object Context information plays an indispensable role in the success of semantic segmentation.

3 / 15

slide-4
SLIDE 4

Introduction

reshape Softmax 2D Similarity Matrix

X

<latexit sha1_base64="UfWg9/VZNDpXzhNmFqUKWFy/9u4=">AB+XicbVC7TsMwFL0pr1JeAUaWiBaJqUoqEAwMlVgYi0QfUhtVjuO0Vh07sp1KVdQ/YWEAIVb+hI2/wWkzQMuRLB+dc698fIKEUaVd9sqbWxube+Udyt7+weHR/bxSUeJVGLSxoIJ2QuQIoxy0tZUM9JLJEFxwEg3mNznfndKpKCP+lZQvwYjTiNKEbaSEPbrg0CwUI1i82V9ea1oV16+4CzjrxClKFAq2h/TUIBU5jwjVmSKm+5ybaz5DUFDMyrwxSRKEJ2hE+oZyFBPlZ4vkc+fCKETCWkO185C/b2RoVjl2cxkjPRYrXq5+J/XT3V062eUJ6kmHC8filLmaOHkNTghlQRrNjMEYUlNVgePkURYm7IqpgRv9cvrpNOoe1f168dGtXlX1FGMziHS/DgBprwAC1oA4YpPMrvFmZ9WK9Wx/L0ZJV7JzCH1ifP1nWk3Y=</latexit>

A

<latexit sha1_base64="epKIaEZkBJr3Mq7YkFA0SyOhips=">AB+XicbVC9TsMwGPxS/kr5CzCyWLRITFVSgWBgKGJhLBKldqochynterEke1UqK+CQsDCLHyJmy8DU6bAVpOsny6+z75fH7CmdKO82V1tY3NrfK25Wd3b39A/vw6EmJVBLaJoIL2fWxopzFtK2Z5rSbSIojn9OP7L/c6ESsVE/KinCfUiPIxZyAjWRhrYdq3vCx6oaWSu7HZWG9hVp+7MgVaJW5AqFGgN7K9+IEga0VgTjpXquU6ivQxLzQins0o/VTBZIyHtGdojCOqvGyefIbOjBKgUEhzYo3m6u+NDEcqz2YmI6xHatnLxf+8XqrDay9jcZJqGpPFQ2HKkRYorwEFTFKi+dQTCQzWREZYmJNmVTAnu8pdXyVOj7l7ULx8a1eZNUcZTuAUzsGFK2jCPbSgDQm8Ayv8GZl1ov1bn0sRktWsXMf2B9/gA2zJNf</latexit>

θ

<latexit sha1_base64="AyNXgrdWt5vi0vjZRrmjFYqRg=">AB/nicbVDLSgMxFM34rPU1Kq7cBFvBVZkpi5cFNy4rGAf0BlKJpO2oZnMkNwRylDwV9y4UMSt3+HOvzHTzkJbD4QczrmXnJwgEVyD43xbK6tr6xubpa3y9s7u3r59cNjWcaoa9FYxKobEM0El6wFHATrJoqRKBCsE4xvc7/zyJTmsXyAScL8iAwlH3BKwEh9+7jqBbEI9SQyV+bBiAGZVvt2xak5M+Bl4hakgo0+/aXF8Y0jZgEKojWPdJwM+IAk4Fm5a9VLOE0DEZsp6hkRM+9ks/hSfGSXEg1iZIwHP1N8bGYl0HtBMRgRGetHLxf+8XgqDaz/jMkmBSTp/aJAKDHOu8AhV4yCmBhCqOImK6YjogF01jZlOAufnmZtOs196J2eV+vNG6KOkroBJ2ic+SiK9RAd6iJWoiD2jV/RmPVkv1rv1MR9dsYqdI/QH1ucPZ3OVwg=</latexit>

φ

<latexit sha1_base64="OEhIYXcD/Vjxh/0XHQfRJ+vL+xM=">AB/HicbVBPS8MwHE3nvzn/VXf0EtwET6Mdih48DLx4nODmYC0jTdMtLE1KkgqlzK/ixYMiXv0g3vw2plsPuvkg5PHe70deXpAwqrTjfFuVtfWNza3qdm1nd2/wD486iuRSkx6WDAhBwFShFOepqRgaJCgOGHkIpjeF/BIpKC3+sIX6MxpxGFCNtpJFdb3qBYKHKYnPlXjKhs+bIbjgtZw64StySNECJ7sj+8kKB05hwjRlSaug6ifZzJDXFjMxqXqpIgvAUjcnQUI5iovx8Hn4GT40SwkhIc7iGc/X3Ro5iVcQzkzHSE7XsFeJ/3jDV0ZWfU56kmnC8eChKGdQCFk3AkEqCNcsMQVhSkxXiCZIa9NXzZTgLn95lfTbLfe8dXHXbnSuyzq4BicgDPgkvQAbegC3oAgw8g1fwZj1ZL9a79bEYrVjlTh38gfX5A8ptlNk=</latexit>

γ

<latexit sha1_base64="vnRTWg0P/qShcLxTMhDuSC7ndgM=">AB/nicbVBLSwMxGMz6rPW1Kp68BFvBU9ktih48FLx4rGAf0F1KNpu2oXksSVYoS8G/4sWDIl79Hd78N2bPWjrQMgw831kMlHCqDae9+2srK6tb2yWtsrbO7t7+7BYVvLVGHSwpJ1Y2QJowK0jLUMNJNFE8YqQTjW9zv/NIlKZSPJhJQkKOhoIOKEbGSn3uBpEksV6wu2VBUPEOZpW+27Fq3kzwGXiF6QCjT7lcQS5xyIgxmSOue7yUmzJAyFDMyLQepJgnCYzQkPUsF4kSH2Sz+FJ5ZJYDqewRBs7U3xsZ4joPaCc5MiO96OXif14vNYPrMKMiSQ0ReP7QIGXQSJh3AWOqCDZsYgnCitqsEI+QtjYxsq2BH/xy8ukXa/5F7XL+3qlcVPUQIn4BScAx9cgQa4A03QAhk4Bm8gjfnyXlx3p2P+eiKU+wcgT9wPn8ASkGVrw=</latexit>

Non-local attention based methods become the main stream of semantic segmentation.

4 / 15

slide-5
SLIDE 5

Introduction

reshape Softmax 2D Similarity Matrix

X

<latexit sha1_base64="UfWg9/VZNDpXzhNmFqUKWFy/9u4=">AB+XicbVC7TsMwFL0pr1JeAUaWiBaJqUoqEAwMlVgYi0QfUhtVjuO0Vh07sp1KVdQ/YWEAIVb+hI2/wWkzQMuRLB+dc698fIKEUaVd9sqbWxube+Udyt7+weHR/bxSUeJVGLSxoIJ2QuQIoxy0tZUM9JLJEFxwEg3mNznfndKpKCP+lZQvwYjTiNKEbaSEPbrg0CwUI1i82V9ea1oV16+4CzjrxClKFAq2h/TUIBU5jwjVmSKm+5ybaz5DUFDMyrwxSRKEJ2hE+oZyFBPlZ4vkc+fCKETCWkO185C/b2RoVjl2cxkjPRYrXq5+J/XT3V062eUJ6kmHC8filLmaOHkNTghlQRrNjMEYUlNVgePkURYm7IqpgRv9cvrpNOoe1f168dGtXlX1FGMziHS/DgBprwAC1oA4YpPMrvFmZ9WK9Wx/L0ZJV7JzCH1ifP1nWk3Y=</latexit>

A

<latexit sha1_base64="epKIaEZkBJr3Mq7YkFA0SyOhips=">AB+XicbVC9TsMwGPxS/kr5CzCyWLRITFVSgWBgKGJhLBKldqochynterEke1UqK+CQsDCLHyJmy8DU6bAVpOsny6+z75fH7CmdKO82V1tY3NrfK25Wd3b39A/vw6EmJVBLaJoIL2fWxopzFtK2Z5rSbSIojn9OP7L/c6ESsVE/KinCfUiPIxZyAjWRhrYdq3vCx6oaWSu7HZWG9hVp+7MgVaJW5AqFGgN7K9+IEga0VgTjpXquU6ivQxLzQins0o/VTBZIyHtGdojCOqvGyefIbOjBKgUEhzYo3m6u+NDEcqz2YmI6xHatnLxf+8XqrDay9jcZJqGpPFQ2HKkRYorwEFTFKi+dQTCQzWREZYmJNmVTAnu8pdXyVOj7l7ULx8a1eZNUcZTuAUzsGFK2jCPbSgDQm8Ayv8GZl1ov1bn0sRktWsXMf2B9/gA2zJNf</latexit>

θ

<latexit sha1_base64="AyNXgrdWt5vi0vjZRrmjFYqRg=">AB/nicbVDLSgMxFM34rPU1Kq7cBFvBVZkpi5cFNy4rGAf0BlKJpO2oZnMkNwRylDwV9y4UMSt3+HOvzHTzkJbD4QczrmXnJwgEVyD43xbK6tr6xubpa3y9s7u3r59cNjWcaoa9FYxKobEM0El6wFHATrJoqRKBCsE4xvc7/zyJTmsXyAScL8iAwlH3BKwEh9+7jqBbEI9SQyV+bBiAGZVvt2xak5M+Bl4hakgo0+/aXF8Y0jZgEKojWPdJwM+IAk4Fm5a9VLOE0DEZsp6hkRM+9ks/hSfGSXEg1iZIwHP1N8bGYl0HtBMRgRGetHLxf+8XgqDaz/jMkmBSTp/aJAKDHOu8AhV4yCmBhCqOImK6YjogF01jZlOAufnmZtOs196J2eV+vNG6KOkroBJ2ic+SiK9RAd6iJWoiD2jV/RmPVkv1rv1MR9dsYqdI/QH1ucPZ3OVwg=</latexit>

φ

<latexit sha1_base64="OEhIYXcD/Vjxh/0XHQfRJ+vL+xM=">AB/HicbVBPS8MwHE3nvzn/VXf0EtwET6Mdih48DLx4nODmYC0jTdMtLE1KkgqlzK/ixYMiXv0g3vw2plsPuvkg5PHe70deXpAwqrTjfFuVtfWNza3qdm1nd2/wD486iuRSkx6WDAhBwFShFOepqRgaJCgOGHkIpjeF/BIpKC3+sIX6MxpxGFCNtpJFdb3qBYKHKYnPlXjKhs+bIbjgtZw64StySNECJ7sj+8kKB05hwjRlSaug6ifZzJDXFjMxqXqpIgvAUjcnQUI5iovx8Hn4GT40SwkhIc7iGc/X3Ro5iVcQzkzHSE7XsFeJ/3jDV0ZWfU56kmnC8eChKGdQCFk3AkEqCNcsMQVhSkxXiCZIa9NXzZTgLn95lfTbLfe8dXHXbnSuyzq4BicgDPgkvQAbegC3oAgw8g1fwZj1ZL9a79bEYrVjlTh38gfX5A8ptlNk=</latexit>

γ

<latexit sha1_base64="vnRTWg0P/qShcLxTMhDuSC7ndgM=">AB/nicbVBLSwMxGMz6rPW1Kp68BFvBU9ktih48FLx4rGAf0F1KNpu2oXksSVYoS8G/4sWDIl79Hd78N2bPWjrQMgw831kMlHCqDae9+2srK6tb2yWtsrbO7t7+7BYVvLVGHSwpJ1Y2QJowK0jLUMNJNFE8YqQTjW9zv/NIlKZSPJhJQkKOhoIOKEbGSn3uBpEksV6wu2VBUPEOZpW+27Fq3kzwGXiF6QCjT7lcQS5xyIgxmSOue7yUmzJAyFDMyLQepJgnCYzQkPUsF4kSH2Sz+FJ5ZJYDqewRBs7U3xsZ4joPaCc5MiO96OXif14vNYPrMKMiSQ0ReP7QIGXQSJh3AWOqCDZsYgnCitqsEI+QtjYxsq2BH/xy8ukXa/5F7XL+3qlcVPUQIn4BScAx9cgQa4A03QAhk4Bm8gjfnyXlx3p2P+eiKU+wcgT9wPn8ASkGVrw=</latexit>

Spacial Attention Channel Attention

Spatial or channel attention? A dilemma in Non-local self-attention based approaches.

5 / 15

slide-6
SLIDE 6

Introduction

Architecture of DANet [1], which contains 2 stream of non-local attentions.

6 / 15

slide-7
SLIDE 7

Introduction

Can we obtain spatial and channel attention simultaneously?

◮ Better context representation. ◮ Smaller computational cost.

7 / 15

slide-8
SLIDE 8

Our Proposed RecoNet

Tensor Reconstruction Network (RecoNet).

CNN UPSAMPLE POOL CONV UPSAMPLE

.

Concatenation Feature

Generator-C

Generator-H

Generator-W

+ +

Channel Feature Height Feature Width Feature (b)Tensor Generation Module(TGM) (c)Tensor Reconstruction Module(TRM) (a)Input Image (d)Final Prediction Context Fragment Reconstructed Feature

… …

× × ×

… …

× H × ×

… …

× 1 × W ×

The pipeline of our framework. Two major components are involved, Tensor Generation Module (TGM) and Tensor Reconstruction Module (TRM). TGM peroforms the low-rank tensor generation while TRM achieves the high-rank tensor reconstruction via CP construction theory.

8 / 15

slide-9
SLIDE 9

Our Proposed RecoNet

Tensor canonical-polyadic decomposition (CP decomposition). Assuming we have 3r vectors in C/H/W directions vci ∈ RC×1×1, vhi ∈ R1×H×1 and

vwi ∈ R1×1×W, where i ∈ r and r is the tensor rank. These vectors are the CP decomposed

fragments of A ∈ RC×H×W , then tensor CP rank-r reconstruction is defined as:

A =

r

  • i=1

λivci ⊗ vhi ⊗ vwi,

(1)

9 / 15

slide-10
SLIDE 10

Tensor Generation Module

Sigmoid 1×1 Conv Channel Pool

Sigmoid 1×1 Conv Height Pool Sigmoid 1×1 Conv Width Pool

… …

rank r Height Generator Channel Generator Width Generator Channel Feature Height Feature Width Feature Input Feature 1×H×1 C×1×1 1×1×W rank r rank r

Tensor Generation Module. Channel Pool, Height Pool and Width Pool are all global average pooling.

10 / 15

slide-11
SLIDE 11

Tensor Reconstruction Module = + + … …

A1

<latexit sha1_base64="4za15Mz456Zfa+WlCcKl9571/Ic=">AB+3icbVDNS8MwHE3n15xfcx69BDfB02iHoniaePE4wX3AVkqapltYmpQkFUfpv+LFgyJe/Ue8+d+Ybj3o5oOQx3u/H3l5fsyo0rb9bZXW1jc2t8rblZ3dvf2D6mGtp0QiMeliwYQc+EgRjnpaqoZGcSoMhnpO9Pb3O/0ikoI/6FlM3AiNOQ0pRtpIXrXWGPmCBWoWmSu9yTyn4VXrdtOeA64SpyB1UKDjVb9GgcBJRLjGDCk1dOxYuymSmJGsoUSRGeIrGZGgoRxFRbjrPnsFTowQwFNIcruFc/b2Rokjl6cxkhPRELXu5+J83THR45aUx4kmHC8eChMGtYB5ETCgkmDNZoYgLKnJCvESYS1qatiSnCWv7xKeq2mc968uG/V29dFHWVwDE7AGXDAJWiDO9ABXYDBE3gGr+DNyqwX6936WIyWrGLnCPyB9fkDZ5SUAQ=</latexit>

A2

<latexit sha1_base64="8JN/GeidEqObCOYF45j9FW3WZtU=">AB+3icbVDNS8MwHE3n15xfdR69BDfB02iHoniaePE4wX3AVkqaZltYmpQkFUfpv+LFgyJe/Ue8+d+Ybj3o5oOQx3u/H3l5Qcyo0o7zbZXW1jc2t8rblZ3dvf0D+7DaVSKRmHSwYEL2A6QIo5x0NWM9GNJUBQw0gumt7nfeyRSUcEf9CwmXoTGnI4oRtpIvl2tDwPBQjWLzJXeZH6z7ts1p+HMAVeJW5AaKND27a9hKHASEa4xQ0oNXCfWXoqkpiRrDJMFIkRnqIxGRjKUSUl86zZ/DUKCEcCWkO13Cu/t5IUaTydGYyQnqilr1c/M8bJHp05aWUx4kmHC8eGiUMagHzImBIJcGazQxBWFKTFeIJkghrU1fFlOAuf3mVdJsN97xcd+sta6LOsrgGJyAM+CS9ACd6ANOgCDJ/AMXsGblVkv1rv1sRgtWcXOEfgD6/MHaRmUAg=</latexit>

Ar

<latexit sha1_base64="WtTLJ4qYMyoeM1te2Hawd9zm8f4=">AB+3icbVDNS8MwHE3n15xfcx69BDfB02iHoniaePE4wX3AVkqapltYmpQkFUfpv+LFgyJe/Ue8+d+Ybj3o5oOQx3u/H3l5fsyo0rb9bZXW1jc2t8rblZ3dvf2D6mGtp0QiMeliwYQc+EgRjnpaqoZGcSoMhnpO9Pb3O/0ikoI/6FlM3AiNOQ0pRtpIXrXWGPmCBWoWmSu9yTzZ8Kp1u2nPAVeJU5A6KNDxql+jQOAkIlxjhpQaOnas3RJTEjWUKBIjPEVjMjSUo4goN51nz+CpUQIYCmkO13Cu/t5IUaTydGYyQnqilr1c/M8bJjq8clPK40QTjhcPhQmDWsC8CBhQSbBmM0MQltRkhXiCJMLa1FUxJTjLX14lvVbTOW9e3Lfq7euijI4BifgDjgErTBHeiALsDgCTyDV/BmZdaL9W59LEZLVrFzBP7A+vwBylmUQg=</latexit>

A

<latexit sha1_base64="czXgkxXDXb/WQtEsU8UDkazjMU=">AB+XicbVBPS8MwHP1/pvzX9Wjl+AmeBrtUBRPEy8eJzg32MpI03QLS5uSpINR9k28eFDEq9/Em9/GdOtBNx+EPN7/cjL8xPOlHacb6u0tr6xuVXeruzs7u0f2IdHT0qktA2EVzIro8V5Symbc0p91EUhz5nHb8V3udyZUKibiRz1NqBfhYcxCRrA20sC2a31f8EBNI3Nlt7PawK46dWcOtErcglShQGtgf/UDQdKIxpwrFTPdRLtZVhqRjidVfqpogkmYzykPUNjHFHlZfPkM3RmlACFQpoTazRXf29kOFJ5NjMZYT1Sy14u/uf1Uh1exmLk1TmCweClOtEB5DShgkhLNp4ZgIpnJisgIS0y0KatiSnCXv7xKnhp196J+dCoNm+KOspwAqdwDi5cQRPuoQVtIDCBZ3iFNyuzXqx362MxWrKnWP4A+vzBzYyk10=</latexit>

vh1

<latexit sha1_base64="2D9C37o+Nocwi4tEPoPyDEsG64=">AB/nicbVDNS8MwHE3n15xfVfHkJbgJnkY7lHkcePE4wX3AVkqaZltYmpQkHYxS8F/x4kERr/4d3vxvTLcedPNByO934+8vCBmVGnH+bZKG5tb2zvl3cre/sHhkX180lUikZh0sGBC9gOkCKOcdDTVjPRjSVAUMNILpne535sRqajgj3oeEy9CY05HFCNtJN8+qw0DwUI1j8yVzjI/nbhZzberTt1ZAK4TtyBVUKDt21/DUOAkIlxjhpQauE6svRJTEjWYKBIjPEVjMjCUo4goL13Ez+ClUI4EtIcruFC/b2RokjlAc1khPRErXq5+J83SPTo1kspjxNOF4+NEoY1ALmXcCQSoI1mxuCsKQmK8QTJBHWprGKcFd/fI6Tbq7nX95qFRbTWLOsrgHFyAK+CJmiBe9AGHYBCp7BK3iznqwX6936WI6WrGLnFPyB9fkDVFWVsQ=</latexit>

vc1

<latexit sha1_base64="02+PxgaVrDN1SRoPTrlPlQI9B7s=">AB/nicbVDNS8MwHE3n15xfVfHkJbgJnkY7lHkcePE4wX3AVkqaZltYmpQkHYxS8F/x4kERr/4d3vxvTLcedPNByO934+8vCBmVGnH+bZKG5tb2zvl3cre/sHhkX180lUikZh0sGBC9gOkCKOcdDTVjPRjSVAUMNILpne535sRqajgj3oeEy9CY05HFCNtJN8+qw0DwUI1j8yVzjI/xW5W8+2qU3cWgOvELUgVFGj79tcwFDiJCNeYIaUGrhNrL0VSU8xIVhkmisQIT9GYDAzlKCLKSxfxM3hplBCOhDSHa7hQf2+kKFJ5QDMZIT1Rq14u/ucNEj269VLK40QTjpcPjRIGtYB5FzCkmDN5oYgLKnJCvESYS1axiSnBXv7xOuo26e12/eWhUW82ijI4BxfgCrigCVrgHrRB2CQgmfwCt6sJ+vFerc+lqMlq9g5BX9gf4ATLKVrA=</latexit>

vw1

<latexit sha1_base64="qHXVBVMHqSBHkj7dRQVqos21N0A=">AB/nicbVDNS8MwHE39nPOrKp68BDfB02iHMo8DLx4nuA/YSknTdAtLk5Kk1EG/itePCji1b/Dm/+N6daDbj4Iebz3+5GXFySMKu0439ba+sbm1nZp7y7t39waB8d5RIJSZtLJiQvQApwignbU01I71EhQHjHSD8W3udydEKir4g54mxIvRkNOIYqSN5Nun1UEgWKimsbmyczPHt1Z1bcrTs2ZA64StyAVUKDl21+DUOA0JlxjhpTqu06ivQxJTEjs/IgVSRBeIyGpG8oRzFRXjaP4MXRglhJKQ5XMO5+nsjQ7HKA5rJGOmRWvZy8T+vn+roxsoT1JNOF48FKUMagHzLmBIJcGaTQ1BWFKTFeIRkghr01jZlOAuf3mVdOo196p2fV+vNBtFHSVwBs7BJXBAzTBHWiBNsAgA8/gFbxZT9aL9W59LEbXrGLnBPyB9fkDaz6VwA=</latexit>

vw2

<latexit sha1_base64="lG0xk5N75GkecXnR+4GedDfcogw=">AB/nicbVDNS8MwHE39nPOrKp68BDfB02iHMo8DLx4nuA/YSknTdAtLk5Kk1EG/itePCji1b/Dm/+N6daDbj4Iebz3+5GXFySMKu0439ba+sbm1nZp7y7t39waB8d5RIJSZtLJiQvQApwignbU01I71EhQHjHSD8W3udydEKir4g54mxIvRkNOIYqSN5Nun1UEgWKimsbmyczPHuzqm9XnJozB1wlbkEqoEDLt78GocBpTLjGDCnVd51EexmSmJGZuVBqkiC8BgNSd9QjmKivGwefwYvjBLCSEhzuIZz9fdGhmKVBzSTMdIjtezl4n9eP9XRjZdRnqSacLx4KEoZ1ALmXcCQSoI1mxqCsKQmK8QjJBHWprGyKcFd/vIq6dRr7lXt+r5eaTaKOkrgDJyDS+CBmiCO9ACbYBp7BK3iznqwX6936WIyuWcXOCfgD6/MHbMSVwQ=</latexit>

vh2

<latexit sha1_base64="DorYgeZlPkYu2p+r/1IdAoPhjGY=">AB/nicbVDNS8MwHE3n15xfVfHkJbgJnkY7lHkcePE4wX3AVkqaZltYmpQkHYxS8F/x4kERr/4d3vxvTLcedPNByO934+8vCBmVGnH+bZKG5tb2zvl3cre/sHhkX180lUikZh0sGBC9gOkCKOcdDTVjPRjSVAUMNILpne535sRqajgj3oeEy9CY05HFCNtJN8+qw0DwUI1j8yVzjI/nTSym9XnbqzAFwnbkGqoEDbt7+GocBJRLjGDCk1cJ1YeymSmJGsowUSRGeIrGZGAoRxFRXrqIn8FLo4RwJKQ5XMOF+nsjRZHKA5rJCOmJWvVy8T9vkOjRrZdSHieacLx8aJQwqAXMu4AhlQRrNjcEYUlNVognSCKsTWMVU4K7+uV10m3U3ev6zUOj2moWdZTBObgAV8AFTdAC96ANOgCDFDyDV/BmPVkv1rv1sRwtWcXOKfgD6/MHVduVsg=</latexit>

vc2

<latexit sha1_base64="5yAjcy7Dt9Fi86zEg7Dc3MlNnNo=">AB/nicbVDNS8MwHE3n15xfVfHkJbgJnkY7lHkcePE4wX3AVkqaZltYmpQkHYxS8F/x4kERr/4d3vxvTLcedPNByO934+8vCBmVGnH+bZKG5tb2zvl3cre/sHhkX180lUikZh0sGBC9gOkCKOcdDTVjPRjSVAUMNILpne535sRqajgj3oeEy9CY05HFCNtJN8+qw0DwUI1j8yVzjI/xY2s5tVp+4sANeJW5AqKND27a9hKHASEa4xQ0oNXCfWXoqkpiRrDJMFIkRnqIxGRjKUSUly7iZ/DSKCEcCWkO13Ch/t5IUaTygGYyQnqiVr1c/M8bJHp06WUx4kmHC8fGiUMagHzLmBIJcGazQ1BWFKTFeIJkghr01jFlOCufnmdBt197p+89CotpFHWVwDi7AFXBE7TAPWiDsAgBc/gFbxZT9aL9W59LEdLVrFzCv7A+vwBTjiVrQ=</latexit>

vcr

<latexit sha1_base64="7JRwURg8HaBMcJMJpcfgeAqlxdc=">AB/nicbVDNS8MwHE3n15xfVfHkJbgJnkY7lHkcePE4wX3AVkqaZltYmpQkHYxS8F/x4kERr/4d3vxvTLcedPNByO934+8vCBmVGnH+bZKG5tb2zvl3cre/sHhkX180lUikZh0sGBC9gOkCKOcdDTVjPRjSVAUMNILpne535sRqajgj3oeEy9CY05HFCNtJN8+qw0DwUI1j8yVzjI/xTKr+XbVqTsLwHXiFqQKCrR9+2sYCpxEhGvMkFID14m1lyKpKWYkqwTRWKEp2hMBoZyFBHlpYv4Gbw0SghHQprDNVyovzdSFKk8oJmMkJ6oVS8X/MGiR7deinlcaIJx8uHRgmDWsC8CxhSbBmc0MQltRkhXiCJMLaNFYxJbirX14n3Ubdva7fPDSqrWZRxmcgwtwBVzQBC1wD9qgAzBIwTN4BW/Wk/VivVsfy9GSVeycgj+wPn8Ar7iV7Q=</latexit>

vhr

<latexit sha1_base64="2fwq/puStF6dp4EalxGu2tQV31U=">AB/nicbVDNS8MwHE3n15xfVfHkJbgJnkY7lHkcePE4wX3AVkqaZltYmpQkHYxS8F/x4kERr/4d3vxvTLcedPNByO934+8vCBmVGnH+bZKG5tb2zvl3cre/sHhkX180lUikZh0sGBC9gOkCKOcdDTVjPRjSVAUMNILpne535sRqajgj3oeEy9CY05HFCNtJN8+qw0DwUI1j8yVzjI/ncis5tVp+4sANeJW5AqKND27a9hKHASEa4xQ0oNXCfWXoqkpiRrDJMFIkRnqIxGRjKUSUly7iZ/DSKCEcCWkO13Ch/t5IUaTygGYyQnqiVr1c/M8bJHp06WUx4kmHC8fGiUMagHzLmBIJcGazQ1BWFKTFeIJkghr01jFlOCufnmdBt197p+89CotpFHWVwDi7AFXBE7TAPWiDsAgBc/gFbxZT9aL9W59LEdLVrFzCv7A+vwBt1uV8g=</latexit>

vwr

<latexit sha1_base64="Wv3ziItRultiwD25ercMCpBp1Fw=">AB/nicbVDNS8MwHE39nPOrKp68BDfB02iHMo8DLx4nuA/YSknTdAtLk5Kk1EG/itePCji1b/Dm/+N6daDbj4Iebz3+5GXFySMKu0439ba+sbm1nZp7y7t39waB8d5RIJSZtLJiQvQApwignbU01I71EhQHjHSD8W3udydEKir4g54mxIvRkNOIYqSN5Nun1UEgWKimsbmyczPHuWs6tsVp+bMAVeJW5AKNDy7a9BKHAaE64xQ0r1XSfRXoakpiRWXmQKpIgPEZD0jeUo5goL5vHn8ELo4QwEtIcruFc/b2RoVjlAc1kjPRILXu5+J/XT3V042WUJ6kmHC8eilIGtYB5FzCkmDNpoYgLKnJCvEISYS1axsSnCXv7xKOvWae1W7vq9Xmo2ijhI4A+fgErigAZrgDrRAG2CQgWfwCt6sJ+vFerc+FqNrVrFzAv7A+vwBzkSWAQ=</latexit>

Tensor Reconstruction Module (TRM). The pipeline of TRM consists of two main steps, sub-attention map generation and global context reconstruction. The processing from top to bottom (see ↓) indicates the sub-attention map generation from three dimensions (channel / height / width). The processing from left to right (see A1 + A2 + · · · + Ar = A ) denotes the global context reconstruction from low-rank to high-rank.

11 / 15

slide-12
SLIDE 12

Visualization

= + + …

<latexit sha1_base64="ht3BoAIs4MA6dI2rfjqciv5fbuU=">AB+XicbVDLSgMxFL1TX7W+Rl26GWwFV2WmKIqrghuXFewD2qFkMpk2NJMSaZQhv6JGxeKuPVP3Pk3ZtpZaOuBkM595KTEySMKu2631ZpY3Nre6e8W9nbPzg8so9POkqkEpM2FkzIXoAUYZSTtqakV4iCYoDRrB5D73u1MiFRX8Sc8S4sdoxGlEMdJGtp2bRAIFqpZbK6sN68N7apbdxdw1olXkCoUaA3tr0EocBoTrjFDSvU9N9F+hqSmJF5ZAqkiA8QSPSN5SjmCg/WySfOxdGCZ1ISHO4dhbq740MxSrPZiZjpMdq1cvF/7x+qNbP6M8STXhePlQlDJHCyevwQmpJFizmSEIS2qyOniMJMLalFUxJXirX14nUbdu6pfPzaqzbuijKcwTlcgc30IQHaEbMEzhGV7hzcqsF+vd+liOlqxi5xT+wPr8AVk8k3Q=</latexit>

A1

<latexit sha1_base64="4za15Mz456Zfa+WlCcKl9571/Ic=">AB+3icbVDNS8MwHE3n15xfcx69BDfB02iHoniaePE4wX3AVkqapltYmpQkFUfpv+LFgyJe/Ue8+d+Ybj3o5oOQx3u/H3l5fsyo0rb9bZXW1jc2t8rblZ3dvf2D6mGtp0QiMeliwYQc+EgRjnpaqoZGcSoMhnpO9Pb3O/0ikoI/6FlM3AiNOQ0pRtpIXrXWGPmCBWoWmSu9yTyn4VXrdtOeA64SpyB1UKDjVb9GgcBJRLjGDCk1dOxYuymSmJGsoUSRGeIrGZGgoRxFRbjrPnsFTowQwFNIcruFc/b2Rokjl6cxkhPRELXu5+J83THR45aUx4kmHC8eChMGtYB5ETCgkmDNZoYgLKnJCvESYS1qatiSnCWv7xKeq2mc968uG/V29dFHWVwDE7AGXDAJWiDO9ABXYDBE3gGr+DNyqwX6936WIyWrGLnCPyB9fkDZ5SUAQ=</latexit>

A2

<latexit sha1_base64="8JN/GeidEqObCOYF45j9FW3WZtU=">AB+3icbVDNS8MwHE3n15xfdR69BDfB02iHoniaePE4wX3AVkqaZltYmpQkFUfpv+LFgyJe/Ue8+d+Ybj3o5oOQx3u/H3l5Qcyo0o7zbZXW1jc2t8rblZ3dvf0D+7DaVSKRmHSwYEL2A6QIo5x0NWM9GNJUBQw0gumt7nfeyRSUcEf9CwmXoTGnI4oRtpIvl2tDwPBQjWLzJXeZH6z7ts1p+HMAVeJW5AaKND27a9hKHASEa4xQ0oNXCfWXoqkpiRrDJMFIkRnqIxGRjKUSUl86zZ/DUKCEcCWkO13Cu/t5IUaTydGYyQnqilr1c/M8bJHp05aWUx4kmHC8eGiUMagHzImBIJcGazQxBWFKTFeIJkghrU1fFlOAuf3mVdJsN97xcd+sta6LOsrgGJyAM+CS9ACd6ANOgCDJ/AMXsGblVkv1rv1sRgtWcXOEfgD6/MHaRmUAg=</latexit>

Ar

<latexit sha1_base64="WtTLJ4qYMyoeM1te2Hawd9zm8f4=">AB+3icbVDNS8MwHE3n15xfcx69BDfB02iHoniaePE4wX3AVkqapltYmpQkFUfpv+LFgyJe/Ue8+d+Ybj3o5oOQx3u/H3l5fsyo0rb9bZXW1jc2t8rblZ3dvf2D6mGtp0QiMeliwYQc+EgRjnpaqoZGcSoMhnpO9Pb3O/0ikoI/6FlM3AiNOQ0pRtpIXrXWGPmCBWoWmSu9yTzZ8Kp1u2nPAVeJU5A6KNDxql+jQOAkIlxjhpQaOnas3RJTEjWUKBIjPEVjMjSUo4goN51nz+CpUQIYCmkO13Cu/t5IUaTydGYyQnqilr1c/M8bJjq8clPK40QTjhcPhQmDWsC8CBhQSbBmM0MQltRkhXiCJMLa1FUxJTjLX14lvVbTOW9e3Lfq7euijI4BifgDjgErTBHeiALsDgCTyDV/BmZdaL9W59LEZLVrFzBP7A+vwBylmUQg=</latexit>

A

<latexit sha1_base64="czXgkxXDXb/WQtEsU8UDkazjMU=">AB+XicbVBPS8MwHP1/pvzX9Wjl+AmeBrtUBRPEy8eJzg32MpI03QLS5uSpINR9k28eFDEq9/Em9/GdOtBNx+EPN7/cjL8xPOlHacb6u0tr6xuVXeruzs7u0f2IdHT0qktA2EVzIro8V5Symbc0p91EUhz5nHb8V3udyZUKibiRz1NqBfhYcxCRrA20sC2a31f8EBNI3Nlt7PawK46dWcOtErcglShQGtgf/UDQdKIxpwrFTPdRLtZVhqRjidVfqpogkmYzykPUNjHFHlZfPkM3RmlACFQpoTazRXf29kOFJ5NjMZYT1Sy14u/uf1Uh1exmLk1TmCweClOtEB5DShgkhLNp4ZgIpnJisgIS0y0KatiSnCXv7xKnhp196J+dCoNm+KOspwAqdwDi5cQRPuoQVtIDCBZ3iFNyuzXqx362MxWrKnWP4A+vzBzYyk10=</latexit>

Background Foreground Foreground Full Attention Map

12 / 15

slide-13
SLIDE 13

Results on PASCAL-VOC12 w/o COCO-pretrained model

FCN [2] PSPNet [3] EncNet [4] APCNet [5] CFNet [6] DMNet [7] RecoNet aero 76.8 91.8 94.1 95.8 95.7 96.1 93.7 bike 34.2 71.9 69.2 75.8 71.9 77.3 66.3 bird 68.9 94.7 96.3 84.5 95.0 94.1 95.6 boat 49.4 71.2 76.7 76.0 76.3 72.8 72.8 bottle 60.3 75.8 86.2 80.6 82.8 78.1 87.4 bus 75.3 95.2 96.3 96.9 94.8 97.1 94.5 car 74.7 89.9 90.7 90.0 90.0 92.7 92.6 cat 77.6 95.9 94.2 96.0 95.9 96.4 96.5 chair 21.4 39.3 38.8 42.0 37.1 39.8 48.4 cow 62.5 90.7 90.7 93.7 92.6 91.4 94.5 table 46.8 71.7 73.3 75.4 73.0 75.5 76.6 dog 71.8 90.5 90.0 91.6 93.4 92.7 94.4 horse 63.9 94.5 92.5 95.0 94.6 95.8 95.9 mbike 76.5 88.8 88.8 90.5 89.6 91.0 93.8 person 73.9 89.6 87.9 89.3 88.4 90.3 90.4 plant 45.2 72.8 68.7 75.8 74.9 76.6 78.1 sheep 72.4 89.6 92.6 92.8 95.2 94.1 93.6 sofa 37.4 64 59.0 61.9 63.2 62.1 63.4 train 70.9 85.1 86.4 88.9 89.7 85.5 88.6 tv 55.1 76.3 73.4 79.6 78.2 77.6 83.1 mIoU 62.2 82.6 82.9 84.2 84.2 84.4 85.6

13 / 15

slide-14
SLIDE 14

Computational Cost

Table: Computational cost and GPU occupation of TGM+TRM. FLOPs (FLoating point Operations). We use tensor rank r = 64 for evaluation Method Channel FLOPs GPU Memory Non-Local [8] 512 19.33G 88.00MB APCNet [5] 512 8.98G 193.10MB RCCA [9] 512 5.37G 41.33MB

A2Net [10]

512 4.30G 25.00MB AFNB [11] 512 2.62G 25.93MB LatentGNN [12] 512 2.58G 44.69MB EMAUnit [13] 512 2.42G 24.12MB TGM+TRM 512 0.0215G 8.31MB

14 / 15

slide-15
SLIDE 15

Contact

Thanks for watching! Please feel free to contact with me. E-mail: 1155137828@link.cuhk.edu.hk WeChat: ChenWanLi11410579

15 / 15

slide-16
SLIDE 16

[1]

  • J. Fu, J. Liu, H. Tian, Z. Fang, and H. Lu, “Dual attention network for scene

segmentation,” arXiv preprint arXiv:1809.02983, 2018. [2]

  • J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic

segmentation,” in Proc. CVPR, 2015, pp. 3431–3440. [3]

  • H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in
  • Proc. CVPR, 2017, pp. 2881–2890.

[4]

  • H. Zhang, K. Dana, J. Shi, Z. Zhang, X. Wang, A. Tyagi, and A. Agrawal, “Context

encoding for semantic segmentation,” in Proc. CVPR, 2018, pp. 7151–7160. [5]

  • J. He, Z. Deng, L. Zhou, Y. Wang, and Y. Qiao, “Adaptive pyramid context network for

semantic segmentation,” in Proc. CVPR, 2019, pp. 7519–7528. [6]

  • H. Zhang, H. Zhang, C. Wang, and J. Xie, “Co-occurrent features in semantic

segmentation,” in Proc. CVPR, 2019, pp. 548–557. [7]

  • J. He, Z. Deng, and Y. Qiao, “Dynamic multi-scale filters for semantic segmentation,” in
  • Proc. ICCV, 2019, pp. 3562–3572.

[8]

  • X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in
  • Proc. CVPR, 2018, pp. 7794–7803.

15 / 15

slide-17
SLIDE 17

[9]

  • Z. Huang, X. Wang, L. Huang, C. Huang, Y. Wei, and W. Liu, “CCNet: Criss-cross

attention for semantic segmentation,” in Proc. ICCV, 2019, pp. 603–612. [10] Y. Chen, Y. Kalantidis, J. Li, S. Yan, and J. Feng, “Aˆ 2-Nets: Double attention networks,” in Proc. NIPS, 2018, pp. 352–361. [11] Z. Zhu, M. Xu, S. Bai, T. Huang, and X. Bai, “Asymmetric non-local neural networks for semantic segmentation,” in Proc. ICCV, 2019, pp. 593–602. [12] S. Zhang, X. He, and S. Yan, “LatentGNN: Learning efficient non-local relations for visual recognition,” in Proc. ICML, 2019, pp. 7374–7383. [13] X. Li, Z. Zhong, J. Wu, Y. Yang, Z. Lin, and H. Liu, “Expectation-maximization attention networks for semantic segmentation,” in Proc. ICCV, 2019, pp. 9167–9176.

15 / 15