advances in trinity of ai data algorithms compute

ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTE Anima - PowerPoint PPT Presentation

ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTE Anima Anandkumar Bren Professor at Caltech Director of ML Research at NVIDIA TRINITY FUELING ARTIFICIAL INTELLIGENCE ALGORITHMS OPTIMIZATION SCALABILITY MULTI-DIMENSIONALITY


  1. ADVANCES IN TRINITY OF AI: DATA, ALGORITHMS & COMPUTE Anima Anandkumar Bren Professor at Caltech Director of ML Research at NVIDIA

  2. TRINITY FUELING ARTIFICIAL INTELLIGENCE ALGORITHMS • OPTIMIZATION • SCALABILITY • MULTI-DIMENSIONALITY INFRASTRUCTURE DATA FULL STACK FOR ML • COLLECTION • APPLICATION SERVICES • AGGREGATION • ML PLATFORM • AUGMENTATION • GPUS

  3. DATA • COLLECTION: ACTIVE LEARNING, PARTIAL LABELS.. • AGGREGATION: CROWDSOURCING MODELS.. • AUGMENTATION: GENERATIVE MODELS, SYMBOLIC EXPRESSIONS..

  4. ACTIVE LEARNING Goal • Reach SOTA with a smaller dataset Unlabeled Labeled • Active learning analyzed in theory data data • In practice, only small classical models Can it work at scale with deep learning?

  5. TASK: NAMED ENTITY RECOGNITION

  6. RESULTS NER task on largest open benchmark (Onto-notes) Test F1 score vs. % of labeled words 75 85 Active learning heuristics: Test F1 score 80 70 • Least confidence (LC) MNLP MNLP 75 LC LC • Max. normalized log RAND RAND 65 Best Deep Model Best Deep Model probability (MNLP) Chinese English Best Shallow Model 70 Best Shallow Model 0 20 40 60 80 100 0 20 40 60 80 Percent of words annotated • Deep active learning matches : • SOTA with just 25% data on English, 30% on Chinese. • Best shallow model (on full data) with 12% data on English, 17% on Chinese.

  7. TAKE-AWAY • Uncertainty sampling works. Normalizing for length helps under low data. • With active learning, deep beats shallow even in low data regime. • With active learning, SOTA achieved with far fewer samples.

  8. ACTIVE LEARNING WITH PARTIAL FEEDBACK images questions dog ? dog partial labels non-dog • Hierarchical class labeling: Labor proportional to # of binary questions asked • Actively pick informative questions ?

  9. RESULTS ON TINY IMAGENET (100K SAMPLES) Accuracy vs. # of Questions ALPF-ERC Uniform Uniform AL-ME AQ-ERC ALPF-ERC active data inactive data 0.5 - 40% active questions inactive questions 0.4 AQ-ERC AL-ME inactive data active data 0.3 +8% active questions inactive questions 0.2 0.1 0% 25% 50% 75% 100% • Yield 8% higher accuracy at 30% questions (w.r.t. Uniform) • Obtain full annotation with 40% less binary questions

  10. TWO TAKE-AWAYS • Don’t annotate from scratch • Select questions actively based on the learned model • Don’t sleep on partial labels • Re-train model from partial labels

  11. CROWDSOURCING: AGGREGATION OF CROWD ANNOTATIONS Majority rule • Simple and common. • Wasteful: ignores annotator quality of different workers. Annotator-quality models • Can improve accuracy. • Hard: needs to be estimated without ground-truth.

  12. PROPOSED CROWDSOURCING ALGORITHM Noisy crowdsourced annotations Repeat Posterior of ground-truth labels given annotator quality model Training with weighted loss. Use posterior as weights MLE : update Annotator quality using inferred Use trained model to infer labels from model ground-truth labels

  13. LABELING ONCE IS OPTIMAL: BOTH IN THEORY AND PRACTICE MS-COCO dataset. Fixed budget: 35k annotations Theorem: Under fixed budget, generalization error minimized with single annotation per sample. Assumptions: 5% wrt Majority rule • Best predictor is accurate enough (under no label noise). • Simplified case: All workers have same quality. • Prob. of being correct > 83% No. of workers

  14. DATA AUGMENTATION 1: GENERATIVE MODELING GAN Merits Peril • Captures statistics of • Feedback is real vs. fake: different from prediction. natural images • Introduces artifacts • Learnable

  15. PREDICTIVE VS GENERATIVE MODELS P(y | x) P(x | y) y y One model to do both? x x • SOTA prediction from CNN models. • What class of p(x|y) yield CNN models for p(y|x)?

  16. <latexit sha1_base64="zX/nehuC+fK5+AT4o3l1JMUrCQ=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPBi8cW7Ae0oWy2k3btZhN2N0I/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O6WNza3tnfJuZW/4PCoenzS0XGqGLZLGLVC6hGwSW2DTcCe4lCGgUCu8H0bu53n1BpHsHkyXoR3QsecgZNVZqZcNqza27C5B14hWkBgWaw+rXYBSzNEJpmKBa9z03MX5OleFM4KwySDUmlE3pGPuWShqh9vPFoTNyYZURCWNlSxqyUH9P5DTSOosC2xlRM9Gr3lz8z+unJrz1cy6T1KBky0VhKoiJyfxrMuIKmRGZJZQpbm8lbEIVZcZmU7EheKsvr5POVd1z617rutboFXGU4QzO4RI8uIEG3EMT2sA4Rle4c15dF6cd+dj2VpyiplT+APn8wfuW40T</latexit> <latexit sha1_base64="zX/nehuC+fK5+AT4o3l1JMUrCQ=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPBi8cW7Ae0oWy2k3btZhN2N0I/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O6WNza3tnfJuZW/4PCoenzS0XGqGLZLGLVC6hGwSW2DTcCe4lCGgUCu8H0bu53n1BpHsHkyXoR3QsecgZNVZqZcNqza27C5B14hWkBgWaw+rXYBSzNEJpmKBa9z03MX5OleFM4KwySDUmlE3pGPuWShqh9vPFoTNyYZURCWNlSxqyUH9P5DTSOosC2xlRM9Gr3lz8z+unJrz1cy6T1KBky0VhKoiJyfxrMuIKmRGZJZQpbm8lbEIVZcZmU7EheKsvr5POVd1z617rutboFXGU4QzO4RI8uIEG3EMT2sA4Rle4c15dF6cd+dj2VpyiplT+APn8wfuW40T</latexit> <latexit sha1_base64="zX/nehuC+fK5+AT4o3l1JMUrCQ=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPBi8cW7Ae0oWy2k3btZhN2N0I/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O6WNza3tnfJuZW/4PCoenzS0XGqGLZLGLVC6hGwSW2DTcCe4lCGgUCu8H0bu53n1BpHsHkyXoR3QsecgZNVZqZcNqza27C5B14hWkBgWaw+rXYBSzNEJpmKBa9z03MX5OleFM4KwySDUmlE3pGPuWShqh9vPFoTNyYZURCWNlSxqyUH9P5DTSOosC2xlRM9Gr3lz8z+unJrz1cy6T1KBky0VhKoiJyfxrMuIKmRGZJZQpbm8lbEIVZcZmU7EheKsvr5POVd1z617rutboFXGU4QzO4RI8uIEG3EMT2sA4Rle4c15dF6cd+dj2VpyiplT+APn8wfuW40T</latexit> <latexit sha1_base64="zX/nehuC+fK5+AT4o3l1JMUrCQ=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPBi8cW7Ae0oWy2k3btZhN2N0I/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O6WNza3tnfJuZW/4PCoenzS0XGqGLZLGLVC6hGwSW2DTcCe4lCGgUCu8H0bu53n1BpHsHkyXoR3QsecgZNVZqZcNqza27C5B14hWkBgWaw+rXYBSzNEJpmKBa9z03MX5OleFM4KwySDUmlE3pGPuWShqh9vPFoTNyYZURCWNlSxqyUH9P5DTSOosC2xlRM9Gr3lz8z+unJrz1cy6T1KBky0VhKoiJyfxrMuIKmRGZJZQpbm8lbEIVZcZmU7EheKsvr5POVd1z617rutboFXGU4QzO4RI8uIEG3EMT2sA4Rle4c15dF6cd+dj2VpyiplT+APn8wfuW40T</latexit> <latexit sha1_base64="TQGEygQLk4MpTONXYfpVxGVfRf8=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqMeCF48t2A9oQ9lsJ+3azSbsbsQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKLSPJb3ZpqgH9GR5CFn1Fip+TQoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ3fsZlkhqUbLkoTAUxMZl/TYZcITNiaglitbCRtTRZmx2ZRsCN7qy+ukfVX13KrXvK7Uu3kcRTiDc7gED2pQhztoQAsYIDzDK7w5D86L8+58LFsLTj5zCn/gfP4A7NeNEg=</latexit> <latexit sha1_base64="TQGEygQLk4MpTONXYfpVxGVfRf8=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqMeCF48t2A9oQ9lsJ+3azSbsbsQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKLSPJb3ZpqgH9GR5CFn1Fip+TQoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ3fsZlkhqUbLkoTAUxMZl/TYZcITNiaglitbCRtTRZmx2ZRsCN7qy+ukfVX13KrXvK7Uu3kcRTiDc7gED2pQhztoQAsYIDzDK7w5D86L8+58LFsLTj5zCn/gfP4A7NeNEg=</latexit> <latexit sha1_base64="TQGEygQLk4MpTONXYfpVxGVfRf8=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqMeCF48t2A9oQ9lsJ+3azSbsbsQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKLSPJb3ZpqgH9GR5CFn1Fip+TQoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ3fsZlkhqUbLkoTAUxMZl/TYZcITNiaglitbCRtTRZmx2ZRsCN7qy+ukfVX13KrXvK7Uu3kcRTiDc7gED2pQhztoQAsYIDzDK7w5D86L8+58LFsLTj5zCn/gfP4A7NeNEg=</latexit> <latexit sha1_base64="TQGEygQLk4MpTONXYfpVxGVfRf8=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEqMeCF48t2A9oQ9lsJ+3azSbsbsQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ7dzvPKLSPJb3ZpqgH9GR5CFn1Fip+TQoV9yquwBZJ15OKpCjMSh/9YcxSyOUhgmqdc9zE+NnVBnOBM5K/VRjQtmEjrBnqaQRaj9bHDojF1YZkjBWtqQhC/X3REYjradRYDsjasZ61ZuL/3m91IQ3fsZlkhqUbLkoTAUxMZl/TYZcITNiaglitbCRtTRZmx2ZRsCN7qy+ukfVX13KrXvK7Uu3kcRTiDc7gED2pQhztoQAsYIDzDK7w5D86L8+58LFsLTj5zCn/gfP4A7NeNEg=</latexit> NEURAL DEEP RENDERING MODEL (NRM) object y category Design joint priors for latent latent variables based on . intermediate . variables . reverse-engineering CNN rendering predictive architectures . . . image x

  17. NEURAL RENDERING MODEL (NRM) image unpooled pooled rectified feature feature feature map map map 0.5 dog 0.2 cat 0.1 horse … CNN: Inference NRM: Generation Render Upsample, Choose render select or not location 1.0 dog rendered upsampled masked class image template template template

Recommend


More recommend


Explore More Topics

Stay informed with curated content and fresh updates.