The Daala Video Codec Project Next-next Generation Video Timothy B. - - PowerPoint PPT Presentation

the daala video codec project next next generation video
SMART_READER_LITE
LIVE PREVIEW

The Daala Video Codec Project Next-next Generation Video Timothy B. - - PowerPoint PPT Presentation

The Daala Video Codec Project Next-next Generation Video Timothy B. Terriberry Mozilla & The Xiph.Org Foundation Patents are no longer a problem for free software We can all go home 2 Mozilla & The Xiph.Org Foundation


slide-1
SLIDE 1

Mozilla & The Xiph.Org Foundation

The Daala Video Codec Project Next-next Generation Video

Timothy B. Terriberry

slide-2
SLIDE 2

Mozilla & The Xiph.Org Foundation

2

  • Patents are no longer a problem for free

software

– We can all go home

slide-3
SLIDE 3

Mozilla & The Xiph.Org Foundation

3

  • Except... not quite
slide-4
SLIDE 4

Mozilla & The Xiph.Org Foundation

4

Carving out Exceptions in OIN

(Table 0 contains one Xiph codec: FLAC)

slide-5
SLIDE 5

Mozilla & The Xiph.Org Foundation

5

Why This Matters

  • Encumbered codecs are a billion dollar toll-tax
  • n communications

– Every cost from codecs is repeated a million fold

in all multimedia software

  • Codec licensing is anti-competitive

– Licensing regimes are universally discriminatory – An excuse for proprietary software (Flash)

  • Ignoring licensing creates risks that can show

up at any time

– A tax on success

slide-6
SLIDE 6

Mozilla & The Xiph.Org Foundation

6

The Royalty-Free Video Challenge

  • Creating good codecs is hard

– But we don’t need many – The best implementations of patented codecs are

already free software

  • Network effects decide

– Where RF is established, non-free codecs see no

adoption (JPEG, PNG, FLAC, …)

  • RF is not enough

– People care about different things – Must be better on all fronts

slide-7
SLIDE 7

Mozilla & The Xiph.Org Foundation

7

We Did This for Audio

slide-8
SLIDE 8

Mozilla & The Xiph.Org Foundation

8

The Daala Project

  • Goal: Better than HEVC without infringing IPR
  • Need a better strategy than “read a lot of

patents”

– People don’t believe you – Analysis is error-prone

  • Try to stay far away from the line, but...
  • One mistake can ruin years of development effort
  • See: H.264 Baseline
slide-9
SLIDE 9

Mozilla & The Xiph.Org Foundation

9

Strategy

  • Look for some elements common to broad

classes of patents

– Only need to avoid one element in a patent claim to

be able to say “we don’t do that”

  • Replace with fundamentally different techniques

– Higher risk/higher reward than incremental changes – Can avoid vast swaths of IPR – Creates new challenges others haven’t solved

  • Still have to read a lot of patents
slide-10
SLIDE 10

Mozilla & The Xiph.Org Foundation

10

Fundamentally Different

  • Identified four key areas we can avoid

– “Displaced Frame Difference” (motion

compensation)

– Adaptive loop filters (deblocking) – Spatial prediction (“intra”) – Binary arithmetic coding (specifically, context

modeling)

slide-11
SLIDE 11

Mozilla & The Xiph.Org Foundation

11

Displaced Frame Difference

  • Motion Compensation

– Copy blocks from an already encoded frame

(offset by a motion vector)

– Subtract from the current frame – Code the residual

⊖ = Input Reference frame Residual

slide-12
SLIDE 12

Mozilla & The Xiph.Org Foundation

12

Displaced Frame Difference

  • The “displaced frame difference” (DFD) is the

term of art for that residual

  • Not in and of itself patentable!

– At least, not anymore...

  • But found as one element of

nearly all patent claims on motion compensation

slide-13
SLIDE 13

Mozilla & The Xiph.Org Foundation

13

What We Do Instead

  • “Perceptual” Vector Quantization
  • Based on work in Opus designed to preserve

energy (film grain, fine details, etc.)

slide-14
SLIDE 14

Mozilla & The Xiph.Org Foundation

14

Perceptual Vector Quantization

  • Separate “gain” (energy) from “shape” (spectrum)

– Vector = Magnitude × Unit Vector (point on sphere)

  • Potential advantages

– Can give each piece different rate allocations

  • Preserve energy (contrast) instead of low-passing

– Free “activity masking”

  • Can throw away more information in regions of high

contrast (relative error is smaller)

  • The “gain” is what we need to know to do this!

– Better representation of coefficients

slide-15
SLIDE 15

Mozilla & The Xiph.Org Foundation

15

What does PVQ have to do with DFDs?

  • Subtracting and coding a residual loses energy

preservation

– The “gain” no longer represents the energy of the

  • riginal signal
  • But we still want to use predictors

– They do a really good job of reducing what we

need to code

slide-16
SLIDE 16

Mozilla & The Xiph.Org Foundation

16

What Does Prediction Really Do?

  • Prediction changes the probability of points

near the predictor

– Highly probable things are cheap to code – With DFDs, “highly probable” means “near zero”

  • Predicting gains is easy

– Subtract gain of predictor

  • Enumerating points on a sphere near an

arbitrary point (to model probabilities) is hard

– Solution: Transform the space so we can single

  • ut points near the predictor
slide-17
SLIDE 17

Mozilla & The Xiph.Org Foundation

17

2-D Projection Example

Input

  • Input
slide-18
SLIDE 18

Mozilla & The Xiph.Org Foundation

18

2-D Projection Example

Prediction Input

  • Input + Prediction
slide-19
SLIDE 19

Mozilla & The Xiph.Org Foundation

19

2-D Projection Example

Prediction Input

  • Input + Prediction
  • Compute Householder

Reflection

slide-20
SLIDE 20

Mozilla & The Xiph.Org Foundation

20

2-D Projection Example

Prediction Input

  • Input + Prediction
  • Compute Householder

Reflection

  • Apply Reflection
slide-21
SLIDE 21

Mozilla & The Xiph.Org Foundation

21

2-D Projection Example

θ

Prediction Input

  • Input + Prediction
  • Compute Householder

Reflection

  • Apply Reflection
  • Compute &

code angle

slide-22
SLIDE 22

Mozilla & The Xiph.Org Foundation

22

2-D Projection Example

  • Input + Prediction
  • Compute Householder

Reflection

  • Apply Reflection
  • Compute &

code angle

  • Code other

dimensions

Prediction Input

θ

slide-23
SLIDE 23

Mozilla & The Xiph.Org Foundation

23

What does this accomplish?

  • Creates another “intuitive” parameter, θ

– “How much like the predictor are we?” – θ = 0 → use predictor exactly

  • Remaining N-1 dimensions are coded with VQ

– We know their magnitude is gain*sin(θ)

  • Instead of subtraction (translation), we’re

scaling and reflecting

– Whatever else you can say, this is nothing like

computing a DFD

slide-24
SLIDE 24

Mozilla & The Xiph.Org Foundation

24

And it works!

PSNR for PVQ vs. Scalar Quantization (flat quantization, no activity masking) FastSSIM for turning on activity masking

slide-25
SLIDE 25

Mozilla & The Xiph.Org Foundation

25

Other Differences...

slide-26
SLIDE 26

Mozilla & The Xiph.Org Foundation

26

Loop Filters

  • “Loop filters” filter block edges to remove

blocking artifacts

– Adaptive: filter strength depends on the amount of

difference across the block edge

– Not invertible

  • Simple filters used in H.263 (and Theora!)

– Very simple to keep CPU cost low

  • Since H.264 there’s been an explosion of

complex filter designs

– And patents

slide-27
SLIDE 27

Mozilla & The Xiph.Org Foundation

27

Lapped Transforms

  • Non-adaptive, invertible deblocking post-filter
  • Encoder applies the inverse (a blocking filter)
  • Technique dates back to the 90’s

P

DCT DCT

P P

DCT DCT IDCT IDCT IDCT IDCT

P-1 P-1 P-1

Prefilter Postfilter

slide-28
SLIDE 28

Mozilla & The Xiph.Org Foundation

28

Blocking Filter

  • Prefilter makes things blocky
slide-29
SLIDE 29

Mozilla & The Xiph.Org Foundation

29

Spatial (Intra) Prediction

  • Predict a block from its causal neighbors
  • Explicitly code a direction along which to copy
  • Extend boundary of neighbors into new block

along this direction

slide-30
SLIDE 30

Mozilla & The Xiph.Org Foundation

30

Intra Prediction with Lapped Transforms

  • We can’t copy pixels until we undo the lapping

– We can’t undo the lapping until we’ve predicted

those pixels

  • Don’t copy pixels: copy transform coefficients

– Currently just horizontal and vertical directions – Chroma (color) predicted from luma (brightness)

  • Not as good, but we try to make up for it

elsewhere (e.g., lapping itself)

slide-31
SLIDE 31

Mozilla & The Xiph.Org Foundation

31

Binary Arithmetic Coding

  • Code only binary decisions

– Actual cost in bits depends on probability – Very cheap to code 1 symbol – Need to code a lot of symbols (not parallelizable)

  • Probability modeling

– Simple 1-byte lookup tables

  • Non-binary values

– Various schemes for converting to binary

decisions (“binarization”)

slide-32
SLIDE 32

Mozilla & The Xiph.Org Foundation

32

Non-Binary Arithmetic Coding

  • Code values with up to 16 possibilities

– Equivalent to 4 binary decisions – More expensive, but not 4x more expensive

  • A lot of overheads are per-symbol

– Effectively parallel!

  • One byte cannot model 16 probabilities

– Use, e.g., expected value plus distribution shape

(Laplace, Exponential) and compute on the fly

  • Convert things to hex, not binary!

– Often combine multiple values into one symbol

slide-33
SLIDE 33

Mozilla & The Xiph.Org Foundation

33

How Are We Doing?

slide-34
SLIDE 34

Mozilla & The Xiph.Org Foundation

34

PSNR-HVS-M Results on 19 Sequences

slide-35
SLIDE 35

Mozilla & The Xiph.Org Foundation

35

FastSSIM Results on 19 Sequences

slide-36
SLIDE 36

Mozilla & The Xiph.Org Foundation

36

Are We Compressed Yet?

  • https://arewecompressedyet.com/

– Will run metrics on any git commit (we’re happy to

add your repository, just ask)

– Amazon EC2 instances, so results in a few minutes – Details on setup at

https://wiki.xiph.org/AreWeCompressedYet

slide-37
SLIDE 37

Mozilla & The Xiph.Org Foundation

37

Daala Demo Pages

  • https://people.xiph.org/~xiphmont/demo/

– Next Generation Video: Introducing Daala, Part 1 – Introducing Daala, Part 2: Frequency Domain Intra Prediction – Introducing Daala, Part 3: Time/Frequency Resolution Switching – Introducing Daala, Part 4: Chroma from Luma – Daala, Part 5: Painting Images for Fun and Profit – Daala, Part 6: Perceptual Vector Quantization – Daala Progress Update 20141223: Still Images

slide-38
SLIDE 38

Mozilla & The Xiph.Org Foundation

38

Questions?