Introduction to CRFs Isabelle Tellier 02-08-2013 Plan 1. What is - - PowerPoint PPT Presentation

introduction to crfs isabelle tellier 02 08 2013 plan
SMART_READER_LITE
LIVE PREVIEW

Introduction to CRFs Isabelle Tellier 02-08-2013 Plan 1. What is - - PowerPoint PPT Presentation

Introduction to CRFs Isabelle Tellier 02-08-2013 Plan 1. What is annotation for ? 2. Linear and tree-shaped CRFs 3. State of the Art 4. Conclusion 1. What is annotation for ? What is annotation ? inputs can be either texts ou trees or


slide-1
SLIDE 1

Introduction to CRFs Isabelle Tellier 02-08-2013

slide-2
SLIDE 2

Plan

  • 1. What is annotation for ?
  • 2. Linear and tree-shaped CRFs
  • 3. State of the Art
  • 4. Conclusion
slide-3
SLIDE 3
  • 1. What is annotation for ?

What is annotation ? – inputs can be either texts ou trees or any structure built on finite vocabulary items – annotate such a structure = associate to each of its items an

  • utput label belonging to another finite vocabulary

– the structure is given and preserved

slide-4
SLIDE 4
  • 1. What is annotation for ?

Exemples of text annotations – POS (“part of speech”) labeling : item = “word” annotation = morphosyntactic label (Det, N, etc.) in the text – named entities (NE), IE : item = “word” annotation = type (D for Date, E for Event, P for Place...) + position of the NE (B for Begin, I for In, O for Out)

In 2016 the Olympic Games will take place in Rio de Janeiro O DB O EB EI O O O O PB PI PI

– segmentation of a text into “chunks”, phrases, clauses... – segmentation of a document into sections (ex : distinguish Title, Menus, Adverts, etc. in a Web page)

slide-5
SLIDE 5
  • 1. What is annotation for ?

Exemples of text annotations – Text alignment for automatic translation J’ aime le chocolat I X like X chocolate X – correspondance matrices are projected into couples of annotations J′

1

aime2 le3 chocolat4 I1 like2 chocolate3 1 2

  • 3

1 2 4

slide-6
SLIDE 6
  • 1. What is annotation for ?

Exemples of tree annotations

SENT NP SUJ Sligos VN PRED va VP OBJ VN PRED prendre NP OBJ pied PP MOD au NP Royaume-Uni .

– syntactic functions, SLR (Semantic Role Labeling : agent, patient...) of a syntactic tree – label = value of an attribute in an XML node

slide-7
SLIDE 7
  • 1. What is annotation for ?

Exemples of tree annotations

HTML BODY . . . DIV TABLE TR TD #text TD DIV #text A #text @href SPAN #text DIV . . . #text . . . Channel DelN DelST item DelN DelN DelST DelST DelN title DelN DelN link description DelST DelST DelN DelST

– on the left : an HTML tree – on the right : a labeling with editing operations – DelN, DelST : Delete a Node/SubTree – channel, item, title, link, description : rename a node

slide-8
SLIDE 8
  • 1. What is annotation for ?

Exemples of tree annotations – execution of the editing operations

HTML BODY . . . DIV TABLE TR TD #text TD DIV #text A #text @href SPAN #text DIV . . . #text . . . Channel item title link description

– implemented application : generations of RSS feeds from HTML pages – other possible application : extraction of portions de Web pages

slide-9
SLIDE 9
  • 1. What is annotation for ?

Summary – many tasks can be considered as annotation tasks – for this, you need to specify – the nature of input items – the relationships between items : order relations of the input structure (sequence, tree...) – the nature of the annotations and their meaning – the relationships between annotations – the relationships between the items their corresponding annotation – pre-treatments and post-treatments often necessary

slide-10
SLIDE 10

Plan

  • 1. What is annotation for ?
  • 2. Linear and Tree-shaped CRFs
  • 3. State of the Art
  • 4. Conclusion
slide-11
SLIDE 11
  • 2. Linear and Tree-shaped CRFs

Basic notions – classical notations : x is the input, y its annotation (of the same structure) – x and y are decomposed into random variables : x = {X1, X2, ..., Xn} et y = {Y1, Y2, ...Yn} – a graphical model defines dependances between the random variables in a graph – in a generative model (HMM, PCFG), there are oriented dependence from Yi to Xj Yi Xj – otherwise, in a discriminative model (CRF), it is possible to compute directly p(y|x) without knowing p(x) – learning : find the best possible parameters for p(y|x) from annotated examples (x, y) by maximazing the likelihood – annotation : for a new x, compute ˆ y = argmaxy p(y|x)

slide-12
SLIDE 12
  • 2. Linear and Tree-shaped CRFs

Basic properties of CRFs – define a non oriented graph on the variables Yi (implicitely : every variable X is connected) – CRFs are markovien discriminative models : p(Yi|X) only dépends

  • f X and Yj (i = j) such that Yi and Yj are connected

– CRFs are defined by (Lafferty, McCallum et Pereira 01) p(y|x) = 1 Z(x)

  • c∈C

exp

k

λkfk(yc, x, i)

  • – C is the set of cliques of the graph

– yc : values of y on the clique c – Z(x) un normalization factor – the fk are user-provided features – λk are the parameters of the model (weights for fk)

slide-13
SLIDE 13
  • 2. Linear and Tree-shaped CRFs

The usual graph for linear CRFs Y1 ... Yi−1 Yi Yi+1 ... YN – the features can use any information in x combined with any information in yc – examples of features fk(yi−1, yi, x, i) at position i : * fk(yi−1, yi, x, i) = 1 if xi−1 ∈ {the, a} and yi−1 = Det et yi = N = 0 otherwise * fk′(yi−1, yi, x, i) = 1 if {Mr, Mrs, Miss} ∩ {xi−3, ..., xi−1} = ∅ and yi = NE = 0 otherwise

slide-14
SLIDE 14
  • 2. Linear and Tree-shaped CRFs

Generate Features from the Labeled examples x y La Det bonne Adj soupe N fume V . 0 ponct ... Definition of features in softwares – define a pattern (any shape on x, at most clique-width on y) – corresponding instance : f1(yi−1, yi, x, i) = 1 if (xi=La) AND (yi=Det) = 0 otherwise

slide-15
SLIDE 15
  • 2. Linear and Tree-shaped CRFs

Generate Features from the Labeled examples x y La Det bonne Adj soupe N fume V . 0 ponct ... Associated feature f2(yi−1, yi, x, i) = 1 if (xi=bonne) AND (yi=Adj) = 0 otherwise

slide-16
SLIDE 16
  • 2. Linear and Tree-shaped CRFs

Generate Features from the Labeled examples x y La Det bonne Adj soupe N fume V . 0 ponct ... Associated feature f4(yi−1, yi, x, i) = 1 if (xi−1=La) AND (yi−1=Det) AND (xi=bonne) AND (yi=Adj) = 0 otherwise

slide-17
SLIDE 17
  • 2. Linear and Tree-shaped CRFs

Transform a HMM into a linear CRF Adj bonne : 1/2, grande : 1/2 Det N V intr la : 2/3 bonne : 1/3 fume : 4/5 une : 1/3 soupe : 2/3 soupe : 1/5 1/3 1/3 2/3 2/3 1 – f1(yi, x, 1) = 1 if yi = Det and xi = la (= 0 otherwise), λ1 = log(2/3) – f2(yi−1, yi, x, 1) = 1 if yi−1 = Det and yi = Adj (= 0 otherwise), λ2 = log(1/3) (if empty transition λ = −∞) – the computation of p(y|x) is the same in both cases

slide-18
SLIDE 18
  • 2. Linear and Tree-shaped CRFs

Possible graphs for trees

⊥ SUJ ⊥ PRED ⊥ OBJ PRED ⊥ OBJ ⊥ MOD ⊥ ⊥ ⊥ ⊥ ⊥ SUJ ⊥ PRED ⊥ OBJ PRED ⊥ OBJ ⊥ MOD ⊥ ⊥ ⊥ ⊥

slide-19
SLIDE 19
  • 2. Linear and Tree-shaped CRFs

Implementations – learning step by maximizing the log-likelihood log(

  • (x,y)∈S

p(y|x)) =

  • (x,y)∈S

log p(y|x) + penalty... by gradient descent (L-BFGS) – annotation by Viterby (linear), inside-outside (trees), message passing (general)... – computation in K ∗ N ∗ |Y |c (c length of the largest clique) – implementations available : Mallet, GRMM, CRFSuite, CRF++, Wapiti, XCRF (for 3-width clique trees), Factorie

slide-20
SLIDE 20

Plan

  • 1. What is annotation for ?
  • 2. Linear and tree-shaped CRFs
  • 3. State of the Art
  • 4. Conclusion
slide-21
SLIDE 21
  • 3. State of the Art

Use of CRFs for labeling tasks – NE recognition (McCallum & Li, 2003) – IE from tables (Pinto & al., 2003), – POS labeling (Altun & al., 2003) – shallow parsing (Sha & Pereira, 2003) – SRL for trees (Cohn & Blusom 2005) – tree transformation (Gilleron & al. 2006) – non linguistic uses : image labeling/segmenting, RNA alignment...

slide-22
SLIDE 22
  • 3. State of the Art

Extensions about the graph – add dependencies in the graph : skip-chain CRFs, dynamic (multi-levels) CRFs... – use CRFs for syntactic parsing (Finkel & al. 2008) – build the tree structure of a CRF (Bradley & Guestrin 2010) – CRFs for general graphs (grid-shaped for images) How to build the features – nearly always binary – feature induction (Mc Callum 2003) – allow to integrate external knowledge... (cf. further) – more general features may be more effective (Pu & al. 2010)

slide-23
SLIDE 23
  • 3. State of the Art

About the learning step – unsupervised or semi-supervised CRFs (difficult, not very effective) – add L1 penalty to the likelihood to select the best features (Lavergne & Yvon 2010) – add constraints at different possible levels (features, likelihood, labels...) : LREC 2012 tutorial (Druck & alii 2012) – MCMC inference methods

slide-24
SLIDE 24
  • 3. State of the Art

Linguistic interest – sequential vs. direct complex labeling ? – how to integrate linguistic knowledge ? – as external constraints – as additional labeled input data – as features

slide-25
SLIDE 25

Plan

  • 1. What is annotation for ?
  • 2. Linear and tree-shaped CRFs
  • 3. State of the Art
  • 4. Conclusion
slide-26
SLIDE 26

Conclusion

Interests – very effective for many tasks – allow the integration of many distinct sources of information – many available easy-to-use libraries Weaknesses – does not support well unsupervised/semi-supervised learning – not very incremental – still high learning complexity with large cliques or large label vocabulary