Characterizing Discrepancies in Reported A Acreage between the - - PowerPoint PPT Presentation

characterizing discrepancies in reported a acreage
SMART_READER_LITE
LIVE PREVIEW

Characterizing Discrepancies in Reported A Acreage between the - - PowerPoint PPT Presentation

Characterizing Discrepancies in Reported A Acreage between the Census of Agriculture b t th C f A i lt and June Agricultural Survey Michael E. Bellow Heather Ridolfo Heather Ridolfo National Agricultural Statistics Service United States


slide-1
SLIDE 1

Characterizing Discrepancies in Reported A b t th C f A i lt Acreage between the Census of Agriculture and June Agricultural Survey

Michael E. Bellow Heather Ridolfo Heather Ridolfo

National Agricultural Statistics Service United States Department of Agriculture DC-AAPOR/WSS Summer Conference /

  • Aug. 3, 2015

Washington, DC

slide-2
SLIDE 2

Outline

  • Background
  • Methods

Methods

  • Results

– Descriptive Graphics – Logistic Regression g g

  • Summary and Implications
slide-3
SLIDE 3

Research Question

What factors were most influential on the large discrepancies in reported acreage operated p p g p between the 2012 JAS and COA?

slide-4
SLIDE 4

Background g

2007 Cl ifi ti E S (CES)

  • 2007 Classification Error Survey (CES)

– Misclassification (farms classified as non-farms and vice versa) – Substantial acreage discrepancies between Census of Agriculture (COA) d J A i l l S (JAS) f l d l d i bl (COA) and June Agricultural Survey (JAS) for land related variables (e.g., total acres operated)

  • Re-interviews conducted on 147 operations found that

acreage discrepancies were due to:

Actual changes in acreage over period between JAS and COA – Actual changes in acreage over period between JAS and COA

– Reporting errors – Change in respondents

2012 - large acreage discrepancies found again

slide-5
SLIDE 5

Definition of Total Acres Operated

Total Acres Operated = (Acres owned) + (Acres rented/leased from others) (Acres rented/leased from others) – (Acres rented/leased to others) ( / )

slide-6
SLIDE 6

June Agricultural Survey (JAS) g y

A f b d l d t d

  • Area frame based sample survey conducted

annually in June

  • Sampling unit is segment (generally 1 square
  • Sampling unit is segment (generally 1 square

mile), divided into tracts

  • Data collected on U S crops livestock grain
  • Data collected on U.S. crops, livestock, grain

storage capacity, type and size of farm for tracts within sampled segments within sampled segments

  • Two week data collection period (first half of the

month)

  • Face-to-face interviewing
slide-7
SLIDE 7

Census of Agriculture (COA) g

  • Complete enumeration of U.S. farms and

ranches conducted every 5 years y y

  • Data collected on land use and ownership,
  • perator characteristics income expenditures
  • perator characteristics, income, expenditures

and farming practices for the previous year

  • Multiple frame (area and list)
  • Primarily mail survey
  • Primarily mail survey
slide-8
SLIDE 8

Combined JAS/COA Data Set

  • JAS records matched to corresponding records in two

COA datasets (unedited and edited)

  • Total number of matched records = 25,983
  • Some COA records were linked to multiple JAS

Some COA records were linked to multiple JAS records, each reporting data for the entire operation

  • Some JAS records were linked to multiple COA
  • Some JAS records were linked to multiple COA

records (mainly ‘split’ operations)

slide-9
SLIDE 9

Adjusted Percent Difference (APD)

APD = 100*(COA-JAS)/(COA+100) (if COA>JAS) = 100*(JAS-COA)/(JAS+100) (otherwise) Example - Example - COA JAS %Diff APD 7 5 29 1.9 700 500 29 25

slide-10
SLIDE 10

Exploratory Data Analysis p y y

  • Records for which APD of total acres operated

is 25 or higher defined to be discrepant g p

  • 23% of operations (nationwide) identified as

discrepant discrepant

  • Dependent variable in logistic regression is

binary for acreage discrepancy (1 if discrepant, 0 otherwise) 0 otherwise)

slide-11
SLIDE 11

Explanatory Variables p y

  • Farm type (crop vs livestock)
  • Land rented from others (acres)

( )

  • Land rented to others (acres)
  • Number of operators
  • Number of operators
  • Operator tenure (years operating farm)
  • Average drought level during JAS (county level)
  • Mode of COA data collection (face-to-face, CATI, etc.)

( , , )

  • Time between JAS and COA (days)
slide-12
SLIDE 12

Drought Intensity Data Set g y

  • Obtained from Univ. of Nebraska’s National Drought Mitigation

Center (NMDC)

  • Drought Monitor Classification Scheme (DMCS) -
  • Drought Monitor Classification Scheme (DMCS) -
  • six levels of drought ranging from ‘none’ to ‘exceptional’

recorded weekly at county level from May 29 – June 25, 2012 recorded weekly at county level from May 29 June 25, 2012

  • data sets give percent of county’s area classified to each

drought level g

  • overall county level average drought level computed from data
slide-13
SLIDE 13

Effect of Data Editing

Data Set  JAS/Unedited JAS/Edited COA / COA /

  • No. Records

25,983 25,983 ( ) ( ) Discrepant Records 6,601 (25.4%) 5,958 (22.9%) Discrepant Records Edited 1,351 (20.5%) _ Discrepancies Resolved _ 745 (55%) Non Discrepancies 102 (11%) Non-Discrepancies Broken _ 102 (11%)

slide-14
SLIDE 14

Preliminary Findings (From Exploratory Data Analysis) Data Analysis)

More Discrepancy If Independent Variable:

  • Farm type (crop or

li k) More Discrepancy If:

  • Livestock farm

livestock)

  • Number of operators
  • Multiple operators
  • Newer operators
  • Operator tenure
  • Drought level during JAS
  • Newer operators
  • Higher drought level

Ph /CATI

  • Mode of COA data

collection Ti b t JAS d COA

  • Phone/CATI

L ti

  • Time between JAS and COA
  • Longer time
slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20

Logistic Regression g g

  • Goal – model probability of discrepancy as function
  • f independent (explanatory) variables
  • Wald Chi-Square Statistic – used to test whether

regression parameter estimate for a given independent variable is significantly different from zero

  • Odds Ratio – measures strength of association

between dependent variable and a given p g independent variable

slide-21
SLIDE 21

Results of Logistic Regression g g

Independent Variable

Wald Test Odds Ratio

Chi-Square Statistic P-Value Value 95% Confidence Interval

Li t k F * 20 1 < 0001 1 148 [1 081 1 219] Livestock Farm* 20.1 <.0001 1.148 [1.081-1.219]

  • No. Operators

2.6 0.11 1.027 [0.994-1.06] Operator Tenure 2 19 0 14 1 002 [1 0-1 004] Operator Tenure 2.19 0.14 1.002 [1.0 1.004]

  • Avg. Drought Level

86.1 <.0001 1.137 [1.107-1.169] Mode = Phone/CATI* 6.9 0.009 1.198 [1.047-1.371] / [ ] Mode = EDR (Web)* 0.31 0.58 0.973 [0.883-1.072] Mode = FTF/CAPI* 0.26 0.61 0.963 [0.832-1.114] Days (JAS to COA) 12.0 0.0005 1.001 [1.001-1.002] * - binary variable y

slide-22
SLIDE 22

Summary and Future Work y

Si l t i bl f d t b i ifi t i l i ti

  • Six explanatory variables found to be significant in logistic

regression based on Wald chi-square test

  • Of those variables, livestock farm, average drought level and

f g g phone/CATI showed most influence in terms of which farms have discrepancies and which do not

  • Next phase of research effort

Next phase of research effort

  • explore explanatory variables further
  • probe largest outliers

investigate odd patterns (e g 60+ records with COA total land 1 JAS

  • investigate odd patterns (e.g. 60+ records with COA total land = 1, JAS

total land > 100)

  • data mining techniques (classification trees, clustering)?
slide-23
SLIDE 23

Acknowledgments g

i b

  • Denise Abreu
  • Mark Apodaca
  • Mark Gorsak
  • Noemi Guindin

Noemi Guindin

  • Thomas Jacob

A d L

  • Andrea Lamas
  • Jaki McCarthy
slide-24
SLIDE 24

Questions/Comments?

Michael E. Bellow, USDA/NASS Sampling and Estimation Research Section Mike.Bellow@nass.usda.gov e e o @ ass usda go Heather Ridolfo, USDA/NASS Survey Methodology and Technology Section Survey Methodology and Technology Section Heather.Ridolfo@nass.usda.gov