Using Stata for data management and reproducible research
Christopher F Baum
Boston College and DIW Berlin
NCER, Queensland University of Technology, March 2014
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 1 / 138
Using Stata for data management and reproducible research - - PowerPoint PPT Presentation
Using Stata for data management and reproducible research Christopher F Baum Boston College and DIW Berlin NCER, Queensland University of Technology, March 2014 Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 1 / 138 Overview of
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 1 / 138
Overview of the Stata environment
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 2 / 138
Overview of the Stata environment
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 3 / 138
Overview of the Stata environment
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 4 / 138
Overview of the Stata environment
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 5 / 138
Overview of the Stata environment
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 6 / 138
Overview of the Stata environment Portability
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 7 / 138
Overview of the Stata environment Stata’s user interface
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 8 / 138
Overview of the Stata environment Stata’s user interface
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 9 / 138
Overview of the Stata environment Stata’s user interface
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 10 / 138
Overview of the Stata environment Stata’s user interface
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 11 / 138
Overview of the Stata environment Stata’s user interface
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 12 / 138
Overview of the Stata environment Stata’s user interface
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 13 / 138
Overview of the Stata environment Stata’s user interface Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 14 / 138
Overview of the Stata environment Stata’s user interface
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 15 / 138
Overview of the Stata environment Stata’s user interface
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 16 / 138
Overview of the Stata environment Using the Do-File Editor
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 17 / 138
Overview of the Stata environment Using the Do-File Editor Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 18 / 138
Overview of the Stata environment Using the Do-File Editor
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 19 / 138
Overview of the Stata environment Using the Do-File Editor Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 20 / 138
Overview of the Stata environment Using the Do-File Editor
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 21 / 138
Overview of the Stata environment The help system
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 22 / 138
Overview of the Stata environment The help system
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 23 / 138
Overview of the Stata environment Stata’s update facility
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 24 / 138
Overview of the Stata environment Extensibility
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 25 / 138
Overview of the Stata environment Extensibility
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 26 / 138
Overview of the Stata environment Extensibility
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 27 / 138
Overview of the Stata environment Extensibility
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 28 / 138
Overview of the Stata environment Extensibility
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 29 / 138
Overview of the Stata environment Extensibility
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 30 / 138
Overview of the Stata environment Extensibility
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 31 / 138
Overview of the Stata environment Extensibility
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 32 / 138
Overview of the Stata environment Extensibility
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 33 / 138
Overview of the Stata environment Extensibility
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 34 / 138
Working with the command line Stata command syntax
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 35 / 138
Working with the command line Stata command syntax
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 36 / 138
Working with the command line Command template
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 37 / 138
Working with the command line The varlist
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 38 / 138
Working with the command line The varlist
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 39 / 138
Working with the command line The exp clause
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 40 / 138
Working with the command line The if and in clauses
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 41 / 138
Working with the command line The if and in clauses
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 42 / 138
Working with the command line The using clause
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 43 / 138
Working with the command line The using clause
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 44 / 138
Working with the command line The using clause
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 45 / 138
Working with the command line The options clause
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 46 / 138
Working with the command line Programmability of tasks
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 47 / 138
Working with the command line Programmability of tasks
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 48 / 138
Working with the command line Programmability of tasks
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 49 / 138
Working with the command line Local macros and scalars
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 50 / 138
Working with the command line Local macros and scalars
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 51 / 138
Working with the command line Local macros and scalars
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 52 / 138
Working with the command line forvalues and foreach
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 53 / 138
Working with the command line forvalues and foreach
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 54 / 138
Working with the command line forvalues and foreach
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 55 / 138
Working with the command line forvalues and foreach
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 56 / 138
Working with the command line forvalues and foreach
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 57 / 138
Working with the command line forvalues and foreach
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 58 / 138
Working with the command line forvalues and foreach
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 59 / 138
Working with the command line forvalues and foreach
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 60 / 138
Working with the command line Prefix commands
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 61 / 138
Working with the command line The by prefix
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 62 / 138
Working with the command line The by prefix
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 63 / 138
Working with the command line The by prefix
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 64 / 138
Working with the command line The by prefix
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 65 / 138
Data management: principles of organization and transformation Missing values
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 66 / 138
Data management: principles of organization and transformation Missing values
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 67 / 138
Data management: principles of organization and transformation Missing data handling
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 68 / 138
Data management: principles of organization and transformation Missing data handling
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 69 / 138
Data management: principles of organization and transformation Missing data handling
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 70 / 138
Data management: principles of organization and transformation Display formats
format varname %9.2f
format date %tm
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 71 / 138
Data management: principles of organization and transformation Variable labels
label variable varname "text"
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 72 / 138
Data management: principles of organization and transformation Value labels
label define sexlbl 0 male 1 female label values sex sexlbl
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 73 / 138
Data management: principles of organization and transformation Generating new variables
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 74 / 138
Data management: principles of organization and transformation Generating new variables
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 75 / 138
Data management: principles of organization and transformation Generating new variables
generate large = (pop > 5000000) & !mi(pop)
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 76 / 138
Data management: principles of organization and transformation Generating new variables
generate raceid = . if (race == "Black") replace raceid = 2 else if(race== "White") replace raceid = 3
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 77 / 138
Data management: principles of organization and transformation Generating new variables
generate raceid = 2 if race == "Black" replace raceid = 3 if race == "White"
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 78 / 138
Data management: principles of organization and transformation Functions for generate, replace
generate byte newengland = /// inlist(state, "CT", "ME", "MA", "NH", "RI", "VT") generate byte middleage = inrange(age, 35, 49)
generate byte middleage = inrange(age, 35, 49) if !mi(age)
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 79 / 138
Data management: principles of organization and transformation Functions for generate, replace
generate ind2d = int(SIC/100)
generate code34 = mod(SIC, 100)
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 80 / 138
Data management: principles of organization and transformation Functions for generate, replace
generate endqtr = cond( mod(month, 3) == 0, /// "Filing month", "Non-filing month")
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 81 / 138
Data management: principles of organization and transformation Functions for generate, replace
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 82 / 138
Data management: principles of organization and transformation Functions for generate, replace
. use census2c . generate size=irecode(pop, 1000, 4000, 8000, 20000) . label define popsize 0 "<1m" 1 "1-4m" 2 "4-8m" 3 ">8m" . label values size popsize . tabstat pop, stat(mean min max) by(size) Summary for variables: pop by categories of: size size mean min max <1m 744.541 511.456 947.154 1-4m 2215.91 1124.66 3107.576 4-8m 5381.751 4075.97 7364.823 >8m 12181.64 9262.078 17558.07 Total 5142.903 511.456 17558.07
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 83 / 138
Data management: principles of organization and transformation Functions for generate, replace
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 84 / 138
Data management: principles of organization and transformation String-to-numeric conversion and vice versa
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 85 / 138
Data management: principles of organization and transformation String-to-numeric conversion and vice versa
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 86 / 138
Data management: principles of organization and transformation String-to-numeric conversion and vice versa
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 87 / 138
Data management: principles of organization and transformation String-to-numeric conversion and vice versa
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 88 / 138
Data management: principles of organization and transformation String-to-numeric conversion and vice versa
. encode state, gen(stid) . list state stid, sep(0) state stid 1. Massachusetts Massachusetts 2. New Hampshire New Hampshire 3. Vermont Vermont 4. New Jersey New Jersey 5. Michigan Michigan 6. Arizona Arizona 7. Alaska Alaska . summarize stid Variable Obs Mean
Min Max stid 7 4 2.160247 1 7
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 89 / 138
Data management: principles of organization and transformation String-to-numeric conversion and vice versa
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 90 / 138
Data management: principles of organization and transformation The egen command
egen newvar = zap(oldvar)
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 91 / 138
Data management: principles of organization and transformation The egen command
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 92 / 138
Data management: principles of organization and transformation The egen command
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 93 / 138
Data management: principles of organization and transformation The egen command
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 94 / 138
Data management: principles of organization and transformation The egen command
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 95 / 138
Data management: principles of organization and transformation The egen command
. use census2c . bysort size: egen avgpop = mean(pop) . generate popratio = 100 * pop / avgpop . format popratio %7.2f . list state pop avgpop popratio if size == 0, sep(0) state pop avgpop popratio 1. Rhode Island 947.2 744.541 127.21 2. Vermont 511.5 744.541 68.69 3.
652.7 744.541 87.67 4.
690.8 744.541 92.78 5. New Hampshire 920.6 744.541 123.65
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 96 / 138
Data management: principles of organization and transformation The egen command
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 97 / 138
Data management: principles of organization and transformation Time series calendar
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 98 / 138
Data management: principles of organization and transformation Time series calendar
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 99 / 138
Data management: principles of organization and transformation Time series calendar
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 100 / 138
Data management: principles of organization and transformation Time series operators
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 101 / 138
Data management: principles of organization and transformation Time series operators
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 102 / 138
Data management: principles of organization and transformation Time series operators
webuse lutkepohl, clear regress consumption L(1/4).consumption
regress consumption L(-4/4).income
regress D.consumption L.consumption
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 103 / 138
Reading external data import delimited
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 104 / 138
Reading external data import delimited
import delimited using filename
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 105 / 138
Reading external data import excel
import excel using weo_201204_FR.xls, firstrow clear
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 106 / 138
Reading external data import excel
import excel using weo_201204_FR.xls, describe import excel iso year NGDPPC PCPI using weo_201204_FR.xls, cellrange(A2:AW39) clear
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 107 / 138
Reading external data infile
infile price mpg displacement using auto
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 108 / 138
Reading external data infile
infile str3 country price mpg displacement using auto2
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 109 / 138
Reading external data infile
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 110 / 138
Reading external data Stat/Transfer
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 111 / 138
Writing external data
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 112 / 138
Writing external data export delimited and file
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 113 / 138
Writing external data export excel
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 114 / 138
Writing external data postfile and post
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 115 / 138
Combining data sets append
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 116 / 138
Combining data sets append
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 117 / 138
Combining data sets append
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 118 / 138
Combining data sets append
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 119 / 138
Combining data sets append
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 120 / 138
Combining data sets append
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 121 / 138
Combining data sets append
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 122 / 138
Combining data sets merge
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 123 / 138
Combining data sets merge
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 124 / 138
Combining data sets merge
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 125 / 138
Combining data sets merge
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 126 / 138
Combining data sets merge
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 127 / 138
Combining data sets merge
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 128 / 138
Combining data sets Match merge
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 129 / 138
Combining data sets Match merge
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 130 / 138
Reconfiguring data sets
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 131 / 138
Reconfiguring data sets collapse
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 132 / 138
Reconfiguring data sets reshape
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 133 / 138
Reconfiguring data sets reshape
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 134 / 138
Reconfiguring data sets reshape
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 135 / 138
Reconfiguring data sets reshape
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 136 / 138
Reconfiguring data sets reshape
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 137 / 138
Reconfiguring data sets reshape
Christopher F Baum (BC / DIW) Using Stata NCER/QUT, 2014 138 / 138