Jump to ContentJump to Main Navigation
Measuring Poverty and Wellbeing in Developing Countries$

Channing Arndt and Finn Tarp

Print publication date: 2016

Print ISBN-13: 9780198744801

Published to Oxford Scholarship Online: January 2017

DOI: 10.1093/acprof:oso/9780198744801.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (www.oxfordscholarship.com). (c) Copyright Oxford University Press, 2017. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a monograph in OSO for personal use (for details see http://www.oxfordscholarship.com/page/privacy-policy).date: 19 September 2017

(p.325) Appendix B User Guide to Estimating First-Order Dominance Software (EFOD)

(p.325) Appendix B User Guide to Estimating First-Order Dominance Software (EFOD)

Source:
Measuring Poverty and Wellbeing in Developing Countries
Author(s):

Channing Arndt

Kristi Mahrt

Publisher:
Oxford University Press

B.1 Introduction

This user guide presents the Estimating First-Order Dominance (EFOD) software developed to implement the first-order dominance (FOD) approach presented in Arndt et al. (2012). The FOD approach is a straightforward method of conducting multidimensional welfare comparisons between population groups based on a set of binary welfare indicators where individuals or households are either deprived or not deprived in each welfare dimension. The approach imposes no restrictions on the nature of the social welfare function or on the relative importance of each indicator. Rather, it relies simply on the idea that it is better to be deprived than not deprived in any indicator. FOD is well suited for comparing welfare performance across time and space. Binary FOD indicators can be defined directly from the non-monetary welfare modules of censuses or household surveys by specifying thresholds that distinguish between outcomes that are considered deprivations and those that are not. Thus, the method is highly relevant as FOD indicators and indicator thresholds can be chosen to correspond to specific public policy goals.

EFOD comprises a series of Stata and GAMS code files that perform four key steps: dataset preparation and software initialization, bootstrap sampling, FOD comparisons, and processing results. The code maintains a high degree of flexibility in that it allows FOD to be implemented for up to seven binary welfare indicators across multiple time periods, levels of area aggregations, and population groups. The EFOD software provides a flexible format for conducting FOD comparisons where populations are specified in terms of areas and time periods. Areas may be specified for multiple aggregate levels such as the nation, urban/rural areas, provinces, or regions. The data can be classified into multiple population groups, referred to as categories. Categories might include groupings such as households, children, or women.

This chapter outlines the technical aspects of implementing the EFOD software. Refer to Chapter 3 for a presentation of the FOD methodology and Chapter 4 for a discussion of applying the FOD methodology in practice. Section B.2 outlines data and (p.326) software requirements. Section B.3 presents a step-by-step overview of the code stream, including required inputs. Finally, section B.4 discusses possibilities of extending FOD comparisons beyond areas within a single country.

B.2 Requirements

B.2.1 Software

The EFOD package is implemented in both Stata and GAMS. While an intermediate skill level in Stata is necessary, only a basic understanding of GAMS is needed. The Stata code was produced using Stata 12; however, the code will run in Stata 11 or higher.1 The GAMS code will run on versions 22.7 and later.2

B.2.2 Data

Table B1. Incoming data

Variable

Description

Notes

ID (optional)

Household or individual ID

Not required

Time period

Survey or census year (numeric)

When analysing only one survey year, the time period variable can be created within EFOD.

Category

Population groups (numeric)

When analysing one population group, the category variable can be created within EFOD.

Indicators

Welfare indicators coded such that

0 = deprived

1 = not deprived

Up to seven indicators may be defined for each category.

Area aggregates

A separate variable must be included for each area aggregate, e.g. urban/rural, states, zones, etc.

Area values should be a numerical but do not need to be consecutive.

The national aggregate variable can be created within EFOD.

Weights

Appropriate weights for each category

For example, in a survey:

household weight = hhsize * sample weight

individual weight = sample weight

If the data contains both household and individual categories, the weight should be appropriately specified for each category.

Gender (optional)

Gender of the household head or gender of the individual (numeric)

The gender variable is used to produce descriptive statistics and is not used in FOD comparisons.

Source: See text

The EFOD code begins with the Stata dataset FOD_input.dta, which contains the FOD indicators and other variables described in Table B1. The dataset must be structured with one observation per unit of analysis (household and/or individual). (p.327) No observation can have a missing value in any of the FOD indicators. See Chapter 14 for a discussion of how this requirement can influence indicator choices. If multiple survey years are included, the years are stacked with the time period variable distinguishing the years. Similarly, if multiple population categories are included, categories are stacked with a category variable distinguishing each population.

When choosing area aggregates and population categories, it is useful to consider how EFOD makes welfare comparisons. Spatial FOD comparisons are made between all areas for each time period, population category, and bootstrap iteration. Temporal FOD comparisons are conducted between time periods for a given area, population category, and bootstrap iteration. FOD comparisons are never made between population categories; rather, the software is capable of making spatial and temporal comparisons independently for different categories in a single execution of EFOD.

Consideration of sample sizes is crucial in choosing areas of aggregation, population categories, and the number of FOD indicators. The smallest area evaluated must be no smaller than the area for which the survey is designed to be statistically representative. Following the survey structure most likely will ensure adequate sample sizes for households, but not necessarily for population groups. Samples in each area, for each category, in each year should be no smaller than approximately 400 households or individuals. Furthermore, samples are divided into subsamples of households or individuals falling into each combination of welfare outcomes, thus the number of FOD indicators must also be balanced with sample size.

B.2.3 Directories

Table B2. EFOD folders

Subfolders

Initial contents

EFOD created files

new

all Stata do-files and GAMS code files

work

FOD_input.dta

intermediate datasets

out

initially empty

output tables

rep

initially empty

logs from each code file

in (optional)

raw data

Source: See text

The code stream requires a specific directory structure. The subdirectories of EFOD and their functions are described in Table B2. The subdirectory new contains the EFOD code stream. The user must create and transfer the input dataset FOD_input.dta to the work subdirectory.3 The code stream creates rep and out as necessary.

(p.328) B.3 EFOD Code Stream

EFOD is comprised of four parts. The files are numbered in the order they are called in the code stream. The first set of Stata do-files prepares the data, defines global macros that allow the remaining do-files to run without modification, and produces descriptive tables (000_master.do to 018_Table_descriptive2.do). The second set of do-files conducts the bootstrap sampling and transforms the data into shares of the population falling into each combination of welfare indicator outcomes (020_boot_control.do to 024_Table_shares.do). The third set of files contains batch files and the GAMS code that conducts the FOD linear program (030_FOD_base.bat to 038_temporal.inc). The final set of do-files processes the FOD outcomes to produce a collection of spatial, temporal, and ranking tables (040_FOD_data.do to 046_Table_Rank.do).

This remainder of this section discusses the code stream file by file. For ease of reference, each subsection begins with a list of relevant Stata and GAMS files as well as an overview of required modifications. The code is structured such that the user must make very few modifications within the main code stream. All required modifications occur in the first three do-files (000_master.do, 010_data.do, 012_initialization.do) and in the GAMS control files (030_FOD_base.gms, 030_FOD_base.bat). The entire code stream, including the GAMS code, can be run from 000_master.do. Once the initial set of do-files is executed, most do-files may be run individually.

B.3.1 Initialization

000_master.do, 010_data.do, 012_initialization.do, 013_global_reset.do, 014_globals.do, 016_Table_descriptive1.do, 018_Table_descriptive2.do.

  • Modify 000_master.do to define the global macros path and cty.

  • Modify 010_data.do to define variables described in Table B3.

  • Modify 012_initialization.do to define global macros described in Table B4.

B.3.1.1 Master Stata Do-File

Once all required modifications are complete, the entire code stream, including GAMS files, can be executed from the master do-file, 000_master.do. As this file lists each Stata and GAMS file in the order they are executed, 000_master.do also serves as a table of contents of all do-files and their functions. To get started, the global path must be defined to point to the EFOD directory. The global cty is used to identify the country or survey and can be set as desired. This master file provides the option to pause the code stream three times to verify that global names and indicators are specified correctly. In the first execution of EFOD with new data or new initialization values, this feature is recommended and the line ‘pause on’ should be activated. In subsequent runs, this feature may not be useful and could be changed to ‘pause off’.

B.3.1.2 Input Datasets

Table B3. Variables created in 010_data.do

Description

Variable

Examples

Values

Notes

Time periods

t

years

1, 2, 3,…Must use consecutive values starting with 1.

Denotes the survey or census year.

Population categories

c

households, women, children

1, 2, 3,…Must use consecutive values starting with 1.

Denotes subgroups in the population to be analysed independently.

Welfare indicators

d1, d2, d3,…d7

water, sanitation, housing, education, health

0 = deprived

1 = not deprived

Welfare indicators may vary by category.

Survey strata

strata1, strata2,…

urban/rural, regions, provinces

Numeric—no need to be consecutive

Define the survey strata variable(s).

Survey cluster

cluster

primary sampling unit

Numeric—no need to be consecutive

Define the survey cluster variable. This variable is not used with census data.

Aggregate areas

area1, area2, area3…

nation, urban/rural, region, provinces

Numeric—no need to be consecutive.

1 = urban

2 = rural

Define the aggregate areas to be analysed in the FOD comparisons.

Sample weights

weight

weight * hhsize, weight

Numeric

Weights may vary with household (weight * hhsize) and individual categories (weight). In a full census the individual weight is 1 and the household weight is hhsize.

Gender (optional)

gender

gender of hh head, gender of individual

0 = male

1 = female

Used in descriptive statistics but not in FOD comparisons.

Source: See text

The do-file, 010_data.do, transforms the incoming dataset, FOD_input.dta, to conform to variable formats used in the code stream and saves a new dataset, FOD_data_$cty.dta. The file should be modified as needed to ensure that FOD_data_$cty.dta has the proper format. (p.329) Refer to Table B3 for a description of possible modifications. The extent to which this do-file must be modified depends on the state of FOD_input.dta. It is useful to note that it is possible to define a different set of indicators for each population category. Population categories may include groups of households or individuals and therefore it may be necessary to define weights differently by category. For example, when working with survey data, the appropriate weight for individuals is the sample weight, whereas the sample weight multiplied by household size may be preferred for households.

B.3.1.3 Global Macros

Table B4. Globals specified in 012_initialization.do

Description

Global

Purpose

Values

Bootstrap iterations

its

Set the number of bootstrap iterations. The default value is 100.

numeric

Naming variables in the tables

yearlist

Specify how years will be called in the tables. Because years are numbered, this global gives the years a name for tables. The years must be listed in the order of the year variable, t.

e.g.

1998,

2000,

2010…

catlist

Specify how categories will be called in the tables. Because categories are numbered, this global gives the categories a name for tables. The category names must be listed in the order of the category variable, c.

e.g. households,

women,

children

deplist1, deplist2,…

Specify how indicators will be called in the tables. Because indicators are numbered, this global gives the indicators a name for tables. A different deplist must be specified for each category even if the indicators are the same for every category. The indicator names must be listed in the order the indicator variables d1,…, d7.

e.g.

water,

housing,

education

areaname1,

areaname2,

areaname3

Corresponding to the area aggregation variables, specify how each area will be called in the tables. For example, if the aggregation is the nation, this global will list Nation. If the aggregation is province, the global will list all province names. The order of area names must correspond to the numeric sequence of areas within area variables.

Nation,

Urban Rural,

Western, Northern, Central…

Areas to be included in each FOD table

rankkeeplist

List the area aggregates by their globals to indicate which levels will be included in the FOD rank table.

$area1, $area2, $area3…

shkeeplist

List the area aggregates by their globals to indicate which levels will be included in the shares of welfare combinations tables.

$area1, $area2, $area3…

spatkeeplist

List the area aggregates by their globals to indicate which levels will be included in the spatial FOD tables.

$area1, $area2, $area3…

tempkeeplist

List the area aggregates by their globals to indicate which levels will be included in the temporal FOD tables.

$area1, $area2, $area3…

Descriptive 1 table

urban

Specify the area variable that defines urban/rural.

area2, area3…

gender_switch

Specify if conducting gender analysis

0 = no gender

1 = gender

Survey/census structure

datatype

For bootstrapping, specify whether the data is from a survey or a census.

1 = survey

2 = census

stratalist1,

stratalist2

For each year, 1, 2,…, list the survey strata specified by the variables strata1, strata2,…. Multiple strata may be listed for each year. This variable is used with census data to guide bootstrap sampling and is determined by the user.

strata1, strata2…

minstrata

If using census data, a minimum sample size can be set to force bootstrap samples of each stratum to be the same size. This global is optional.

no greater than the population of the smallest stratum

GAMS processors

GAMS

Specify the number of processors that will be used running FOD in GAMS.

1–4

Source: See text

The initialization file, 012_initialization.do, defines global macros used throughout the code stream. The user must carefully specify global macros to allow the remaining do-files to run without further modification. Global macros are described in Table B4. (p.330)

(p.331) The global macros used to define area names, categories, years, and deprivations must be set with care. It is particularly important for the names to be listed in sequential order exactly corresponding to the numeric order of the relevant variable values.4 For instance, suppose there are three population categories with the category variables valued 1, 2, 3, where 1 refers to households, 2 refers to children, and 3 refers to women. Then the category global must be specified in the corresponding order:

global catlist households children women

The do-file 014_globals.do automatically generates additional globals. This file does not need to be modified. In addition, this file saves a log that lists variable values with the corresponding names specified in the globals yearlist, catlist, areaname1, areaname2,…, and deplist. It is advisable to view these lists in the results window, if ‘pause on’ is activated, or in the log file, rep/014_globals.log, to verify that the naming globals are properly specified.

B.3.2 Descriptive Statistics

016_Table_descriptive1.do and 018_Table_descriptive2.do.

Descriptive statistics do-files generate sample sizes and weighted means of the welfare indicators. A comma-separated text file is created for each population category. Indicator means are interpreted as the share of the population not deprived in each indicator. The first set of tables generates means and sample sizes for the nation, urban and rural areas, and if specified, gender. The second set of tables generates means and sample sizes for every area included in the analysis, with the areas organized by area aggregates.

Weighted means are produced early in the code stream, allowing analysts to scrutinize the data before moving forward with bootstrapping and FOD comparisons. When ‘pause on’ is activated, Stata pauses at the end of 016_Table_descriptive1.do and 018_Table_descriptive2.do, allowing the user to examine means in the results window or text files before proceeding.

(p.332) B.3.3 Bootstrapping and Compiling Shares

020_boot_control.do, 022_shares.do, 024_Table_shares.do.

The do-files 020_boot_control.do and 022_shares.do work together to conduct bootstrap sampling and generate shares of an area’s population that fall into each combination of welfare indicators for every category, year, and iteration. 022_shares.do is executed from 020_boot_control.do and cannot be run independently. The do-file 020_boot_control.do cycles through several loops and sub-loops. The exterior loop cycles through the bootstrap iterations where the number of iterations is defined by the global its. It begins with the zero iteration, which contains the actual survey data, and then continues by drawing bootstrap samples in iteration one through the final iteration.5

Within the given iteration, 022_shares.do loops through years and population categories with a sub-loop through all areas to calculate the proportion of a given area’s population attaining each possible combination of welfare outcomes. For a given area, time period, and category, the shares across all combinations of indicators sum to one. When the time and category loops are complete in a given iteration, Stata returns to 020_boot_control.do, where that iteration’s share data is appended to the data file work/boot_final_$cty.dta. This process continues through all iterations until work/boot_final_$cty.dta contains shares for all areas in each time period and category, for every iteration. This file is also saved as a text file, work/data_bs_100.csv, that will be imported into GAMS for FOD comparisons.

Table B5. Combination of welfare indicators, table_shares_1.csv

Water

Sanit

House

Educ

Info

National 2004

National 2010

National_change

0

0

0

0

0

6.8726

6.8797

−13.2620

0

0

0

0

1

6.1608

5.7849

0.8490

0

0

0

1

0

3.6046

4.2612

−0.0889

0

0

0

1

1

5.6834

6.5246

3.4989

0

0

1

0

0

0.2131

0.1921

−0.1853

0

0

1

0

1

0.6247

0.8597

−0.1757

0

0

1

1

0

0.3519

0.3719

0.1551

0

0

1

1

1

2.0807

2.4545

1.3803

0

1

0

0

0

0.0000

0.0167

−0.0521

0

1

0

0

1

0.0862

0.0513

0.0141

0

1

0

1

0

0.0000

0.0318

0.0199

0

1

0

1

1

0.0034

0.0366

0.0194

0

1

1

0

0

0.0452

0.0023

−0.0078

0

1

1

0

1

0.0526

0.0762

0.0599

0

1

1

1

0

0.0032

0.0682

0.0292

0

1

1

1

1

0.5169

1.1511

0.9863

1

0

0

0

0

15.1621

12.2475

−14.6200

1

0

0

0

1

14.9753

12.1967

3.9844

1

0

0

1

0

8.3381

7.6366

−0.1913

1

0

0

1

1

13.1270

12.8563

7.3944

1

0

1

0

0

1.0865

1.2374

−0.2660

1

0

1

0

1

3.4522

2.8978

−1.2036

1

0

1

1

0

1.4996

1.6262

0.1844

1

0

1

1

1

10.1295

10.2820

3.7224

1

1

0

0

0

0.0353

0.1238

−0.1554

1

1

0

0

1

0.2773

0.1765

0.0974

1

1

0

1

0

0.0353

0.1065

0.0924

1

1

0

1

1

0.1165

0.5966

0.5338

1

1

1

0

0

0.1068

0.2910

0.2122

1

1

1

0

1

0.6369

0.8045

0.3230

1

1

1

1

0

0.3204

0.5954

0.4939

1

1

1

1

1

4.4019

7.5626

6.1580

Source: Based on calculations in Arndt et al. (2014) using the 2004/5, 2010 TDHS (National Bureau of Statistics and Macro 2005, 2011)

Table B6. Number of deprivations, table_shares_1_num.csv

num_dep

National

National

National_

Rural

Rural

Rural_

Urban

Urban

Urban_

2004

2010

change

2004

2010

change

2004

2010

change

0

4.4019

7.5626

6.1580

0.8586

1.3157

1.0155

15.6553

28.3526

23.1706

1

11.7202

13.4295

6.0592

4.9436

7.3434

4.7315

33.2425

33.6845

10.0390

2

20.6382

20.5898

8.2659

19.1846

21.3244

12.9707

25.2549

18.1451

−7.7582

3

31.2266

29.0360

6.8762

35.5143

34.2169

11.5780

17.6090

11.7936

−8.7273

4

25.1405

22.5024

−14.0973

30.7850

27.2588

−13.8055

7.2138

6.6730

−14.6567

5

6.8726

6.8797

−13.2620

8.7139

8.5409

−16.4902

1.0246

1.3512

−2.0673

Source: Based on calculations in Arndt et al. (2014) using the 2004/5, 2010 TDHS (National Bureau of Statistics and Macro 2005, 2011)

Before proceeding to FOD, the do-file 024_Table_shares.do creates tables by category containing shares of the population by combinations of welfare indicators and by number of deprivations in the static sample. The combinations of welfare indicator tables are only generated for areas specified by the global shkeeplist. Table B5 is a sample table, which displays the combinations of welfare indicators at the national level. Number of deprivations measures the total number of deprivations faced by a household or an individual. The share of the population with a given number of deprivations is equal to the sum of the shares of all welfare combinations with that number of deprivations. For example, supposing five indicators are in focus, the share of the population with one deprivation is equal to the sum of the shares of the population with welfare combinations (1 0 0 0 0), (0 1 0 0 0), (0 0 1 0 0), (0 0 0 1 0), and (0 0 0 0 1). The total shares across all numbers of deprivations for a given area (in a specific year and category) sums to one. The deprivations tables are presented in a long and a short form. Table B6 provides an example of a number of deprivations table. The long form includes all areas while the short form includes only areas specified by the global shkeeplist. (p.333)

(p.334) B.3.4 FOD

030_FOD_base.bat, 030_FOD_base.gms, 031_process1.bat, 031_process1.gms, 032_process2.bat, 032_process2.gms, 033_process3.bat, 033_process3.gms, 034_process4.bat, 034_process4.gms, 036_spatial.inc, 038_temporal.inc.

  • Modify 030_FOD_base.bat and 030_FOD_base.gms

FOD comparisons are conducted entirely in a linear program executed by GAMS. The file 030_FOD_base.gms uses the dataset data_bs_100.csv and several include files to create required variables, equations, and parameters and save them to a base file. A file for each processor, 031_process1.gms to 034_ process4.gms, then executes 036_spatial.inc and 038_temp.inc to conduct the FOD comparisons using the base file.

The user can execute the GAMS code in three ways. First, FOD can be shelled directly from 000_master.do in Stata. Second, the user can manually execute FOD in GAMS IDE. Third, the user can execute FOD from a command window. FOD involves a large number of comparisons that increases with the number of areas, survey years, population categories, and bootstrap iterations. In order to reduce processing time, the FOD comparisons are divided by bootstrap iteration and executed using up to four processors. It is possible to assign iterations to fewer processors depending on hardware capabilities. The process time can be lengthy, even when taking advantage of four processors, and can vary from minutes to several hours.

B.3.5 FOD Tables

040_FOD_data, 042_Table_FODspat.do, 044_Table_FODtemp.do, 046_Table_Rank.do.

Depending on the number of processors utilized, GAMS saves up to four spatial (res_spat1.csv…) and four temporal (res_temp1.csv…) text files to the work directory. The Stata do-file 040_FOD_data.do appends these files and creates two datasets (work/res_spat.dta and work/res_temp.dta). From these datasets, three collections of tables are created that present temporal results, spatial results, and area rankings.

B.3.5.1 Spatial FOD Tables

The do-file 042_Table_FODspat.do creates spatial FOD tables for static and bootstrapped samples by area, category, and period. FOD results are averaged across bootstrap iterations and are interpreted as the probability of domination. A table is produced for static (Table B7) and bootstrap results (Table B8). Within each table, a blank cell indicates an indeterminate outcome between the row and column area. In static tables, a ‘1’ indicates the row (column) area dominates (is dominated by) the column (row) area. In bootstrap tables, values indicate the estimated probability that the row (column) area dominates (is dominated by) the column (row) area (probability is defined as total number of iterations where a domination outcome occurs divided by the total number of bootstrap iterations). The row (column) average yields the average number of times or the average probability that the row (column) area dominates (is dominated by) all other areas for the static and bootstrap cases, respectively.

Table B7. Spatial FOD results (static), FOD_spat_1_1_static.csv

Area

National

Rural

Urban

Central

Eastern

Lake

Northern

S_Highlands

Southern

Western

Zanzibar

Average

National

1

0.1

Rural

0

Urban

1

1

1

1

1

1

1

1

1

0.9

Central

0

Eastern

1

1

1

1

1

1

1

0.7

Lake

0

Northern

1

1

1

0.3

S_Highlands

0

Southern

0

Western

0

Zanzibar

1

1

1

1

1

1

1

0.7

Average

0.3

0.5

0

0.3

0.1

0.4

0.1

0.3

0.3

0.4

0

0.2455

Source: Based on calculations in Arndt et al. (2014) using the 2004/5, 2010 TDHS (National Bureau of Statistics and Macro 2005, 2011)

Table B8. Spatial FOD results (bootstrap), FOD_spat_1_1_boot.csv

Area

National

Rural

Urban

Central

Eastern

Lake

Northern

S_Highlands

Southern

Western

Zanzibar

Average

National

1

0.05

0.41

0.26

0.04

0.33

0.209

Rural

0.01

0.01

0.002

Urban

1

1

0.96

0.74

1

0.95

1

1

0.99

0.16

0.88

Central

0.17

0.09

0.04

0.25

0.1

0.065

Eastern

0.79

0.97

0.56

0.79

0.22

0.86

0.91

0.69

0.01

0.58

Lake

0.04

0.01

0.01

0.006

Northern

0.13

0.57

0.06

0.47

0.3

0.07

0.35

0.195

S_Highlands

0.18

0.02

0.05

0.03

0.03

0.031

Southern

0

Western

0.03

0.03

0.02

0.01

0.009

Zanzibar

0.3

0.77

0.42

0.06

0.42

0.06

0.38

0.83

0.4

0.364

Average

0.222

0.473

0

0.211

0.08

0.326

0.123

0.286

0.314

0.289

0.017

0.2128

Source: Based on calculations in Arndt et al. (2014) using the 2004/5, 2010 TDHS (National Bureau of Statistics and Macro 2005, 2011)

Bootstrap sampling introduces variation to the results and therefore small values should be interpreted with caution. For instance, Table B8 indicates that the nation (p.335) (p.336) (p.337) dominates Central with a probability of 0.05, which is likely too small to conclude that the nation outperforms Central.

B.3.5.2 Temporal FOD Tables

The do-file 044_Table_FODtemp.do creates temporal and net temporal FOD tables for each category. Temporal tables present static and bootstrap results for all year combinations for each area. As in spatial analysis, bootstrapped FOD results are averaged across iterations and are interpreted as the probability of domination. Temporal results are presented in two ways. First, FOD_temp_$cat.csv presents static and bootstrap temporal outcomes for both later years dominating earlier years and earlier years dominating later years (Table B9). Second, FOD_net_temp_$cat.csv, presents net temporal domination, which measures the difference in the probabilities of later years dominating earlier years and earlier years dominating later years (Table B10). In the case of no welfare regression, net results are equivalent to the results for later years dominating earlier years.

In static temporal columns, a ‘1’ indicates that a given year dominated the other year, while a blank cell indicates the given year did not dominate the other year. When both years have a blank entry, FOD was indeterminate. In the net temporal table, ‘1’ indicates the later year dominated the earlier year; a blank cell indicates FOD was indeterminate; and, ‘−1’ indicates the earlier year dominated the later year. There is no difference in the amount of information in the static temporal and the static net temporal tables, rather the difference lies in the presentation.

In the bootstrap temporal columns, entries indicate the probability that a given year dominates the other year. A blank indicates the year did not dominate in any iteration and ‘1’ indicates that year dominated in every iteration. When both years have a blank entry, FOD was indeterminate in all cases. In the net temporal table, positive probabilities indicate that the later year dominated in more iterations than the earlier year, and negative probabilities indicate that the earlier year dominated in more iterations than the later year. A net result of 0.2 could mean that the later year dominated in 20 per cent of iterations, the earlier year never dominated, and 80 per cent of the iterations were indeterminate. Or, for example, it could mean that the later year dominated in 60 per cent of the iterations, and the earlier year dominated in 40 per cent of the iterations. The exact scenario should be determined by the user. Similarly, a blank cell could indicate that the outcome was indeterminate in every iteration or that each year dominated with the same frequency. Thus, in cases with frequent backsliding, the full temporal table provides a more complete story than the net temporal table.

Table B9. Temporal FOD results, FOD_temp_1.csv

Area

stat_

stat_

stat_

stat_

stat_

stat_

boot_

boot_

boot_

boot_

boot_

boot_

1992

1992

1996

1996

2004

2004

1992

1992

1996

1996

2004

2004

1996

2004

1992

2004

1992

1996

1996

2004

1992

2004

1992

1996

National

1

1

1

0.3

0.99

0.95

Rural

1

1

0.28

0.71

0.45

Urban

1

0.22

0.11

0.03

Central

0.14

0.14

0.13

Eastern

1

0.35

0.42

0.18

Lake

1

0.62

0.17

Northern

1

1

0.05

0.02

0.68

0.85

S_Highlands

1

1

0.01

0.13

0.67

0.46

Southern

1

1

0.06

0.55

0.69

Western

1

0.27

0.23

0.12

Zanzibar

1

1

1

0.22

0.99

0.86

Source: Based on calculations in Arndt et al. (2014) using the 2004/5, 2010 TDHS (National Bureau of Statistics and Macro 2005, 2011)

Table B10. Net temporal FOD results, FOD_net_temp_1.csv

Area

stat_

stat_

stat_

boot_

boot_

boot_

1996

2004

2004

1996

2004

2004

1992

1992

1996

1992

1992

1996

National

1

1

1

0.3

0.99

0.95

Rural

1

1

0.28

0.71

0.45

Urban

1

0.22

0.11

0.03

Central

0.14

0.14

0.13

Eastern

1

0.35

0.42

0.18

Lake

1

0.62

0.17

Northern

1

1

−0.03

0.68

0.85

S_Highlands

1

1

0.12

0.67

0.46

Southern

1

1

0.06

0.55

0.69

Western

1

0.27

0.23

0.12

Zanzibar

1

1

1

0.22

0.99

0.86

Source: Based on calculations in Arndt et al. (2014) using the 2004/5, 2010 TDHS (National Bureau of Statistics and Macro 2005, 2011)

For example in Table B9, both 1996 dominates 1992 and 1992 dominates 1996 with positive probabilities in Northern and Southern Highlands. In Table B10, with the exception of Northern and Southern Highlands, static and bootstrap net domination results are the same as those in Table B9. Net domination is different in the case of Northern and Southern Highlands because there are small probabilities of 1992 dominating 1996.

B.3.5.3 Area Rankings

Table B11. FOD rankings, table_rank_1.csv

Area

Net

PNet

Rank

Net

PNet

Rank

Change

Domination

Domination

Domination

Domination

2004

2004

2004

2010

2010

2010

Eastern

518

0.74

1

504

0.72

1

0

Zanzibar

208

0.2971429

2

144

0.2057143

2

0

S_Highlands

−154

−0.22

6

−17

−0.0242857

3

−2

Northern

38

0.0542857

3

−54

−0.0771429

4

1

Lake

−97

−0.1385714

5

−80

−0.1142857

5

−2

Southern

−62

−0.0885714

4

−83

−0.1185714

6

−2

Western

−191

−0.2728571

7

−143

−0.2042857

7

1

Central

−260

−0.3714286

8

−271

−0.3871429

8

4

Source: Based on calculations in Arndt et al. (2014) using the 2004/5, 2010 TDHS (National Bureau of Statistics and Macro 2005, 2011)

Area ranking tables use spatial bootstrap FOD results to compare areas based on the net probability of domination, which measures the average probability that an area (p.338) (p.339) dominates all other areas minus the average probability of the same area being dominated by all other areas. If the same areas are presented in the spatial and rank tables, the probability of net domination is equivalent to the spatial ranking row average minus the column average. The do-file 046_Table_Rank.do produces separate tables for each category. See Table B11 for an example of ranking outcomes.

Ranking results should be interpreted carefully. Because bootstrapping results may vary from one execution of FOD to the next, rankings may be sensitive to small perturbations. The difference in net domination scores is often insufficiently large to distinguish between differences in welfare outcomes and variability (p.340) introduced through random bootstrapping. For example, in Table B11, the difference in net domination between Lake and Southern in 2010 is extremely small. However, even the difference between Northern and Lake may not be robust to bootstrap variation.

B.4 Alternative Specifications

Thus far, the language in this description of the EFOD software has been geared towards welfare analysis of areas over time. However, EFOD is flexible and can be applied to alternative specifications. This section provides three examples of possible variations.6

  • To this point, the discussion has focused on analysis within a single country. Alternatively, welfare comparisons can be made internationally. With comparable indicators, areas could be specified as individual countries yielding spatial FOD comparisons between countries and temporal FOD comparisons for each country.

  • Thus far, population groups have been discussed independently of each other. However, FOD comparisons can be made between populations if the analyst defines the area parameters to specify population groups instead of areas. See Chapter 14 where Mahrt and Masumbu specify FOD comparisons in Zambia by rural economic activity and urban housing cost areas. One area variable would now classify the different population groups, similar to the category variable in the standard format. If areas are still of interest, additional area variables can also be used to compare the population groups to aggregate areas such as urban/rural.

  • In analysis focused on a single population group, say households, the category variable could be used to specify different sets of indicators. In this context, the category variable would serve merely to signal each set of indicators rather than defining different populations. For example, category one could include a set of health indicators, category two could contain a set of shelter indicators, and category three could contain a set of education indicators. For a given set of indicators, spatial FOD comparisons would be made across areas and temporal analysis over time for each area. FOD analysis within each indicator category would be conducted independently, thus highlighting the relative performance of areas for each set of indicators.

Notes:

(1) Versions prior to Stata 11 will require modifications such as reverting to the old merge syntax.

(2) Note, EFOD cannot run with the free GAMS licence; a licence must be obtained.

(3) Note, Stata creates and deletes a temporary folder called temp.

(4) If the area variables have value labels, it is easy to find the relationship between area names and area numbers using the Stata commands tabulate or label list lblname, where lblname is the variable’s value label. The command describe is useful to determine the name of the value label.

(5) Note, bootstrap samples are drawn randomly, and without intervention each execution of EFOD could produce a different set of bootstrap samples. However, the capacity to replicate results is desirable and is possible by specifying a ‘seed’. On a given machine and with a given set of input data, the seed forces the same bootstrap samples to be drawn, allowing results to be reproduced.

(6) Note that the FOD code stream requires area 1 to specify the entire population, and must continue to do so.