Jump to ContentJump to Main Navigation
Measuring Poverty and Wellbeing in Developing Countries$

Channing Arndt and Finn Tarp

Print publication date: 2016

Print ISBN-13: 9780198744801

Published to Oxford Scholarship Online: January 2017

DOI: 10.1093/acprof:oso/9780198744801.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (www.oxfordscholarship.com). (c) Copyright Oxford University Press, 2018. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a monograph in OSO for personal use (for details see www.oxfordscholarship.com/page/privacy-policy).date: 18 January 2019

(p.305) Appendix A User Guide to Poverty Line Estimation Analytical Software—PLEASe

(p.305) Appendix A User Guide to Poverty Line Estimation Analytical Software—PLEASe

Measuring Poverty and Wellbeing in Developing Countries

Channing Arndt

Ulrik Beck

M. Azhar Hussain

Kristi Mahrt

Kenneth Simler

Finn Tarp

Oxford University Press

A.1 Introduction

This technical guide presents the Poverty Line Estimation Analytical Software (PLEASe). PLEASe comprises a flexible set of Stata and GAMS codes designed to estimate regional poverty lines using household budget survey data. In this approach, the estimation of absolute poverty lines is rooted in the cost of basic needs method, which forms the core PLEASe code stream. Specifically, poverty lines are based on the typical consumption patterns (food bundles and prices) of the reference population (relatively poor households). The cost of food bundles, which attain minimum caloric needs, at prices paid by relatively poor households, yields a food poverty line. The total poverty line is determined by the sum of the food poverty line and the cost of non-food items for households with total consumption levels close to the food poverty line.

Some key aspects of the default PLEASe approach merit mentioning. First, the typical consumption pattern of the reference population, poor households, is estimated using an iterative procedure to identify which households are deemed poor. Second, the approach recognizes the value of accounting for differences in regional and temporal consumption patterns. Thus, the approach allows poverty lines to be estimated in multiple spatial domains based on flexible consumption bundles that vary over time and space. Third, revealed preference tests are evaluated to determine whether regional and temporal consumption bundles represent a consistent level of utility. Finally, if these revealed preference conditions fail, a minimum cross-entropy methodology is employed to adjust consumption bundles to satisfy constraints. The reader is referred to Chapters 2 and 4 for a more detailed discussion.

This guide aims to provide the information needed to apply PLEASe to poverty line estimations in a multitude of country settings. The goal is not to give the user prepackaged software, but to provide a launching point such that, with relevant modifications to data, parameters, or the code stream, the software can be appropriately adapted to (p.306) accommodate country-specific circumstances. With slight modifications, it is straightforward to implement a large array of approaches. For example, while the PLEASe code stream was designed to estimate poverty lines based on flexible utility-consistent regional consumption baskets, the code can be modified to accommodate alternative approaches, including but not limited to regional baskets fixed over time, a single national consumption basket priced at national prices, or a single national basket priced at regional levels. In addition, revealed preference constraints can be imposed on flexible baskets spatially, spatially and temporally, or not at all.

This appendix focuses on specific details of understanding and implementing the PLEASe code stream. After presenting data and software requirements in section A.2, section A.3 presents the code stream step by step, including required inputs. Section A.4 provides guidelines to assembling necessary datasets. Lastly, section A.5 provides some final thoughts.

A.2 Requirements

A.2.1 Software

The PLEASe package is executed in Stata and GAMS. High skill levels in Stata are a distinct advantage. Only a basic understanding of GAMS is needed. The Stata code was produced using Stata version 12; however, the code will run in Stata 11.1 The GAMS code will run on versions 22.7 and later.

A.2.2 Data

Estimating poverty lines using the PLEASe software requires disaggregated household budget survey data—specifically, consumption expenditures for food at the household and product level and non-food expenditures at the household level. For food items, quantities are also necessary for estimating food prices (unit costs). If consumption quantities are not reported, local prices must be obtained from an alternative source, such as community price surveys. Additional required data include household-level survey data (survey periods, regions, household size, and household weights), individual data (age, sex, and the presence of a child’s mother in the household), fertility rates by age and urban/rural area, and calories per gram of food items. Greater detail on compiling datasets is provided in section A.4.

A.3 Description of the Code Stream

This section presents the specifics of the code stream and inputs necessary to adapt the software to individual country cases. While more substantial changes to the code may be desired to adapt the software to specific conditions in each country, this section focuses on basic requirements. The beginning of each subsection lists the relevant code files as well as an overview of required modifications.

(p.307) A.3.1 Directory Structure

The PLEASe directory consists of a master directory containing a subdirectory of the code files as well as subdirectories for each survey year. The provided PLEASe code contains the code subdirectory named new. The user must create subdirectories containing the input data for each survey year. Survey directories can be named as desired. Within each survey directory all input data will be provided by the user in the subdirectory in. The initial structure is as follows:







The PLEASe code will create several subdirectories as needed. After the initial execution the directory structure will appear as follows:















Where work contains working datasets, out contains final results, and rep contains logs from each Stata do-file. The subdirectory out/t_plus1 contains data necessary for revealed preference tests in the subsequent time period.

A.3.2 Initialization

000_boom.do, 010_initial_$year.do, 060_in_2_work.

  • In 000_boom.do, three global macros must be specified: the file path to the PLEASe directory, the name of the subdirectory of the survey year being analysed, and if applicable, the subdirectory of the prior survey year.

  • In 010_initial_$year.do, global macros used throughout the code stream are defined.

Table A1. Global macros

Global Macro






file path to PLEASe

file path



name of the directory containing files for the year of analysis

directory name



name of the directory containing files for the previous year of analysis

If only one year or the first of a series of years is being analysed, this global is left blank.

directory name



code that will determine how food products will be selected in the code stream


food_cat = 1



number of spatial domains

1, 2, 3…



percentile that defines the relatively poor for the TPI




specifies how food products in the TPI will be chosen

0, top food items

1, specified set of food items.



if product_tpi_switch = 0, specifies the number of food items in the TPI basket

number of food items



if product_tpi_switch = 1, specifies code that will select the set of products




number of TPI regions

1, 2, 3, 4…



time unit for TPI adjustments




number of TPI periods

1, 2, 3, 4…



initial percentile for determining the relatively poor




number of iterations

1, 2, 3…



controls whether the relatively poor are determined by regional or national bottom percentiles

0, spatial domain

1, nation



denotes the round of iterations (automatically set within 100_iterate.do)

1, 2,…$it_n


revealed pref.

specifies whether temporal revealed preference constraints will be checked

* no temporal“ ” temporal


revealed pref.

GAMS revealed preference file

spatial only: 250_spat_consistent.bat

spatial and temporal: 255_spat_temp_consistent.bat

Source: See text

The Stata do-file, 000_boom.do, is the master file from which the entire code stream can be executed. This file, therefore, also functions as a table of contents of all Stata and GAMS code files. The code stream relies on a number of global macros that are set in (p.308) 000_boom.do and 010_initial_$year.do. These globals allow customization of the specific aspects of the code without the need to directly modify individual Stata do-files. Every global should be reviewed and set accordingly. See Table A1 for more detailed descriptions of the globals.

(p.309) A.3.3 Consumption Aggregates

Assembling and fine-tuning consumption data to conform to the PLEASe format is time-consuming and requires care. It is certainly an important task in implementing PLEASe. However, the steps needed to prepare the data are specific to each survey and therefore cannot be standardized. The do-files used to compile consumption data from Mozambique household surveys are provided for reference purposes only and are not incorporated into the code stream. Rather, for each survey a new set of do-files must be created.

A.3.4 Working Datasets


File 060_in_2_work preserves original user-provided datasets by creating a set of working datasets that are saved in the work directory. This file also merges the consumption dataset, cons_nom_in.dta, with household data and produces two convenient datasets for later use, cons_nom.dta and cons_nom_trans.dta. The differences between these two datasets hinges on the availability of transaction-level data. Some surveys report food consumption at the transaction level, i.e. consumption values and quantities are reported separately each time a household acquires a particular food item. Other surveys only report the total value and quantity of each product consumed during the recall period. If consumption values are available at the transaction level, cons_nom.dta is collapsed to a single observation per product, per household. The dataset cons_nom_trans.dta retains transaction-level data for food product pricing. These datasets are essentially the same if food consumption is not available at the transaction level, though cons_nom_trans.dta keeps only food products while cons_nom.dta keeps food and non-food products. After the temporal price index is created, consumption in cons_nom_trans.dta is temporally adjusted for price calculations.

A.3.5 Caloric Needs and Content


This file calculates the average per person daily caloric requirements in each spatial domain. Using individual-level data contained in indata.dta, caloric needs are set according to sex and age, with adjustments for the probability of breastfeeding and pregnancy. We employ age and urban/rural-dependent fertility rates from other statistical sources to estimate caloric needs for women. Individual caloric requirements contained in the do-file, 070_calpp.do, are based on international standards for moderately active individuals and are applicable to all countries (WHO 1985). However, fertility rates are country-specific and must be provided by the user in fert_rate.dta.

To account for pregnancy, we assume that pregnant women need 285 additional calories in the last trimester of pregnancy. Since one trimester is three months, or one fourth of a year, the probability that a given woman is in the third trimester of pregnancy is the relevant fertility rate divided by four. Applying the probability of pregnancy to all women is appropriate as food poverty line calculations are based on average caloric needs in a spatial domain rather than individual needs. The resulting caloric requirement for women is thus a standard requirement of 2100 plus 285 times the probability of being in the third trimester of pregnancy.

(p.310) Caloric needs are also adjusted to account for the additional 500 calories per day required by breastfeeding mothers. The assumption that all children under six months of age whose mothers live in the household are breastfed allows the breastfeeding caloric requirement to be added to the caloric requirements of all children under one. Assuming that 60 per cent of children under one are less than six months old, we add 300 calories to the daily requirements of all children under one whose mothers reside in the household. As with fertility rates, these assumptions are appropriate as average caloric requirements by spatial domain are used in food poverty line calculations. If information on the presence of the mother in the household is not available, one approach is to assume that all children under six months are breastfed and set the variable motherhh to one for all children.

A.3.6 Per Capita Consumption


For convenience, the dataset conpc.dta is created which contains per capita nominal food, non-food, and total household consumption. This file also creates the share of food consumption of total expenditures and per capita calorie consumption. In the next step, the intra-survey temporal price index is used to generate temporally adjusted per capita consumption variables in conpc.dta.

A.3.7 The Temporal Price Index


The intra-survey temporal price index (TPI) allows temporal adjustment of consumption values to account for seasonality of prices and associated variations in purchasing power. TPI calculations involve four key steps. The first step identifies households with per capita nominal household consumption in the bottom X percentile as specified by the global tpi_bottom. Consumption at this percentile is used as a cut-off to define the relatively poor throughout the TPI calculations.

Second, a TPI food basket is identified that contains the most important food items in each TPI region. This step can be accomplished in one of two ways, which the user specifies with the global product_tpi_switch. By default, food items with the highest weighted expenditure shares among the relatively poor are selected. The number of items in the food basket is determined by the global product_tpi_n. Alternatively, the user may specify the global product_tpi to include particular food products.

Third, unit value prices for TPI food basket items are calculated. Before computing unit prices, we toss out the top and bottom 5 per cent of household-level prices for each region and product combination, eliminating the influence of these potential outliers. Then, using sample and quantity weighting, household-level consumption quantities and expenditures for each product, region, and time period are aggregated. From these regional aggregates, unit prices are calculated.

Finally, we determine the consumption share of each item in the food basket of relatively poor households. Because the TPI basket comprises a subset of all food consumption, average regional product shares of food basket items are scaled so that shares sum to one in each region. Using these shares as weights, we calculate a price (p.311) index of food basket prices by region and quarter. Then for each region we normalize the index by dividing each quarterly index value by the value of an arbitrary quarter.

The last step in this do-file temporally adjusts food expenditures in the data files conpc.dta and cons_nom_trans.dta. Note that only food expenditures are deflated with the TPI. Total real consumption is therefore the sum of TPI-adjusted food consumption and nominal non-food consumption. At this point, all temporal deflation is complete.

A.3.8 Consumption Statistics


This file provides initial descriptive statistics of the consumption aggregate that may be useful both in understanding the consumption aggregate and in troubleshooting poverty line estimation. Among the values reported is per capita calorie consumption based on reported food consumption, which can help identify calorie under-reporting. Calorie under-reporting may be an issue for a variety of reasons such as a more diverse array of food consumed than the survey food recall lists account for or a failure to report food consumed outside the home. Calorie totals may also fall short of actual calorie consumption due to inaccurate mapping from reported consumption to actual calories consumed. A mismatch between per capita consumption and per capita calorie consumption may signal a problem with the consumption aggregate or the reported calories per gram, or it may identify an underlying shortfall in the survey data. See DNEAP (2010) for a detailed discussion of calorie under-reporting in Mozambique.

A.3.9 Estimating Poverty Lines with an Iterative Procedure


Food prices, baskets, and the resulting poverty lines are calculated for relatively poor households using an iterative procedure to ensure that poverty lines are based on the consumption patterns of poor households. In a preliminary iteration, relatively poor households are identified as those with temporally adjusted per capita consumption in the bottom X per cent, where the macro global bottom specifies a national cut-off X. The consumption patterns of these households yield food prices, food baskets, and poverty lines for each spatial domain. As regional poverty lines reflect regional variations in the cost of attaining the same standard of living, it is possible to calculate a spatial price index with which (already temporally deflated) per capita household consumption is spatially deflated. Spatially adjusted poverty lines applied to real consumption yield poverty headcount rates. These poverty headcount rates provide updated spacial-domain-specific cut-off percentiles, and together with real per capita consumption, form the basis for identifying relatively poor households in the subsequent iteration. With an updated set of relatively poor households, food baskets, and food prices, a new set of poverty lines and poverty headcounts are calculated. This process continues until the poverty rates converge. Convergence generally occurs within five iterations. The number of iterations is set with the global it_n.

The do-file, 100_iterations.do, runs the iterative process by first selecting relatively poor households in each iteration and then executing four subsequent do-files that calculate food prices (110_price_unit.do), food bundles (120_food_basket_flex.do), poverty lines (130_povline_flex.do), and the spatial price index and Foster–Greer–Thorbecke (p.312) (FGT) class of poverty measures (140_povmeas_flex.do) (Foster et al. 1984). The process is executed for a preliminary iteration (called iteration 0) and the subsequent 1 through it_n iterations.

A.3.9.1 Food Prices


This do-file generates unit food prices by spatial domain based on the consumption of relatively poor households. After tossing out the top and bottom 5 per cent of household-level prices, several methods are employed to calculate the price of each product in each spatial domain. By default, PLEASe uses the value share weighted mean price per gram; however, alternative price specifications are possible and are calculated in this do-file. Prices are recalculated, in each iteration, using the updated set of relatively poor households.

A.3.9.2 Flexible Food Bundle


Spatial-domain-specific flexible food baskets include the bundle of most commonly consumed food products by the relatively poor. The dataset is restricted to relatively poor households and food products with quantities, calorie information, and prices based on at least ten observations.2 Food expenditures on items such as restaurant meals are often reported without quantities or lack calorie data and in these instances are not used in poverty line calculations. By spatial domain, each product’s weighted share of total food expenditures among relatively poor households is determined. The food basket contains those products comprising the top 90 per cent of food expenditures. The rationale for restricting analysis to the top 90 per cent is that the bottom 10 per cent tends to contain a great number of food items typically consumed by relatively few households. It is appealing to exclude such items and limit the consumption bundle to items consumed by a larger share of poor households. The process of calculating food bundles is repeated in each iteration using the reselected group of relatively poor households.

A.3.9.3 Flexible Poverty Lines and Poverty Measures

130_povline_flex.do, 140_povmeas.do.

For each spatial domain, the food poverty line represents the cost of meeting regional daily per person calorie requirements with each food basket item contributing according to its regional share of total consumption. The food basket represents 90 per cent of expenditures, which is assumed to meet 95 per cent of calorie requirements. Once the food poverty line is derived, it is scaled to reflect 100 per cent of food expenditures and therefore the cost of meeting 100 per cent of the regional calorie requirement.

Appendix A User Guide to Poverty Line Estimation Analytical Software—PLEASe

Figure A1. Extra household weights used to estimate non-food expenditure

The non-food poverty line is the weighted average of non-food expenditures for households with total per capita expenditures within 20 per cent of the food poverty (p.313) line. A triangular weighting scheme is used to give greater weight to those with expenditures closest to the food poverty line (Figure A1). The total poverty line is simply the sum of the food and non-food poverty lines.

Finally, this do-file executes 140_povmeas.do to calculate the spatial price index and FGT poverty measures. The poverty headcount defines the next round’s bottom percentile for determining the relatively poor. Though poverty lines are calculated for specific spatial domains, the FGT poverty measures can be calculated for any area by finding the average number of households in that area with real per capita consumption falling below the relevant spatial domain’s poverty line. This file outputs poverty rates for the nation, rural/urban, TPI regions, spatial domains, strata, and a regional variable, news.

The do-file, 140_povmeas.do, is also executed later in the code stream to calculate the fixed (based on the previous time period’s entropy-adjusted food basket and current period prices) and flexible entropy-adjusted spatial price indices and poverty measures.

A.3.10 Preparing Data for Revealed Preference Conditions

200_rev_pref_spat_1.do, 210_rev_pref_temp_1.do, 220_rev_pref_temp_2.do.

Files 200 to 220 prepare datasets containing prices, quantities, poverty lines, regional caloric requirements, and calories per gram from the current survey period and the previous period, when applicable, for use in revealed preference tests. Refer to Chapters 2 and 4 for discussions of revealed preference tests and the entropy adjustment procedure. Recall that revealed preference conditions are expressed in the following three constraints, where i indexes food products; r and its alias, s, represent the set of spatial domains; and p1,p2,q1,q2 represent prices and quantities in the first and second time period.




A.3.10.1 Spatial Revealed Preferences

The do-file 200_rev_pref_spat.do exports product codes, prices, quantities, calories per gram, regional caloric requirements, and the food poverty line from the last iteration for spatial revealed preference tests and entropy corrections in GAMS. This do-file also conducts initial spatial revealed preference tests (constraint A.1). These tests simply provide an initial look at spatial revealed preference outcomes. Spatial revealed preference tests used in analysis and entropy adjustments are conducted in the GAMS files 250_spat_consistent.gms and 250_spat_consistent.bat.

In testing spatial revealed preference conditions, we can effectively compare the food poverty line in region r to a food poverty line calculated using the region r prices and the region s food basket. To ensure comparability, the poverty lines must reflect the same calorie target. Product codes, prices, quantities, calories per gram, regional calorie requirements, and food poverty lines are exported for analysis in GAMS.

A.3.10.2 Temporal Revealed Preferences

The do-files 210_rev_pref_temp_1.do and 220_rev_pref_temp_2.do prepare data from the previous survey period and the current survey period to test the temporal revealed preference constraints (constraints A.2 and A.3). These do-files are only executed when conducting intertemporal comparisons as determined by the global, no_temp_rev. Both files use match_t1_t2.dta to harmonize product codes between the two surveys, enabling prices and quantities to be matched between surveys.

The do-file, 210_rev_pref_temp_1.do, addresses constraint A.2. The right-hand side of the equation is simply the current survey food poverty line. To calculate the left side, we calculate a fixed poverty line using previous period quantities and current period prices. However, several adjustments are required. First, because quantities in each period are scaled to meet regional calorie requirements, it is necessary to account for the fact that regional calorie requirements are likely to be different between survey periods. Specifically, we scale previous period quantities to reflect current period caloric requirements. Second, because not all items in the previous food basket were consumed in the current period, we account for missing products when scaling the poverty line to meet 100 per cent of expenditures. This fixed poverty line is exported for use in GAMS.

The do-file, 220_rev_pref_temp_2.do, prepares data to evaluate revealed preference constraint A.3. The right-hand side of constraint A.3 is the previous period poverty line. Previous period prices, after harmonizing product codes, and poverty lines are exported for use in GAMS.

A.3.11 Fixed Poverty Lines and Poverty Measures

230_ povline_fix.do, 140_povmeas.do.

Do-file 230_povline_fix.do calculates total poverty lines from the fixed food poverty lines. The fixed poverty lines were derived in 210_rev_pref_temp_1.do and reflect previous period consumption bundles evaluated at current period prices. This file is only executed when conducting intertemporal comparisons as determined by the global, no_temp_rev. The total poverty line in the fixed case is calculated differently than in the (p.315) flexible cases (e.g. in 130_povline_flex.do and 270_povline_flex_ent.do). As previous period non-food consumption evaluated at current period prices is not available, the previous period food poverty line to total poverty line ratio is used to derive the total poverty line. Finally, 140_povmeas.do is executed to calculate spatial price indices and FGT poverty measures.

A.3.12 Revealed Preference Tests and Entropy Adjustments

250_spat_consistent.gms, 250_spat_consistent.bat 255_spat_temp_consistent.gms, 255_spat_temp_consistent.bat.

  • 250_spat_consistent.bat (spatial revealed preferences only) or 250_spat_temp_consistent.bat (spatial and temporal revealed preference constraints) must be adapted such that the first line points to the directory containing the GAMS software. This allows Stata to initiate the GAMS files as GAMS is on the DOS path.

The GAMS file 250_spat_consistent.gms tests spatial revealed preference constraints while 255_spat_temp_consistent.gms tests spatial and temporal revealed preference constraints. Both files perform entropy adjustments to resolve any utility inconsistency among food bundles, thereby producing utility-consistent consumption bundles and food poverty lines. These files may be run directly in GAMS or may be shelled from Stata via 250_spat_consistent.bat or 255_spat_temp_consistent.bat. As noted, in order for the bat files to work, the first line must be modified to point to the directory that contains the GAMS software so that GAMS is on the DOS path.

Poverty lines are calculated to meet specific calorie requirements that vary by time and space. To ensure comparability in revealed preferences, poverty lines must reflect the same calorie target. This is accomplished in two ways. First, poverty lines based on the previous period’s quantities (ip1ir*q1ir, and ip2ir*q1ir) are scaled to reflect a calorie target of 2150. Second, poverty lines based on the current period’s quantities, which are endogenous to the model, are not scaled. Rather, in entropy maximization, q2ir and q2is are constrained to satisfy revealed preferences while attaining a calorie target of 2150. Once entropy adjustments have been made, current period quantities and food poverty lines are scaled to satisfy each spatial domain’s caloric requirements.

The output file povline_rp_inout.csv merits a brief explanation. This file contains two matrices in which rows represent row spatial-domain-specific food bundles (quantities) and columns represent column spatial-domain-specific prices. The values in the matrix identify the row spatial domain’s food bundle evaluated at each column spatial domain’s prices. The diagonal values are the row/column spatial domain’s food poverty lines. The top matrix presents pre-entropy values. Reading down a column, any value less than that spatial domain’s poverty line (the diagonal value) violates revealed preferences. If the bundles represent the same level of utility, a rational consumer would choose the least-cost basket. Pairs are mutually consistent when revealed preferences are satisfied for region A as compared to region B and vice versa. The bottom matrix presents post-entropy quantities evaluated at each region’s prices. All values satisfy revealed preferences and all pairs are mutually consistent.

(p.316) A.3.13 Flexible Entropy-Adjusted Poverty Lines and Poverty Measures

260_povline_ent_flex.do, 140_povmeas.do.

Do-file 260_povline_ent_flex.do closely follows 130_povline_flex.do to obtain entropy-corrected, utility-consistent food poverty lines and the associated non-food poverty and total poverty lines. It executes 140_povmeas.do to calculate spatial price indices and FGT poverty measures.

A.3.14 Preparing for the Next Survey


This do-file saves data for intertemporal comparisons in subsequent surveys. Previous period data is required for temporal revealed preference tests and for calculating fixed food bundles and poverty lines. Files needed for revealed preference tests in the next survey are saved in the folder out/t_plus1. In the last iteration of price calculations, all prices are saved to out/t_plus1/price_unit_t1.dta. This do-file saves quantities, prices, calories per person, product-level food expenditure shares, food poverty line ratios, and the food poverty line to out/t_plus1/food_pov_t1.dta.

A.3.15 Output

All summary statistics produced during the execution of PLEASe can be accessed in do-file logs in the rep folder. Intermediate working datasets are saved in work. Descriptive statistics, food baskets, the FGT poverty measures and poverty lines derived from the final-iteration flexible bundles, the entropy-adjusted utility-consistent flexible bundles, and the fixed bundles are saved in out. Files in out are saved as comma-separated text files and can be opened in Excel.

A.3.16 Executing PLEASe

The entire code stream, including the GAMS code, can be run from 000_boom.do. Once the initialization file is executed, most do-files may also be run individually. The iterative process, do-files 100 to 140, is run through 100_iterate.do. After the iterative process is run once, do-files 110 to 140 may be run individually. Note the postscript on many file and variable names refers to the iteration. Specifically, the global macro pass indicates the iterative pass where 0 indicates the initial iteration.

A.4 Compiling Data

This section provides general descriptions of the required data as well as guidelines to compiling data. As each survey differs both in terms of data provided and the structure of raw datasets, these are merely guidelines and the analyst must be knowledgeable about the household survey and other supplemental data. Tables A2A6 provide specific information about the structure of each user-provided dataset.

A.4.1 Household Data

Table A2. Household characteristics and interview details

Variable names:


primary sampling unit


bootstrap weights: set = 1 for all households


interview survey quarter—must be sequential


interview survey month—must be sequential


household sample weight


household ID


household size


geographical stratification variable used in the survey sample design


0,1: 1 if rural, 0 if urban


regions used for temporal price index calculations (1,2,3…)


poverty lines are constructed for each spatial domain (1,2,3…)


geographical regions such as north, east, central, west, south (set = 1 for all areas if not relevant)

Dataset name:



One record per household


Both survmon and survquar may be included in the dataset. It is also possible to include one or the other.

When selecting regions for TPI calculations, consider that the TPI is calculated by region and survey quarter/month using the consumption patterns of the relatively poor. If the TPI region is too small, it is possible for an area to have no relatively poor households in a given time period.

If no TPI calculations are necessary set reg_tpi, survquar, and/or survmon equal to one for all observations.

Source: See text

The dataset hhdata.dta contains household characteristics and survey information, including regions, survey periods, sample strata, household size, and household weights. Required variables are described in Table A2.

(p.317) Regions for defining poverty lines (spdomain) should be chosen based on statistically representative areas, with an aim to preserve urban and rural areas, to preserve homogenous regions in terms of prices and preference, and to maintain a minimum number of households per region (DNEAP 2004). Because poverty lines are based on the consumption patterns of the poor, each spatial domain should include no fewer than 200 households and ideally spatial domains should include significantly more poor households.

The intra-survey temporal price index is calculated by region (reg_tpi) and by survey quarter (survquar) or month (survmon). Therefore, TPI regions are likely to need to be more aggregated than those used for defining poverty lines in order to maintain a minimum sample size. If the TPI region is too small, it is possible for an area to have few relatively poor households in a given time period. Furthermore, because sampling is not necessarily representative by survey period, care must be taken to ensure that a sufficient number of relatively poor households are present within each region, in each period. If a survey is conducted in a single quarter, it is possible to avoid the intra-survey TPI adjustments by setting all TPI regions and all survey periods to one. The resulting TPI equals one and thus no temporal adjustments are made.

A.4.2 Individual Data

Table A3. Individual demographics

Variable names:


household ID


1,2: 1 = male, 2 = female




0,1: 1 if a child’s mother lives in the household

If this variable is not available, set the value to 1 for all children.

Dataset name:



One observation for each member of each surveyed household


If data on the presence of mothers in households is not available, set motherhh to 1 for all children.

Source: See text

The dataset indata.dta contains the demographics of individual household members needed to calculate spatial-domain-specific caloric requirements. Specifically, three (p.318) variables are defined for each household member: sex, age, and an indicator identifying whether a child’s mother lives in the household (motherhh). The information from this dataset is used to estimate average per person caloric requirements in each spatial domain. If information on the presence of a child’s mother in the household is not available, set motherhh to one for all children. See Table A3 for details on variable names and formatting.

A.4.3 Fertility Rates

Table A4. Fertility rates

Variable names:


2 = female


0,1: 0 = urban, 1 = rural




birth rates

Dataset name:



One observation for every relevant age by urban and rural areas.

Source: See text

Data on birth rates are used to adjust average regional caloric needs. The standard code requires the dataset fert_rate.dta, containing fertility rates by urban/rural area, and age. The user must provide this data. See Table A4 for dataset format details.

A.4.4 Food Calories

Table A5. Caloric content of food items (calories per gram)

Variable names:


food product code (consistent with cons_nom_in.dta)


product description (consistent with cons_nom_in.dta)


caloric content of food product: calories per gram


source of calorie information, e.g. FAO, web page, etc. (optional)

Dataset name:



One observation for each food product.


Make sure the food product code is correct for each survey year.

Source: See text

The dataset calperg.dta provides the caloric content of commonly consumed food items. These data are needed for items in each region’s food consumption bundle. Ideally, no item should be dropped from analysis due to missing caloric data and attempts should be (p.319) made to update the data as necessary.3 The unit of measurement is calories per gram of a given food item. See MPF/UEM/IFPRI (1998) for information on compiling food calorie data. In addition to national departments of health or agriculture, possible sources include West et al. (1987, 1988), the US Department of Agriculture (1998), and the US Department of Health, Education, and Welfare (1968). See Table A5 for dataset format details.

A.4.5 Product Codes

Table A6. Product code matching

Variable names:


product code in the current survey, t2


product code in the previous survey, t1

Dataset name:



One observation for each product, product_t1 combination.


The dataset should contain one line for each item in the most disaggregated survey.

Source: See text

The dataset match_t1_t2.dta contains product codes for the current survey and the previous survey. It is necessary to match product codes in order to match product prices and quantities across surveys. Due to different product code aggregations across surveys, a single product code in one survey may correspond to multiple product codes in the other survey. The code stream is equipped to handle this possibility. The dataset should have one line for each item in the most disaggregated survey. For example, if the current survey only records consumption for the category grains but the previous survey is disaggregated with categories maize, millet, sorghum, the product code for grain should be entered three times under the variable product with the corresponding product codes for maize, millet, and sorghum in product_t1. See Table A6 for variable and format details.

A.4.6 Consumption

Table A7. Total value and quantity of consumed products (food and non-food)

Variable names:


household ID


product code


product description


0, 1: 0 = non-food, 1 = food


product categories: 1,2,…,12, e.g. COICOP codes


quantity of food product consumed, daily values in kilograms


total household expenditure on given product (food and non-food), daily values

Dataset name:



One observation per household per product per transaction (if possible) for food products.

One observation per household per food product is also acceptable when transaction-level data is not available.

One observation per product per household for non-food products.


All consumption values are nominal.

Convert all food quantities measured in non-kilogram units to kilograms. This includes food items not measured in weight units, such as litres. Retain documentation of the conversion factors.

Include food consumption for food items even when quantities are not available. Though items without quantity data cannot be included in poverty line calculations, they should be included in total consumption.

Source: See text

The dataset cons_nom_in.dta provides consumption values for all food and non-food items and quantity data for food items, by household and product (see Table A7 for dataset format). Not all surveys collect quantities of food consumed. In such cases, prices must be obtained from other sources, such as community surveys, in order to calculate quantities.4 If available, transaction-level data should be reported for each (p.320) (p.321) food item. Specifically, a separate observation should be entered each time the household acquires a food item. Many surveys do not provide transaction-level food expenditures and quantities. While transaction-level data is not required, it is useful in price calculations. Both quantity and expenditures should be converted to daily values. The quantity of food items consumed must be converted to kilograms.

Potentially the most important and messiest aspect of assembling data is carefully checking consumption data for errors and outliers. Unit errors can have surprisingly large impacts on poverty lines. Errors can occur both in the reported unit and in the conversion to standard units (e.g. kilograms or grams). Consumption values and quantities should be scrutinized and cleaned to eliminate the undue influence of outliers.

This section provides an overview of general guidelines for assembling consumption data; see Deaton and Zaidi (2002) for a thorough discussion of consumption aggregates. In addition, the PLEASe code provides the do-files employed to transform raw Mozambique survey data to cons_nom_in.dta. These files are provided for reference only, since survey formats vary significantly from country to country and sometimes within a country over time. Nonetheless, the actual code as well as the documentation within each file may be informative.

While ideally all household consumption would be included in the household consumption aggregate, practically some categories of consumption are purposely excluded. Imputing consumption values of home-produced services and public goods and services is impractical and these categories of consumption are excluded. Notably the value of free public education is excluded (see Chapter 2).

A.4.6.1 Food Consumption

Food consumption includes all food consumed by all members of the household, including food consumed away from home. Sources of food consumption include purchased food and meals, home-produced food, and food gifts, subsidies, remittances, and in-kind payments.

Not all surveys report values for in-kind and home-produced food consumption. When self-reported prices are not available, prices must be obtained from an additional source, such as community surveys. If possible, the distinction between farm-gate prices and market prices should be considered (Deaton and Zaidi 2002).

A.4.6.2 Non-Durable and Semi-Durable Non-Food Consumption

Non-food consumption incorporates the consumption of non-durable, semi-durable, and durable goods, and rent. This section addresses non-durable and semi-durable goods, which are goods purchased and consumed over a relatively short period of time. This is a broad category and includes everything from laundry soap and fuel, to clothing and housewares, to internet services, health insurance, and private education. Values of non-food items may be collected for a number of recall periods and must be converted to daily values. Quantities consumed are generally not collected and are not used in the PLEASe analysis.

Care must be taken to distinguish between household consumption and income. The purchase of financial assets, interest payments, rents received, and debt payments (p.322) are not included in consumption. However, the purchase of financial services is included. Likewise, taxes, fees, levies, and fines are deductions from income and are not included in household consumption.

In-kind gifts could be excluded from giving households’ consumption to avoid double counting (Deaton and Zaidi 2002). In-kind gifts and subsidies (e.g. housing, transportation to work, and education) are included in the receiving households’ consumption. Remittances are considered income and are not included in either the giving or receiving households’ consumption.

Households are treated separately from home businesses and farms, and thus business expenses and assets are excluded from consumption. While consumption of home-produced goods is included, consumption of home-produced services is excluded due to the difficulty of valuing such services.

A.4.6.3 Durable Goods

Durable goods require special treatment because their consumption value is not reflected in the purchase transaction but in the value of their use over many years. The approach adopted here is to include the use value of the durable good determined by the value of investing the durable expenditure in the market. It is necessary to specify a relevant interest rate and, for each durable good, a depreciation rate. Depreciation rates can be estimated based on the expected life of each durable good. Durable goods bought more than one year ago, ‘old’ durables, are, by default, valued at half the buying price. The sample do-file, 033_durables.do, provides an example of how to impute use values. In this file, the following formula is specified:


where val_old is the value of durables purchased more than a year ago, val_new is the value of durables purchased in the last twelve months, r is the interest rate, and dep_rt is the product-specific depreciation rate (Deaton and Zaidi 2002).

A.4.6.4 Housing Rent

When households pay rent, the actual paid rent is the household’s consumption expenditure for housing. For others, the housing consumption expenditure may need to be imputed. Hedonic regression analysis is applied to estimate rental values for households without either an actual paid rent or a self-imputed rent. Separate regressions should be carried out for rural and urban households. Right-hand-side variables are selected in part by data availability. Generally, household size, house ownership, and dummies for strata can be included. Other variables include dwelling characteristics, e.g. roof material, solidity of walls, sanitation standard, water source, and the type of energy used in the kitchen and for lighting. The sample do-file 034_rent.do provides an example of one method of specifying the hedonic regression.

A.4.7 Previous Survey Output

Finally, when more than one survey round is being compared, data from the previous round is required for temporal revealed preference constraints. This data is automatically (p.323) produced in the code stream and automatically retrieved from the previous periods directory, out/t_plus1/.

A.5 Final Thoughts

This appendix is designed to orient the user within the code stream. It is not possible or desirable to provide discussion of every detail within the code. Users are encouraged to carefully look through each file. When running with data, users are encouraged to insert pause or stop commands (typing stop just produces an error) in order to examine the datasets in memory and to better understand what has happened with each step. While this is a time-consuming process, it is certainly more rapid than writing the code.

Users should also expect a long iterative process of interrogating the data, identifying errors, and adapting the code stream to country circumstances. In our experience, this process is never simple or easy. Errors are common, with unit errors in data being particular ubiquitous but certainly not the only error that is likely to be encountered. Nonsensical results are a good sign that something is wrong. With experience, users will develop search methods for locating errors. Once the error is located and understood, fixing it is normally relatively straightforward.

The most pernicious errors are those that only mildly influence results or only influence results under special circumstances. While substantial efforts have been expended to produce clean code, the potential presence of errors is not excluded. Users employ PLEASe at their own risk.

Finally, it is our hope that neither the PLEASe code nor the associated manuals will remain static. Experience is an excellent, if stern, teacher. The PLEASe code and associated manual are offered in the spirit of allowing future analysts to stand on the shoulders of current analysts. There is no doubt that the existing package can be improved. Our hope is that the package enhances the quality of future analysis and that, in the process, the package itself is improved.


(1) Versions prior to Stata 11 will require modifications such as reverting to old merge syntax.

(2) The food_basket_missing data files identify those items with sufficient expenditure levels to fall in the top 90 per cent of consumption but excluded from the food basket for lack of calorie or price data.

(3) The output file food_basket_missing_$it_n.csv reports food items that should be included in the food basket but are dropped due to missing calories.

(4) Prices should be supplied at the most local level possible. At a bare minimum, prices should be supplied for each region in which poverty lines are constructed.