## Martha Nussbaum and Amartya Sen

Print publication date: 1993

Print ISBN-13: 9780198287971

Published to Oxford Scholarship Online: November 2003

DOI: 10.1093/0198287976.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (www.oxfordscholarship.com). (c) Copyright Oxford University Press, 2018. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a monograph in OSO for personal use (for details see www.oxfordscholarship.com/page/privacy-policy). Subscriber: null; date: 22 January 2019

# The Relativity of the Welfare Concept

Chapter:
(p.362) The Relativity of the Welfare Concept
Source:
The Quality of Life
Publisher:
Oxford University Press

# 1 Introduction

In most sciences there are phenomena that are understood only partially or not at all. Nevertheless, if we take the basic phenomenon for granted it is frequently possible to build a theory on them explaining more complicated phenomena. The basic phenomena are called the primitive concepts of that science. When science progresses we do not only see an outward movement trying to explain and understand newly observed phenomena but also an inward movement where an attempt is made to explain phenomena hitherto taken as primitive concepts. Unavoidably this leads to the definition of more basic concepts as the primitive concepts of the theory. Nowadays two outstanding examples of this scientific evolution are seen in the developments in physics, where the atom is no longer the primitive concept, it having become possible to dissect the atom into ever smaller particles, and in medicine and biology, where we are discovering the secrets of genetics.

In economics we have the same problem: some basic concepts are needed to build a theory on, but those concepts themselves are not well understood or even not measurable for the time being. A prime example is the welfare or utility concept. It is taken to be a primitive concept. As the concept is also used in sociology and psychology as a basic concept, the understanding of that concept may be seen as a common task of the social sciences. The main objective of this paper will be to make a contribution to that understanding of the welfare concept.

Section 2 sketches the mainstream approach in economic literature. In Section 3 we consider the measurability problem and suggest a measurement method. Some results are reported, which suggest measurability in a certain sense. In Section 4 we consider the differences between respondents and try to explain those differences by relatively simple regression equations. Section 5 introduces some more complex models, which include past and future expectations as co‐determinants. In Section 6 we consider the social filter model, which incorporates the social reference group. In Section 7 we study a cardinal utility framework. Section 8 is the conclusion.

The nature of this presentation is non‐mathematical, following the general style of this volume. Hence, some matters will not be exactly defined or described. The references, where the reader may find a more exact presentation, are given in the text.

The main message of this paper is, then, that from attitude questions an (p.363) ordinal but interpersonally comparable individual welfare concept may be constructed which is ‘operationally meaningful’ in the sense of Samuelson (1947). At the end of the paper we suggest an attractive cardinalization of this utility index. We do not touch on the problem of whether and how we can define a social welfare function using as a building stone the individual welfare concept described.

# 2 The Economic Mainstream Approach to Utility

The attitude of economics towards utility has always been ambiguous. On the one hand, the concept was absolutely needed in order to develop a positive and a normative theory of economic behaviour. On the other hand, economists have felt very uneasy with the concept as its measurability is doubtful. As such it does not seem to be an operational concept (Samuelson, 1947). How can a science be based on non‐measurable concepts?

At a non‐scientific level welfare or well‐being is a well‐known concept. It is an evaluation by the individual of his situation. We know from introspection and observation that it is fairly possible to evaluate situations in terms of feeling ‘well’ or feeling ‘badly’. It follows that an intrapersonal comparison of situations is possible. Now this evaluation is done in terms of verbal labels, which is not a good point of departure for the formation of a quantitative theory. It has, however, been demonstrated that many esoteric things may be evaluated on a numerical scale: for example, the quality of wine, a musical performance, commodity testing, etc. The evaluation of school results is frequently done in terms of a scale from 0 to 10, where the numbers are explicitly translated as 10 standing for ‘excellent’, 9 standing for ‘very good’, 8 standing for ‘good’ and so forth. When we are first confronted with such numerical evaluations they look strange and unfamiliar. When we have got some experience with this type of rating, they become ingrained in our value pattern and we begin to think in those numerical terms. Hence, we do not reject the idea that more general situations may be evaluated by human beings in terms of numbers on a numerical scale just as well as in terms of verbal labels on a verbal scale. This may apply for the welfare concept as well.

This was also the position of the classical economists like Edgeworth (1881) and Cohen Stuart (1889). Edgeworth assumed that welfare positions could be described by the consumption levels x 1, . . . , x n of n commodities X 1, . . . , X n, shortly denoted by the vector x. Then he assumed that an individual was able to evaluate each situation x by a number U(x) called the utility attached to that situation. Consumer behaviour was then basically a search for the welfare position x with the highest utility, given the constraint that total expenditures p 1 x 1 + . . . + p n x n will not exceed a given income y, where p 1, . . . , p n stand for the prices of the different commodities. In this way demand for goods could be described as a function of prices and income. In this analysis (p.364) utility is just a tool of analysis. The fact that individuals try to improve or even to optimize their behaviour according to some criterion is more or less a tautology. If this were not the case, we could expect purely random behaviour, which is not observed in practice. As a result of this analysis we can also evaluate income levels y by assigning to them the utility value U corresponding to the optimal consumption pattern that can be reached at given prices p and income y. That value U depends on y and p and it is nowadays called the indirect utility function V(y, p). If prices are taken fixed we denote it by V(y) and it is then also called the utility function of income.

This brings the second problem to the fore. Cohen Stuart was looking for a tool to construct a just taxation model, as he realized that, although there is a case for taxing all citizens by an equal amount as they get the same services in the public sector, in some way the pain caused by taxation is not the same for everyone. It is easier for a millionaire to pay $1,000 than for someone with an annual income of$10,000. This points to progressive taxation. Then V(y) is a measuring rod by which the tax pain may be equalized. Let us assume that we tax someone with $10,000 by$500, then the pain inflicted will be V(10,000) − V(9,500) = A. If we like to inflict the same pain on someone with $20,000 to begin with we have to tax him by T with V(20,000) − V(20,000 − T) = A. There are two problems that are rather basic in this approach. The first problem is whether equal differences in the value of the utility function imply equal pain differences for the individual. If we return to the evaluation by verbal labels this question may be translated into the question whether the fall from ‘very good’ to ‘good’ is equivalent to the fall from ‘good’ to ‘amply sufficient’, being the usual translations in Dutch schools of the grades 9, 8, and 7. We cannot solve this problem. We cannot say that the differences imply equal utility jumps, but neither can we say that they do not. The reason for this is that we lack a measuring rod to measure utility itself, say, by a ‘utility meter’ (see also Suppes and Winet, 1954). All we can do is observe correlates, which we assume to be strongly related to the latent non‐cognitive concept utility. The second problem arises if we accept that equal utility differences imply equal pain differences. Then we still have to answer the problem whether two individuals have the same utility function of income and whether the fact that two individuals attach the same utility value to the same income implies that they feel equally satisfied with their income. Again in terms of verbal labels: does the fact that two individuals call the same income level ‘good’ imply that they feel equally satisfied or dissatisfied about their welfare position? Here too we have to confess our agnostic position. It follows that the utility function approach cannot be applied for intra‐ or interpersonal welfare comparisons without a reasonable measurement method and/or the willingness to accept some unproven assumptions as a matter of working convention. You may call it an ‘act of faith’. But such acts are performed by all individuals, mostly unconsciously. There is an infinity of social conventions (p.365) that serve to replace metaphysical notions by observed correlates. It would be hard to imagine how any being or society could function without such conventions. Obviously the convention is just a working assumption, which is invalidated when its consequences do not conform to our expectations of real phenomena. The measurability problem was recognized by Pareto (1904). It led him to the conclusion that the assumption of utility maximization is a useful device to explain consumer behaviour but that a utility function in the context of the description of consumer demand is actually only needed to describe indifference curves in commodity space, indicating that people are indifferent between various consumption patterns and prefer more to less. He did not regard it as necessary for the explanation of the consumer problem to assume that utility differences are comparable, or, more technically, no cardinal utility had to be assumed. Pareto did not state that the idea of cardinal utility was nonsense, only that the assumption of cardinal utility was superfluous for dealing with the consumer problem. Robbins (1932), a man of tremendous influence in English and American literature, went much further in denying the existence of a measurable cardinal utility function and proclaiming henceforth the impossibility of measuring such a concept. Mainstream economics accepted this verdict for a long time. This position had a significant impact on the state of the art in economics. All welfare comparisons were forbidden, except for the assumption that, if an individual A has not less of anything than an individual B, he cannot be worse off than B. It follows, then, that a social allocation of goods over individuals can be improved if nobody gets less and at least someone gets more in the new allocation. Clearly this denied any foundation to normative economics, which has to be based on the evaluation of individual situations and the evaluation of the state of society as a whole by an aggregate of some sort of individual welfare evaluations. It is nevertheless widely felt that it is one of the basic tasks of economists to measure inequality and to advise on methods to reduce social inequality. In this light it is untenable to maintain the position that welfare situations cannot be compared by some kind of utility function. In fact, economists have developed economic theories on inequality (Atkinson, 1970; Sen, 1973), taxation, uncertainty (Arrow, 1964), and economic growth that are either implicitly or explicitly based on a cardinal utility concept, including intra‐ and interpersonal utility comparisons. At the same time colleagues—or even these economists themselves—have professed their refusal to accept cardinality of some sort. Those studies are based on a postulate of the type: ‘Let us assume that individuals have a common utility function U(.) and that there is a social welfare function of the type W = W(U 1, . . . , U m)’, where m is the number of individuals in society. This being postulated one proceeds without further discussion or doubts on this basic postulate. This leads to a rather schizophrenic situation in economics as some authors, while on the one hand painstakingly ordinal, in normative studies accept cardinality in the way (p.366) pictured above if the need arises. The only interpretation of this behaviour is that a theory is constructed which is applicable under the proviso that the basic postulate has been verified or taken for granted as a primitive concept. In a sense it is building the first floor when the foundations are not yet laid. It may also be seen as accepting the reality of scientific method that one has to accept some primitive concepts and assumptions in order to get anywhere. The more one is willing to accept, the more specific will be the resulting theory. Nevertheless, it would be very nice if we could find some credible method of getting evidence on utility. # 3 A Measurement Method Individuals evaluate their situation in terms of ‘good’ and ‘bad’. This idea actually involves three elements. Situations have to be described by means of observable variables X 1, . . . , X k, which assume values x 1, . . . , x k on a domain X. The situation has to be evaluated by a welfare (or utility) function assigning a welfare value U to the situation described by the k‐vector x; the welfare values are then elements of an evaluation set U. The first question is how we would like to characterize our situations. What is the choice of variables X 1, . . . , X k that are required to describe our welfare position? Obviously an exact description would require an infinite set of variables like income, consumption bundle, number of working hours, family life, the weather, and even the political system. In our analysis, however, we shall confine ourselves to one variable to begin with, namely, family income denoted by y. It does not imply that we believe that this provides a perfect characterization but we use it as a start. We take it that y varies from 0 to ∞, that is, X is the positive semi‐axis. Which values U(y) can assume is a much more problematic question. As we argued before, it does not seem obvious that welfare positions are evaluated by numbers on a numerical scale. Theoretically it may be possible, but individuals do not think in numerical values. They think in verbal labels like ‘good’ or ‘bad’. It follows that it is more natural to assume that the welfare function U(.) assumes values on the set of verbal labels. That set is denoted by U. A question which is now crucial is whether different people assign the same emotional value to the same verbal label. The verbal labels are after all assumed to reflect emotional values, which are described in verbal language. The reason we are not sure is that emotions cannot be measured in an exact way. There are, however, experiments where individuals have been asked to translate such verbal labels into figures on a 0–10 scale or to draw lines of a specific length, where the convention was that ‘very bad’ corresponded to zero length and ‘excellent’ to, say, 8 centimetres. A consistent response pattern was found which suggests that those verbal labels have roughly the same connotation for most individuals. These experiments are described in Saris (1988) (p.367) (see especially van Doorn and van Praag; see also Van Praag, 1989). Another argument, which is of a more philosophical nature, is the following. Human language is a transmitter of information between members of the language community. Hence, words are symbols of concepts and things that must have about the same meaning for two individuals who communicate in that language. Obviously we cannot prove beyond all doubt that the word ‘table’ has the same meaning for all English people, but on the other hand it does not seem far‐fetched to assume that this is roughly the case. Otherwise, language would be no means of communications, and it is precisely that which is the raison d'être of a language. This is also in line with Sen's (1982: 9) statement on empirical economic methodology, where he refers to the predilection among economists for observable behaviour. One reason for the tendency in economics to concentrate only on ‘revealed preference’ relations is a methodological suspicion regarding introspective concepts. Choice is seen as solid information, whereas introspection is not open to observation . . . Even as behaviorism this is peculiarly limited since verbal behavior (or writing behavior, including response to questionnaires) should not lie outside the scope of the behaviorist approach. A third indication that it is not odd to assume that verbal labels in U do approximately mean the same for all members of the language community can be constructed by posing the so‐called income evaluation question (IEQ), which runs as follows: Please try to indicate what you consider to be an appropriate amount for each of the following cases. Under my/our conditions I would call a net household income per week/month/year of: • about . . . . . . . . . . . . . very bad • about . . . . . . . . . . . . . bad • about . . . . . . . . . . . . . insufficient • about . . . . . . . . . . . . . sufficient • about . . . . . . . . . . . . . good • about . . . . . . . . . . . . . very good Please enter an answer on each line, and underline the period you refer to. At first sight this attitude question, developed by myself (van Praag, 1971) looks somewhat awkward. It would have seemed more natural to specify income levels first and to ask the respondents for their corresponding verbal evaluations. The problem with that is that different respondents have different incomes, one being a millionaire and one being a poor man. The evaluation of an income sequence of$10,000, $20,000, etc., would therefore yield different evaluations when offered to a poor man and a millionaire, to whom those income levels would make no difference. He would not be able to distinguish a real difference between such petty amounts. A typical response, quoted from a British respondent in 1979 for this IEQ, is the following: (p.368) • about £25 . . . . . . . very bad • about £35 . . . . . . . bad • about £45 . . . . . . . insufficient • about £70 . . . . . . . sufficient • about £120 . . . . . . good • about £160 . . . . . . very good *(Figures are for household income per week.) Let us denote such a response sequence for respondent n by the vector c n = (c 1n,.., c 6n). We call those amounts the income standards of the individual concerned. The dimension of this vector—that is, the number of levels supplied as stimuli to the respondent—could vary. In practice six levels works rather well in the sense that people are willing and able to answer, but that limitation is only suggested by practice. Similarly the monotonic ordering of the levels is useful to calibrate the answers and to make the answers comparable between respondents, but any other ordering of the stimuli is also conceivable. As already hinted at, the responses vary between individuals. It follows that there is no one uniform opinion on what is a ‘good’ income, etc. This does not indicate, however, that the verbal labels represent different things to different people. Let us denote the mean of the six levels by m n, so that $Display mathematics$ It may then be expected that the mean response will vary between individuals. However, if the proportional deviation pattern were constant, say ‘good’ always corresponds to 20 per cent above the mean and ‘bad’ to 20 per cent below the mean, then this regularity would strongly suggest that people translate the verbal labels on the same emotional scale. As always when studying income, it is advisable to study relative income differences rather than absolute differences. Relative differences are studied most easily by looking at the logarithm of the answers. This implies that all responses are translated on a logarithmic scale and that we shall consider the vector ln (c) = [ln (c 1, . . . , ln (c 6)]. It follows then that equal log‐differences stand for equal income proportions. The hypothesis that the difference ln (c i) − ln (c j) is equal over respondents, that is, that the verbal labels i and j give rise to the same proportional response has to be rejected as well. However, let μ stand for the mean of the logarithmic answers and σ for the standard deviation of the log‐answers about their mean μ, then we find that the standardized response u in = [ln (c in) − μn)]/σn is practically constant. (van der Sar, van Praag, and Dubnoff, 1988). Table 1 refers to a sample of about 500 American respondents. We see that u 1 has an average value of −1.291 and that the sample dispersion over individuals about that value is 0.236. This table is very interesting. First, although there is variation among respondents it could not be explained by personal characteristics of the respondents, in other words, the observed variation is purely random. (p.369) Table 1 Average u‐levels and Sample Deviations Label u i σ (u i) 1 − 1.291 0.236 2 − 0.778 0.190 3 − 0.260 0.241 4 0.259 0.239 5 0.760 0.190 6 1.311 0.229 Second, the dispersion is roughly the same at each level. This implies that the response variation is not level‐specific. Third, and this is the most interesting aspect, the values are nearly symmetrical about zero. All this seems to imply that, for given μ and σ the values u i roughly predictable except for a random disturbance. It follows that for given μn and σn also ln (c in) = μn + u i σn is predictable. This predictability is evidence that the emotional content of the set of stimuli is about the same for all respondents. It follows that we feel justified in treating the individual responses as meaningful. In the following sections we shall try to explain (in a statistical sense) the differences in the values of μ and σ by personal variables. If we succeed in that explanation we will actually be able to identify the determinants of why people derive different welfare from a fixed amount of income. But before doing so, let us pose the question whether the values u 1, . . . , u 6 may be considered as the numerical welfare levels assigned to the amounts c 1, . . . , c 6 by the respondent. Or, phrased differently, are the values u the numerical translations of the verbal labels ‘bad’, ‘good’, etc.? The answer is yes and no. The answer is yes as we find statistically that log‐standardization of the response yields always roughly the same u‐values. So it makes sense to connect the label ‘sufficient’ with 0.259 in Table 1. Obviously that value has no emotional connotation unless we use this scaling frequently. Think of the academic grading A—E in Britain or the grading on a 0–20 scale usual in Belgian universities. Those gradings are also completely arbitrary, but they have an emotional connotation for people who are used to them. Another example is temperature measurement in Celsius or Fahrenheit degrees. The answer is no, as the log‐standardization used above is an arbitrary procedure. We may continue by taking the exponential of u and we find a new scale defined on the positive semi‐axis. Hence there are more value schemes, which may serve as a translation of the verbal labels. However, the primary step of log‐standardization seems essential, as we thereby discard any effect of personal respondent characteristics. Here again there is no mathematical certainty that another transformation could not be made that would have statistically the same effect of discarding the respondent's personal characteristics, (p.370) but I can only report that we were unable to find such an alternative. Hence, the basis of our method is the idea that the individual is able to evaluate income levels in terms of verbal labels like ‘bad’ or ‘good’ and so on. A question that is now crucial is whether different people assign the same emotional value to the same verbal label. This assumption is crucial for the method and it has not been tested. Even stronger, the assumption is untestable per se. It is a primitive assumption, like many others in science, for example, in physics. It is maintained as long as it does not run counter to empirical evidence. In this section we give strong empirical and philosophical evidence that this assumption should not be rejected, but we do not claim in the paper (nor elsewhere) that we have shown its truth. Actually, the problem of whether words have the same meaning to people in the same language community is fundamental to the significance of language outside any specific context. Take the words ‘red’ and ‘blue’. Two members of a language community will not disagree whether an object is ‘red’ or ‘blue’; however, this is not evidence that both individuals have the same physical sensation or internal perception of the two colours. It may well be that person A has the same internal sensation corresponding to ‘red’ as person B experiences when he is seeing ‘blue’. In other words, there are two elements involved, namely, internal sensations on the one hand and verbal labels on the other. Obviously, aspects of internal sensations can frequently be observed by means of external signs like heart‐beat frequency or electrical activity in the brain, but even then we cannot say that we have measured the internal sensation itself. We can only assume that there is a measurable phenomenon which ‘stands for’ or is ‘correlated with’ the sensation. The sensation itself is unmeasurable, unless we agree by convention that the measurable phenomenon completely describes the sensation. Equating the metaphysical concept to the measurable outcome of an experiment is always a convention. We have to accept primitive assumptions of this kind in science everywhere, and one of the basic assumptions concerning a language community—or even the definition of a language community—is that verbal labels have roughly the same (emotional) meaning, that is, a common interpretation, to all members of the language community. This is not ‘an act of faith’ but just a working assumptions, towards which I feel morally neutral. Without making such assumptions we are doomed to sterile solipsism. I am inclined to maintain this assumption, until credible counter‐evidence is presented. # 4 Interpersonal Differences Explained: Virtual and True Standards In the previous section we reported that the answers to the IEQ, that is, the income standards of the individual, are pretty much constant over individuals, if we standardize for the μ and σ. Our next task will thus be either to discover a systematic relationship between (p.371) μ and σ and some objectively measurable characteristics of the individual or to look directly for a relationship between the income standards and those individual characteristics. By ‘systematic’ we mean that differences may be explained by intuitively understandable models, preferably of a simple type. In this paper we shall follow the second road and concentrate on the explanation of the standards themselves, although we shall consider the value of σ, that is, the spread in the levels, in Section 6. Our problem is now to find out what factors determine the money values of the standards. On the one hand we have our intuition, on the other hand we have a host of samples with Dutch and other data on which we may test the hypothesized relationships. We shall now report in a non‐technical way on a number of such results, which have been described elsewhere in great detail (see bibliography). The first factor that will influence the response is clearly the current income of the respondent or the respondent's household. Let current income be denoted by y c, then the idea would be that c = c(y c). The expectation would be that people with a higher income will also regard higher income levels as being ‘good’, ‘bad’, etc., than respondents with a low income. That would imply that the standards reported would be increasing functions of current income. The functional relationship between c i and y c is sketched in Figure 1 where both variables are measured on a logarithmic scale. Empirically it is found that the relationship is then approximately linear. Actually, the relationship can be sketched for each level i = 1, . . . , 6 separately. In Figure 1 the lines corresponding to two levels are sketched. A special case is that where the lines are horizontal. In that case there is uniformity of opinion between respondents with different incomes about the income standard. Then it is a generally accepted standard. We said above that the respondent's current income would clearly be the Figure 1 Current Income and Virtual Income Standards (p.372) first factor to think of in respect of influencing the response. But is that so clear after all? There seem to be two strands of opinion. The first is that of the traditional economist. In economics the assumption of a common preference structure and a unique utility function of income is the basic point of departure. This would imply that there can be no individual variation on what a ‘good’ income is. The second is that of the psychologist or rather the psychophysicist (Helson, 1964; Stevens, 1975). It is well known from measurement experiments on the individual perceptions of the brightness of light or the volume of sound that perceptions as to what is a ‘lot’ and what is not are heavily influenced by the environment before the experiment. Respondents refer to the situation they have presently in mind as their ‘anchor’‐situation. If we assume that the income evaluation question is a similar psychophysical experiment with ‘income’ as its subject‐matter, it is fairly natural to assume that respondents are heavily influenced by their own current income, which plays the role of an anchor in this case. However, let statistical analysis enlighten us as to the value of either assumption. It is possible to estimate the slope of the lines in Figure 1 empirically. If the lines are horizontal we are in the position of the traditional economist. A second extreme position would be that where the slope of the lines corresponds to 45 degrees. In that case 10 per cent increase in income would cause an increase of the standard by the same ratio. It would then be impossible to make the individual better off by an increase in current income. His standards would increase pari passu. In the latter situation we may call standards purely relative. In reality the slope coefficient is estimated to be about 0.6. Hence, income standards are neither purely absolute nor purely relative. (See also Hagenaars and van Praag, 1985). The fact that the slope coefficient is estimated at 0.6 indicates that people with different incomes have different standards with respect to what level represents a ‘good’ income. In other words, contra to what is frequently assumed, there is not one social norm with respect to income; rather, each individual has his or her own standards. This obviously presents a major difficulty when we try to evaluate the welfare situation of individuals or of society as a whole. The problem is: according to whose standards we have to evaluate? The poor citizen believes that nearly everybody is fairly rich, while the rich man believes that nearly everyone is poor (according to his standards). The phenomenon of people shifting their norms with their income I have called (1971) preference drift and the value of the slope coefficient the preference drift ratio. The existence of preference drift is a disturbing factor for social policy. First, the top of society may have a different view on social distribution than the majority of the population, the rank and file. Second, we may expect a difference in the evaluation ex ante and ex post of social changes. Ex ante a wage increase may look marvellous, but ex post the standard has shifted upwards and people evaluate their wage increase as being relatively minor. (p.373) Such a phenomenon will obviously create frustration. We call the standards used by different individuals virtual standards. They are called virtual, because the individual's evaluation of all incomes, especially other incomes than his current income, will change when his current income changes. Is it possible to find out how people evaluate their own current income? Let us look back at Figure 1. We draw it again in Figure 2, where we add the 45‐degree line. Consider the line corresponding to the verbal label ‘bad’ and its point of intersection A with the 45‐degree line. To the left of A people evaluate their own current income as worse than ‘bad’. To the right of A they evaluate it as better than ‘bad’. It follows that people with an income equal to the projection of A on the horizontal axis evaluate their own income as being ‘bad’. We call that income level the true income standard corresponding to the verbal label ‘bad’, say y*bad. Similarly the point of intersection B with the ‘good’ line determines the true standard y*good. We notice that we would not find one true standard if there were no point of intersection, or more than one. # 5 Compensating Equivalence Scales The previous results have been found for a sample of respondents that are not differentiated with respect to personal characteristics except for their current income y c. In this section we consider the question whether people with different personal characteristics and/or living in a different environment will have different virtual and true standards. If so—and intuition suggests that this is not improbable—we are interested in the quantitative relationship Figure 2 Virtual and True Standards (p.374) between these differences and the resulting differences in income standards. If people need different amounts to feel equally happy in terms of their own income evaluation, this will lead automatically to the derivation of compensating equivalence scales. We shall consider two example: (a) the welfare implications of differences in family size, (b) the welfare implications of differences in climate. It is generally recognized that it makes a difference whether one has to support a small or a large family from a fixed amount of income. Let us characterize family size quite simply by the number of household members to be supported out of household income. That number is denoted by fs. It is obvious that we may think of more elaborate definitions that take into account the ages as well, but that is outside the scope of this article (see e.g. Kapteyn and van Praag, 1976). Again the way in which we can try to discover the empirical relationship between fs and the response on the IEQ is to estimate the line in Figure 1 for various values of fs. In Figure 3 we sketch the ‘good’ line for households with fs = 2 and fs = 4. Not unexpectedly, the line for households with four members is situated at a higher level than that for two‐person households. The difference between y*good(fs = 2) and y*good (fs = 4) is just the income difference needed to get the two households to the same welfare level. It has generally been found (see e.g. Kapteyn and van Praag, 1976) that a 10 per cent increase in household size has to be compensated by an income increase of 2.5 per cent. Family size elasticity thus equals 2.5/10, or 25 per cent. Several observations are appropriate at this point. First, we notice that the compensation rule does not depend on the specific income level at the point of departure; neither is the rule utility‐specific: the compensation factor does not depend on the welfare level considered. The long‐linear specification, as depicted in Figure 1, was not dictated by a theory, but was simply the best‐fitting specification Figure 3 True Standards For a Three‐Person and Four‐Person Household (p.375) within a class of non‐complex functional specifications. So we do not exclude the possibility of there being better‐fitting specifications which yield utility‐or income‐specific compensation factors. Second, we found for a number of roughly comparable societies (see van Praag and van der Sar, 1988) values of roughly the same order of magnitude, although the elasticity value of 25 per cent is certainly not an empirical law. We may think of rural societies where children are primarily a production instead of a consumption good for the household. In such societies we would expect a lower value for family size elasticity; it might even be negative. Indeed, in a recent Pakistani sample we find that self‐employed people feel better off with more children than with less. Looking back on this analysis, we see that the result is twofold. First, it gives an empirical insight into the welfare differences corresponding to differences in family size; this is a positive result. Second, it yields a compensation factor according to which welfare differences due to differences in family size may be compensated by income changes. This is a normative result. Now we shall consider an analogous analysis dealing with the influence of climatic differences on welfare (see van Praag, (1988). The naîve approach would be to define a variable called climate denoted by C and to hypothesize a relationship between climate C and the income standards. We have to create a sample of households that exhibits not only variation with respect to current income and family size, but also with respect to climate. It will then be possible to estimate that relation. Once this has been estimated, we can define climate elasticity in a similar way as we defined family size elasticity. The only problem here is the definition of the climate variable C. What we need is a climate index. There is more than one relevant variable. First, we have temperature, either measured as an annual average or as a maximum or minimum per year. But the hours of sunshine may also matter. And anybody who knows a dry climate like California will be aware of the fact that rain, measured in centimeters per year, is also a relevant variable, as is air humidity. Finally, altitude, windiness, and especially the chilliness of the wind, may be important as well. In short, climate is a multi‐dimensional phenomenon. The problem is then how to define C. A practical start was to experiment with some alternative selections of climate variables and to end up with a best‐fitting and intuitively interpretable estimated equation. We used a sample of about 10,000 European households, surveyed among the ‘old’ members of the European Community. This guaranteed a climatic variation from Berlin to the Channel Islands and from the north of Denmark to the south of Italy. We ended up with three variables that seemed relevant for the description of a climate in this context. Those three climatic variables are TEMP, standing for the average annual temperature, HUM, standing for average humidity, and PREC, standing for (p.376) Table 2 Climate Compensation Factors for some European sites  Paris 1 Rome 0.95 Berlin 1.11 Sicily 0.94 Copenhagen 1.1 Nice 0.91 London 1.08 Channel Isl. 0.87 Amsterdam 0.99 precipitation. The composite climate index C is then estimated by $Display mathematics$ In Table 2 we give the resulting climate compensation factors for some European sites, having set the climate index of Paris at 100. Using our estimate of the climate influence on income standards we see from Table 2 that one needs 11 per cent more income in Berlin than in Paris to reach the same income standard. We notice that this exercise has brought us two results. First, we have estimated the climate effect on income evaluation. It follows that we are able to work out the effect of a change in temperature, humidity, or precipitation on the evaluation of income, a positive result. Second, we have found a normative result, namely, what compensation factor would be needed in terms of income to neutralize for a climate change. Moreover, we have come more or less unexpectedly to the definition of a climate index which is an aggregate of three dimensions of climate. In this section we used a relatively simple method to estimate the effects of differences in personal conditions on income evaluation. In the first instance, we estimated the effect of differences in family size. The method is rather unorthodox, as it uses responses to attitude questions as the basic observations. The effect itself is intuitively fairly obvious, and it is investigated elsewhere in the economic literature by observing consumer behavior under the hypothesis that equal purchasing behavior implies an equal preference structure, and hence—although this is not necessarily true—that people with the same consumption pattern evaluate their welfare situation equally. The second example deals with a much more esoteric case. Climate is not an individual variable but rather a public good. It is part of the environment, like public health, safety in the street, etc. In the second analysis we analysed its effect, while simultaneously constructing an aggregate index which best reflects climate differences in the framework of this problem. Obviously there is no reason why the same method should not be viable to estimate the money value to individuals of changes in the environment, health, or public goods. In the scope of this paper we have found that there are traceable welfare differences between individuals, which are caused by specific external factors. It should be remembered that we consider here a narrow welfare concept, as it refers only to that part of welfare which is related to money income. In the following section we shall consider a more difficult model, where we (p.377) consider the influence of past and anticipated income on the evaluation of present income. # 6 The Impact of the Past and the Future on Present Income Evaluation1 In the previous models we stressed the dependency of income standards on the concept of an ‘anchor income’, which we defined to be (net) current income. Although the empirical results are intuitively plausible and statistically of good quality, we have to admit that the choice of current income is a rather rough one, dictated by the circumstances. It is well known that income fluctuates a great deal even for regular employees, and that apart from these more or less random fluctuations, income over life is not constant but will follow a first rising and then falling profile, with a maximum somewhere near the age of 40—although this obviously depends on the job and schooling of the individual. This relation between income and age is frequently called an earnings profile (see Mincer, 1958). It follows that we may doubt whether the income level of a specific individual at a specific moment in time is the best operationalization of the anchor‐income. Are we not looking for a sort of ‘permanent income’ in the sense of Friedman (1957) to use as an anchor? Let us denote that concept by y π. Let us assume we know the earning profile of an individual; it is the sequence . . . y −1, y 0, y 1, y 2 . . . where the moment zero is located at present. Then we assume that the permanent income must be a weighted average of the individual's earning profile. For instance, assuming only three periods of interest: the past with income y p, the present with income y o and the future with income y f, we may define the permanent income concept by taking a weighted average of the log‐incomes where the weights W p, W o, W f, adding up to one, reflect the relative impact of the past, the present, and future expectations on the formation of the permanent income concept. More specifically, we assume $Display mathematics$ The weight W p may be called the memory weight and W f the anticipation weight. It is evident that this concept may be refined by the distinction of more than three periods. In fact we may consider time as a continuous variable; then the weight distribution is described by a continuous density function over the time axis. We estimated this weight pattern by explaining the observed income standards not by current income alone, as done before, but by a weighted average of past, present, and future income levels, where the past and future incomes have been calculated by applying the previously estimated earnings profile. We specified a specific weight pattern in such a way that it depends also on (p.378) the age of the individual, so as to reflect the intuitive fact that people's time horizon, both backwards and forwards, varies with age. It turned out that the weight pattern could be estimated. It is depicted graphically for three typical ages in Figure 4. Figure 4 Time‐Discounting Density Functions For Various Ages We see two remarkable things. First, the distribution is not symmetrical about the present. Second, the top at μτ is for young and old people situated in the past and for people in midlife in the near future. The shape of the density also varies with age, becoming very peaked at midlife. Table 3 (van Praag and van Weeren, 1988) shows the values of μτ, W p, W o, and W f for various ages. We notice explicitly that the weights estimated refer only to the formation of the permanent income concept. The weight of the past might well turn out much greater if we were studying the formation of other standards, say, strength of religious attitudes; similarly, the weight of the past might be much less important (than for income) if we were studying standards regarding Table 3 Values of μτ, W p, W o, W f for Various Ages Age μτ W p W o W f 20 −1.32280 0.71557 0.18098 0.10345 30 −0.31780 0.39848 0.47742 0.12409 40 0.27360 0.00135 0.80874 0.18992 50 0.45140 0.00000 0.69937 0.30063 60 0.21560 0.00041 0.90787 0.09172 70 −0.43380 0.45750 0.47642 0.06608 (p.379) the length of girls' skirts. The weight distribution estimated has to be considered as specific for the phenomenon studied, in this case income standards. In this paper we cannot dwell on the methodological problems posed by the psychological interpretation of these results. We do, however, get the result that the weights are age‐dependent. It follows that income standards increase if past income increases. This implies that people have higher standards if their past earnings were higher. Likewise standards rise if expectations for the future increase. Moreover, one sees that a specific change in an individual's earnings has a different impact on his standards depending on his age and consequently his distribution of memory and anticipation weights. This implies that young populations and old populations will have different income standards, given the same distribution of present incomes. It also implies that the same distribution is differently perceived in terms of welfare, according to whether one arrives at that distribution from ‘below’, in a situation of steady growth, or from ‘above’ in a situation of steady decline. We may observe again that the analysis yields a positive and a normative result. First, it describes the impact of income changes over time on income standards. Second, it becomes possible to find equivalent income profiles which yield either momentarily or permanently the same welfare. Finally, we found as a by‐product an interesting quantification of the memory and anticipation process, in so far as it concerns income perception and evaluation. This is properly speaking a product of experimental psychology, which sheds light on the perception of time by individuals at different stages of life. We shall resist the temptation to look into it any further at this point. # 7 The Social Reference Process At this point it will be sufficiently clear that the evaluation of income varies a lot among individuals and that that variation may be explained to a considerable extent by observable variables related to the respondent. We saw that the main determinants were own current income, family size, climate, and income history and expectations about future income. Up to now all explanations referred to separate response levels, and we have not considered the standard deviation of the log‐response, denoted by σ. In this section we will take the whole response pattern into consideration. Apart from the individual determinants already considered, it is frequently thought that the question as to whether an income is good cannot be decided outside of a context. The context is then the incomes of other individuals in the individual's social reference group. If I know practically no one with an income of more than$50,000, then I will consider that income extremely good. On the other hand, if all my social peers earn more than that amount, I will consider the same amount a very bad income. This suggests that the verbal labels will correspond to quantiles in the income distribution of the (p.380) respondent's social reference group. The label ‘good’, for instance, will correspond to the 80 per cent quantile. The fact that different social reference groups with different respondents give different answers will then reflect the fact that different people have different income distributions in mind. The response pattern will then be a discrete image of the income distribution of the respondent's social reference group. Let us be more specific now. Let us denote the density function of the income distribution in the population by f(y) and the income distribution in the social reference group of individual n by f n(y). We then define the social filter function φn(y) by the relation f n(y) = φn(y)f(y). If we interpret the value f(y) as the fraction of the income bracket y in the population and f n(y) as the corresponding fraction in the individual's social reference group, the factor φ gets an interesting interpretation. If φ equals one, the bracket y has equal importance in the social reference group and in the objective income distribution. If φ is larger than one, the individual assigns more than proportionate weight to that bracket, and if φ is smaller than one that bracket is less weighted in the social reference group than corresponds with its share in the objective income distribution.

Using this idea I estimated this social filter function. It is depicted in Figure 5, together with the objective income distribution. We call the income level where the filter function reaches its top the focal point of the filter. If the filter is very peaked, the individual looking through that filter suffers from social myopia to a large extent. If the filter is flat, there is no social myopia. In that case the individual's social reference group is just society as a whole.

The social filter function differs between individuals. These differences may be partly attributed to objectively observable differences in individual characteristics. These effects have been estimated on the basis of a sample of

Figure 5 The Objective Income Density and the Social Filter Function

(p.381) more than 500 American respondents and they give insight into the width of the social reference group, or, to put it differently, the situation of the social focal point and the myopia of different individuals. The main results were that
1. 1. social myopia becomes less if people are better educated;

2. 2. social myopia increases with experience, where we have to be aware that experience and age are strongly correlated;

3. 3. the social focal point varies positively with the income of the respondent;

4. 4. in general the social focal point is situated at a higher income than one's own: people are looking upwards.

Sociologists will not be very surprised by these results. Nevertheless, there are some points to be made. The first is that the definition of a social reference group is not given a priori in terms of who belongs to it and who does not, but the group itself is estimated from the data. Second, we observe that in most literature each member of the social reference group gets the same weight in influencing the individual, that is, zero or one, while in this model the weight varies continuously according to the social filter mechanism.2

Unfortunately we do not have the space to dwell on the technicalities of the estimation procedure, for which we refer to van Praag (1981), van Praag and Spit (1982), or van der Sar and van Praag (1988).

# 8 Cardinality or Not?

Up to now our analysis has been cast in ordinal welfare terms. We observed that individuals assign different verbal labels to the same income levels depending on their personal circumstances and outlook on society. Or, to put it differently, there are differences between what people call a good income and those differences may be quantitatively explained and predicted. The mechanisms discovered conform pretty well with our intuitive feelings and with other findings in the social sciences. The fundamental question of normative welfare economics is whether we can compare the welfare positions of different individuals in society in the sense that we can evaluate trade‐offs. One person gets less and another gets more: what is the net result for society? Does the gain of the second person outweigh the welfare loss of the first person? If we would like to answer that question we must require that such welfare losses and gains can be measured per individual and that they may be compared between the two individuals. It seems to me that these requirements are only (p.382) realized if we adopt conventions of measurement and comparison. Let us make a further excursion to the physical concept of temperature. Whether the change from 15°C to 20°C represents the same change in temperature for an individual as from 20°C to 25°C is impossible to answer. If we assume that it does, it is a pure convention. If we carry out psychophysical experiments, where we measure transpiration or ask the test person at what temperature above 20°C he feels that the change is equal to the first change from 15°C to 20°C, we can construct subjective measures of temperature perception, but such measures present the same problem. They only represent comparable changes if we agree by convention that they may be compared. And similar remarks can be made with respect to interpersonal comparability.

The upshot is that there is no natural measure, but that:

1. measures have to be accepted by convention;

2. a measurement method defines an empirical concept;

3. theoretical concepts are of a metaphysical nature;

4. a measurement method of an empirical concept is acceptable, if the empirical concept, thus defined, behaves as the theoretical concept it is supposed to measure;

5. in case of insufficient conformity between theory and empirical concept either the theory has to be modified or the measurement method and the empirical concept it has been based on have to be modified. Most scientific progress basically consists of a reshaping of theory and/or empirical concepts to improve insufficient conformity between them.

Where do we have to situate our own research, briefly outlined in the previous sections? I believe that it may be situated as follows. We have defined a measurement method and found an empirical concept: the virtual income standards, the values u 1, . . . , u 6, the average log‐response μ and the standard deviation σ of log‐responses. We have been able to derive interesting empirical laws about those concepts. We did not formally say which theoretical concept we attempt to measure. At this moment we are typically in the situation that we have found an empirical phenomenon in search of a theoretical metaphysical counterpart.

We could call that metaphysical concept welfare W. Then we would have found that someone's W increases with income and that W is essentially a function of income and some individually determined characteristics X. We would find that W(y i; X) = u i (i = 1, . . . , 6) where y i stands for the response on the IEQ. We would like to equate the empirical W with the metaphysical concept of welfare.3 As long as W(.) empirically behaves as the welfare concept should behave, I do not see much problem with sticking to that convention. However, we certainly cannot prove that we are measuring the metaphysical concept of welfare. On the other hand, it cannot be disproved either.

(p.383) There is one thing which makes us reluctant to accept a value W that varies between − ∞ and + ∞ as a measure of welfare. As human beings we are unable to differentiate our feelings on an unbounded scale. All evaluations and ratings we know of are on finite bounded scales like 1–10 or A–E. This prompts us to normalize welfare between zero and one. So we define

$Display mathematics$
where F(.) is a distribution function, increasing from zero to one, and where the parameters μ and σ are the mean and the standard deviation of the log‐answers on the IEQ. We saw in Table 1 that the values of u were practically symmetrical about zero. If we consider them as quantiles of the normal distribution, we see that they roughly correspond with equal probability jumps. That is, N(u i) = (2i −)/12. This does not hold exactly, but to a striking extent. Now, there is a theoretical argument that this is not an accident (see van Praag, 1971; Kapteyn, 1977). The person who responds to the IEQ does this with a certain response strategy. His objective is to give the most informative response. This is clearly not done if all response levels are so near to each other that all income levels would roughly correspond to the same welfare level. On the contrary, the response is given in such a way that the welfare deferences between the levels are maximal. This is realized by choosing the six levels in such a way that each level corresponds with the midpoint of one sixth of the interval (0,1), the range of W(.).4 It follows that this would imply that F(.) should be taken to be the normal distribution function. Welfare taken as a function of y instead of ln(y) is then described by a log‐normal distribution function.

In view of our earlier interpretation of the IEQ response as a description of the income distribution of the individual's social reference group, we come now to the equality: welfare evaluation of y = percentage below that income in the social reference group.5 If we accept this cardinalization, we may formulate welfare comparisons. This cardinalization has a certain plausibility. In van Praag (1968) I formulated the same cardinalization but without any empirical corroboration like the one given here. There I gave other theoretical arguments, which I believe still to be valid, but which I will leave out of the present context.

The same cardinal applications may be based on any other functional increasing specification of F(.). However, the attractive identification of the u‐values then gets lost.

# (p.384) 9 Conclusion

In this paper I have outlined a method and results to get some idea of how individuals evaluate income levels. We saw that this is possible by a fairly simple and intuitively plausible set of questions, the so‐called IEQ. The result can only be regarded as ordinal welfare measurement, when we assume that verbal labels have the same emotional connotation to different respondents. If we are willing to apply a plausible cardinalization, such that welfare differences between levels are equalized, we have also found a cardinal welfare measure, useful for normative intra‐ and interpersonal welfare comparisons.

Obviously the method has to be corroborated still further. Moreover, it may be applied to the measurement of standards for other concepts as well, for example, wealth, amount of education, age, expenditures on specific commodities. Some work has been done in that direction (see for instance van Praag, Dubnoff, and van der Sar, 1985, 1988).

I believe that this is a new and fruitful alley for tackling welfare comparison problems in the sense of positive and normative science.

Bibliography

Bibliography references:

Arrow, K. J. (1964). ‘The Role of Securities in the Optimal Allocation of Risk‐Bearing’, Review of Economic Studies, 31, 91–6.

Atkinson A. B. (1970). ‘On the Measurement of Inequality’, Journal of Economic Theory, 2, 244–63.

Cohen Stuart, A. J. (1889). Bijdrage tot de Theorie der Progressieve Inkomstenbelastingen. The Hague: Nijhoff.

Edgeworth, F. Y. (1881). Mathematical Psychics. London: Paul.

Friedman, M. (1957). A Theory of the Consumption Function. Princeton, NJ: Princeton, University Press.

Hagenaars, A. J. M., and van Praag, B. M. S. (1985). ‘A Synthesis of Poverty Line Definitions’, Review of income and Wealth, 31, 139–53.

Helson, H. (1964). Adaptation‐Level Theory: An Experimental and Systematic Approach to behaviour. New York: Harper.

Kapteyn, A. (1977). ‘A theory of Preference Formation’. Ph.D. thesis, Leyden University, Leyden.

—— and van Praag, B. M. S. (1976). ‘A New Approach to the Construction of Family Equivalence Scales’, European Economic Review, 7, 313–35.

—— Wansbeek, T. J. and Buyze, J. (1978). ‘The Dynamics of Preference Formation’, Economics Letters, 1, 93–7.

Mincer, J. (1958). ‘Investment in Human Capital and Personal Income Distribution’, Journal of Political Economy, 66, 281–302.

Pareto, V. (1904), Manuel d'économie politique. Paris: Giard and Brière. (p.385)

Rainwater, L. (1974). What Money Buys: Inequality and the Social Meaning of Income. New York: Basic Books.

Robbins, L. (1932). An Essay on the Nature and Significance of Economic Science, 1st edn. London: Macmillan.

Samuelson, P.A. (1974). Foundations of Economic Analysis. Cambridge, Mass.: Harvard University Press.

Saris, W.E. (1988). Sociometric Research, ed. W. E. Saris and I. N. Gallhofer. London: Macmillan.

Sen, A.K. (1973). On Economic Inequality. Oxford: Clarendon Press.

—— (1982). Choice, Welfare and Measurement. Cambridge, Mass.: M. I. T. Press.

Simon, H.A. (1979). Models of Thought. New Haven, Conn.: Yale University Press, ch. 1.

Steven, S.S. (1975). Psychophysics: Introduction to its Perceptual, Neural and Social Prospects. New York: John Wiley.

Suppes, P., and Winck, M. (1954). ‘An Axiomatization of Utility based on the Notion of Utility Differences’, Management Science, 1, 259–70.

van der Sar, N.L., van Praag, B. M. S., and Dubnoff, S. (1988). ‘Evaluation Questions and Income Utility’, in B. Munier (ed.), Risk, Decision and Rationality. Dordrecht: Reidal, 77–96.

van Doorn, L., and van Praag, B.M.S. (1988). ‘The Measurement of Income Satisfaction’, in W.E. Saris and I.N. Gallhofer (eds.), Sociometric Research. Houndmills: Mcmillan Press, 230–46.

van Praag, B.M.S. (1968). Individual Welfare Functions and Consumer Behavior: A Theory of Rational Irrationality. Amsterdam: North Holland, 235.

—— (1971). ‘The Welfare Function of Income in Belgium: An Empirical Investigation’, European Economic Review, 2, 337–69.

van Praag, B.M.S. (1981). ‘Reflections on Theory of Individual Welfare Functions’, in American Statistical Association, 1981 Proceedings of the Business and Economic Statistics Section, Washington DC.

—— (1988). ‘Climate Equivalence Scales: An Application of a General Method’, European Economics Review, 32, 1019–24.

—— (1989). ‘Cardinal and Ordinal Utility: An Integration of the Two Dimensions of the Welfare Concept’, forthcoming in Journal of Econometrics.

—— Dubnoff, S. and van der Sar, N.L. (1985). ‘From Judgements to Norms: Measuring the Social Meaning of Income, Age and Education’, report 8509/E, Econometric Institute, Erasmus University, Rotterdam.

—— and Spit, J.S. (1982). ‘The Social Filter Process and Income Evaluation: An Empirical Study in the Social Reference Mechanism’, report 82.08, Center for Research in Public Economics, Leyden University.

—— and van der Sar, N.L. (1988). ‘Household Cost Functions and Equivalence Scales’, Journal of Human Resources, 23/2, 193–210.

—— and van Weeren, J. (1988). ‘Memory and Anticipation Processes and their Significance for Social Security and Income Inequality’, in S. Maital (ed.), Applied Behavioural Economics. Brighton: Wheatsheaf Books, ii. 731–51.

## Notes:

I am grateful to Dr Siddiq R. Osmani for his constructive critique which led to some revisions in this paper.

(1) This section is based on van Praag and van Weeren, 1988.

(2) The reader will notice that I have used terms that suggest an optical filter. Indeed we may think of the social filter mechanism as looking through a lens at society. Some social groups will be magnified, while others become less important than they should. Another analogue of this mechanism is the Bayesian model, where the objective distribution acts as a priori density, the filter as the sample likelihood, and the income distribution of the reference group as the a posteriori density.

(3) As in this whole paper the welfare concept is a partial concept: it is only related to income. The concepts of happiness, satisfaction, and welfare in the general sense are wider concepts.

(4) Exactly speaking we have roughly W [ln (y i)] = (2i − 1)/12.

(5) This statement has been formulated as a basic hypothesis by Kapteyn (1977) and by Kapteyn, Wansbeek, and Buyze (1978).