(p.525) Appendix 3 Glossary
(p.525) Appendix 3 Glossary
Instrumental reinforcers are stimuli that, if their occurrence, termination, or omission is made contingent upon the making of an action, alter the probability of the future emission of that action (Gray 1975, Mackintosh 1983, Dickinson 1980, Lieberman 2000). Rewards and punishers are instrumental reinforcing stimuli. The notion of an action here is that an arbitrary action, e.g. turning right vs turning left, will be performed in order to obtain the reward or avoid the punisher, so that there is no pre-wired connection between the response and the reinforcer. Some stimuli are primary (unlearned) reinforcers (e.g., the taste of food if the animal is hungry, or pain); while others may become reinforcing by learning, because of their association with such primary reinforcers, thereby becoming ‘secondary reinforcers’.This type of learning may thus be called ‘stimulus–reinforcer association learning’, and occurs via a stimulus–stimulus associative learning process.
A positive reinforcer (such as food) increases the probability of emission of a response on which it is contingent, the process is termed positive reinforcement, and the outcome is a reward (such as food).
A negative reinforcer (such as a painful stimulus) increases the probability of emission of a response that causes the negative reinforcer to be omitted (as in active avoidance)or terminated (as in escape), and the procedure is termed negative reinforcement.
Punishment refers to procedures in which the probability of an action is decreased. Punishment thus describes procedures in which an action decreases in probability if it is followed by a painful stimulus, as in passive avoidance. Punishment can also be used to refer to a procedure involving the omission or termination of a reward (‘extinction’ and ‘time out’ respectively), both of which decrease the probability of responses (Gray 1975, Mackintosh 1983, Dickinson 1980, Lieberman 2000).
A punisher when delivered acts instrumentally to decrease the probability of responses on which it is contingent, or when not delivered (escaped from or avoided) acts as a negative reinforcer in that it then increases the probability of the action on which its non-delivery is contingent. Note that my definition of a punisher, which is similar to that of an aversive stimulus, is of a stimulus or event that can either decrease the probability of actions on which it is contingent, or increase the probability of actions on which its non-delivery is contingent. The term punishment is restricted to situations where the probability of an action is being decreased.
Emotions are states elicited by reinforcers, where the states have the set of functions described in Chapter 3. My argument is that an affectively positive or ‘appetitive’ stimulus (p.526) (which produces a state of pleasure) acts operationally as a reward, which when delivered acts instrumentally as a positive reinforcer, or when not delivered (omitted or terminated) acts to decrease the probability of responses on which it is contingent. Conversely I argue that an affectively negative or aversive stimulus (which produces an unpleasant state) acts operationally as a punisher, which when delivered acts instrumentally to decrease the probability of responses on which it is contingent, or when not delivered (escaped from or avoided) acts as a negative reinforcer in that it then increases the probability of the action on which its non-delivery is contingent51.
Classical conditioning or Pavlovian conditioning.Whena conditioned stimulus (CS) (such as a tone) is paired with a primary reinforcer or unconditioned stimulus (US) (such as a painful stimulus), then there are opportunities for a number of types of association to be formed. Some of these involve ‘classical conditioning’ or ‘Pavlovian conditioning’, in which no action is performed that affects the contingency between the conditioned stimulus and the unconditioned stimulus. Typically an unconditioned response (UR), for example an alteration of heart rate, is produced by the US, and will come to be elicited by the CS as a conditioned response (CR). These responses are typically autonomic (such as the heart beating faster), or endocrine (for example the release of adrenaline (epinephrine in American usage) by the adrenal gland). In addition, the organism may learn to perform an instrumental response with the skeletal muscles in order to alter the probability that the primary reinforcer will be obtained. In our example, the experimenter might alter the contingencies so that when the tone sounded, if the organism performed a response such as pressing a lever, then the painful stimulus could be avoided. In the instrumental learning situation there are still opportunities for many classically conditioned responses, including emotional states such as fear, to occur. The associative processes involved in classical conditioning, and the influences that these processes may have on instrumental performance, are described in Section 4.6.1.
Motivated behaviour occurs when an animal will perform an instrumental (i.e. arbitrary operant) response to obtain a reward or to escape from or avoid a punisher. If this criterion of an arbitrary operant response is not met, and only a fixed response can be performed, then the term drive can be used to describe the state of the animal when it will work to obtain or escape from the stimulus.
Fitness is the reproductive potential of genes. Through the process of natural selection and reproduction, fit genes are selected for the next generation.
Long-term potentiation (LTP) is the increase in synaptic strength that can occur during learning. It is typically associative, depending on conjunctive presynaptic activity and postsynaptic depolarization.
(p.527) Long-term depression (LTP) is the decrease in synaptic strength that can occur during learning. It is typically associative, occurring when the presynaptic activity is low and the postsynaptic depolarization is high (heterosynaptic long-term depression), or when the presynaptic activity is high, and the postsynaptic activity is only moderate (homosynaptic long-term depression) (see Fig. A.5).
(51) Note that my definition of a punisher, which is similar to that of an aversive stimulus, is of a stimulus or event that can either decrease the probability of actions on which it is contingent, or increase the probability of actions on which its non-delivery is contingent. The term punishment is restricted to situations where the probability of an action is being decreased.