When a stimulus is removed and future frequencies of behavior maintain or increase it is known as?
Show
Reinforcement, punishment, and Extinction[edit]Reinforcement and punishment, the core tools of operant conditioning, are either positive (delivered following a response), or negative (withdrawn following a response). This creates a total of four basic consequences, with the addition of a fifth procedure known as extinction (i.e. no change in consequences following a response) It's important to note that organisms are not spoken of as being reinforced, punished, or extinguished; it is the response that is reinforced, punished, or extinguished. Additionally, reinforcement, punishment, and extinction are not terms whose use is restricted to the laboratory. Naturally occurring consequences can also be said to reinforce, punish, or extinguish behavior and are not always delivered by people.
Four contexts of operant conditioning: Here the terms "positive" and "negative" are not used in their popular sense, but rather: "positive" refers to addition, and "negative" refers to subtraction. What is added or subtracted may be either reinforcement or punishment. Hence positive punishment is sometimes a confusing term, as it denotes the addition of punishment (such as spanking or an electric shock), a context that may seem very negative in the lay sense. The four procedures are:
Also:
Thorndike's law of effect[edit]Operant conditioning, sometimes called instrumental conditioning or instrumental learning, was first extensively studied by Edward L. Thorndike (1874-1949), who observed the behavior of cats trying to escape from home-made puzzle boxes.[4] When first constrained in the boxes, the cats took a long time to escape. With experience, ineffective responses occurred less frequently and successful responses occurred more frequently, enabling the cats to escape in less time over successive trials. In his Law of Effect, Thorndike theorized that successful responses, those producing satisfying consequences, were "stamped in" by the experience and thus occurred more frequently. Unsuccessful responses, those producing annoying consequences, were stamped out and subsequently occurred less frequently. In short, some consequences strengthened behavior and some consequences weakened behavior. Thorndike produced the first known learning curves through this procedure. B.F. Skinner (1904-1990) formulated a more detailed analysis of operant conditioning based on reinforcement, punishment, and extinction. Following the ideas of Ernst Mach, Skinner rejected Thorndike's mediating structures required by "satisfaction" and constructed a new conceptualization of behavior without any such references. So while experimenting with some homemade feeding mechanisms Skinner invented the operant conditioning chamber which allowed him to measure rate of response as a key dependent variable using a cumulative record of lever presses or key pecks.[5] Operant Conditioning vs Fixed Action Patterns[edit]Skinner's construct of instrumental learning is contrasted with what Nobel Prize winning biologist Konrad Lorenz termed "fixed action patterns," or reflexive, impulsive, or instinctive behaviors. These behaviors were said by Skinner and others to exist outside the parameters of operant conditioning but were considered essential to a comprehensive analysis of behavior. In dog training, the use of the prey drive, particularly in training working dogs, detection dogs, etc., the stimulation of these fixed action patterns, relative to the dog's predatory instincts, are the key to producing very difficult yet consistent behaviors, and in most cases, do not involve operant, classical, or any other kind of conditioning.[citation needed] While evolutionary processes shaped these fix action patterns, the patterns themselves remained stable long enough to be shaped by the long time span necessary for evolution because of their survival function (i.e., operant conditioning). According to the laws of operant conditioning, any behavior that is consistently rewarded, every single time, will extinguish at a faster rate while intermittently reinforcing behavior leads to more stable rates of behavior that are relatively more resistant to extinction. Thus, in detection dogs, any correct behavior of indicating a "find," must always be rewarded with a tug toy or a ball throw early on for initial acquisition of the behavior. Thereafter, fading procedures, in which the rate of reinforcement is "thinned" (not every response is reinforced) are introduced, switching the dog to an intermittent schedule of reinforcement, which is more resistant to instances of non-reinforcement. Nevertheless, some trainers are now using the prey drive to train pet dogs and find that they get far better results in the dogs' responses to training than when they only use the principles of operant conditioning[citation needed] which, according to Skinner and his students Keller and Marian Breland (who invented clicker training), break down when strong instincts are at play.[6] Criticisms[edit]Thorndike's law of effect specifically requires that a behavior be followed by satisfying consequences for learning to occur. There are, however, cases in which learning can be shown to occur without good or bad effects following the behavior. For instance, a number of experiments examining the phenomenon of latent learning[7][8][9][10] showed that a rat needn't receive a satisfying reward (food, if hungry; water, if thirsty) in order to learn a maze; learning that becomes apparent immediately after the desired reward is introduced. However, views claiming such research invalidates theories of operant conditioning are molecular to a fault. If the rat has a history of "searching behavior" being reinforced in novel environments, the behavior will occur in new environments. This is especially plausible in a species which scavenges for food and has thus likely inherited a propensity for searching behavior to be sensitive to reinforcement. Behaving during initial extinction trials as the organism had during reinforcement trials is not proof of latent learning, as behavior is a function of the history of the individual organism and its genetic endowment and is never controlled by future consequences. That an organism continues to respond during unreinforced trials has been well-established when studying intermittent schedules of reinforcement[11]. A different experiment, in humans, showed that "punishing" the correct behavior may actually cause it to be more frequently taken (i.e. stamp it in)[12]. Subjects are given a number of pairs of holes on a large board and required to learn which hole to poke a stylus through for each pair. If the subjects receive an electric shock for punching the correct hole, they learn which hole is correct more quickly than subjects who receive an electric shock for punching the incorrect hole. This cannot, however, be accurately described as punishment if it is increasing the probability of the behavior. Biological correlates of operant conditioning[edit]The first scientific studies identifying neurons that responded in ways that suggested they encode for conditioned stimuli came from work by Rusty Richardson and Mahlon deLong.[13][14] They showed that nucleus basalis neurons, which release acetylcholine broadly throughout the cerebral cortex, are activated shortly after a conditioned stimulus, or after a primary reward if no conditioned stimulus exists. These neurons are equally active for positive and negative reinforcers, and have been demonstrated to cause plasticity in many cortical regions.[15] Evidence also exists that dopamine is activated at similar times. The dopamine pathways encode positive reward only, not aversive reinforcement, and they project much more densely onto frontal cortex regions. Cholinergic projections, in contrast, are dense even in the posterior cortical regions like the primary visual cortex. A study of patients with Parkinson's disease, a condition attributed to the insufficient action of dopamine, further illustrates the role of dopamine in positive reinforcement.[16] It showed that while off their medication, patients learned more readily with aversive consequences than with positive reinforcement. Patients who were on their medication showed the opposite to be the case, positive reinforcement proving to be the more effective form of learning when the action of dopamine is high. Factors that alter the effectiveness of consequences[edit]When using consequences to modify a response, the effectiveness of a consequence can be increased or decreased by various factors. These factors can apply to either reinforcing or punishing consequences.
Most of these factors exist for biological reasons. The biological purpose of the Principle of Satiation is to maintain the organism's homeostasis. When an organism has been deprived of sugar, for example, the effectiveness of the taste of sugar as a reinforcer is high. However, as the organism reaches or exceeds their optimum blood-sugar levels, the taste of sugar becomes less effective, perhaps even aversive. The principles of Immediacy and Contingency exist for neurochemical reasons. When an organism experiences a reinforcing stimulus, dopamine pathways in the brain are activated. This network of pathways "releases a short pulse of dopamine onto many dendrites, thus broadcasting a rather global reinforcement signal to postsynaptic neurons."[17] This makes recently activated synapses able to increase their sensitivity to efferent signals, hence increasing the probability of occurrence for the recent responses preceding the reinforcement. These responses are, statistically, the most likely to have been the behavior responsible for successfully achieving reinforcement. But when the application of reinforcement is either less immediate or less contingent (less consistent), the ability of dopamine to act upon the appropriate synapses is reduced. Operant variability[edit]Operant variability is what allows a response to adapt to new situations. Operant behavior is distinguished from reflexes in that its response topography (the form of the response) is subject to slight variations from one performance to another. These slight variations can include small differences in the specific motions involved, differences in the amount of force applied, and small changes in the timing of the response. If a subject's history of reinforcement is consistent, such variations will remain stable because the same successful variations are more likely to be reinforced than less successful variations. However, behavioral variability can also be altered when subjected to certain controlling variables.[18] An extinction burst will often occur when an extinction procedure has just begun. This consists of a sudden and temporary increase in the response's frequency , followed by the eventual decline and extinction of the behavior targeted for elimination. Take, as an example, a pigeon that has been reinforced to peck an electronic button. During its training history, every time the pigeon pecked the button, it will have received a small amount of bird seed as a reinforcer. So, whenever the bird is hungry, it will peck the button to receive food. However, if the button were to be turned off, the hungry pigeon will first try pecking the button just as it has in the past. When no food is forthcoming, the bird will likely try again... and again, and again. After a period of frantic activity, in which their pecking behavior yields no result, the pigeon's pecking will decrease in frequency. The evolutionary advantage of this extinction burst is clear. In a natural environment, an animal that persists in a learned behavior, despite not resulting in immediate reinforcement, might still have a chance of producing reinforcing consequences if they try again. This animal would be at an advantage over another animal that gives up too easily. Extinction-induced variability serves a similar adaptive role. When extinction begins, and if the environment allows for it, an initial increase in the response rate is not the only thing that can happen. Imagine a bell curve. The horizontal axis would represent the different variations possible for a given behavior. The vertical axis would represent the response's probability in a given situation. Response variants in the middle of the bell curve, at its highest point, are the most likely because those responses, according to the organism's experience, have been the most effective at producing reinforcement. The more extreme forms of the behavior would lie at the lower ends of the curve, to the left and to the right of the peak, where their probability for expression is low. A simple example would be a person inside a room opening a door to exit. The response would be the opening of the door, and the reinforcer would be the freedom to exit. For each time that same person opens that same door, they do not open the door in the exact same way every time. Rather, each time they open the door a little differently: sometimes with less force, sometimes with more force; sometimes with one hand, sometimes with the other hand; sometimes more quickly, sometimes more slowly. Because of the physical properties of the door and its handle, there is a certain range of successful responses which are reinforced. Now imagine in our example that the subject tries to open the door and it won't budge. This is when extinction-induced variability occurs. The bell curve of probable responses will begin to broaden, with more extreme forms of behavior becoming more likely. The person might now try opening the door with extra force, repeatedly twist the knob, try to hit the door with their shoulder, maybe even call for help or climb out a window. This is how extinction causes variability in behavior, in the hope that these new variations might be successful. For this reason, extinction-induced variability is an important part of the operant procedure of shaping. Avoidance learning[edit]Avoidance training belongs to negative reinforcement schedules. The subject learns that a certain response will result in the termination or prevention of an aversive stimulus. There are two kinds of commonly used experimental settings: discriminated and free-operant avoidance learning. Discriminated avoidance learning Free-operant avoidance learning In this experimental session, no discrete stimulus is used to signal the occurrence of the aversive stimulus. Rather, the aversive stimulus (mostly shocks) are presented without explicit warning stimuli.There are two crucial time intervals determining the rate of avoidance learning. This first one is called the S-S-interval (shock-shock-interval). This is the amount of time which passes during successive presentations of the shock (unless the operant response is performed). The other one is called the R-S-interval (response-shock-interval) which specifies the length of the time interval following an operant response during which no shocks will be delivered. Note that each time the organism performs the operant response, the R-S-interval without shocks begins newly.Two-process theory of avoidance[edit]This theory was originally established to explain learning in discriminated avoidance learning. It assumes two processes to take place. a) Classical conditioning of fear. During the first trials of the training, the organism experiences both CS and aversive US(escape-trials). The theory assumed that during those trials classical conditioning takes place by pairing the CS with the US. Because of the aversive nature of the US the CS is supposed to elicit a conditioned emotional reaction (CER) - fear. In classical conditioning, presenting a CS conditioned with an aversive US disrupts the organism's ongoing behavior. b) Reinforcement of the operant response by fear-reduction. Because during the first process, the CS signaling the aversive US has itself become aversive by eliciting fear in the organism, reducing this unpleasant emotional reaction serves to motivate the operant response. The organism learns to make the response during the US, thus terminating the aversive internal reaction elicited by the CS. An important aspect of this theory is that the term "Avoidance" does not really describe what the organism is doing. It does not "avoid" the aversive US in the sense of anticipating it. Rather the organism escapes an aversive internal state, caused by the CS.
Verbal Behavior[edit]In 1957 Skinner published Verbal Behavior a theoretical extension of the work he had pioneered since 1938. This work extended the theory of operant conditioning to human behavior previously assigned to the areas of language, linguistics and other areas. Verbal Behavior is the logical extension of Skinner's ideas, in which he introduced new functional relationship categories such as intraverbals, autoclitics, mands, tacts and the controlling relationship of the audience. All of these relationships were based on operant conditioning and relied on no new mechanisms despite the introduction of new functional categories. Four term contingency[edit]Modern behavior analysis, which is the name of the discipline directly descended from Skinner's work, holds that behavior is explained in four terms: an establishing operation (EO), a discriminative stimulus (Sd), a response (R), and a reinforcing stimulus (Srein or Sr for reinforcers, sometimes Save for aversive stimuli).[19] Operant Hoarding[edit]Operant Hoarding is a term referring to the choice made by a rat, on a compound schedule called a multiple schedule, that maximizes its rate of reinforcement in an operant conditioning context. More specifically, rats were shown to have allowed food pellets to accumulate in a food tray by continuing to press a lever on a continuous reinforcement schedule instead of retrieving those pellets. Retrieval of the pellets always instituted a one-minute period of extinction during which no additional food pellets were available but those that had been accumulated earlier could be consumed. This finding appears to contradict the usual finding that rats behave impulsively in situations in which there is a choice between a smaller food object right away and a larger food object after some delay. See schedules of reinforcement. [20] See also[edit]
References[edit]
Further reading[edit]
External links[edit]
de:Konditionierung he:התניה אופרנטית
When a stimulus is added and future frequencies of behavior maintain or increase it is known as?Reinforcers. A consequence stimulus that increases the target behavior it follows is referred to as a reinforcer. There are two categories of reinforcers: unconditioned and conditioned.
Is the removal of a stimulus that results in an increase in behavior?Negative Reinforcement
In an attempt to increase the likelihood of a behavior occurring in the future, an operant response is followed by the removal of an aversive stimulus. This is negative reinforcement.
Is the process of increasing the frequency of a behavior?Reinforcement, either positive or negative, works by increasing the likelihood of a behaviour. Punishment, on the other hand, refers to any event that weakens or reduces the likelihood of a behaviour.
What increases the frequency of a particular behavior due to removal of particular stimulus?Reinforcement is used to help increase the probability that a specific behavior will occur in the future by delivering or removing a stimulus immediately after a behavior. Another way to put it is that reinforcement, if done correctly, results in a behavior occurring more frequently in the future.
|