We’ve all heard it. “When do the rewards stop?”
The knee jerk reaction by many, especially on social media is to cave, to placate, to give the client at least some of what.
” Variable reinforcement, skip rewards. It will actually make the behaviour stronger!”
Skipping reinforcements (rewards) does make a behaviour more resistant to extinction. Think of constant pay as a soda pop machine. Put in coin. Get a soda. Broken machine? You walk off pretty fast. You’re unlikely to put in another coin, at least not today.
Variable reinforcement is like a slot machine. Put in coin after coin hoping to win. If the machine does not pay out, our behaviour of feeding the machine is slow to extinguish. We don’t expect to get paid each time. We try, try again.
Similarly, if a dog gets paid only some of the time for a skill, they try, try again. More work for less pay. Less likely to quit. The behaviour is more impervious to extinction.
Sounds awesome. But it can bite you in the butt.
Imagine the dog learning a basic sit. The family drills the skill. Sit, yes, treat. They get a nice nano-second sit. They want to ditch to food ( or toys). We offer variable reinforcement as an option. Nano-second sits become more resistant to extinction. When we fail to pay, the dog gives us another, and another.
Except that nano-second sits are rather useless. The dog can’t sit to have their paws wiped, their leash put on or wait for the light at the intersection. We need longer sits to make them useful. We have not finished the behaviour – only started it.
We start to train a stay. The dog is paid for one second holds. Nano-second sits are no longer paid. We want to extinguish the short sits in favour of the longer ones. Extinguish. Important word here because variable reinforcement makes extinction difficult.
We turned short sits into a slot machine, making it HARD to extinguish when we gave the human the thumbs up to skip reinforcements. The usual end result is:
Sit. One Mississ……butt shuffle, nose nudge at hand, reposition, sit, yip, reposition.”
Before the second is up the dog either yips, barks, jumps or gets up to re-sit. Some eventually give up and walk off. They just tried 20 short sits and failed. Why would the dog give something new, like duration, a try after so much failure? How frustrating for the dog. Error rates are way too high. Before you know it, the dog would rather sniff grass than keep working.
You probably now have a client that is flummoxed at what to do with the pestering. Or worse, a client upset that their dog is now pestering, barking, nudging.
If our goal is 30 seconds, we want to extinguish short sits in favour of longer ones. One second extinguished in favour of two. Then two extinguished in favour of three. And so on. Until we reach the end goal of any skill, there is an element of extinction constantly in play. Continuous reinforcement makes things easy to extinguish.
Until a skill is completed, ease of extinction is a GOOD thing. Reinforcing right answers is the way of saying correct. Failing to reinforce extinguishes. Go variable before the behaviour is done, polished and under stimulus control and you just made everything a whole lot harder for the dog and for the client.
Some might say that they can go variable early and still get a long sit. I’m sure that’s true. You can muscle through it, given enough effort. I know that I could spit to quarter second criteria increases. It’s fixable.
I’m just not sure it’s fair to the dog or client to have them do extra, unnecessary challenges. I’m not sure why anyone would reject the easy way in favour of the hard, the frustrating, the more time consuming way.
And let’s face it, sits are easy to teach. Try shaping a complex skill with thirty, forty of fifty splits where each step is inappropriately put on a variable schedule. Every increase in difficulty was made harder than it needed to be. The variable schedule sucks the dog into repeating steps instead of extinguishing them quickly and easily.
Here’s the reality about that desire to get rid of reinforcements. I don’t get it. If you’re pushing the bar higher, asking more of the dog, they will be getting fewer reinforcements. A dog that does a two minute sit only gets one reinforcement for two minutes of behaviour. A dog that knows and loves doing skills can do chains and sequences. For example, a whole obstacle course for one reinforcement. The reinforcements thin as you raise the bar. You can preserve cookie (or toy) versus none all the way to excellence AND reinforcement use will decrease on its own.
The short answer to getting rid of reinforcements is…
“Show me polished behaviours. Then we can talk. Until then, pay right responses. Keep your bar moving. Remember that until then, ease of extinction is a massive benefit not to be overlooked.”
There is also a problem with your use of the term ‘reinforcement’ . By definition a ”consequence” is only a ‘reinforcement’ if it increases the rate of that behaviour being offered. So one really cannot HAVE a continuous rate of ”reinforcement”. What you are using is a maintenance rate of rewards/payment. I bit like housework 😦 You need to continue to do housework if you want your hose neat and clean.
Continuous rates of rewards get dogs and children who will NOT cooperate unless they are sure the reward is coming. You get dogs who decide “”no rewards then I won’t do it””, or worse Nah I think I’d rather run away/refuse to do the washing up””. You end up increasing the reward value over time and having dog and children who bargain with you.
Just, as you say, if you stopped getting paid for your employment, then you’d stop gong to work. Your pay cheque has NOT reinforced your going to work — you’ve made a conscious decision that the payoff is worth the effort. You’d probably also at some time decide to change jobs if you don’t get a pay rise
You know I’ll have a yeah..but..to that right? 🙂
If you combine a fast rate of reinforcement, continuous reinforcement, you create a positive association to the cue. The task and the cue become reinforcing.
So if you do it well, you get difficult to extinguish, persists despite disruption and distractions and a positive association so they want to do it.
Holy moly…that’s a lot of benefits from a high ROR continuous reinforcement thing.
If you’d like to look it up, try Nevin for behavioural momentum and then S-(RO) associations.
And no interval scalloping like with a variable schedule.
Continuous reinforcement at high rates of reinforcement RULE. 😉
Sorry. I DO know what I am talking about. aka ‘been there, done that’. Apart from being ‘Qualified” (Certificate IV in Behavioural Dogs Training” and very well read. Been training dogs in dog clubs since 1973. I also did my ‘Learning Theory with my Diploma of Educations (post Grad Level). I own many books — both technical and ‘popular’.
Continuous reinforcement in NOT the best way to train. Certainly variable reinforcement (it not always the same thing) is more important, the rewards offered as a reinforcement should be something the dogs want at that time and in that situation.
I WILL look up “Nevin’ but suspect that he is not actually referring to Behavioural Dog Training. Remember that when we are training our dogs, they are not so much as ‘Learning (a skill/knowledge) as our putting a behaviour on cue.
Please do look it up. Domjan’s textbook goes into behavioural momentum briefly but nicely. And all the benefits of resistance to extinction and the different type of Pavlovian association that is triggered.
Principles of Learning and Behavior is the textbook. Quite popular in university programs on animal learning theory.
When you trigger strong behavioural momentum, you don’t have the issues you are concerned about.
I’m not disagreeing that straight variable vs continuous without the fast ROR and S (RO) association would make it seem that variable is a clear better option. There’s that pesky and wonderful exception.
HA! What if we do NOT want what we teach to be extinguished??
What of we want a behaviour in a situation where we cannot reinforce?
I have never ever liked then analogy of ‘working for a wage/salary’. Our dogs are social animals and we are trying to teach them manners and appropriate behaviours. Just as we try to teach our kids. Í never ‘paid my kids to say thankyou, to aske permission, to not shop-lift, to go to school, to help around the house. They got random non-contingent ‘rewards’ and because humans are social animals these worked to maintain behaviours. So to it should be with out dog.
Variable rewards and random schedules of reinforcement, Rule OK!
You want an S(RO) association to form. Which also happens with continuous reinforcement. High rate of reinforcement is necessary. Creates a behaviour that is almost impossible to extinguish, despite distractions and even if no reinforcement present.
TOTALLY and easily doable.
Great points well expressed. I have never understood the desire to randomly treat less either. After all, the more work we do for a wage, the more we expect to get paid. What happens is that as we repeat the parts of our job, we get better and more efficient at it, still for the same wage. So what you describe, going from the basic sit to the sustained sit is the dog getting better at a job. The pay stays the same but the effort is seamlessly increased without adding stress.
Exactly! Why compromise clarity (treat yes, no treat no) when you still can use that clarity to get better behaviour!