We’ve all heard it. “When do the rewards stop?”
The knee jerk reaction by many, especially on social media is to cave, to placate, to give the client at least some of what.
” Variable reinforcement, skip rewards. It will actually make the behaviour stronger!”
Skipping reinforcements (rewards) does make a behaviour more resistant to extinction. Think of constant pay as a soda pop machine. Put in coin. Get a soda. Broken machine? You walk off pretty fast. You’re unlikely to put in another coin, at least not today.
Variable reinforcement is like a slot machine. Put in coin after coin hoping to win. If the machine does not pay out, our behaviour of feeding the machine is slow to extinguish. We don’t expect to get paid each time. We try, try again.
Similarly, if a dog gets paid only some of the time for a skill, they try, try again. More work for less pay. Less likely to quit. The behaviour is more impervious to extinction.
Sounds awesome. But it can bite you in the butt.
Imagine the dog learning a basic sit. The family drills the skill. Sit, yes, treat. They get a nice nano-second sit. They want to ditch to food ( or toys). We offer variable reinforcement as an option. Nano-second sits become more resistant to extinction. When we fail to pay, the dog gives us another, and another.
Except that nano-second sits are rather useless. The dog can’t sit to have their paws wiped, their leash put on or wait for the light at the intersection. We need longer sits to make them useful. We have not finished the behaviour – only started it.
We start to train a stay. The dog is paid for one second holds. Nano-second sits are no longer paid. We want to extinguish the short sits in favour of the longer ones. Extinguish. Important word here because variable reinforcement makes extinction difficult.
We turned short sits into a slot machine, making it HARD to extinguish when we gave the human the thumbs up to skip reinforcements. The usual end result is:
Sit. One Mississ……butt shuffle, nose nudge at hand, reposition, sit, yip, reposition.”
Before the second is up the dog either yips, barks, jumps or gets up to re-sit. Some eventually give up and walk off. They just tried 20 short sits and failed. Why would the dog give something new, like duration, a try after so much failure? How frustrating for the dog. Error rates are way too high. Before you know it, the dog would rather sniff grass than keep working.
You probably now have a client that is flummoxed at what to do with the pestering. Or worse, a client upset that their dog is now pestering, barking, nudging.
If our goal is 30 seconds, we want to extinguish short sits in favour of longer ones. One second extinguished in favour of two. Then two extinguished in favour of three. And so on. Until we reach the end goal of any skill, there is an element of extinction constantly in play. Continuous reinforcement makes things easy to extinguish.
Until a skill is completed, ease of extinction is a GOOD thing. Reinforcing right answers is the way of saying correct. Failing to reinforce extinguishes. Go variable before the behaviour is done, polished and under stimulus control and you just made everything a whole lot harder for the dog and for the client.
Some might say that they can go variable early and still get a long sit. I’m sure that’s true. You can muscle through it, given enough effort. I know that I could spit to quarter second criteria increases. It’s fixable.
I’m just not sure it’s fair to the dog or client to have them do extra, unnecessary challenges. I’m not sure why anyone would reject the easy way in favour of the hard, the frustrating, the more time consuming way.
And let’s face it, sits are easy to teach. Try shaping a complex skill with thirty, forty of fifty splits where each step is inappropriately put on a variable schedule. Every increase in difficulty was made harder than it needed to be. The variable schedule sucks the dog into repeating steps instead of extinguishing them quickly and easily.
Here’s the reality about that desire to get rid of reinforcements. I don’t get it. If you’re pushing the bar higher, asking more of the dog, they will be getting fewer reinforcements. A dog that does a two minute sit only gets one reinforcement for two minutes of behaviour. A dog that knows and loves doing skills can do chains and sequences. For example, a whole obstacle course for one reinforcement. The reinforcements thin as you raise the bar. You can preserve cookie (or toy) versus none all the way to excellence AND reinforcement use will decrease on its own.
The short answer to getting rid of reinforcements is…
“Show me polished behaviours. Then we can talk. Until then, pay right responses. Keep your bar moving. Remember that until then, ease of extinction is a massive benefit not to be overlooked.”