[esnr] Re: AW: Re: an interesting week for neurofeedback

  • From: "reitsma" <b.reitsma@xxxxxxxxxx>
  • To: <esnr@xxxxxxxxxxxxx>
  • Date: Mon, 11 Oct 2004 20:23:05 +0200

Dear Group,
 
Interesting doesn't mean always important.
Please read the comments of David Kaiser on this topic
 
 
Newsweek featured a short article on neurofeedback this week. 
Unfortunately, what caught the eye of the science writer was less
science than entertainment. They featured once again a system which
plays a videogame with brainwaves. Neurofeedback needs to train the
brain, not entertain. Too many people feel the need to motivate the
client to return with ephemoral rewards
--
buzz and blips-instead of the solid rewards of mental health. 
A few years ago a NASA-derived company, or so they called themselves,
had a similar set up, and they went belly up pretty quickly. Why? For
whatever reasons they told their shareholders, but I suspect any system
that simply alters gameplay of videogames cannot train the brain, at
least not efficiently. Why?  Because it isn't operant conditioning (OC),
at least it mightily avoids the primary goal of OC which is
discrimination. 
Operant conditioning increases the tendency of one and only one response
in favor of all others to a stimulus. This is done by rewarding one
behavior above all others, to the detriment of those near-neighbors.
When a pigeon is trained to peck a light, pecking the wall nearby, or
flapping a wing, or nuzzling the food magazine  -- none of these
behaviors are rewarded with a pellet drop. If they were, the bird brain
would continue his or her non-goal behavior. In neurofeedback, we want,
say, increases in SMR, and need to reward only that.  But with an
elaborate videogame going on-essentially whatever brainwaves that occur
in response to any visual stimulation is being conditioned. It's jumping
on the videogame addiction bandwagon to get clients into the office. We
routinize poor children's brains into these inflexible states by allow
such immersion into these video games. How can math and reading compete,
with their slower, less frequent reward schedules. Skinner realized all
of this decades ago, using rats and pigeons, but a quick primer on
Operant Conditioning might help for those who missed it. 
Operant conditioning works by associating reward with desired behaviors.
Optimally we should place electrodes into the pleasure center of the
hypothalamus and turn on the juice whenever a targeted behavior is
performed and turn it off when it isn't. This on/off dichotomy is
reflected in discrete exercises (not so much in continuous reward games
like the ones often featured by magazines). 
 
Animation, or any visual excitement prior to or after the criterion
behavior is performed, undermines conditioning as it rewards
non-criterion behavior, whatever the brain is doing at that moment. 
Continuous reward information is only useful during shaping, when a
person has difficulty eliciting or maintaining a desired behavior, and
then only to successive approximations to the goal behavior. But once
this obstacle is overcome, once a person can reliably perform the
behavior requested of him or her, continuous reward will weaken the
association between stimulus and response. As I said already, the goal
of operant conditioning is discrimination. Discrimination emerges out of
generalization by the means of FOCAL association, strong linkages
between response and reward. 
BF Skinner figured this all out 50 years ago: punishment and
reinforcement, both positive and negative, reinforcement schedules,
contingencies, informative signals, noninformative ones, primary
reinforcers, secondary reinforcers, spontaneous recovery, shaping,
extinction curves. Positive reinforcement is when an appetitive stimulus
(a rewarding one like food) is provided. Negative reinforcement, despite
the oxymoronic name, is also a good thing-an aversive stimulus is
removed (and thus rewarded). We drop a coin into a vending machine and
receive an item: that's positive reinforcement. We fasten our seat belts
when we get into the car to stop an annoying buzzer: that's negative
reinforcement. I know the wording is perverse, but it's Skinnerian.
There is also negative punishment (withholding a positive reward) and
positive punishment (providing an aversive one) although I don't think,
of all the aspects of learning theory we have, that scientists have
fully understand all the components and aftermath of punishment. I tell
my students how the environment includes the punisher for punishment but
not for reinforcement, so that the behavioral tendency is increased
universally with reward but only diminished in the presence of the
punisher or ones like him or her for punishment.  Maybe I'm wrong, but I
have yet to read a convincing case for the converse equality of
reinforcement and punishment. 
The most effective reinforcement schedule for task acquisition is to
reward every instance. Unfortunately this is also the schedule which is
most easy to snuff out once the reward is withheld. So if you reward
your kids every time they clean their rooms, for instamce, once you stop
paying them, they will stop cleaning. But if you work on a partial
reinforcement schedule, and reward them after every third or fourth
cleaning, they will continue to work much longer after the reward has
stopped. This is called resistance to extinction and it's one goal of
neurofeedback because the bells and blips will not be available forever
to the client. 
There are four partial reinforcement schedules: variable ratio and
variable interval, and fixed ratio and fixed interval. The variable
ratio schedule (VR) is the best. More than any reinforcement schedule,
VR speeds task acquisition and tasks acquired with it resist extinction.
VR built Las Vegas out there in the desert, and it is quite visible in
slot machines. For VR, the individual is rewarded on average after some
amount of behavior. In slot machines VR may be set to every 50 or 100
pulls. As long as the payoff is appropriate for the schedule rate, an
individual will repeat the behavior indefinitely. Slot machines layer
multiple VR schedules on top of each other to produce behavior quickly
acquired and extremely resistant to extinction. Large payoffs occur
infrequently (but predictably in a statistical sense) and smaller
payoffs occur frequently (and predictably again, in a statistical
sense). 
Predictably in the sense that a $1 million payoff occurs on average
after every 100 million pulls or so. All games should incorporate
multiple reinforcement schedules by the use of multiple layers of
reward, be it screen completion, bonus scores, or sudden completion of
task. More important than the schedule is reward delivery, which should
be discrete like the hypothalamic shocks to rats I mentioned above.
Those rats literally died from starvation because they stimulated their
brain rather than sought food. That is the best advertisement and
evidence of the power of discrete rewards. Had the shocks come about
regardless of their actions, with peaks and valleys of activity perhaps
as they came near the bar they were suppose to press, then food
consumption would have been option. But they all died, because the
rewards were discrete and focal. 
Finally, one last aspect of operant conditioning is the stage of
immediate consolidation, the sensory pause after a reinforcement has
been given in order to strengthen the linkages between behavior and
response, and presumably to weaken the linkages with other behaviors in
this context). Thirty years ago, Sterman, Clemente, Marczynski (a decade
later) and colleagues quite clearly revealed the presence of a
consolidation period immediately after response and rewards. Few if any
learning theorists seemed to be aware of their discovery, however. I
reviewed the operant conditioning literature and except for those
mentioned above, I find nothing about immediate consolidation, probably
because outside of this field few scientists investigate EEG during
operant conditioning. 
In 1981 Ted Marczynski and colleagues identified how blocked
consolidation led to slower learning in cats. Kaiser (1994) documented
this process in humans, perhaps for the first time. Learning is a
two-step discrete process that involves sampling of the environment
followed by consolidation of associations. This consolidation period is
evident in one's EEG as a post-response synchronization, i.e., a
dominant frequency burst after response and reinforcement. The beep and
visual reward signals to the client that the desired behavior was
performed, now it is time to consolidate. The next couple of seconds are
spent strengthening the internal linkages, an unconscious process that
can be derailed when the environment prods the client for more
behaviors.  When there is no break in training, either the client makes
one him or herself, or a client continues to sample the environment even
after the behavioral criteria is met, which is essentially informing
them that the goal behavior was not the goal. 
In my study, subjects who failed to alpha burst, missed the material
they needed to process as shown by missing the items on later
recognition tests. It's strange that so little is understand about this
part of learning, despite Skinner's work. 
But of course Skinner didn't use electrodes.
 
Ben Reitsma
 
 

Other related posts: