Sunday, 26 February 2017

Evidence for the Kinematic Specification of Dynamics

Gibson’s most detailed analysis of the KSD problem came from work on the perception of dynamic occlusion (viewing one surface become progressively hidden behind another as they move; Gibson, Kaplan, Reynolds & Wheeler, 1969; Kaplan, 1969). As one surface goes behind another, the sensations coming from the rear surface stop hitting the retina; they disappear. However, was is perceived is the progressive occlusion of a persisting surface; it is not disappearing, it is going out of view. Gibson and his students identified the kinematic pattern of transformation of the optic array that was specific to occlusion and distinguished it from the pattern specific to a surface actually going out of existence. In the former case, optical texture from the rear surface is progressively deleted over time from the optic array at an edge as it goes in behind the closer surface, and that texture progressively accretes as it comes back into view. In the latter case, there are a variety of transformations depending on how the surface is disappearing (melting vs being eaten, etc). Each event creates a specific optical pattern, but these patterns are not identical to the underlying dynamics. Observers, however, readily and easily perceive and report the underlying dynamics, not the optical patterns. Additional evidence that people are perceiving the dynamics comes from work in multiple object tracking (Scholl & Pylyshyn, 1999). People can track multiple moving targets over time, and can continue to do so even if the objects move in and out of view, but only if they do so in an occlusion event. If the objects go out of view by imploding, tracking goes to chance. In the occlusion case, the visual attention system continues to perceive a persisting object and can often pick it back up when it returns to view. In the imploding case, this system perceives that the object has ceased to exist, and it no longer tracks it. 

Around the time Gibson was working, other research began to point to the fact that when presented with kinematic information, people used it to perceive the underlying dynamic (if there was one), and not the motions per se. Michotte (1963) experimentally investigated the perception of cause and effect in collisions. There was a long-standing philosophical stance (Hume) that claimed people must infer (dynamical) cause and effect from (kinematic) motions by learning that one thing tends to lead to another. In a collision, for example, the motions are that one ball moves and comes into contact with another, and the second then begins to move. In our current framing, the underlying causal structure is dynamical (a launching event) but the information is kinematic (the pattern of motions), and for Hume, this entails inference. In a long series of experiments, Michotte demonstrated exquisite perceptual sensitivity to the underlying dynamics. For example, when there was no delay between the balls coming into contact and the second one moving, people perceived the dynamical event of the first ball ‘launching’ the second. Adding small delays creates a different underlying dynamic and people’s perceptions changed accordingly. The key result from dozens of studies was that the motions were never perceived as such; only the underlying dynamics, and with exquisite sensitivity (see Scholl & Tremoulet, 2000 for a review). 

Event Perception

This work evolved in to the field known as event perception (Johansson, von Hofsten & Jansson, 1980) which explicitly tackles the question of how we use information for dynamical events using point light displays (PLDs). Johansson (1950) had invented and used PLDs to demonstrate that patterns of 2D motion that were projections of coherent 3D motion were perceived as the 3D event, and he extended the technique to study the perception of biological motion too (Johansson, 1973). 
Point-light displays are made by recording (or simulating) all the moving parts of the dynamic event to be studied (e.g. the joints of a person walking). The experimenter can then play back the recorded motion using just dots on a screen, without any of the ‘pictorial’ information usually present in the event (e.g. clothing, facial features, hair, or any other features that might serve as cues for judgments about the walking person) , and this allows the experimenter to ask what people are able to perceive about the dynamic event using only this kinematic information. 

Johansson and those who followed quickly discovered that these displays are a rich source of information about event dynamics. Once moving, the collection of dots immediately perceptually resolve into the event (‘that is a person walking’) and the displays enable many accurate judgments to be made about the event. Some examples include perceiving the size of a moving animal at varying distances (Jokisch & Troje, 2003), perceiving the weight of objects lifted by others (Runeson & Fryhholm, 1981) or where an object was thrown (Zhu & Bingham, 2014). These displays also support the perception of a wide variety of social properties, from a person’s identity (e.g. Loula et al, 2005) to their gender (e.g. Barclay et al, 1978; Troje, 2003); from social dominance (e.g. Montepare & Zebrowitz-McArthur, 1988) to intent to deceive (Runeson & Frykholm, 1983) and vulnerability to attack (Gunns et al, 2002). See Blake and Shiffrar (2007) for a recent review. 

Humans are extremely perceptually sensitive to these patterns from very early in life and this sensitivity is specific to patterns caused by coherent dynamical events. For example, infants as young as 4 months old prefer to look at point light displays of human motion over equally complex random motion or even inverted  point light displays of human motion (e.g. Berenthal, 1993; Fox & McDaniel, 1982) and there is even evidence for this kind of preference in 2 day old babies (Simon, Regolin & Bulf, 2008). We therefore come ready to be quickly competent in the detection of information. 

Kinematic information and dynamical events are not identical, however, so the relation between them must still be learned. While infants can readily discriminate between point-light displays and show preferences for dynamically organised patterns, they are not immediately able to use these patterns as information for those dynamics. Wicklegren & Bingham (2001) presented 8 month old infants with PLDs of events (rolling balls, splashing and occlusion) running either forwards or backwards in time. The events were chosen because they are asymmetric in time and only happen one way round in the real world (making the backwards events ‘impossible’ events). Infants who habituated to the events running forwards recovered their looking times for events running backwards and vice versa (showing they could tell the difference between the displays). However, there was no difference in habituation rates between forwards versus backwards events (showing they were not sensitive to the underlying dynamics which determined whether the event was possible or impossible). 

Optic Flow

When observers move relative to their environments, optical texture moves in ways that are specific to and therefore informative about the dynamics of that motion. This optical motion is optic flow. Analytically, optic flow is represented as a spherical field of vectors centred on the observer’s point of observation, with each vector representing the direction and magnitude of the optical motion of a textured element in the scene. As we shall see, this field is richly informative about a variety of dynamical events. 

Self-Motion from Optic Flow

Higher order relations between elements in global optic flow specify all elements of the dynamics of self-motion (e.g. heading, direction, and speed) as well as many critical elements of the dynamics of the scene being moved through. 

  • An observer locomoting through a scene will experience optic flow emanating from a source coincident with the direction of heading; this is referred to as the focus of expansion (FOE). The angular velocity of texture elements increases smoothly from zero at the FOE to a maximum 90° from the heading direction, and then decreases smoothly back to zero at a focus of contraction (FOC) behind the observer and directly opposite the FOE. A simple control law then becomes possible; you can control heading by moving so that the FOE coincides with the required location.
  • The direction components of this higher order flow pattern are invariant with the distance from the point of observation to the textured elements of the world creating the optical texture, and these therefore specify locomotion on a heading in all possible scenes. 
  • Triangulating any pair of flow vectors provides an estimate of the location of the FOE even when you are not looking at it. This estimate improves with more pairs and so the dense flow field specifies heading no matter the observer’s current fixation. (Gibson would often joke that he could still see where he was going while driving even if he wasn’t looking out the front windscreen.)
  • The velocity component of the vectors does vary with distance from the observer; in a given direction, optical velocity decreases as a linear function of distance (motion parallax). This relation provides ordinal distance perception (relative depth perception). 
  • The optic flow we detect does not simply come from a rigid translation, however. Eye rotation adds a second component to the vector field that is independent of our locomotion. One practical upshot is that the FOE of the detected flow field is shifted away from the actual direction of heading, meaning that heading must be recovered. The flow field itself contains information to solve this problem; in an actually 3D environment, the independent contributions of observer translation and eye rotation to the detected flow field are specified by different elements of the field. This is combined (in some instances) with non-visual sources of information about, for example, the eye movements causing the rotation, and these allow heading to be perceived (Warren, 2008). 

There are multiple research programmes investigating the details of when and how optic flow is used to control locomotion (Gibson & Crooks, 1938; Warren & Hannon, 1988 etc; Wann & Land, 2000; Wilkie & Wann, 2003, etc; Fajen and Warren, etc; Rushton, etc). 

Other-Motion from Optic Flow

Self-motion is specified by properties of global optic flow. The motion of other objects and organisms in the environment is specified by properties of local optic flow. Anything that is moving independently of the observer creates a patch of optic flow that is discontinuous with the global pattern. Specifically the vectors representing the texture elements along the edge of the object have either a different direction or different magnitude or both compared to their neighbours. The location of the discontinuity specifies the edge of the object, and the dynamics of the object’s motion is specified by the pattern of accretion and deletion of optical texture from surfaces behind the object as it moves across those more distance surfaces. There are a variety of local flow variables that specify a variety of dynamical properties of moving others.

  • Biological or not? The dynamics of animate vs inanimate objects can be distinguished primarily by the fact that inanimate objects must move in accordance with conservation laws (e.g. conservation of energy). Animals routinely violate these laws by continuously injecting energy into their movements to, for example, resist the force of gravity. Observers are highly sensitive to motions that violate these laws (Twardy & Bingham, 2002). The way organisms achieve these violations have specific kinematic consequences, and as reviewed above, we are able to use those kinematics as information for a wide variety of animal dynamics (reviewed in Blake & Shiffrar, 2007). 
  • Where is the object going? The direction of an object’s motion relative to an observer is specified by relations between the local and global flow. If the objects’ optical size increases (i.e. it is progressively occluding other flow) the object is approaching. If the rate of this increasing optical size is symmetrical across the local flow, the object is heading right for your eyes. If the rate is slower to one side of the object, the object will miss you on that side by an amount proportional the rate difference. If the optical size is decreasing, the object is moving away from; and so on. 
  • When will it arrive? This is one of the most important questions about the motion of an approaching object, and the study of time-to-contact (TTC) is an appropriately busy field. Lee (1976) was the first to consider this problem informationally, and proposed the variable tau (τ). Τ is defined as the ratio of an object’s angular size to the rate of change of angular size, and the ratio is equal to the time-to-contact under certain conditions (Hoyle, 1957). Those conditions (specifically constant approach velocity) makes τ’s scope quite narrow, so observers do not typically rely on it. There are a variety of other sources of information, however (see Tresilian, 1999 for a review of visual sources; Shaw, McGowan & Turvey, 1990 also identified an auditory variable). 
  • How can I intercept it? Moving objects are not always heading towards you, and sometimes you want to intercept that object. This requires information about both space and time, and hence variables such as τ aren’t especially useful (Bingham & Zaal, 2004). The most widely studied interception task is the famous outfielder problem (Chapman, 1968). The task is to visually guide oneself so as to be in the right place at the right time so as to intercept a ball. Prediction (Saxberg, 1987) does not work. It is too unstable because of the many variables that affect a ball’s trajectory (Adair, 1990; McBeath, Nathan, Bahill & Baldwin, 2008) and fielders show no sign of either typically using or even being able to use prediction at all (McBeath, Shaffer & Kaiser, 1995; Shaffer & McBeath, 2005). The relevant dynamics are those of projectile motion, and this creates two information based options: Linear Optical Trajectory (McBeath et al, 1995) and Optical Acceleration Cancellation (Chapman, 1968; Fink, Foo & Warren, 2009). Moving so as to make the optical motion of the ball follow the strategy ensures the fielder arrive at the right place at the right time without ever knowing where or when that will be. (These are examples of prospective control: moving with respect to a currently available information variable so as to achieve a future state). 

KSD in the Acoustic Array

While research on information is dominated by vision, there are examples from other modalities. Warren & Verbrugge (1984) demonstrated people could use acoustic information to perceive bouncing and breaking events, while Warren, Kim & Husney (1987) investigated the acoustic perception of elasticity in the control of a bounce pass. Button & Davids (2004) reviewed research on the acoustic perception of time-to-contact and for the control of interception, and found strong evidence that there is such information and that people use it. A related field (although not typically explicitly grounded in the kinematic specification of dynamics) is the extensive investigation of echolocation on animals such as bats and dolphins (see Thomas, 2004), and orca whales (e.g. Au, Horne & Jones, 2010). There is also growing interest in studying echolocation by humans (see Kolarik, Cirstea, Pardhan & Moore, 2014). These studies are all working to show how animals use sound to track dynamical properties of their environments (e.g. the identity, location and motion of prey or predator animals). Finally, Gaver (1993) has proposed a detailed ecological framework for studying ‘everyday listening’.

No Information, No Stable Behaviour

One final strand of evidence for the central role ecological plays in behaviour comes from research looking at what happens when that information is absent. 

During locomotion, steps are prepared and assembled slightly ahead of time on the basis of prospective (typically visual) information about the upcoming surface (e.g. Fajen, 2013; Matthis, Barton & Fajen, 2015; Matthis & Fajen, 2014). This prospective control allows for stride alterations to be made efficiently and in plenty of time. One change that requires altering your stride is any changes in friction; failing to account for an increase or decrease in friction can make you trip or slip, respectively. However, it turns out there is no prospective information about upcoming changes in friction. There cannot be, because friction is not a property of the upcoming surface alone. It is a dynamical property of the interaction between two surfaces (here the foot and the surface) and simply does not exist ahead of time. Because it does not exist ahead of time, it cannot create information ahead of time, and the result is that you cannot adjust your stride with respect to a change in friction until you come in contact with the surface, when it is often too late. Friction related falls therefore account for about half of falls in the USA and the types of injuries (broken collarbones, wrists) indicate reactive responses to the slip (Courtney, Sorock, Manning, Collins, & Holbein-Jenny, 2001).

Two studies have investigated the prospective perception of friction in the context of locomotion. Joh, Adolph, Campbell & Eppler (2006) showed that people seem to be using shine to judge slipperiness, but shine perception was unreliable and influenced by factors which don't impact on friction (such as surface distance and colour) and shine does not actually co-vary reliably with slipperiness anyway. Joh, Adolph, Narayanan & Dietz (2007) had adults stand on flat low, medium and high friction surfaces, and make judgements about whether they could walk down the same surface once it sloped. The results across four experiments were broadly consistent; participants were extremely poor at identifying when they could safely traverse the sloped surface. Not only were judgements highly variable, but worse, the errors were in the least useful direction; participants systematically underestimated their ability to walk down high friction slopes, and overestimated their ability to walk down low friction surfaces.

Friction is a dynamical property that only exists when two surfaces are in contact, and there is therefore no information present prior to that contact. In the absence of this prospective information, steps must be adjusted at the moment when friction changes, rather than ahead of time, and the result is highly unstable and unsafe behaviour. The absence of information for a property makes stable behaviour with respect to that property impossible (another example is what happens during white-out conditions, in which there is light but no structure in light and therefore no information, leading people to get lost or crash). 


  1. "Friction is a dynamical property that only exists when two surfaces are in contact, and there is therefore no information present prior to that contact."

    I'm familiar with some of the literature cited, but this still strikes me as odd. In practice, people anticipate slipperiness on a regular basis. I guess the question is whether it is "specified" or if there are just "cues", right? For example, in a grocery store, a person with a mop and a much shinier swath of ground indicate slipperiness. I might be hesitant to say "specify" slipperiness, but people definitely notice, and their perception-action systems adjust accordingly.

    P.S. Have you been getting my emails? Did your email change?

    1. Check the Adoplh papers mentioned here . There is no information but people do try and use cues, except none of them are actually any good. In addition, because they aren't actual information, they cannot support online action control, only action selection (e.g. selecting to move slower but unable to make you pick a specific slow speed)

      I'll reply now :)