POMDP Based Collision Avoidance Systems for Unmanned Aerial Vehicles


Traffic Alert and Collision Avoidance System (TCAS) is a family of airborne devices that function autonomously of the aircraft navigation equipment and independently of the ground-based air traffic control (ATC) system, and provide collision avoidance protection for a broad spectrum of aircraft types. TCAS reduces the risk of midair collisions between aircraft by providing two types of advisories to the pilots: Traffic Advisories (TA) assist the pilot in visual acquisition of the intruder aircraft, and Resolution Advisories (RA) recommend escape maneuvers in the vertical dimension to either increase or maintain the existing vertical separation between aircraft. Depending on the type of the equipment carried by the aircrafts in an encounter, TCAS can serve 3 levels of protection: just TA, TA and RA, and TA and coordinated RA.

An unmanned aerial vehicle (UAV) is an unpiloted aircraft. UAVs are currently used mostly in military applications and also in small but increasing number of civil applications such as firefighting and reconnaissance support in natural disasters. The subject of this research is to design collision avoidance systems for UAVS in order to incorporate them safely into the airspace.

Similar to TCAS, a UAV collision avoidance system (let us call it UCAS) should also function independently of the ground systems used to provide ATC services, and it should also take the dynamics and performance limits of the aircraft into consideration. But the rest of the design considerations for UCAS are generally very different. TCAS is basically an advisory system for the pilots. It relies on the pilot to execute the advisories (which should be selected carefully such that they do not seem confusing to the pilot), and it also has to take into account a few seconds of delay in the execution. On the other hand, UCAS is completely in charge of controlling the UAV during an encounter, and it can execute any selected collision avoidance action with no delay. A UAV can also maneuver more aggressively than a human carrying aircraft; therefore, UCAS can choose to exert higher accelerations and/or turn rates.

There are also other considerations such as cost-effectiveness of the equipment (sensors) onboard the UAV. Specifically, we focus on three different sensor types with different specifications: A UAV can be equipped with a TCAS sensor (omni-directional, provides range and angular measurements), a radar (limited field of view, provides range and angular measurements) or an electro-optical/infrared sensor (limited field of view, provides angular measurements). In this research, we also assess the level of protection provided by different sensor types by comparing them with each other, and also comparing them with an ideal sensor which does not have any measurement errors (and does never issue false negative or false positive readings).


To the outside world, UCAS can be considered as a black box that assumes control of the UAV during encounters with other aircraft. The input to this controller is the state of the UAV (position, velocity, and any other useful data that is available) and the periodic sensor data, and the output is a series of velocity and/or acceleration commands that steer the UAV away from potential hazard. Internally, we can think of four different logical components that make up a reasonable controller: The first component interprets the input and tries to deduce various information such as where the intruder plane might be, how fast it is moving, etc. The second component estimates the motions of the UAV and the intruder plane (by using kinematics and aircraft dynamics). The third component generates escape plans, and the fourth component evaluates those plans, picks the best one and decides on what command to issue. Such a controller can be articulated in detail as a Partially Observable Markov Decision Process (POMDP), and in this research we use POMDPs as our primary tool to model UCAS. A POMDP model consists of finite sets of states, actions and observations, an observation model, a transition model and a reward model. The solution to a problem modeled as a POMDP is called a policy, and it is a mapping from a given probability distribution over the states to the control action that maximizes the expected reward. There is a very nice and straightforward mapping between the four components of the controller and a POMDP specification: The observation model contains information about the input and how to interpret it (component 1). We can also easily describe the probability of false negative and false positive sensor readings in the observation model. The transition model contains kinematics and dynamics information necessary to estimate motion of the UAV and the intruder plane (component 2), and by using the policy (components 3 and 4), the controller can decide on the next command to be issued.

There are a few limitations of POMDPs. In order to model a problem as a POMDP, one needs to discretize states, actions and observations. This sometimes prevents the controller to behave optimally (for example, the magnitude of an acceleration/deceleration command should be chosen from a predefined finite set). And also there is the curse of dimensionality, so we cannot easily switch to a finer discretization. The curse of dimensionality also makes it computationally very costly to have nicely elaborated states and observations as well. As a result, we are also pursuing methods other than POMDPs to model UCAS. One promising method that we are especially interested in is using space-time formulations.

In our research, we collaborate with two researchers from MIT Lincoln Laboratory; Dr. James K. Kuchar and Dr. Mykel J. Kochenderfer. They provide domain expertise about many subjects such as aircraft dynamics and sensor characteristics. They also provide us access to their simulation system where we can test the designed controllers in various real encounter scenarios.

Work Done and Future Work

We started with simplified POMDP models for UCAS in order to get a better understanding of the problem. Our initial POMDPs had simpler sensors and observation models, and two dimensional state spaces (where each state contained information about only the relative vertical and horizontal ranges to the intruder plane). The transition models incorporated simple kinematics, and our controllers issued vertical velocity commands to control the UAV. We mostly used random walk to model the intruder behavior. Since our initial models were not detailed enough to be tested on Lincoln Laboratory’s simulation system, we developed some graphical simulation tools of our own to analyze and debug our designs.

Over the time, we carefully selected new features and added them one by one to our models, while trying to keep the size of the observation and state spaces reasonable at the same time. Currently we have realistic models for all three sensor types. Our state space is five dimensional (in addition to ranges, each state has an estimate of the intruder plane’s vertical and horizontal velocities, and information about UAV’s vertical velocity), and the transition models are capable of representing realistic kinematics and dynamics. We modified our models to issue acceleration commands rather than velocity commands (since this is a more natural way of controlling an airplane). In order to model the behavior of the intruder plane, we now have a parameterized random walk model (which can be configured to dynamically apply biases in certain motion directions). We implemented a software interface between our controller and Lincoln Laboratory’s simulation system, and we are able to test our controllers on that system using real encounter scenarios.

We are currently working on a few different aspects of the problem. The sizes of the POMDP models usually increase each time a new feature is added (which, in turn, increases the time taken by the controller to issue a command). Therefore, we are experimenting with the idea of preprocessing POMDP models to speed up computations (our main interest here is time efficiency rather than memory efficiency). Also, as new features are added, we can express other important criteria in our POMDP models. One such criterion is minimizing deviation from flight plan while avoiding collisions during encounters. We are working on different reward models that achieve this effect.

In all our models, we process sensor readings and convert them into some sort of vertical and horizontal range data (the estimate of where the intruder plane might be), and our controllers only issue vertical velocity or acceleration commands that affect just the altitude of the UAV. One of our important long-term research goals is to move from this two-dimensional representation to a three-dimensional representation and take full advantage of this new representation to better avoid collisions. In order to avoid the sudden increase in the size of the state space, we are thinking about using clever representations such as polar coordinates.

We also foresee that we will have to deal with additional important criteria (such as avoiding more than one intruder plane) and eventually the size of our POMDPs will increase beyond reasonable limits. Therefore, another goal for us is to pursue other models and techniques for our controllers (such as space-time formulations instead of a transition model, and randomized kinodynamic planning techniques instead of hard-to-compute policies). We have briefly experimented with simple space-time formulations using two-dimensional representations, and we will continue our research to develop a generic space-time formulation using a three-dimensional representation.


The Lincoln Laboratory portion of this work was supported under Air Force contract number FA8721-05-C-0002. The MIT CSAIL portion of this work was supported by the Office of the Director of Defense Research and Engineering. Interpretations, opinions, and conclusions are those of the authors and do not reflect the official position of the United States government.