Informationssystem der Friedrich-Alexander-Universität Erlangen-Nürnberg © Config eG 
FAU Logo
  Sammlung/Stundenplan    Modulbelegung Home  |  Rechtliches  |  Kontakt  |  Hilfe    
   Räume   Forschungs-
   Telefon &

 Außerdem im UnivIS
Vorlesungs- und Modulverzeichnis nach Studiengängen





Einrichtungen >> Technische Fakultät (TF) >> Department Informatik (INF) >> Lehrstuhl für Informatik 2 (Programmiersysteme) >>

How far is far? Evaluation, Visualization, and Interpretation of RNNs on Physically Correct Movements (i2M00434)

Art der Arbeit:
Master Thesis
Feigl, Tobias
Lehrstuhl für Informatik 2 (Programmiersysteme)
E-Mail: tobias.feigl@fau.de

Mutschler, Christopher
Lehrstuhl für Maschinelles Lernen und Datenanalytik
Telefon +49 (0) 911 / 58061 3253, E-Mail: christopher.mutschler@fau.de

Philippsen, Michael
Lehrstuhl für Informatik 2 (Programmiersysteme)
Telefon +49-9131-85-27625, Fax +49-9131-85-28809, E-Mail: michael.philippsen@fau.de

Beschreibung der Arbeit:
Background. In many applications, the ability to accurately predict the trajectory of objects such as cars or pedestrians is of great value. For example, autonomous cars must judge the future trajectories of cars, pedestrians, and other nearby people to drive safely. In particular, the unpredictable intent of pedestrians often changes their movement quickly and abruptly. Therefore, it is difficult to describe these changes physically correct. In these situations, accurate prediction of future positions is an elaborate task. Therefore, modern methods such as Kalman Filters (KF) have difficulty to obtain accurate position estimates. This is even more difficult as the sensor data that represents the motion suffers from nonlinear noise. Therefore, classical methods cannot interpret whether the sensor data represents a real movement or noise, as these models usually rely either on their internal state or on the sensor measurements. In general, this results in the position estimates of such model-based approaches to deviate from the actual position.
Since artificial neural networks (ANNs) provide compelling results over a wide range of tasks [1, 2, 3, 6], it is reasonable to study ANNs for their applicability on positioning tasks. Thus, it can be determined whether and how it is possible to predict future positions of an object based on its past trajectory. A special type of ANN, the Recurrent Neural Network (RNN), processes time series and sequential data. At each time step, an RNN provides an output vector as well as a hidden state. This hidden state is fed back into the network at the next time step to obtain a fixed size compressed memory for the previous and current state. In this way, the network can predict and include the context of data. The memory effect has significant advantages over conventional ANNs, as many interesting relationships in the input data only occur when considering multiple data points over time. A variety of specific recurrent "cells" have been proposed, such as the Elman network [8] (EN), the long-term short-term memory (LSTM) of Hochreiter and Schmidhuber [10], or the Gated Recurrent Unit (GRU) of Cho et al. [5]. The EN recalculates the hidden state at each time step as a weighted combination of the input and the previous hidden state. The combination is then fed through an activation function. Both LSTM and GRU differ in that they use so-called "gates" to modify their hidden state. This has been suggested to counteract the fact that in an EN, the gradient for long-term dependencies disappears or explodes, similar to other deep ANNs. Beyond these basic cell designs, various variations in architectures (e.g., encoder-decoder networks [5] or attention methods [3]) have been proposed.
RNNs have previously been applied to trajectory prediction and have surpassed the baseline methods of prior work, such as: by Feigl et al. [8] or by Altche et al. [1]. However, researchers and engineers have not fully understood how RNNs model motion physics internally. For example, a KF uses its internal state to explicitly model a position from the speed and acceleration of the physical components. While complex RNNs surpass the prediction accuracy and dynamics of a KF, there is no understanding of how motion physics is represented by its hidden state. In addition, it is unknown if the network performs a computation, interpolates the data, or simply creates a "lookup table". This interpretability and understanding are important in many applications, especially when explanatory ability is required to demonstrate the reliability, safety, and quality of the results of a network, thereby avoiding potential accidents. There is some preliminary work that focuses on the explainability of RNNs: Arras et al. [2] used layer-by-layer relevance propagation to measure the influence of certain words on the impact of sentiment analysis. This allows to measure the relevance of certain inputs for a classification task, but cannot be directly extended to the continuous regression examined in this thesis. Karpathy et al. [11] recorded the activations of certain cells and the network to visualize the hidden state. They examined cells with interpretable activations, e.g., a cell that correlates with the length of a sentence. However, they also found many cells without interpretable activations. This thesis extends on some of their visualizations and techniques, potentially leading to further understanding. To the best of our knowledge, there is no publicly available research that interprets and explains RNNs that model movements.

Goal and research focus. The goal of this work is to provide insights into the functionality and processing of RNNs for simulated motion physics. In order to interpret RNNs on movement data, the thesis candidate first creates a simulator for physically correct movement. This simulation will create realistic trajectories with adjustable parameters. These parameters will include, for example, average speed, average acceleration, fast and abrupt orientation changes, and the extent of gaussian and “random walk” noise (as is typical for sensor data [7]) in the measurements. The simulator to be examined can thus generate physically correct movement data, for example for pedestrians, cyclists, or cars. In addition, the simulation allows the thesis candidate to obtain the ground truth values, i.e., the labels for each physical state, its noiseless position, and speed and acceleration at every time step. This knowledge enables the candidate to investigate both the objective and subjective interpretability of RNN processing.
Second, the thesis candidate will conduct a comprehensive grid search for models of the family of RNNs and their derivatives. Areas of the search are the number of parameters, the number of layers, the cell design, and the connectivity of the networks. With the help of the high-performance deep learning cluster of Fraunhofer IIS, this work provides new insights into the performance and quality of the networks. The candidate will compare the models with two baseline methods: a traditional KF by Feigl et al. [6] and a temporal convolutional network (TCN), as described by Bai et al. [4]. Additionally, the thesis candidate will vary the inputs and outputs of a subset of the models to evaluate the importance of auxiliary inputs (such as velocity and acceleration). Based on these results, the thesis candidate will attempt to derive general statements about structures and parameters that enable RNNs to perform well for the given task. While the main focus is on the training and evaluation of the simulated data, the thesis will evaluate a subset of models on real-world data that were previously acquired at Fraunhofer IIS. The candidate will use the real-world data to validate the results based on the simulation.
Third, the subjective interpretability of RNNs in the processing of motion data is examined. The thesis will therefore investigate whether there are interpretable cell activations that correspond to the current position, velocity, or acceleration, i.e., correlate with these, on the basis of ground truth data from the simulation. This is based on the work of Karpathy et al. [11], but adapted to the respective movement model task. Based on this information, the thesis will try to conclude whether the models learn a meaningful physical model of the underlying data or if the models use a different, uninterpretable representation. This thesis will also analyze how the model uses both the network structure (i.e., the structure of the recurrent cell, in particular the "gates" used by LSTM and GRU), and the entire network topology for the predictions. To achieve these goals, the candidate will develop and utilize various hidden state and structure visualization and interpretation techniques, based on work by Karpathy et al. [11], dependency measures including correlation measures, transfer entropy [12], mutual information flow [13], regression analysis, and graph layouts, and will assess their applicability to this task.
Fourth, the thesis candidate will look at how accurate and how far into the future RNN motion models are able to predict the future, i.e., to extrapolate. This is interesting especially in situations of sudden changes in direction where a reliable and robust model should “get back on track” as quick as possible. The thesis candidate will provide a comparative analysis of how some of the above models deal with sudden changes in movement.

Expected results and contributions:

  • Simulator that is customizable and generates physically correct and interpretable motion data;

  • Evaluation of different architectures of the RNN family for the prediction of simulated movement data;

  • Visualization and interpretation of how RNNs model physical movements;

  • Visualizations and metrics that describe why some architectures outperform others in terms of motion data;

  • Subjective and objective metrics describing the ability to predict future trajectories for motion models, specially in abrupt movement changes.


  • [1] Florent Altché and Arnaud de La Fortelle. “An LSTM network for highway trajec- tory prediction”. In: IEEE 20th International Conf. on Intelligent Transportation Systems (ITSC), Auckland, New Zealand, pp. 353–359, 2017.

  • [2] Leila Arras, Grégoire Montavon, Klaus-Robert Müller, and Wojciech Samek. “Explaining Recurrent Neural Network Predictions in Sentiment Analysis”. In: Proc. 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Copenhagen, Denmark, pp. 159–168, 2017.

  • [3] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. “Neural Machine Trans- lation by Jointly Learning to Align and Translate”. In: 3rd Int. Conf. on Learning Representations (ICLR), San Diego, CA, USA, 2015.

  • [4] Shaojie Bai, J. Zico Kolter, and Vladlen Koltun. “An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling”. arXiv:1803.01271, 2017.

  • [5] Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. “Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation”. In: Proc. 2014 Conf. on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1724–1734, 2014.

  • [6] Douglas Eck and Jürgen Schmidhuber. “Finding temporal structure in music: blues improvisation with LSTM recurrent networks”. In: Proc. 12th IEEE Workshop on Neural Networks for Signal Processing (NNSP), Martigny, Valais, Switzerland, pp. 747–756, 2017.

  • [7] Jeffrey L. Elman. “Finding Structure in Time”. Cognitive Science 14(2): 179–211, 1990.

  • [8] Naser El-Sheimy, Haiying Hou, and Xiaoji Niu. “Analysis and Modeling of Inertial Sensors Using Allan Variance”. In: IEEE Trans. Instrumentation and Measurement 57(1): 140–149, 2007.

  • [9] Tobias Feigl, Thorsten Nowak, Michael Philippsen, Thorsten Edelhäußer, and Christopher Mutschler. “Recurrent Neural Networks on Drifting Time-of-Flight Measurements”. In: Proc. 2018 Int. Conf. on Indoor Positioning and Indoor Navigation (IPIN), Nantes, France, pp. 206– 212, 2018.

  • [10] Sepp Hochreiter and Jürgen Schmidhuber. “Long Short-Term Memory”. In: Neural Computation 9(8): 1735–1780, 1997.

  • [11] Andrej Karpathy, Justin Johnson, and Fei-Fei Li. “Visualizing and Understanding Recurrent Networks”. arXiv:1506.02078, 2015.

  • [12] Thomas Schreiber. “Measuring Information Transfer”. In: Phys. Rev. Lett. 85, pp. 461–464, 2000.

  • [13] Ravid Shwartz-Ziv and Naftali Tishby. “Opening the Black Box of Deep Neural Networks via Information”. arXiv: 1703.00810, 2017.

Die Arbeit ist bereits abgeschlossen.
Bearbeiter: Lukas Schmidt
Abgegeben am: 31.12.2019

UnivIS ist ein Produkt der Config eG, Buckenhof