Einrichtungen >> Technische Fakultät (TF) >> Department Informatik (INF) >> Lehrstuhl für Informatik 2 (Programmiersysteme) >> |
How far is far? Evaluation, Visualization, and Interpretation of RNNs on Physically Correct Movements (i2M00434)
- Art der Arbeit:
- Master Thesis
- Feigl, Tobias
Lehrstuhl für Informatik 2 (Programmiersysteme)
Lehrstuhl für Maschinelles Lernen und Datenanalytik
Telefon +49 (0) 911 / 58061 3253, E-Mail: firstname.lastname@example.org
Lehrstuhl für Informatik 2 (Programmiersysteme)
Telefon +49-9131-85-27625, Fax +49-9131-85-28809, E-Mail: email@example.com
- Beschreibung der Arbeit:
- Background. In many applications, the ability to accurately predict the trajectory of objects such as cars or
pedestrians is of great value. For example, autonomous cars must judge the future trajectories of cars,
pedestrians, and other nearby people to drive safely. In particular, the unpredictable intent of pedestrians often
changes their movement quickly and abruptly. Therefore, it is difficult to describe these changes physically
correct. In these situations, accurate prediction of future positions is an elaborate task. Therefore, modern
methods such as Kalman Filters (KF) have difficulty to obtain accurate position estimates. This is even more
difficult as the sensor data that represents the motion suffers from nonlinear noise. Therefore, classical methods
cannot interpret whether the sensor data represents a real movement or noise, as these models usually rely either
on their internal state or on the sensor measurements. In general, this results in the position estimates of such
model-based approaches to deviate from the actual position.
Since artificial neural networks (ANNs) provide compelling results over a wide range of tasks [1, 2, 3, 6], it is
reasonable to study ANNs for their applicability on positioning tasks. Thus, it can be determined whether and
how it is possible to predict future positions of an object based on its past trajectory. A special type of ANN, the
Recurrent Neural Network (RNN), processes time series and sequential data. At each time step, an RNN
provides an output vector as well as a hidden state. This hidden state is fed back into the network at the next time
step to obtain a fixed size compressed memory for the previous and current state. In this way, the network can
predict and include the context of data. The memory effect has significant advantages over conventional ANNs,
as many interesting relationships in the input data only occur when considering multiple data points over time. A
variety of specific recurrent "cells" have been proposed, such as the Elman network  (EN), the long-term
short-term memory (LSTM) of Hochreiter and Schmidhuber , or the Gated Recurrent Unit (GRU) of Cho et
al. . The EN recalculates the hidden state at each time step as a weighted combination of the input and the
previous hidden state. The combination is then fed through an activation function. Both LSTM and GRU differ
in that they use so-called "gates" to modify their hidden state. This has been suggested to counteract the fact that
in an EN, the gradient for long-term dependencies disappears or explodes, similar to other deep ANNs. Beyond
these basic cell designs, various variations in architectures (e.g., encoder-decoder networks  or attention
methods ) have been proposed.
RNNs have previously been applied to trajectory prediction and have surpassed the baseline methods of prior
work, such as: by Feigl et al.  or by Altche et al. . However, researchers and engineers have not fully
understood how RNNs model motion physics internally. For example, a KF uses its internal state to explicitly
model a position from the speed and acceleration of the physical components. While complex RNNs surpass the
prediction accuracy and dynamics of a KF, there is no understanding of how motion physics is represented by its
hidden state. In addition, it is unknown if the network performs a computation, interpolates the data, or simply
creates a "lookup table". This interpretability and understanding are important in many applications, especially
when explanatory ability is required to demonstrate the reliability, safety, and quality of the results of a network,
thereby avoiding potential accidents. There is some preliminary work that focuses on the explainability of
RNNs: Arras et al.  used layer-by-layer relevance propagation to measure the influence of certain words on
the impact of sentiment analysis. This allows to measure the relevance of certain inputs for a classification task,
but cannot be directly extended to the continuous regression examined in this thesis. Karpathy et al. 
recorded the activations of certain cells and the network to visualize the hidden state. They examined cells with
interpretable activations, e.g., a cell that correlates with the length of a sentence. However, they also found many
cells without interpretable activations. This thesis extends on some of their visualizations and techniques,
potentially leading to further understanding. To the best of our knowledge, there is no publicly available research
that interprets and explains RNNs that model movements.
Goal and research focus. The goal of this work is to provide insights into the functionality and processing of
RNNs for simulated motion physics. In order to interpret RNNs on movement data, the thesis candidate first
creates a simulator for physically correct movement. This simulation will create realistic trajectories with
adjustable parameters. These parameters will include, for example, average speed, average acceleration, fast and
abrupt orientation changes, and the extent of gaussian and “random walk” noise (as is typical for sensor data )
in the measurements. The simulator to be examined can thus generate physically correct movement data, for
example for pedestrians, cyclists, or cars. In addition, the simulation allows the thesis candidate to obtain the
ground truth values, i.e., the labels for each physical state, its noiseless position, and speed and acceleration at
every time step. This knowledge enables the candidate to investigate both the objective and subjective
interpretability of RNN processing.
Second, the thesis candidate will conduct a comprehensive grid search for models of the family of RNNs and
their derivatives. Areas of the search are the number of parameters, the number of layers, the cell design, and the
connectivity of the networks. With the help of the high-performance deep learning cluster of Fraunhofer IIS, this
work provides new insights into the performance and quality of the networks. The candidate will compare the
models with two baseline methods: a traditional KF by Feigl et al.  and a temporal convolutional network
(TCN), as described by Bai et al. . Additionally, the thesis candidate will vary the inputs and outputs of a
subset of the models to evaluate the importance of auxiliary inputs (such as velocity and acceleration). Based on
these results, the thesis candidate will attempt to derive general statements about structures and parameters that
enable RNNs to perform well for the given task. While the main focus is on the training and evaluation of the
simulated data, the thesis will evaluate a subset of models on real-world data that were previously acquired at
Fraunhofer IIS. The candidate will use the real-world data to validate the results based on the simulation.
Third, the subjective interpretability of RNNs in the processing of motion data is examined. The thesis will
therefore investigate whether there are interpretable cell activations that correspond to the current position,
velocity, or acceleration, i.e., correlate with these, on the basis of ground truth data from the simulation. This is
based on the work of Karpathy et al. , but adapted to the respective movement model task. Based on this
information, the thesis will try to conclude whether the models learn a meaningful physical model of the
underlying data or if the models use a different, uninterpretable representation. This thesis will also analyze how
the model uses both the network structure (i.e., the structure of the recurrent cell, in particular the "gates" used
by LSTM and GRU), and the entire network topology for the predictions. To achieve these goals, the candidate
will develop and utilize various hidden state and structure visualization and interpretation techniques, based on
work by Karpathy et al. , dependency measures including correlation measures, transfer entropy , mutual
information flow , regression analysis, and graph layouts, and will assess their applicability to this task.
Fourth, the thesis candidate will look at how accurate and how far into the future RNN motion models are able to
predict the future, i.e., to extrapolate. This is interesting especially in situations of sudden changes in direction
where a reliable and robust model should “get back on track” as quick as possible. The thesis candidate will
provide a comparative analysis of how some of the above models deal with sudden changes in movement.
Expected results and contributions:
Simulator that is customizable and generates physically correct and interpretable motion data;
Evaluation of different architectures of the RNN family for the prediction of simulated movement data;
Visualization and interpretation of how RNNs model physical movements;
Visualizations and metrics that describe why some architectures outperform others in terms of motion data;
Subjective and objective metrics describing the ability to predict future trajectories for motion models, specially in abrupt movement changes.
 Florent Altché and Arnaud de La Fortelle. “An LSTM network for highway trajec- tory prediction”. In: IEEE 20th International Conf. on Intelligent Transportation Systems (ITSC), Auckland, New Zealand, pp. 353–359, 2017.
 Leila Arras, Grégoire Montavon, Klaus-Robert Müller, and Wojciech Samek. “Explaining Recurrent Neural Network Predictions in Sentiment Analysis”. In: Proc. 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Copenhagen, Denmark, pp. 159–168, 2017.
 Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. “Neural Machine Trans- lation by Jointly Learning to Align and Translate”. In: 3rd Int. Conf. on Learning Representations (ICLR), San Diego, CA, USA, 2015.
 Shaojie Bai, J. Zico Kolter, and Vladlen Koltun. “An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling”. arXiv:1803.01271, 2017.
 Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. “Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation”. In: Proc. 2014 Conf. on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1724–1734, 2014.
 Douglas Eck and Jürgen Schmidhuber. “Finding temporal structure in music: blues improvisation with LSTM recurrent networks”. In: Proc. 12th IEEE Workshop on Neural Networks for Signal Processing (NNSP), Martigny, Valais, Switzerland, pp. 747–756, 2017.
 Jeffrey L. Elman. “Finding Structure in Time”. Cognitive Science 14(2): 179–211, 1990.
 Naser El-Sheimy, Haiying Hou, and Xiaoji Niu. “Analysis and Modeling of Inertial Sensors Using Allan Variance”. In: IEEE Trans. Instrumentation and Measurement 57(1): 140–149, 2007.
 Tobias Feigl, Thorsten Nowak, Michael Philippsen, Thorsten Edelhäußer, and Christopher Mutschler. “Recurrent Neural Networks on Drifting Time-of-Flight Measurements”. In: Proc. 2018 Int. Conf. on Indoor Positioning and Indoor Navigation (IPIN), Nantes, France, pp. 206– 212, 2018.
 Sepp Hochreiter and Jürgen Schmidhuber. “Long Short-Term Memory”. In: Neural Computation 9(8): 1735–1780, 1997.
 Andrej Karpathy, Justin Johnson, and Fei-Fei Li. “Visualizing and Understanding Recurrent Networks”. arXiv:1506.02078, 2015.
 Thomas Schreiber. “Measuring Information Transfer”. In: Phys. Rev. Lett. 85, pp. 461–464, 2000.
 Ravid Shwartz-Ziv and Naftali Tishby. “Opening the Black Box of Deep Neural Networks via Information”. arXiv: 1703.00810, 2017.
|Die Arbeit ist bereits abgeschlossen.|
|Bearbeiter: ||Lukas Schmidt|
||UnivIS ist ein Produkt der Config eG, Buckenhof