A comprehensive survey on Deep Learning for Anomaly Detection


This post was originally published by Guansong Pang at Towards Data Science

Through such a review, we identify some exciting opportunities. Some of them are described as follows.

Exploring Anomaly-supervisory Signals

Informative supervisory signals are the key for deep anomaly detection to learn expressive representations of normality/abnormality or anomaly scores and reduce false positives. While a wide range of unsupervised or self-supervised supervisory signals has been explored, to learn the representations, a key issue for these formulations is that their objective functions are generic but not optimized specifically for anomaly detection. Current anomaly measure-dependent feature learning approaches help address this issue by imposing constraints derived from traditional anomaly measures. However, these constraints can have some inherent limitations, e.g., implicit assumptions in the anomaly measures. It is critical to explore new sources of anomaly-supervisory signals that lie beyond the widely-used formulations such as data reconstruction and GANs, and have weak assumptions on the anomaly distribution. Another possibility is to develop domain-driven anomaly detection by leveraging domain knowledge such as application-specific knowledge of anomaly and/or expert rules as the supervision source.

Deep weakly-supervised anomaly detection aims at leveraging deep neural networks to learn anomaly-informed detection models with some weakly-supervised anomaly signals, e.g.,, partially/inexactly/inaccurately labeled anomaly data. This labeled data provides important knowledge of anomaly and can be a major driving force to lift detection recall rates. One exciting opportunity is to utilize a small number of accurately labeled anomaly examples to enhance detection models as they are often available in real-world applications, e.g., some intrusions/frauds from deployed detection systems/end-users and verified by human experts. However, since anomalies can be highly heterogeneous, there can be unknown/novel anomalies that lie beyond the span set of the given anomaly examples. Thus, one important direction here is unknown anomaly detection, in which we aim to build detection models that are generalized from the limited labeled anomalies to unknown anomalies.

To detect anomalies that belong to the same classes of the given anomaly examples can be as important as the detection of novel/unknown anomalies. Thus, another important direction is to develop data-efficient anomaly detection or few-shot anomaly detection, in which we aim at learning highly expressive representations of the known anomaly classes given only limited anomaly examples. It should be noted that the limited anomaly examples may come from different anomaly classes and thus exhibit completely different manifold/class features. This scenario is fundamentally different from the general few-shot learning, in which the limited examples are class-specific and assumed to share the same manifold/class structure.

Large-scale Normality Learning

Large-scale unsupervised/self-supervised representation learning has gained tremendous success in enabling downstream learning tasks. This is particularly important for learning tasks, in which it is difficult to obtain sufficient labeled data, such as anomaly detection. The goal is to learn transferable pre-trained representation models from large-scale unlabeled data in an unsupervised/self-supervised mode and fine-tune detection models in a semi-supervised mode. Self-supervised classification-based anomaly detection methods may provide some initial sources of supervision for the normality learning. However, precautions must be taken to ensure that (i) the unlabeled data is free of anomaly contamination and/or (ii) the representation learning methods are robust w.r.t. possible anomaly contamination. This is because most methods implicitly assume that the training data is clean and does not contain any noise/anomaly instances. This robustness is important in both the pre-trained modeling and the fine-tuning stage. Additionally, anomalies and datasets in different domains vary significantly, so the large-scale normality learning may need to be domain/application-specific.

If you find the summarization of the survey paper interesting and helpful, you can read the full paper for detail.

[1] Guansong Pang, Chunhua Shen, Longbing Cao, Anton van den Hengel. “Deep Learning for Anomaly Detection: A Review”. 2020. arXiv preprint: 2007.02500.

Spread the word

This post was originally published by Guansong Pang at Towards Data Science

Related posts