Entropy application in the Stock Market


This post was originally published by Marco Cerliani at Towards Data Science

A lot of definitions and formulations of entropy are available. What in general is true is that entropy is used to measure information, surprise, or uncertainty regarding experiments’ possible outcomes. In particular, Shannon entropy is the one that is used most frequently in statistics and machine learning. For this reason, it’s the focus of our attention here.

Surprise and uncertainty are daily concepts in the financial market. So using the entropy as an instrument to explore the market sounds like a very spicy idea. What we expect is to reveal a remarkable pattern between the new measure and the volatility of the assets’ prices over time.

Considering our aims, I think it’s valuable to introduce the standard approach and considerations provided in this work. The authors introduced the concept of Structural Entropy and used it to monitor a correlation-based network over time with application to financial markets.

For our analysis, we use daily closing prices from a dataset collected on Kaggle. It stores 32 stocks, from different market sectors, that were traded continuously from 2000 to 2018. For each stock in the data set, we derive a time series of its daily log-returns. Log-differencing the price (log-returns) can generate stationary and normally distributed signals suitable for our scope.

We start exploring the presence of turbulence in the data with conventional measurements. Volatility is a statistical measure of the dispersion of returns for a given market index. This measure refers to the level of uncertainty or risk associated with the size of changes in the market. A high volatility level corresponds to a high range of fluctuations in the prices of the stocks. This means that the price of an asset can change dramatically over a short period in either direction. A lower volatility means that the asset value does not fluctuate dramatically, and tends to be more steady.

Individual (blue) and median aggregation (red) statistics generated with a sliding window approach

In our case, the high volatility periods are registered in the early years of the new millennium (dot-com bubble), after 2008 (recent financial crisis), and in some subsequent periods.

One particularly interesting idea is the representation of financial markets as correlation-based networks. In the case of financial markets, network nodes are financial assets and network edges are interactions between them, where such an interaction is typically measured by the magnitude of price correlations over time. The representation of financial markets as networks is valuable to identify turbulence or structural breaks.

Based on this community structure, Structural Entropy is a measure to quantify the level of structural diversity in a given network. In this framework, structural entropy refers to the level of heterogeneity of nodes in the network, with the premise that nodes that share functionality or attributes are more connected than others.

To calculate the Structural Entropy in a given time range, we need to follow a defined workflow:

  • Measure the Pearson correlation of the series, resulting in an NxN symmetric matrix.
  • Create an adjacency matrix as the representation of the network edges. A standard approach is to use a threshold to determine which values of the correlation matrix will be transformed into edges in the network.
  • On the adjacency matrix, we apply a community detection algorithm (connected components in our case).
  • The resulting labels (vector of integers) of the clustering procedure are used to calculate the classical Shannon entropy. More specifically, we compute the entropy on the cluster count frequencies. The resulting value is defined as the Structural Entropy of a network.

Putting these steps together in a sliding window procedure, we can monitor the dynamics over time in the system of our interest. The result is a new single time series of Structural Entropy values.

Spread the word

This post was originally published by Marco Cerliani at Towards Data Science

Related posts