Data analysis without programming

towards-data-science

This post was originally published by Victor Silva at Towards Data Science

To start, I will download historical data about the SPY from Yahoo Finance. To exactly reproduce this experiment, you will need to use data from January 28th, 1993 to October 3rd, 2020, you can download data here. If you use data from other sources or other periods, you might obtain slightly different results.

Screenshot of the SPY data in Excel

After opening the .csv file that you just downloaded, your data should look like the one in the image on the left. We will only use one field from this data: Adj Close. Feel free to delete columns A, B, C, D, E, and G by clicking on the header and clicking delete.

The next step is to calculate the percentual price change (R). The formula to do so is R = (Close-Open)/Open.

Excel implementation of percentual price change

I have included the excel formula in the image on the left. In lay terms, we are calculating how much the price changed from yesterday to today.

Reproduce your formula for the whole column B, calculating the values for 6971 days of data that we have collected. Next, we will identify the unique values contained in the collection of changes that we have just computed. We will copy and paste the whole of column B into column E, and we will use the “Remove duplicates” tool, under the Data menu.

Unique change values of the SPY over the years.

We already have some interesting insight about the SPY price changes. We know that the maximum that it has changed in a single day was +15%. We also found that the maximum devaluation was -11%. These insights bring an interesting question: “How often does each of these changes happen?”. If answered, this question can give us even more insights about what the daily change patterns of the SPY are each day.

Fortunately, Excel gives us the formula to count how many of each of these values have occurred. The =COUNTIF(RANGE, CONDITION) formula counts how many times in the RANGE, the condition was met.

Counting how many times the value -11% happened

I demonstrate the usage of the COUNTIF formula in the image on the left. There, I count the occurrences of -11%.

If we apply the formula to all values, we will have the counts for each value in the data that we have just collected. We can compute the percentual frequency of each event. With that data, we can produce a histogram plot.

Price change occurrence probabilities

Now we have even more insights:

  • There is a nearly 50% probability that the price will not change at all.
  • The probability of the price going up by 1% is higher than the probability of the price going down by 1%

There is room for error in this analysis: the probability of price increasing by 1% is an isolate probability. That means that we have just calculated the changes in the price increasing by exactly 1%. More often, we are interested in knowing then the price will be increasing by at least 1%. Similarly, we are interested in knowing when the price will decrease by at least 1%.

To calculate these aggregate values, we can simply sum the probabilities of the associated events. For example, to calculate the probability of the price decreasing by at least 10% we can sum the probability of the price decreasing 11% and the probability of the price decreasing by 10%. If we fill the chart with those calculations, we create a Spreadsheet demonstrating Isolate and Aggregate probabilities.

We see that the actual probability of the price decreasing by any percentage is 23.3%, while the probability of the price increasing by any percentage is 28.4%. We obtained these values by summing up the probabilities of change smaller than zero and the probabilities of change greater than 0, respectively. To improve our data visualization, we have also applied a colour scale from the Conditional Formatting menu.

Spread the word

This post was originally published by Victor Silva at Towards Data Science

Related posts