We have already seen how time series models such as ARIMA can be used to make time series forecasts. While these models can prove to have high degrees of accuracy, they have one major shortcoming – they do not account for “shocks”, or sudden changes in a time series. Let’s see how we can potentially alleviate this problem using a model known as the Kalman Filter.
Panel data, along with cross-sectional and time series data, are the main data types that we encounter when working with regression analysis.
It is often the case that a dataset contains significant outliers – or observations that are significantly out of range from the majority of other observations in our dataset. Let us see how we can use robust regressions to deal with this issue.
The purpose of a variance-covariance matrix is to illustrate the variance of a particular variable (diagonals) while covariance illustrates the covariances between the exhaustive combinations of variables.
A sentiment analysis is a useful way of gauging group opinion on a certain topic at a particular point in time.
Using social media data, let us see how we can use the twitteR library to stream tweets from Twitter and conduct a sentiment analysis to determine current sentiment on gold prices.
In conducting probability analysis, the two variables that take account of the chance of an event happening are N (number of observations) and λ (lambda – our hit rate/chance of occurrence in a single interval). When we talk about a cumulative binomial probability distribution, we mean to say that the greater the number of trials, the higher the overall probability of an event occurring.
The purpose of the plyr and dplyr libraries in R is to manipulate data with ease.
As we’ve seen in a previous post, there are various methods of wrangling and summarising data in R. However, wouldn’t it be great if we had some libraries that can greatly simplify this process for us?
ARIMA (Autoregressive Integrated Moving Average) is a major tool used in time series analysis to attempt to forecast future values of a variable based on its present value. For this particular example, I use a stock price dataset of Johnson & Johnson (JNJ) from 2006-2016, and use the aforementioned model to conduct price forecasting on this time series.
PostgreSQL is a commonly used database language for creating and managing large amounts of data effectively.
Here, you will see how to:
- create a PostgreSQL database using the Linux terminal
- connect the PostgreSQL database to R using the “RpostgreSQL” library, and to Python using the “psycopg2” library
Functions are used to simplify a series of calculations.
For instance, let us suppose that there exists an array of numbers which we wish to add to another variable. Instead of carrying out separate calculations for each number in the array, it would be much easier to simply create a function that does this for us automatically.