24 Jan 21

Example of prediction (forecasting) recipe

Let us assume we want to predict a country’s Gross National Income (GNI) in dollars. What kind of data do we need for the algorithm to work properly (i.e. to identify patterns) and make a prediction?

In the CSV (or Excel) file we created, we selected certain agricultural products to help us predict a country’s GNI. We considered that the production of goods is related to GNI and we attempted to build a model based on that. We did not know in advance whether a particular product would be directly related to GNI and whether it would play a greater role in forecasting.

Steps:

1. Go to the Alpha + Omega (A+Ω) page and select the forecasting recipe (after registering on the platform).

This image has an empty alt attribute; its file name is image-8-1170x541.png — We add at the blank spaces the values we want our model to take into account for the prediction.

2. Create a project.

3. Upload the CSV/Excel file.

4. See all file data on A+Ω.

5. Click next and select the column for which you want to make a prediction. Let’s suppose you want to predict the GNI for the year 2022. Your file does not contain data on 2022.

6. Insert hypothetical estimates for data in the remaining columns of the file for the year 2022.

7. Click on predict Gross National Income in dollars to view the result of the forecast.

Factors affecting the quality of the model and, subsequently, the result of the forecast include:

• The number of instances. The previous section listed very few (only 6) examples, however, if these 6 are found to be representative of the “truth” then the model will make correct predictions.

• The number of Features (the columns in our file). The more, the better.

• The type of Features. At least some of the Features in our file need to be representative of the “truth”. We usually do not know which ones they are; the Α+Ω recipe identifies them automatically.

• The selection of our file’s Features in relation to what we want to predict. For example, it seems reasonable to assume that grain production is more closely related to a country’s GNI than the annual number of marriages in a country.

•Predictions are not easy to make; they require experimentation and experience. Perhaps the most important aspect of forecasting is data modeling (the type of data we choose as Features/columns in a file):

• The number of columns needed: 2-3 columns will usually not suffice unless we are dealing with a very simple forecast. For example, if we want to predict the number of deaths caused by a pandemic, the number of cases and deaths reported in the previous days is not enough.

• The way data is linked to columns: We need columns that inform us about deaths that occurred in the previous days, e.g. “number of deaths on the previous day, number of intubations on the previous day, number of cases on the previous day,” etc. In other words, we need to adopt a time series approach. – And of course, the more data we have, the better.

If the result of our forecast still troubles us, then:

• We try to add extra Features.

• We add more instances/rows

• or we decide that we cannot make a satisfactory forecast based on the data we have at our disposal, which means we may have to change our topic or seek help from a specialist. The Α+Ω team assists users in modeling data and the topic under investigation efficiently.

Try our free data journalism tools

Try Α + Ω

Do you have a new data journalism project idea but you don’t know from where to start?

Back to: Data Journalism