21 Apr 20

Most common mistakes with excel / csv files

When using Alpha+Omega application in order to analyze data:

  1. Make sure you upload excel or csv files (NO tsv or other type of files)
  2. For csv files, make sure, all values are separated with commas (not dots or semicolons)
  3. Make sure your file has values in all cells. In case you do not know a number or the cell in the file you downloaded is empty, just put 0. If it’s text (non numeric data, put: NOT APPLICABLE (or unknown). In case the file has empty cells, A+Ω will not be able to analyze properly your data and give you back a reliable result.
  4. Your excel file need to have only one sheet, not more.
  5. The description of columns should only be in the first row of the file.

A+Ω will work with this kind files

Image for post

A+Ω will NOT work with this kind files:

Image for post

6. In case all the above conditions are met but you still get an error and A+Ω can not proceed with the analysis, check again your csv / excel. Values might be mistakenly separated in two rows instead of one. Best way to check if your file meets the above conditions, is to open it with Google sheets. It will be easier to go over the file and spot a mistake in the cells.

Image for post

7. Make sure all numerical data of a column that include commas or dots are written in the same way and NOT: 9,821.41 and on the cell below: 4.789,5

8. Make sure your dataset contains a maximum of 10.000 lines (records)and a maximum of 100 columns.

9. It may happen that when you upload an excel file in Α+Ω, you may see the formula of calculation and not the result.

Image for post

In that case, it is better to save your file as csv. For the moment we do not support parsing of formulas. But, by saving the file as csv, the excel formula is also transformed.

Be careful! Sometimes, although you choose to save an excel file as csv, when you open it again, you realize that it is not saved as csv (with commas but with semicolons, which is another type of file). Best way to save an excel file to csv is through Google Spreadsheets.

10. Α+Ω can not make a correlation between numbers and words. When you use the correlation recipe, try to correlate same type of values.

11. A journalist using the recipes, needs to understand that if she/he wants a reliable result, needs to have as many data as possible. But again, even if the journalist finds the data and runs the recipe again, it is not necessary that the recipe will work better. Strange? No! It is exactly the same thing as if a journalist has dedicated a lot of time to do an investigation and has NOT found the desired evidence / results. Then, needs to decide whether to look further for more sources or to stop. If he/she decides to search more evidence, he is no longer sure, he will find something. But if he does not search further, he will certainly not find anything.

In order to have a reliable result especially for the prediction model, you need to have an excel / csv with a lot of data, that is to say, at least 100 rows with information. Why? Imagine the algorithm of the prediction recipe as a cook:

When you give good ingredients (data) to the cook (the algorithm of the prediction recipe of A+Ω), in order to make a delicious food (have a good prediction result), a prerequisite is that the cook has cooked many times in his life (these are the rows in your file, that is, each row is a food once made and how tasty it was). Each dish needs some ingredients and in the end it will be up to someone who tastes the food, to say how delicious it was. Now, if you give to two cooks (two prediction recipes) and the first cook has cooked 100 foods in his life and the second only ten foods in his life, then imagine, who is more likely to prepare the best food?

Back to: Data Journalism