I want to look into Statistic starting from zero again and add check how code it with python pandas and graph it,
So the basics are Mean, Median and Mode.
Mean (Average)
The mean helps us find a central tendency of a data set by providing a single value that summarizes the typical value of the data set. It can also help compare two data sets (for example, comparing the average income of two countries).
Problems with the Mean
One of the main problems is that if one term in our dataset is very different from the rest, it can give us a wrong idea.
Example:
Let's say we are checking the number of episodes in anime. We get the following data set of 10 random anime:
12, 12, 13, 12, 12, 12, 13, 13, 12, 12
The average is 12 episodes per anime.
However, if we add one piece of data to the list:
12, 12, 13, 12, 12, 12, 13, 13, 12, 12, 1049
The average becomes 106 episodes per anime.
With one outlying data point, we can get the wrong idea of the data set.
Median
The median is the middle value of a data set, but there are two cases to consider:
If there is an odd number of terms, the median is the middle value of a sorted list of numbers.
If there is an even number of terms, we find the two middle values and get the average of those two values.
Example
Odd number of terms:
12, 12, 13, 12, 12, 12, 13, 13, 12, 12, 1049. In this case, the median is 12.
Even number of terms:
12, 12, 13, 12, 12, 12, 13, 13, 12, 12. In this case, we have two numbers in the middle, 12 and 12.
Therefore, the median is 12.
Mode
The mode is the value that appears the most in the list
Example
12, 12,13,12,12,12,13,13,12,12,1049
if we make frequency table
12 is repeated 7 time so that is our mode
This are the first 3 concepts, in the next post I will look into python and pandas to explore a data set.