Imagine trying to make decisions based on a massive dataset with thousands of numbers. Without a way to break down and summarize that data, it is nearly impossible to draw meaningful insights. According to a report from Statista, over 149 zettabytes of data was generated in 2024 and it’s expected to reach 394 zettabytes by 2028.
Surprisingly, much of the generated data goes unorganized and under analyzed. This is where descriptive statistics comes in, offering a powerful method to organize, summarize, and present data in a way that makes sense. In this article, we will explore what descriptive statistics is, discuss its different types, and highlight its applications to give you a better understanding of its practical use.
What is Descriptive Statistics?
Descriptive statistics is a branch of statistics that involves summarizing, organizing, and presenting data meaningfully and concisely. It focuses on describing and analyzing a dataset’s main features and characteristics without making any generalizations or inferences about a larger population.
The primary goal of descriptive analysis is to provide a clear and concise summary of the data, enabling researchers or analysts to gain insights and understand patterns, trends, and distributions within the dataset. This summary typically includes measures such as central tendency (e.g., mean, median, mode), dispersion (e.g., range, variance, standard deviation), and shape of the distribution (e.g., skewness, kurtosis).
Descriptive statistics also involves a graphical representation of data through charts, graphs, and tables, which can further aid in visualizing and interpreting the information. Common graphical techniques include histograms, bar charts, pie charts, scatter plots, and box plots.
By employing descriptive statistics, researchers can effectively summarize and communicate the key characteristics of a dataset, facilitating a better understanding of the data and providing a foundation for further statistical analysis or decision-making processes.
Visual Ways to Represent Descriptive Statistics
Descriptive statistics can be made easier to understand by using visual tools. These methods help in spotting trends, patterns, and distributions within the data. Two common ways to represent data visually are:
-
Frequency Distribution Tables
Frequency distribution tables show how often particular values or ranges of values occur in a dataset. These can range from simple tables that provide a summary of individual values with their corresponding frequencies, or grouped tables where individual values may be placed together to form combined intervals.
Graphs and charts are another useful way to present data visually. Graphs and charts can show percentages, frequencies, distributions, etc. Examples include bar charts, pie charts, and scatter plots, all of which simplify complex data for better understanding and analysis.
Enroll in the trending Data Scientist Masters Program, learn from experts in the field and become the highest paid data science professional. 🎯
Types of Descriptive Statistics
Data collected through observation, surveys, or experiments is initially in its raw form—this is referred to as ungrouped data. Once this raw data is organized into intervals or categories, it becomes grouped data.
To analyze such data, two primary types of descriptive statistics are used: measures of central tendency and measures of dispersion. These methods identify overall trends and reveal how values are distributed.
1. Measures of Central Tendency
Measures of Central Tendency describe where the center or average of a dataset lies. The three key measures are mean, median, and mode.
Mean
The mean or average is calculated by summing all data points and dividing by the total number of values.
For ungrouped data, the formula is:
Mean (𝑥̄) = Σx / n
Where:
- Σx = sum of all data values
- n = total number of values
For grouped data, where frequencies are involved:
Mean (𝑥̄) = Σ(f × x) / Σf
Where:
- f = frequency of each class
- x = midpoint of each class
- Σf = total frequency
This gives a more accurate average when data is grouped in intervals.
Median
The median is the middle value in a sorted dataset.
For ungrouped data:
- If n is odd: Median = value at position (n + 1) / 2
- If n is even: Median = average of values at positions n/2 and (n/2 + 1)
For grouped data, the formula is:
Median = L + [(N/2 – CF) / f] × h
Where:
- L = lower boundary of median class
- N = total frequency
- CF = cumulative frequency before the median class
- f = frequency of median class
- h = class width
This helps estimate the central position within grouped data.
Mode
The mode is the value that appears most frequently. For ungrouped data, it’s simply the most repeated number.
In grouped data:
Mode = L + [(f₁ – f₀) / (2f₁ – f₀ – f₂)] × h
Where:
- L = lower boundary of modal class
- f₁ = frequency of modal class
- f₀ = frequency of class before modal class
- f₂ = frequency of class after modal class
- h = class width
This formula helps locate the class with the highest concentration of values.
Enroll in our trending Data Scientist Masters Program and turn data into your career’s most powerful asset! 🎯
2. Measures of Dispersion
These explain how spread out the data values are. Averages alone can be misleading, so dispersion shows how much values vary from the center.
The range is the simplest form of dispersion and is calculated as:
Range = Maximum value – Minimum value
While useful for a quick glance, it only considers the extremes.
Variance measures the average squared deviation from the mean. For ungrouped data:
Population Variance (σ²) = Σ(x – 𝑥̄)² / n
Sample Variance (s²) = Σ(x – 𝑥̄)² / (n – 1)
For grouped data:
Variance = Σf(x – 𝑥̄)² / Σf
This tells how tightly values cluster around the mean.
Standard deviation is the square root of the variance, making it easier to interpret because it returns to the original units:
Standard Deviation (σ or s) = √Variance
A smaller standard deviation indicates values are close to the mean, while a larger one shows they are more spread out.
This shows the average of absolute deviations from a central value (like the mean or median).
For ungrouped data:
Mean Deviation = Σ|x – A| / n
Where A is usually the mean, median, or mode.
For grouped data:
Mean Deviation = Σf|x – A| / Σf
It’s less sensitive to extreme values than variance.
Also known as the semi-interquartile range, it focuses on the spread of the middle 50% of data:
Quartile Deviation = (Q₃ – Q₁) / 2
Where Q₁ = first quartile and Q₃ = third quartile
It’s useful when ignoring outliers is necessary for a more reliable analysis.
Descriptive Statistics Examples
Here are a few examples of descriptive statistics to help you gain a clearer understanding now that you’re familiar with the types of descriptive statistics.
Example 1:
Exam Scores Suppose you have the following scores of 20 students on an exam:
85, 90, 75, 92, 88, 79, 83, 95, 87, 91, 78, 86, 89, 94, 82, 80, 84, 93, 88, 81
To calculate descriptive statistics:
- Mean: Add up all the scores and divide by the number of scores. Mean = (85 + 90 + 75 + 92 + 88 + 79 + 83 + 95 + 87 + 91 + 78 + 86 + 89 + 94 + 82 + 80 + 84 + 93 + 88 + 81) / 20 = 1770 / 20 = 88.5
- Median: Arrange the scores in ascending order and find the middle value. Median = 86 (middle value)
- Mode: Identify the score(s) that appear(s) most frequently. Mode = 88
- Range: Calculate the difference between the highest and lowest scores. Range = 95 – 75 = 20
- Variance: Calculate the average of the squared differences from the mean. Variance = [(85-88.5)^2 + (90-88.5)^2 + … + (81-88.5)^2] / 20 = 33.25
- Standard Deviation: Take the square root of the variance. Standard Deviation = √33.25 = 5.77
Example 2:
Monthly Income Consider a sample of 50 individuals and their monthly incomes:
$2,500, $3,000, $3,200, $4,000, $2,800, $3,500, $4,500, $3,200, $3,800, $3,500, $2,800, $4,200, $3,900, $3,600, $3,000, $2,700, $2,900, $3,700, $3,500, $3,200, $3,600, $4,300, $4,100, $3,800, $3,600, $2,500, $4,200, $4,200, $3,400, $3,300, $3,800, $3,900, $3,500, $2,800, $4,100, $3,200, $3,600, $4,000, $3,700, $3,000, $3,100, $2,900, $3,400, $3,800, $4,000, $3,300, $3,100, $3,200, $4,200, $3,400
To calculate descriptive statistics:
- Mean: Add up all the incomes and divide by the number of incomes. Mean = ($2,500 + $3,000 + … + $3,400) / 50 = $166,200 / 50 = $3,324
- Median: Arrange the incomes in ascending order and find the middle value. Median = $3,400 (middle value)
- Range: Calculate the difference between the highest and lowest incomes. Range = $4,500 – $2,500 = $2,000
- Variance: Calculate the average of the squared differences from the mean. Variance = [($2,500-$3,324)^2 + ($3,000-$3,324)^2 + … + ($3,400-$3,324)^2] / 50 = $221,684,000 / 50 = $4,433,680
- Standard Deviation: Take the square root of the variance. Standard Deviation = √$4,433,680 = $2,105.18
These calculations provide descriptive statistics that summarize the central tendency, dispersion, and shape of the data in these examples.
Univariate vs. Bivariate Statistics
Let’s learn more about the two forms of descriptive statistics:
Univariate Descriptive Statistics
Univariate descriptive statistics examine only one variable at a time and do not compare variables. Rather, it allows the researcher to describe individual variables. The patterns identified in this sort of data may be explained using the following:
- Measures of central tendency (mean, mode, and median)
- Data dispersion (standard deviation, variance, range, minimum, maximum, and quartiles) (standard deviation, variance, range, minimum, maximum, and quartiles)
- Tables of frequency distribution
- Pie graphs
- Frequency polygon histograms
- Bar graphs
Bivariate Descriptive Statistics
When using bivariate descriptive statistics, two variables are concurrently analyzed (compared) to see whether they are correlated. Generally, by convention, the independent variable is represented by the columns, and the rows represent the dependent variable.’
There are numerous real-world applications for bivariate data. For example, estimating when a natural occurrence will occur is quite valuable. Bivariate data analysis is a tool in the statistician’s toolbox. Sometimes, something as simple as projecting one parameter against the other on a Two-dimensional plane can better understand what the information is trying to convince you. For example, the scatterplot below demonstrates the link between the period between eruptions at Old Faithful and the eruption’s duration.
What’s the Difference Between Descriptive Statistics and Inferential Statistics?
So, what’s the difference between the two statistical forms? We’ve already touched upon this when we mentioned that descriptive statistics doesn’t infer any conclusions or predictions, which implies that inferential statistics do so.
Inferential statistics takes a random sample of data from a portion of the population and describes and makes inferences about the entire population. For instance, in asking 50 people if they liked the movie they had just seen, inferential statistics would build on that and assume that those results would hold for the rest of the moviegoing population in general.
Therefore, if you stood outside that movie theater and surveyed 50 people who had just seen Rocky 20: Enough Already! and 38 of them disliked it (about 76 percent), you could extrapolate that 76% of the rest of the movie-watching world will dislike it too, even though you haven’t the means, time, and opportunity to ask all those people.
Simply put: Descriptive statistics give you a clear picture of what your current data shows. Inferential statistics makes projections based on that data.
Detailed Read: Comprehensive Guide on Descriptive vs Inferential Statistics 📖
Conclusion
Descriptive statistics is one of the easiest and most effective ways to get a quick overview of any dataset. It doesn’t predict or explain the reasons behind the numbers, it simply helps you describe what the data shows. Whether you’re working in business, education, healthcare, or any other field, descriptive analysis plays a key role in helping you analyze and communicate data clearly.
If you’re looking to build deeper skills in data science and analytics, the Data Scientist Program by Simplilearn is a great next step. It covers not only statistics but also machine learning, data science tools, and more, all designed to prepare you for different roles in data science.
Frequently Asked Questions
1. What is the main purpose of descriptive statistics?
The primary objective of descriptive statistics is to effectively summarize and describe the main features of a dataset, providing an overview of the data and helping to identify patterns and relationships within it.
2. Can Descriptive Statistics be used to make inferences or predictions?
Descriptive statistics do not involve making inferences or predictions beyond the data itself. Statistical inference methods are needed to make inferences or predictions about a larger population, which go beyond descriptive statistics and involve estimating parameters and testing hypotheses.
3. Why is descriptive statistics important?
Descriptive statistics is important because it allows us to summarize and describe data meaningfully. It helps us understand a dataset’s main features and characteristics, identify patterns and trends, and gain insights from the data.
4. What are the limitations of descriptive statistics?
Descriptive statistics only sums the data without making future predictions. Additionally, if the data set contains errors like outliers, the description may give an incorrect summary.