Power BI DAX Integrated Analysis to Fix the Average Trap (Part 1-1): Using MEDIANX and MODEX to Find the 'Truth the Mean Hides'
The first step in data analysis is usually calculating the Mean (Average). This simple, intuitive figure is commonly used as a benchmark for setting KPIs (Key Performance Indicators) or strategic planning. However, this average has a fatal flaw: the 'Average Trap.'
Because the mean is calculated by dividing the total sum of data by the count, its center can be significantly distorted by a few Outliers (extreme values). Especially in a business environment, a handful of VIP customers or one-off, huge transactions can unrealistically inflate the overall average, leading to the error of obscuring the realistic spending level of the majority of customers.
To correct this average distortion and find the true center of the data, the following five key analysis methodologies are used:
- Median and Mode: Used to diagnose distortion by outliers and grasp realistic purchasing power levels (The focus of this analysis).
- Trimmed Mean: Used to correct the center by removing outliers.
- Standard Deviation: Used to measure the data's variability and risk.
- Geometric Mean: Used to correct for analyzing compounded growth and rate data.
- Weighted Mean: Used to adjust data influence based on importance.
In this first part, we will use the MEDIANX (Median) and MODEX (Mode DAX Pattern) functions in Power BI to identify the data's point of balance and the purchasing power of the majority of customers—the truth hidden by the arithmetic mean—and find the strategic center through multiple columns.
Our first column will delve deeper into why the median should be used instead of the arithmetic mean, leveraging Power BI DAX.
(After reading this, you can check next contents below)
Power BI DAX Integrated Analysis to Fix the Average Trap (Part 1-2): Median-driven Segmentation Strategy using Power BI's Box and Whisker Chart
The data we'll use is the purchase amount (in thousands of KRW) recorded for 25 customers at a fashion retail store. This dataset is structured as follows:
Our goal is to visually confirm the impact of just one outlier on the overall average and clearly demonstrate why this mean is unsuitable as a KPI.
We will first calculate the overall arithmetic mean using a DAX measure and integrate it into the individual customer purchase chart for a visual comparison. This process takes place in the Power BI environment.
When you chart the individual customer purchase amounts, a filter context is applied per Customer ID. Therefore, to create a 'constant average line' that spans the entire chart without change, a DAX measure that ignores the filter context is necessary.
![Airthmetic Mean Purchase = AVERAGEX(All('Sales'), [Purchase Amount (K)] Airthmetic Mean Purchase = AVERAGEX(All('Sales'), [Purchase Amount (K)]](https://blogger.googleusercontent.com/img/a/AVvXsEiF2giEvxb1_LE9P4GmpzlJzrkRO-uKlyMEJD_i0oSwue10WrZada3CR56b4EOhXqIYC42SZNbAXDOakAz7WEZoRSCSun2wzqN2QfxPszbn1Wnfv7mxx5t_r2tP_2rbyLV4Di8s-OXbrT6zTln2GYMskfwteKga5xEXfNDiTO32iS7D0cdlgwfdFmkU8fZ5=w640-h135-rw)
This measure consistently returns the arithmetic mean for the entire table, regardless of the Customer ID filter context.
We will integrate the individual customer purchase amounts and the average line into a single chart using the arithmetic mean measure.
The chart shows a horizontal line representing 12.76K drawn above the individual customer bars. Through this chart, we can clearly identify the following average distortion phenomenon.
The chart clearly demonstrates the 'Average Trap,' where the mean, inflated to 12.76K by the single outlier of 80K, completely fails to reflect the purchasing level of the majority of customers.
In the previous analysis, we visually confirmed the 'Average Trap,' where the arithmetic mean was inflated to 12.76K by the 80K outlier, failing to reflect the reality of the majority of customers. Now, we use the Median to correct this distortion and find the data's true balance center.
The median is the value that lies exactly in the middle when the data is arranged in order of size. This value is completely unaffected by the magnitude of extreme values, ensuring that 50% of the total data is less than or equal to this value, and the other 50% is greater than or equal to it.
The DAX function used to find the median is MEDIANX. Like the arithmetic mean, it must be calculated over the entire table using the ALL function to avoid being affected by the chart's filter context.
This measure correctly returns 9K, which is the positional center of the data, excluding the extreme value of 80K.
Now, we will integrate both the arithmetic mean (12.76K) and the newly calculated median (9K)—two central tendency lines—into the individual customer purchase amount chart.
The two horizontal lines drawn on the chart, 12.76K and 9K, show the clear truth about where the data is centered.
Strategic Application: Setting a threshold based on the median is key to creating incentives that the majority of customers can feel are attainable. For example, setting a reward threshold at 9K and offering extra benefits for purchases above that can effectively encourage regular customers who spent 8K to increase their purchase to 9K, maximizing the sales conversion rate.
The median of 9K, found using the MEDIANX function, is the balance center where 50% of all customers' spending lies. This is the value that should be used as the basis for setting strategy.
We have thoroughly explored how to find and utilize the median with Power BI DAX. In the next article, we will further detail the use of MEDIANX and MODEX to find the "Truth the Mean Hides," specifically by utilizing the 'Box and Whisker with Points chart' for visualization, analyzing data by segment, and developing corresponding action plans.
A Comprehensive Exploration into Forecast Accuracy %
Standard Deviation (Part 1): Measuring Data Volatility and Using the Insights for Better Strategy
Standard Deviation (Part 2): Strategic Limitations and Complementary Perspectives in Standard Deviation Analysis
DAX CALENDAR Function Deep Dive and Practical Usage Guide
1. Example Data and Results
The data we'll use is the purchase amount (in thousands of KRW) recorded for 25 customers at a fashion retail store. This dataset is structured as follows:
- Table Name: 'Sales'
- Key Field: Customer ID, Purchase Amount (K)
Our goal is to visually confirm the impact of just one outlier on the overall average and clearly demonstrate why this mean is unsuitable as a KPI.
2. Analyzing Average Distortion
We will first calculate the overall arithmetic mean using a DAX measure and integrate it into the individual customer purchase chart for a visual comparison. This process takes place in the Power BI environment.
2.1. Measure to Calculate the Total Arithmetic Mean
When you chart the individual customer purchase amounts, a filter context is applied per Customer ID. Therefore, to create a 'constant average line' that spans the entire chart without change, a DAX measure that ignores the filter context is necessary.
- We use ALL('Sales') to ignore the current filter (e.g., Customer ID) and calculate the average of the Purchase Amount (K) for all rows in the 'Sales' table.
- The AVERAGEX function is used to iterate over the entire set of rows returned by ALL and calculate the average of the Purchase Amount (K).
This measure consistently returns the arithmetic mean for the entire table, regardless of the Customer ID filter context.
2.2. Power BI Visual Setup
We will integrate the individual customer purchase amounts and the average line into a single chart using the arithmetic mean measure.
- Chart Type: Combo Chart (Line and Clustered Column Chart).
- X-Axis: Assign the Customer ID field.
- Column Values (Bars): Assign Purchase Amount (K) (the actual individual customer purchase amounts).
- Line Values (Line): Assign the DAX measure defined above, Arithmetic Mean Purchase.
2.3. Visual Interpretation: Proving the 'Average Trap'
The chart shows a horizontal line representing 12.76K drawn above the individual customer bars. Through this chart, we can clearly identify the following average distortion phenomenon.
The chart clearly demonstrates the 'Average Trap,' where the mean, inflated to 12.76K by the single outlier of 80K, completely fails to reflect the purchasing level of the majority of customers.
- Unrealistic Average Line: The average line (12.76K) is positioned significantly higher than the bars for the most frequent customer segment. This visually proves that the realistic purchasing power level of the majority of customers falls short of the average target.
- Bias from Outlier: The VIP customer's 80K bar is overwhelmingly far from most other bars, indicating that this single extreme value is the main culprit pushing the average line (12.76K) away from the realistic data center (the median).
3. Median (MEDIANX) Analysis: Setting a Realistic Threshold for Reward Programs
In the previous analysis, we visually confirmed the 'Average Trap,' where the arithmetic mean was inflated to 12.76K by the 80K outlier, failing to reflect the reality of the majority of customers. Now, we use the Median to correct this distortion and find the data's true balance center.
The median is the value that lies exactly in the middle when the data is arranged in order of size. This value is completely unaffected by the magnitude of extreme values, ensuring that 50% of the total data is less than or equal to this value, and the other 50% is greater than or equal to it.
3.1. Defining the DAX Measure: Calculating the Median (MEDIANX)
The DAX function used to find the median is MEDIANX. Like the arithmetic mean, it must be calculated over the entire table using the ALL function to avoid being affected by the chart's filter context.
- The MEDIANX function returns the median of a table or column. It includes the iterator (X) capability to calculate the median of an expression within a table context.
- ALL('Sales') ensures the median is calculated for the entire 'Sales' table, ignoring the Customer ID filter context.
This measure correctly returns 9K, which is the positional center of the data, excluding the extreme value of 80K.
3.2. Power BI Visualization: Comparing Two Central Metrics (Mean vs. Median)
Now, we will integrate both the arithmetic mean (12.76K) and the newly calculated median (9K)—two central tendency lines—into the individual customer purchase amount chart.
- Visual Element: Use the Line and Clustered Column Chart.
- X-Axis: Customer ID.
- Column Values (Bars): Purchase Amount (K).
- Line Values (Lines):
- Arithmetic Mean Purchase
- Median Purchase <-- Newly added
3.3. Visual Interpretation: Identifying the True Central Tendency and Strategic Use
The two horizontal lines drawn on the chart, 12.76K and 9K, show the clear truth about where the data is centered.
- The Reality of the Median (9K): The median line is drawn in the area where the majority of customer bars (7K, 8K, 9K, etc.) are most concentrated. This means the median has successfully captured the data's true point of balance, free from the influence of the outlier, representing the actual central purchasing power level spent by 50% of all customers.
- Separation of Strategic Benchmarks: The large 3.76K gap between the arithmetic mean (12.76K) and the median (9K) clearly quantifies the extent of the average distortion. The larger this gap, the stronger the evidence that using the arithmetic mean as a KPI or strategic benchmark is risky.
- Setting the Reward Threshold: For strategic decision-making, we must use the median (9K) as a realistic benchmark instead of the unrealistic 12.76K. This value becomes the optimal threshold that the average customer can reasonably reach while being nudged to increase their purchasing power to the next level.
Strategic Application: Setting a threshold based on the median is key to creating incentives that the majority of customers can feel are attainable. For example, setting a reward threshold at 9K and offering extra benefits for purchases above that can effectively encourage regular customers who spent 8K to increase their purchase to 9K, maximizing the sales conversion rate.
The median of 9K, found using the MEDIANX function, is the balance center where 50% of all customers' spending lies. This is the value that should be used as the basis for setting strategy.
4. Wrapping Up
We have thoroughly explored how to find and utilize the median with Power BI DAX. In the next article, we will further detail the use of MEDIANX and MODEX to find the "Truth the Mean Hides," specifically by utilizing the 'Box and Whisker with Points chart' for visualization, analyzing data by segment, and developing corresponding action plans.
<Other posts on the blog>
The Hidden Hero of Data Analysis: The Mode (Part 1) : Moving Beyond the "Average Trap" to Read Your "Real Customers"
Power BI DAX Integrated Analysis to Fix the Average Trap (Part 1-2): Median-driven Segmentation Strategy using Power BI's Box and Whisker Chart
The Hidden Hero of Data Analysis: The Mode (Part 1) : Moving Beyond the "Average Trap" to Read Your "Real Customers"
Power BI DAX Integrated Analysis to Fix the Average Trap (Part 1-2): Median-driven Segmentation Strategy using Power BI's Box and Whisker Chart
A Comprehensive Exploration into Forecast Accuracy %
Standard Deviation (Part 1): Measuring Data Volatility and Using the Insights for Better Strategy
Standard Deviation (Part 2): Strategic Limitations and Complementary Perspectives in Standard Deviation Analysis
DAX CALENDAR Function Deep Dive and Practical Usage Guide
Comments
Post a Comment