Power BI Table Tips : Why VALUES is More Critical Than DISTINCT? The Starting Point of DAX Context Thinking and Virtual Table Design

When you begin to delve deeper into Power BI, an inevitable question arises at least once:

"Aren't both VALUES and DISTINCT simply used for removing duplicates?"

"The results look practically identical, so why does everyone use VALUES as the default?"

"Honestly, at first glance, using DISTINCT doesn't seem to cause any major issues, does it?"

In fact, when I first encountered DAX, I did not noticeably perceive the difference between the two either. This is because in simple sample datasets, the outputs appear almost identical. However, the moment real-world production data is introduced, the narrative changes entirely. Especially when:

  • Discrepancies occur between ERP and online mall product codes,
  • The registration of a new product SKU is delayed, or
  • Data from external channels arrives ahead of schedule.

At these exact moments, the distinction between VALUES and DISTINCT ceases to be a mere grammatical variance; it transforms into a fundamental issue of "how accurately the report reflects reality." In this article, based on an actual real-world scenario of a "missing master record," we will examine one by one why seasoned DAX practitioners utilize VALUES as the de facto default standard.

1. Example Data

1) Product Master Table (d_Product)

Currently, there are three registered products.

ProductKey ProductName Category Price
101Apple Vision ProElectronics3,500
102Beats Studio ProElectronics350
201Patagonia TorrentshellApparel180

2) Sales Table (f_Sales)

To simulate a production environment, an example has been constructed where the referential integrity of the data model is broken. The final row contains data with a ProductKey of 999, which does not exist in the Product Master (d_Product).

Date ProductKey Store SalesAmount
2025-05-10102New York350
2025-12-25201Los Angeles180
2026-05-01101San Francisco3,500
2026-05-02201Chicago360
2026-05-20999Amazon US100,000,000

The item with ProductKey = 999 generated actual revenue, but it remains unregistered in the master table. In production environments, this happens significantly more often than one might expect. For instance:

  • A product is created in the online storefront first, but its ERP registration is delayed,
  • The SKU architecture of an international channel differs from the headquarters' code base, or
  • A data synchronization batch job fails.

In actual retail or consumer goods fields, it is quite common to see tens of millions or even hundreds of millions of won in revenue concentrated within these Blank rows. While it initially looks like a simple data error, this exact scenario demonstrates the difference between VALUES and DISTINCT most dramatically.

3) Modeling

Data modeling : Product and sales

2. Filter Context and Evaluation Mechanism Simulation

When these two functions generate a virtual table, the phenomena that occur inside the data model contrast sharply as follows:

1) VALUES('d_Product'[Category]) Evaluation Mechanism

Because a ProductKey = 999 exists in the sales table but is missing from the master, the DAX engine detects that the referential integrity of the relationship is broken and automatically injects a (Blank) row into the virtual table.


VALUES('d_Product'[Category])

  • • Output Result: {"Electronics", "Apparel", (Blank)}
  • • Why does a Blank appear?
    In the 2026 sales data, ProductKey = 999 exists. However, it is absent from the product master. Consequently, DAX reasons: "Sales exist, but there is no matching product information. Since I must not hide data, I should preserve it by routing it to a Blank region." Therefore, it automatically generates a (Blank) row. This mechanism is profoundly critical.
  • Key Characteristic: VALUES is not a primitive deduplication function. It tracks the current filter context and the relationships between tables, serving as a 'mirror reflecting the reality of the data model' by preserving unmapped, omitted data.

2) DISTINCT('d_Product'[Category]) Evaluation Mechanism

In this case, highly dangerous distortions can occur depending on the scenario. Specifically, based on the internal measure architecture:

  • Omitted sales might be entirely excluded, or
  • Data might vanish from specific Total calculations altogether.

What makes this more perilous is that the report appears entirely normal on the surface.

  • • Output Result: {"Electronics", "Apparel"}
  • Fatal Pitfall: It completely ignores the existence of the $100,000,000 revenue belonging to ProductKey = 999. It severs the relationship between tables and the flow of real-world data, performing a static operation relying solely on the definitions within the master table.

3. Side Effect Analysis in Production Measures

When writing an iterator function (SUMX) to calculate the sum of sales by category using the virtual table as a backbone, the report results distort based on the function chosen:

Side Effect Analysis in Production Measures : Values


Side Effect Analysis in Production Measures : Distinct


1) Results Using VALUES (The Standard Report)


Results Using VALUES (The Standard Report)


VALUES does not conceal the data with broken integrity; instead, it explicitly displays the $100,000,000 as a (Blank) row on the report layout.

table result : Category sales using VALUES DAX


Analyst-Perspective Advantage: "By confirming that $100,000,000 in revenue is aggregated under a (Blank) row in the report, the analyst immediately recognizes that an omission occurred during the new product master registration process, gaining an early opportunity to correct the system error."

2) Results Using DISTINCT (Distorted Report: Silent Distortion)


Results Using DISTINCT (Distorted Report: Silent Distortion)


If DISTINCT constructs the virtual table framework inside SUMX, a 'Silent Distortion' phenomenon occurs where data completely vanishes depending on the nature of the internal measure.


table result : Category sales using DISTINCT DAX


  • Phenomenon: The sum of the internal rows and the bottom total mathematically align perfectly. (e.g., Apparel(540) + Electronics(3,850) = Total(4,390))
  • Risk: Because the row sums and the total reconcile seamlessly on the surface of the report, it is easy to mistakenly assume the formula is accurate. However, the $100,000,000 that should be included in the total revenue has completely evaporated from the entire report. This is the most dangerous type of distortion because it entirely masks the data integrity error, preventing decision-makers from realizing an omission has even occurred.

(※ Note: If CALCULATE or filter-removal functions become complexly entangled within internal measures, this can escalate into a 'discrepant report' where the sum of the internal rows and the bottom total tear apart from each other.)

4. The Core Essence a Modeler Must Remember

The difference between these two functions is not a matter of syntactic preference or style. It is a fundamental difference in the philosophy of treating a data model.

1) VALUES: "The Reality of Live, Breathing Data"

VALUES thoroughly respects the currently active filter context, tracks the relationships between tables, and preserves unmapped, omitted data. In other words, it is a mirror that directly reflects the 'reality' of the data model as viewed by the DAX engine.

2) DISTINCT: "A Static Dictionary of Columns Independent of Reality"

DISTINCT holds zero interest in filters or relationship integrity. It mechanically extracts only the unique list of values written within the designated column. It is a static function that relies solely on the definition (dictionary) of the source table rather than the flow of reality data.

5. Production Judgment Guide and Summary

1) Begin Most Virtual Table Frameworks with VALUES

When establishing a 'row-based' foundation or performing iterative calculations for a virtual table in production, you should engrave VALUES into your mind as the absolute default to remain safe.

  • When performing category-by-category operations with X-derived iterator functions such as SUMX or AVERAGEX.
  • When providing a virtual row list as the first argument of a FILTER function.
  • When designing sophisticated summary tables using ADDCOLUMNS or SUMMARIZE.

Standard Production Pattern:

SUMX(

    VALUES('d_Product'[Category]),

    [SalesAmount]

)

2) When Exactly Do We Use DISTINCT?

DISTINCT is selected intentionally when you need a pure list of domain values while deliberately blocking the data model's relationship integrity or the current filter flow.

  • When independently creating a pure unique list of master codes for use in a slicer table.
  • When you want to purely count the number of unique values existing in a specific column (DISTINCTCOUNT), completely independent of the relationship between two tables.

Wrapping up

Choosing between VALUES and DISTINCT is not a trivial syntax preference. It is a philosophical choice: "Will you let the DAX engine reflect the reality of the current model, or will you force it to see an isolated column?"

  • VALUES: Observes reality data. (Preserves the current filter, table relationships, omitted data, and data integrity)
  • DISTINCT: Observes the column in isolation. (Simple deduplication, a static dictionary disconnected from the model's reality)

Cultivating the habit of thinking of VALUES before DISTINCT when designing virtual tables is a critical cognitive pivot. This small shift in perspective is a shortcut to becoming an expert, allowing you to instantly pierce through the reasons why totals distort, why filter propagation breaks, and why virtual table lineage matters.

```

Comments

Popular posts from this blog

DAX CALENDAR Function Deep Dive and Practical Usage Guide

Standard Deviation (Part 2): Strategic Limitations and Complementary Perspectives in Standard Deviation Analysis

Standard Deviation (Part 1): Measuring Data Volatility and Using the Insights for Better Strategy