Skip to main content

varSamp

This page contains information on the varSamp and varSampStable ClickHouse functions.

varSamp

Calculate the sample variance of a data set.

Syntax

varSamp(expr)

Parameters

  • expr: An expression representing the data set for which you want to calculate the sample variance. Expression

Returned value

Returns a Float64 value representing the sample variance of the input data set.

Implementation details

The varSamp() function calculates the sample variance using the following formula:

∑(x - mean(x))^2 / (n - 1)

Where:

  • x is each individual data point in the data set.
  • mean(x) is the arithmetic mean of the data set.
  • n is the number of data points in the data set.

The function assumes that the input data set represents a sample from a larger population. If you want to calculate the variance of the entire population (when you have the complete data set), you should use the varPop() function instead.

This function uses a numerically unstable algorithm. If you need numerical stability in calculations, use the slower but more stable varSampStable function.

Example

Query:

CREATE TABLE example_table
(
id UInt64,
value Float64
)
ENGINE = MergeTree
ORDER BY id;

INSERT INTO example_table VALUES (1, 10.5), (2, 12.3), (3, 9.8), (4, 11.2), (5, 10.7);

SELECT varSamp(value) FROM example_table;

Response:

0.8650000000000091

varSampStable

Calculate the sample variance of a data set using a numerically stable algorithm.

Syntax

varSampStable(expr)

Parameters

  • expr: An expression representing the data set for which you want to calculate the sample variance. Expression

Returned value

The varSampStable function returns a Float64 value representing the sample variance of the input data set.

Implementation details

The varSampStable function calculates the sample variance using the same formula as the varSamp function:

∑(x - mean(x))^2 / (n - 1)

Where:

  • x is each individual data point in the data set.
  • mean(x) is the arithmetic mean of the data set.
  • n is the number of data points in the data set.

The difference between varSampStable and varSamp is that varSampStable is designed to provide a more deterministic and stable result when dealing with floating-point arithmetic. It uses an algorithm that minimizes the accumulation of rounding errors, which can be particularly important when dealing with large data sets or data with a wide range of values.

Like varSamp, the varSampStable function assumes that the input data set represents a sample from a larger population. If you want to calculate the variance of the entire population (when you have the complete data set), you should use the varPopStable function instead.

Example

Query:

CREATE TABLE example_table
(
id UInt64,
value Float64
)
ENGINE = MergeTree
ORDER BY id;

INSERT INTO example_table VALUES (1, 10.5), (2, 12.3), (3, 9.8), (4, 11.2), (5, 10.7);

SELECT varSampStable(value) FROM example_table;

Response:

0.865

This query calculates the sample variance of the value column in the example_table using the varSampStable() function. The result shows that the sample variance of the values [10.5, 12.3, 9.8, 11.2, 10.7] is approximately 0.865, which may differ slightly from the result of varSamp due to the more precise handling of floating-point arithmetic.