U-SQL 入門⑥ ~ 集約関数 ~

U-SQL にはビルトインの集約関数が提供されています。これらの集約関数は SELECT 文の中で使用できます。

集約関数の一覧

ANY_VALUE グループの中から任意の 1 つの値を取得する
ARRAY_AGG グループごとに SQL.ARRAY 値を生成する
AVG 平均
COUNT 個数のカウント
MAP_AGG Key, Value 値を取り SQL.MAP 値を生成する
MAX 最大値
MIN 最小値
SUM 合計
STDEV 標本標準偏差
STDEVP 標準偏差
VAR 標本分散
VARP 母分散

集約関数の実行例

@rs =
	SELECT
		Region,
		COUNT() AS NumSessions,
		SUM(Duration) AS TotalDuration,
		AVG(Duration) AS AvgDuration,
		MAX(Duration) AS MaxDuration,
		MIN(Duration) AS MinDuration,
		STDEV(Duration) AS StdevDuration,
		STDEVP(Duration) AS StdevPDuration,
		VAR(Duration) AS VarianceDuration,
		VARP(Duration) AS VariancePDuration
	FROM @searchlog
	GROUP BY Region;

その他集約関数

ANY_VALUE

ANY_VALUE 関数はグループの中から任意の 1 つの値を取得します。
以下ドキュメントにあるサンプルクエリを実行した結果です。
https://msdn.microsoft.com/en-us/azure/data-lake-analytics/u-sql/any-value-u-sql
クエリ

@employees = 
 SELECT * FROM ( VALUES
     (1, "Noah",   "Engineering", 100, 10000),
     (2, "Sophia", "Engineering", 100, 20000),
     (3, "Liam",   "Engineering", 100, 30000),
     (4, "Emma",   "HR",          200, 10000),
     (5, "Jacob",  "HR",          200, 10000),
     (6, "Olivia", "HR",          200, 10000),
     (7, "Mason",  "Executive",   300, 50000),
     (8, "Ava",    "Marketing",   400, 15000),
     (9, "Ethan",  "Marketing",   400, 10000) )
 AS T(EmpID, EmpName, DeptName, DeptID, Salary);

@result =
    SELECT DeptName,
           ANY_VALUE(EmpName) AS ArbitraryEmployee
    FROM @employees
    GROUP BY DeptName; 

OUTPUT @result
TO "/Output/ReferenceGuide/any_value/exampleB.csv"
USING Outputters.Csv(outputHeader: true);

実行結果

DeptName ArbitraryEmployee
Engineering Noah
Executive Mason
HR Emma
Marketing Ava
ARRAY_AGG

ARRAY_AGG 関数はグループごとに SQL.ARRAY 値を生成します。
以下ドキュメントにあるサンプルクエリを実行した結果です。
https://msdn.microsoft.com/en-us/azure/data-lake-analytics/u-sql/array-agg-u-sql
クエリ

@employees = 
    SELECT * FROM 
        ( VALUES
        ("Noah",   "cell:030-0074321"),
        ("Noah",   "office:030-0076545"),
        ("Sophia", "cell:(5) 555-4729"),
        ("Sophia", "office:(5) 555-3745"),
        ("Liam",   "cell:(5) 555-3932"),
        ("Amy",    "cell:(171) 555-7788"),
        ("Amy",    "office:(171) 555-6750"), 
        ("Amy",    "home:(425) 555-6238"),
        ("Justin", "cell:0921-12 34 65"),
        ("Justin", "office:0921-12 34 67"),
        ("Emma",   (string)null),
        ("Jacob",  ""),
        ("Olivia", "cell:88.60.15.31"),
        ("Olivia", "office:88.60.15.32"),
        ("Mason",  "cell:(91) 555 22 82"),
        ("Mason",  "office:(91) 555 91 99"), 
        ("Mason",  "home:(425) 555-2819"),
        ("Ava",    "cell:91.24.45.40"),
        ("Ava",    "office:91.24.45.41"),
        ("Ethan",  "cell:(604) 555-4729"),
        ("Ethan",  "office:(604) 555-3745"),
        ("David",  "cell:(171) 555-1212"),
        ("Andrew", "cell:(1) 135-5555"),
        ("Andrew", "office:(1) 135-4892"),
        ("Jennie", "cell:(5) 555-3392"),
        ("Jennie", "office:(5) 555-7293")
        ) AS T(EmpName, PhoneNumber);

@result =
    SELECT EmpName,
           string.Join(", ", ARRAY_AGG(PhoneNumber)) AS PhoneNumbers
    FROM @employees
    WHERE !string.IsNullOrEmpty(PhoneNumber)
    GROUP BY EmpName;

OUTPUT @result
TO "/Output/ReferenceGuide/Aggregate/array_agg/exampleA.csv"
USING Outputters.Csv(outputHeader: true);

実行結果

EmpName PhoneNumbers
Amy cell:(171) 555-7788, office:(171) 555-6750, home:(425) 555-6238
Andrew office:(1) 135-4892, cell:(1) 135-5555
Ava cell:91.24.45.40, office:91.24.45.41
David cell:(171) 555-1212
Ethan cell:(604) 555-4729, office:(604) 555-3745
Jennie office:(5) 555-7293, cell:(5) 555-3392
Justin cell:0921-12 34 65, office:0921-12 34 67
Liam cell:(5) 555-3932
Mason cell:(91) 555 22 82, office:(91) 555 91 99, home:(425) 555-2819
Noah cell:030-0074321, office:030-0076545
Olivia office:88.60.15.32, cell:88.60.15.31
Sophia office:(5) 555-3745, cell:(5) 555-4729
MAP_AGG

MAP_AGG 関数は Key, Value 値を取り SQL.MAP 値を生成します。
以下ドキュメントにあるサンプルクエリを実行した結果です。
https://msdn.microsoft.com/en-us/azure/data-lake-analytics/u-sql/map-agg-u-sql
クエリ

@employees = 
    SELECT * FROM 
        ( VALUES
        ("Noah",   "cell",   "030-0074321"),
        ("Noah",   "office", "030-0076545"),
        ("Sophia", "cell",   "(5) 555-4729"),
        ("Sophia", "office", "(5) 555-3745"),
        ("Liam",   "cell",   "(5) 555-3932"),
        ("Amy",    "cell",   "(171) 555-7788"),
        ("Amy",    "office", "(171) 555-6750"), 
        ("Amy",    "home",   "(425) 555-6238"),
        ("Justin", "cell",   "0921-12 34 65"),
        ("Justin", "office", "0921-12 34 67"),
        ("Emma",   (string)null, (string)null),
        ("Jacob",  "", ""),
        ("Olivia", "cell",   "88.60.15.31"),
        ("Olivia", "office", "88.60.15.32"),
        ("Mason",  "cell",   "(91) 555 22 82"),
        ("Mason",  "office", "(91) 555 91 99"), 
        ("Mason",  "home",   "(425) 555-2819"),
        ("Ava",    "cell",   "91.24.45.40"),
        ("Ava",    "office", "91.24.45.41"),
        ("Ethan",  "cell",   "(604) 555-4729"),
        ("Ethan",  "office", "(604) 555-3745"),
        ("David",  "cell",   "(171) 555-1212"),
        ("Andrew", "cell",   "(1) 135-5555"),
        ("Andrew", "office", "(1) 135-4892"),
        ("Jennie", "cell",   "(5) 555-3392"),
        ("Jennie", "office", "(5) 555-7293")
        ) AS T(EmpName, PhoneType, PhoneNumber);

@result =
    SELECT EmpName,
           String.Join(",", MAP_AGG(PhoneType, PhoneNumber).Select(p => String.Format("{0}:{1}", p.Key, p.Value))).Trim() AS PhoneNumbers
    FROM @employees
    WHERE !string.IsNullOrWhiteSpace(PhoneType)
    GROUP BY EmpName;

OUTPUT @result
TO "/Output/ReferenceGuide/Aggregate/map_agg/exampleA.csv"
USING Outputters.Csv(outputHeader: true);

実行結果

EmpName PhoneNumbers
Amy cell:(171) 555-7788,home:(425) 555-6238,office:(171) 555-6750
Andrew cell:(1) 135-5555,office:(1) 135-4892
Ava cell:91.24.45.40,office:91.24.45.41
David cell:(171) 555-1212
Ethan cell:(604) 555-4729,office:(604) 555-3745
Jennie cell:(5) 555-3392,office:(5) 555-7293
Justin cell:0921-12 34 65,office:0921-12 34 67
Liam cell:(5) 555-3932
Mason cell:(91) 555 22 82,home:(425) 555-2819,office:(91) 555 91 99
Noah cell:030-0074321,office:030-0076545
Olivia cell:88.60.15.31,office:88.60.15.32
Sophia cell:(5) 555-4729,office:(5) 555-3745

この記事はこちらの公式ドキュメントを参考にしたものです。
https://msdn.microsoft.com/en-us/azure/data-lake-analytics/u-sql/aggregate-functions-u-sql