U-SQL 入門⑩ ~ 結合 ~
U-SQL では以下の結合が利用可能です。すべての JOIN において行セットの変数を指定することも SELECT 文を記述することもできます。
- CROSS JOIN
- INNER JOIN
- OUTER JOIN
- SEMIJOIN
- ANTISEMIJOIN
CROSS JOIN
行セットのデカルト積を返します。
@employees = SELECT * FROM (VALUES ("Rafferty", (int?) 31) , ("Jones", (int?) 33) , ("Heisenberg", (int?) 33) , ("Robinson", (int?) 34) , ("Smith", (int?) 34) , ("Williams", (int?) null)) AS E(EmpName, DepID); @departments = SELECT * FROM (VALUES ((int) 31, "Sales") , ((int) 33, "Engineering") , ((int) 34, "Clerical") , ((int) 35, "Marketing")) AS D(DepID, DepName); @rs_cross = SELECT e.EmpName, e.DepID, d.DepName FROM @employees AS e CROSS JOIN @departments AS d WHERE ((d.DepName == "Engineering") || (d.DepName == "Marketing")); OUTPUT @rs_cross TO "/Output/ReferenceGuide/Joins/CrossJoins/ExampleA.csv" USING Outputters.Csv(outputHeader : true);
出力結果
EmpName | DepID | DepName |
---|---|---|
Rafferty | 31 | Engineering |
Jones | 33 | Engineering |
Heisenberg | 33 | Engineering |
Robinson | 34 | Engineering |
Smith | 34 | Engineering |
Williams | Engineering | |
Rafferty | 31 | Marketing |
Jones | 33 | Marketing |
Heisenberg | 33 | Marketing |
Robinson | 34 | Marketing |
Smith | 34 | Marketing |
Williams | Marketing |
INNER JOIN
それぞれの行セットの指定したキーのカラムの値が一致するデータだけを返します。こちらの例では INNER JOIN にサブクエリを指定しています。
@employees = SELECT * FROM (VALUES ("Rafferty", (int?) 31) , ("Jones", (int?) 33) , ("Heisenberg", (int?) 33) , ("Robinson", (int?) 34) , ("Smith", (int?) 34) , ("Williams", (int?) null)) AS E(EmpName, DepID); @departments = SELECT * FROM (VALUES ((int) 31, "Sales") , ((int) 33, "Engineering") , ((int) 34, "Clerical") , ((int) 35, "Marketing")) AS D(DepID, DepName); @rs_inner = SELECT e.DepID AS EmpDepID, d.DepID, e.EmpName, d.DepName FROM @employees AS e INNER JOIN (SELECT (int?) DepID AS DepID, DepName FROM @departments) AS d ON e.DepID == d.DepID; OUTPUT @rs_inner TO "/output/rsInnerJoin.csv" USING Outputters.Csv(outputHeader : true);
出力結果
EmpDepID | DepID | EmpName | DepName |
---|---|---|---|
31 | 31 | Rafferty | Sales |
33 | 33 | Jones | Engineering |
33 | 33 | Heisenberg | Engineering |
34 | 34 | Robinson | Clerical |
34 | 34 | Smith | Clerical |
OUTER JOIN
指定したキーでそれぞれの行セットを結合します。キーが一致しない場合でも、指定した側のデータは維持されます。
LEFT OUTER JOIN の例
@employees = SELECT * FROM (VALUES ("Rafferty", (int?) 31) , ("Jones", (int?) 33) , ("Heisenberg", (int?) 33) , ("Robinson", (int?) 34) , ("Smith", (int?) 34) , ("Williams", (int?) null)) AS E(EmpName, DepID); @departments = SELECT * FROM (VALUES ((int?) 31, "Sales") , ((int?) 33, "Engineering") , ((int?) 34, "Clerical") , ((int?) 35, "Marketing")) AS D(DepID, DepName); @rs_leftouter = SELECT e.EmpName, d.DepName FROM @employees AS e LEFT OUTER JOIN @departments AS d ON e.DepID == d.DepID; OUTPUT @rs_leftouter TO "/output/rsLeftOuterJoin.csv" USING Outputters.Csv(outputHeader : true);
出力結果
EmpName | DepName |
---|---|
Rafferty | Sales |
Jones | Engineering |
Heisenberg | Engineering |
Robinson | Clerical |
Smith | Clerical |
Williams |
※RIGHT OUTER JOIN 、FULL OUTER JOIN も同様にサポートされます。
SEMIJOIN
SEMIJOIN は SELECT * FROM A WHERE A.Key IN (SELECT B.Key FROM B) のようなフィルターのような動作になります。LEFT SEMIJOIN と RIGHT SEMIJOIN の 2 つがあります。
LEFT SEMIJOIN の例
@employees = SELECT * FROM (VALUES ("Rafferty", (int?) 31) , ("Jones", (int?) 33) , ("Heisenberg", (int?) 33) , ("Robinson", (int?) 34) , ("Smith", (int?) 34) , ("Williams", (int?) null)) AS E(EmpName, DepID); @departments = SELECT * FROM (VALUES ((int?) 31, "Sales") , ((int?) 33, "Engineering") , ((int?) 34, "Clerical") , ((int?) 35, "Marketing")) AS D(DepID, DepName); @emps_in_valid_dept = SELECT e.EmpName, e.DepID FROM @employees AS e LEFT SEMIJOIN @departments AS d ON e.DepID == d.DepID; OUTPUT @emps_in_valid_dept TO "/output/rsLeftSemiJoinEmployeesInValidDept.csv" USING Outputters.Csv(outputHeader : true);
出力結果
EmpName | DepID |
---|---|
Rafferty | 31 |
Jones | 33 |
Heisenberg | 33 |
Robinson | 34 |
Smith | 34 |
ANTISEMIJOIN
ANTISEMIJOIN は SEMIJOIN の反対の動作をします。SELECT * FROM A WHERE A.key NOT IN (SELECT B.key FROM B) のように、マッチしない行を返します。
RIGHT ANTISEMIJOIN の例
@employees = SELECT * FROM (VALUES ("Rafferty", (int?) 31) , ("Jones", (int?) 33) , ("Heisenberg", (int?) 33) , ("Robinson", (int?) 34) , ("Smith", (int?) 34) , ("Williams", (int?) null)) AS E(EmpName, DepID); @departments = SELECT * FROM (VALUES ((int) 31, "Sales") , ((int) 33, "Engineering") , ((int) 34, "Clerical") , ((int) 35, "Marketing")) AS D(DepID, DepName); @depts_without_emps = SELECT d.DepName, d.DepID FROM @employees AS e RIGHT ANTISEMIJOIN (SELECT (int?) DepID AS DepID, DepName FROM @departments) AS d ON e.DepID == d.DepID; OUTPUT @depts_without_emps TO "/output/rsRightAntiSemiJoinDeptsWithoutEmployees.csv" USING Outputters.Csv(outputHeader : true);
出力結果
DepName | DepID |
---|---|
Marketing | 35 |
この記事はこちらの公式ドキュメントを参考にしたものです。
https://msdn.microsoft.com/en-us/azure/data-lake-analytics/u-sql/u-sql-select-selecting-from-joins