U-SQL 入門⑩ ~ 結合 ~

U-SQL では以下の結合が利用可能です。すべての JOIN において行セットの変数を指定することも SELECT 文を記述することもできます。

  • CROSS JOIN
  • INNER JOIN
  • OUTER JOIN
  • SEMIJOIN
  • ANTISEMIJOIN

CROSS JOIN

行セットのデカルト積を返します。

@employees = SELECT *
               FROM (VALUES
                      ("Rafferty", (int?) 31)
                    , ("Jones", (int?) 33)
                    , ("Heisenberg", (int?) 33)
                    , ("Robinson", (int?) 34)
                    , ("Smith", (int?) 34)
                    , ("Williams", (int?) null)) AS E(EmpName, DepID);

@departments = SELECT *
                FROM (VALUES
                       ((int) 31, "Sales")
                     , ((int) 33, "Engineering")
                     , ((int) 34, "Clerical")
                     , ((int) 35, "Marketing")) AS D(DepID, DepName);

@rs_cross = 
    SELECT e.EmpName, e.DepID, d.DepName
    FROM @employees AS e
         CROSS JOIN @departments AS d
    WHERE ((d.DepName == "Engineering") || (d.DepName == "Marketing"));

OUTPUT @rs_cross
TO "/Output/ReferenceGuide/Joins/CrossJoins/ExampleA.csv"
USING Outputters.Csv(outputHeader : true);

出力結果

EmpName DepID DepName
Rafferty 31 Engineering
Jones 33 Engineering
Heisenberg 33 Engineering
Robinson 34 Engineering
Smith 34 Engineering
Williams Engineering
Rafferty 31 Marketing
Jones 33 Marketing
Heisenberg 33 Marketing
Robinson 34 Marketing
Smith 34 Marketing
Williams Marketing

INNER JOIN

それぞれの行セットの指定したキーのカラムの値が一致するデータだけを返します。こちらの例では INNER JOIN にサブクエリを指定しています。

@employees = SELECT *  
               FROM (VALUES   
                      ("Rafferty", (int?) 31)  
                    , ("Jones", (int?) 33)  
                    , ("Heisenberg", (int?) 33)  
                    , ("Robinson", (int?) 34)  
                    , ("Smith", (int?) 34)  
                    , ("Williams", (int?) null)) AS E(EmpName, DepID);  

@departments = SELECT *  
                FROM (VALUES  
                       ((int) 31, "Sales")  
                     , ((int) 33, "Engineering")  
                     , ((int) 34, "Clerical")  
                     , ((int) 35, "Marketing")) AS D(DepID, DepName);  

@rs_inner =   
    SELECT e.DepID AS EmpDepID, d.DepID, e.EmpName, d.DepName       
    FROM @employees AS e  
         INNER JOIN (SELECT (int?) DepID AS DepID, DepName FROM @departments) AS d   
         ON e.DepID == d.DepID;  

OUTPUT @rs_inner   
TO "/output/rsInnerJoin.csv"  
USING Outputters.Csv(outputHeader : true);  

出力結果

EmpDepID DepID EmpName DepName
31 31 Rafferty Sales
33 33 Jones Engineering
33 33 Heisenberg Engineering
34 34 Robinson Clerical
34 34 Smith Clerical

OUTER JOIN

指定したキーでそれぞれの行セットを結合します。キーが一致しない場合でも、指定した側のデータは維持されます。

LEFT OUTER JOIN の例
@employees = SELECT *
               FROM (VALUES
                      ("Rafferty", (int?) 31)
                    , ("Jones", (int?) 33)
                    , ("Heisenberg", (int?) 33)
                    , ("Robinson", (int?) 34)
                    , ("Smith", (int?) 34)
                    , ("Williams", (int?) null)) AS E(EmpName, DepID);

@departments = SELECT *
                FROM (VALUES
                       ((int?) 31, "Sales")
                     , ((int?) 33, "Engineering")
                     , ((int?) 34, "Clerical")
                     , ((int?) 35, "Marketing")) AS D(DepID, DepName);

@rs_leftouter =
    SELECT e.EmpName, d.DepName
    FROM @employees AS e
         LEFT OUTER JOIN @departments AS d
         ON e.DepID == d.DepID;

OUTPUT @rs_leftouter
TO "/output/rsLeftOuterJoin.csv"
USING Outputters.Csv(outputHeader : true); 

出力結果

EmpName DepName
Rafferty Sales
Jones Engineering
Heisenberg Engineering
Robinson Clerical
Smith Clerical
Williams

※RIGHT OUTER JOIN 、FULL OUTER JOIN も同様にサポートされます。

SEMIJOIN

SEMIJOIN は SELECT * FROM A WHERE A.Key IN (SELECT B.Key FROM B) のようなフィルターのような動作になります。LEFT SEMIJOIN と RIGHT SEMIJOIN の 2 つがあります。

LEFT SEMIJOIN の例
@employees = SELECT *  
               FROM (VALUES   
                      ("Rafferty", (int?) 31)  
                    , ("Jones", (int?) 33)  
                    , ("Heisenberg", (int?) 33)  
                    , ("Robinson", (int?) 34)  
                    , ("Smith", (int?) 34)  
                    , ("Williams", (int?) null)) AS E(EmpName, DepID);  

@departments = SELECT *  
                FROM (VALUES  
                       ((int?) 31, "Sales")  
                     , ((int?) 33, "Engineering")  
                     , ((int?) 34, "Clerical")  
                     , ((int?) 35, "Marketing")) AS D(DepID, DepName);  

@emps_in_valid_dept =  
    SELECT e.EmpName, e.DepID  
    FROM @employees AS e  
         LEFT SEMIJOIN @departments AS d  
         ON e.DepID == d.DepID;  

OUTPUT @emps_in_valid_dept   
TO "/output/rsLeftSemiJoinEmployeesInValidDept.csv"  
USING Outputters.Csv(outputHeader : true);

出力結果

EmpName DepID
Rafferty 31
Jones 33
Heisenberg 33
Robinson 34
Smith 34

ANTISEMIJOIN

ANTISEMIJOIN は SEMIJOIN の反対の動作をします。SELECT * FROM A WHERE A.key NOT IN (SELECT B.key FROM B) のように、マッチしない行を返します。

RIGHT ANTISEMIJOIN の例
@employees = SELECT *  
               FROM (VALUES   
                      ("Rafferty", (int?) 31)  
                    , ("Jones", (int?) 33)  
                    , ("Heisenberg", (int?) 33)  
                    , ("Robinson", (int?) 34)  
                    , ("Smith", (int?) 34)  
                    , ("Williams", (int?) null)) AS E(EmpName, DepID);  

@departments = SELECT *  
                FROM (VALUES  
                       ((int) 31, "Sales")  
                     , ((int) 33, "Engineering")  
                     , ((int) 34, "Clerical")  
                     , ((int) 35, "Marketing")) AS D(DepID, DepName);  

@depts_without_emps =  
    SELECT d.DepName, d.DepID  
    FROM @employees AS e  
         RIGHT ANTISEMIJOIN (SELECT (int?) DepID AS DepID, DepName FROM @departments) AS d   
         ON e.DepID == d.DepID;  

OUTPUT @depts_without_emps   
TO "/output/rsRightAntiSemiJoinDeptsWithoutEmployees.csv"  
USING Outputters.Csv(outputHeader : true);

出力結果

DepName DepID
Marketing 35

この記事はこちらの公式ドキュメントを参考にしたものです。
https://msdn.microsoft.com/en-us/azure/data-lake-analytics/u-sql/u-sql-select-selecting-from-joins