Thursday, May 18, 2017

Almost Outer Join in Wave Dataflow

A full outer join is not available in Wave, augment is almost a left outer join. If we utilize repeated augments, append and some other operations we can accomplish what is typically required from full outer join functionality. This has not been tested for joining on keys that would have multi-values.

We are going to baseline this with something in Salesforce that makes sense as full outer join, however I would like to caution that this type of output may be confusing to end users as to why data does not exist in certain columns for certain records. For this example we will join on the Lead and Opportunity objects.

The code for this very basic example will be at the end of this article. The nodes in the dataflow are numbered to match what is referenced in the diagrams. It might be helpful to paste the code into the text editor of your choice to review side by side with this article.

1) Starting with this simplified diagram or as we already know, it is common that some Leads do not convert to Opportunities and that some Opportunities may not start as leads.



2) This is a summary view detailing all the of the steps we are going to use and referencing where sometimes we will use Lead Id and at other times the Opportunity Id. Along with a quick indication of where we will use the append operation. There is a lot going on in this image and we will break it down into the individual components over the next several steps.


3) The first 2 digest nodes acquire the key information that we want to include in the final result. As and example for leads we might want to know when it was created and for opportunities, what is the current amount.


4) The middle column of information has not yet been acquired. To simplify things as much as possible, we are only going to acquire the minimal amount of Lead and Opportunity data to create this middle join column. For the 3rd dataflow node we focus on just the minimal Lead information. We include “IsDeleted” because we need to include at least one field besides ID. The field must also exist on the Opportunity dataset for when we want to later append. Append requires the same set of columns in both datasets.


5) We are later going to append the minimal Opportunity dataset so to prevent replication in this column, we will truncate the data to only Leads that do not have Opportunities.


6) We now acquire the minimal Opportunity information and we need to add the field “ConvertedOpportunityId” so that we can append these together. The append operation requires the same columns in both datasets. We fill this "ConvertedOpportunityId" column with empty data. Dataset dimension compression of the same value is typically very efficient.


7) Now we augment the Lead information for Leads that do not have Opportunities matching Lead.Id == Lead.Id.


8) Then we augment Leads that have converted to Opportunities. The subtlety for this augment is that the Id of the green dataset is an OpportunityId so we need to match on Lead.ConvertedOpportunityId == Opportunity.Id.



9) Columns match, so at this point in time we can append the data together and we will then have augmented all the Lead information we wanted.


10) We then augment the opportunity information and our outer join is complete.



And the code used to accomplish this example:

{
   "LeadDigest_01": {
       "comment": "This is the full set of lead data to be included in the join",
       "action": "sfdcDigest",
       "parameters": {
           "fields": [
               {
                   "name": "Id"
               },
               {
                   "name": "Name"
               },
               {
                   "name": "ConvertedOpportunityId"
               },
               {
                   "name": "AnnualRevenue"
               },
               {
                   "name": "Project_Budget_Amount__c"
               }
           ],
           "object": "Lead"
       }
   },
   "OpportunityDigest_02": {
       "comment": "This is the full set of opportunity data to be included in the join",
       "action": "sfdcDigest",
       "parameters": {
           "fields": [
               {
                   "name": "Id"
               },
               {
                   "name": "Name"
               },
               {
                   "name": "Amount"
               }
           ],
           "object": "Opportunity"
       }
   },

   "MinimalLeadDigest_03": {
       "comment": "This is the minimal amount required to be able to use the append to help accomplish the outer join",
       "action": "sfdcDigest",
       "parameters": {
           "fields": [
               {
                   "name": "Id"
               },
               {
                   "name": "ConvertedOpportunityId",
                   "defaultValue": "undefined"
               },
               {
                   "name": "IsDeleted"
               }
           ],
           "object": "Lead"
       }
   },
   "TransformFilterLeads_04": {
       "comment": "As a workaround we have to append leads and opps with common fields and we don't want to append leads that converted as that will create duplicate rows",
       "action": "filter",
       "parameters": {
           "filter": "ConvertedOpportunityId:EQ:undefined",
           "source": "MinimalLeadDigest_03"
       }
   },    
   "MinimalOpportunityDigest_05": {
       "comment": "The minimal amount to pull from opportunity for the append",
       "action": "sfdcDigest",
       "parameters": {
           "fields": [
               {
                   "name": "Id"
               },
               {
                   "name": "IsDeleted"
               }
           ],
           "object": "Opportunity"
       }
   },
   "AddDerivedFieldToOpp_06": {
       "comments": "Append will want everything the same in the 2 datasets, so we will add a 'ConvertedOpportunityId' field to make the metadatamatch",
       "action": "computeExpression",
       "parameters": {
           "source": "MinimalOpportunityDigest_05",
           "mergeWithSource": true,
           "computedFields": [ {
               "name": "ConvertedOpportunityId",
               "type": "Text",
               "label": "unused",
               "saqlExpression": "\"undefined\""
           }]
       }
   },

   "AugmentInLeadDataToFilteredLeads_07": {
       "comment": "We can add the lead data prior to append so it will be in the same order",
       "action": "augment",
       "parameters": {
           "left": "TransformFilterLeads_04",
           "left_key": [
               "Id"
           ],
           "relationship": "Lead",
           "right": "LeadDigest_01",
           "right_key": [
               "Id"
           ],
           "right_select": [
               "Name",
               "AnnualRevenue",
               "Project_Budget_Amount__c",
               "ConvertedOpportunityId"
           ]
       }
   },

   "AugmentInLeadDataToOpps_08": {
       "comment": "We can add the lead data prior to append so it will be in the same order",
       "action": "augment",
       "parameters": {
           "left": "AddDerivedFieldToOpp_06",
           "left_key": [
               "Id"
           ],
           "relationship": "Lead",
           "right": "LeadDigest_01",
           "right_key": [
               "ConvertedOpportunityId"
           ],
           "right_select": [
               "Name",
               "AnnualRevenue",
               "Project_Budget_Amount__c",
               "ConvertedOpportunityId"
           ]
       }
   },

   "LeadOppAppend_09": {
       "comment": "This will now be an outer join, converted leads will have opp id's, opps that did not originate from a lead are included and the rows of leads that have not converted are included, the id field will be lead id's and opp id's",
       "action": "append",
       "parameters": {
           "sources": [
               "AugmentInLeadDataToFilteredLeads_07",
               "AugmentInLeadDataToOpps_08"
           ]
       }
   },

   "AugmentInOppData_10": {
       "comment": "Now we can add in the opp data - we could not augment lead data after the append which would have simplified and reduced one of the augments above, but you have to augment lead data on 'id' and 'convertedoppid' and augment doesn't like duplicate Lead.Name fields",
       "action": "augment",
       "parameters": {
           "left": "LeadOppAppend_09",
           "left_key": [
               "Id"
           ],
           "relationship": "Opportunity",
           "right": "OpportunityDigest_02",
           "right_key": [
               "Id"
           ],
           "right_select": [
               "Name",
               "Amount"
           ]
       }
   },

   "LeadRegister": {
       "action": "sfdcRegister",
       "parameters": {
           "alias": "Base_Lead",
           "name": "Base Lead",
           "source": "LeadDigest_01"
       }
   },
   "OppRegister": {
       "action": "sfdcRegister",
       "parameters": {
           "alias": "Base_Opportunity",
           "name": "Base Opportunity",
           "source": "OpportunityDigest_02"
       }
   },
   "FullRegister_11": {
       "action": "sfdcRegister",
       "parameters": {
           "alias": "LeadOppFullJoin",
           "name": "LeadOpp Full Join",
           "source": "AugmentInOppData_10"
       }
   }
}