Salesforce
Important Capabilities
| Capability | Status | Notes | 
|---|---|---|
| Data Profiling | ✅ | Only table level profiling is supported via profiling.enabledconfig field | 
| Detect Deleted Entities | ❌ | Not supported yet | 
| Domains | ✅ | Supported via the domainconfig field | 
| Platform Instance | ✅ | Can be equivalent to Salesforce organization | 
Prerequisites
In order to ingest metadata from Salesforce, you will need:
- Salesforce username, password, security token OR
- Salesforce instance url and access token/session id (suitable for one-shot ingestion only, as access token typically expires after 2 hours of inactivity)
The account used to access Salesforce requires the following permissions for this integration to work:
- View Setup and Configuration
- View All Data
Integration Details
This plugin extracts Salesforce Standard and Custom Objects and their details (fields, record count, etc) from a Salesforce instance. Python library simple-salesforce is used for authenticating and calling Salesforce REST API to retrive details from Salesforce instance.
REST API Resources used in this integration
- Versions
- Tooling API Query on objects EntityDefinition, EntityParticle, CustomObject, CustomField
- Record Count
Concept Mapping
This ingestion source maps the following Source System Concepts to DataHub Concepts:
| Source Concept | DataHub Concept | Notes | 
|---|---|---|
| Salesforce | Data Platform | |
| Standard Object | Dataset | subtype "Standard Object" | 
| Custom Object | Dataset | subtype "Custom Object" | 
Caveats
- This connector has only been tested with Salesforce Developer Edition.
- This connector only supports table level profiling (Row and Column counts) as of now. Row counts are approximate as returned by Salesforce RecordCount REST API.
- This integration does not support ingesting Salesforce External Objects
CLI based Ingestion
Install the Plugin
pip install 'acryl-datahub[salesforce]'
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
pipeline_name: my_salesforce_pipeline
source:
  type: "salesforce"
  config:
    instance_url: "https://mydomain.my.salesforce.com/"
    username: user@company
    password: password_for_user
    security_token: security_token_for_user
    platform_instance: mydomain-dev-ed
    domain:
      sales:
        allow:
          - "Opportunity$"
          - "Lead$"
    object_pattern:
      allow:
        - "Account$"
        - "Opportunity$"
        - "Lead$"
sink:
  type: "datahub-rest"
  config:
    server: "http://localhost:8080"
Config Details
- Options
- Schema
Note that a . is used to denote nested fields in the YAML recipe.
| Field | Description | 
|---|---|
| access_token string | Access token for instance url | 
| auth Enum | Default: USERNAME_PASSWORD | 
| ingest_tags boolean | Ingest Tags from source. This will override Tags entered from UI Default: False | 
| instance_url string | Salesforce instance url. e.g. https://MyDomainName.my.salesforce.com | 
| is_sandbox boolean | Connect to Sandbox instance of your Salesforce Default: False | 
| password string | Password for Salesforce user | 
| platform string | Default: salesforce | 
| platform_instance string | The instance of the platform that all assets produced by this recipe belong to | 
| security_token string | Security token for Salesforce username | 
| username string | Salesforce username | 
| env string | The environment that all assets produced by this connector belong to Default: PROD | 
| domain map(str,AllowDenyPattern) | A class to store allow deny regexes | 
| domain. key.allowarray(string) | |
| domain. key.denyarray(string) | |
| domain. key.ignoreCaseboolean | Whether to ignore case sensitivity during pattern matching. Default: True | 
| object_pattern AllowDenyPattern | Regex patterns for Salesforce objects to filter in ingestion. Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True} | 
| object_pattern.allow array(string) | |
| object_pattern.deny array(string) | |
| object_pattern.ignoreCase boolean | Whether to ignore case sensitivity during pattern matching. Default: True | 
| profile_pattern AllowDenyPattern | Regex patterns for profiles to filter in ingestion, allowed by the object_pattern.Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True} | 
| profile_pattern.allow array(string) | |
| profile_pattern.deny array(string) | |
| profile_pattern.ignoreCase boolean | Whether to ignore case sensitivity during pattern matching. Default: True | 
| profiling SalesforceProfilingConfig | Default: {'enabled': False} | 
| profiling.enabled boolean | Whether profiling should be done. Supports only table-level profiling at this stage Default: False | 
The JSONSchema for this configuration is inlined below.
{
  "title": "SalesforceConfig",
  "description": "Any source that is a primary producer of Dataset metadata should inherit this class",
  "type": "object",
  "properties": {
    "env": {
      "title": "Env",
      "description": "The environment that all assets produced by this connector belong to",
      "default": "PROD",
      "type": "string"
    },
    "platform_instance": {
      "title": "Platform Instance",
      "description": "The instance of the platform that all assets produced by this recipe belong to",
      "type": "string"
    },
    "auth": {
      "default": "USERNAME_PASSWORD",
      "allOf": [
        {
          "$ref": "#/definitions/SalesforceAuthType"
        }
      ]
    },
    "username": {
      "title": "Username",
      "description": "Salesforce username",
      "type": "string"
    },
    "password": {
      "title": "Password",
      "description": "Password for Salesforce user",
      "type": "string"
    },
    "security_token": {
      "title": "Security Token",
      "description": "Security token for Salesforce username",
      "type": "string"
    },
    "instance_url": {
      "title": "Instance Url",
      "description": "Salesforce instance url. e.g. https://MyDomainName.my.salesforce.com",
      "type": "string"
    },
    "is_sandbox": {
      "title": "Is Sandbox",
      "description": "Connect to Sandbox instance of your Salesforce",
      "default": false,
      "type": "boolean"
    },
    "access_token": {
      "title": "Access Token",
      "description": "Access token for instance url",
      "type": "string"
    },
    "ingest_tags": {
      "title": "Ingest Tags",
      "description": "Ingest Tags from source. This will override Tags entered from UI",
      "default": false,
      "type": "boolean"
    },
    "object_pattern": {
      "title": "Object Pattern",
      "description": "Regex patterns for Salesforce objects to filter in ingestion.",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "allOf": [
        {
          "$ref": "#/definitions/AllowDenyPattern"
        }
      ]
    },
    "domain": {
      "title": "Domain",
      "description": "Regex patterns for tables/schemas to describe domain_key domain key (domain_key can be any string like \"sales\".) There can be multiple domain keys specified.",
      "default": {},
      "type": "object",
      "additionalProperties": {
        "$ref": "#/definitions/AllowDenyPattern"
      }
    },
    "profiling": {
      "title": "Profiling",
      "default": {
        "enabled": false
      },
      "allOf": [
        {
          "$ref": "#/definitions/SalesforceProfilingConfig"
        }
      ]
    },
    "profile_pattern": {
      "title": "Profile Pattern",
      "description": "Regex patterns for profiles to filter in ingestion, allowed by the `object_pattern`.",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "allOf": [
        {
          "$ref": "#/definitions/AllowDenyPattern"
        }
      ]
    },
    "platform": {
      "title": "Platform",
      "default": "salesforce",
      "type": "string"
    }
  },
  "additionalProperties": false,
  "definitions": {
    "SalesforceAuthType": {
      "title": "SalesforceAuthType",
      "description": "An enumeration.",
      "enum": [
        "USERNAME_PASSWORD",
        "DIRECT_ACCESS_TOKEN"
      ]
    },
    "AllowDenyPattern": {
      "title": "AllowDenyPattern",
      "description": "A class to store allow deny regexes",
      "type": "object",
      "properties": {
        "allow": {
          "title": "Allow",
          "description": "List of regex patterns to include in ingestion",
          "default": [
            ".*"
          ],
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "deny": {
          "title": "Deny",
          "description": "List of regex patterns to exclude from ingestion.",
          "default": [],
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "ignoreCase": {
          "title": "Ignorecase",
          "description": "Whether to ignore case sensitivity during pattern matching.",
          "default": true,
          "type": "boolean"
        }
      },
      "additionalProperties": false
    },
    "SalesforceProfilingConfig": {
      "title": "SalesforceProfilingConfig",
      "type": "object",
      "properties": {
        "enabled": {
          "title": "Enabled",
          "description": "Whether profiling should be done. Supports only table-level profiling at this stage",
          "default": false,
          "type": "boolean"
        }
      },
      "additionalProperties": false
    }
  }
}
Code Coordinates
- Class Name: datahub.ingestion.source.salesforce.SalesforceSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for Salesforce, feel free to ping us on our Slack.