Salesforce Data Integration Primer - ETL and SSIS

If you are working with Salesforce, especially if you are building out your first instance, you will likely need to populate Salesforce with at least some of your existing data. That could be from a legacy CRM or from other business systems you wish to enrich your Salesforce data with. Once your Salesforce instance is loaded and launched to production you will likely have ongoing integration needs where you will be sending data to Salesforce from on-prem databases and loading those same databases from Salesforce. While Salesforce does offer some decent reporting capabilities it is likely you'll want to bring your Salesforce data down to your ODS or Data Warehouse where you can join it with the rest of your business data to gain the most meaningful insights.

While there are other ways you can integrate data to and from Salesforce, we are focusing on the Business Intelligence wheelhouse. For example, if you wanted to cascade data in real-time from other source systems to Salesforce, you could build that out in custom development. For different reason you may not need that kind of custom solution and a scheduled, incremental data load would suffice.

Some Considerations

Here are some considerations you will likely wrestle with as you get started and some good to know facts before you tackle your first Salesforce integration project. This is a wide-angle view--we will go into more detail later in this guide. 

  • The different types of integration tools we can use to load and extract data from Salesforce.

  • Data integration is performed by accessing Salesforce APIs. Salesforce API endpoints have a standard and bulk processing mode. Use the bulk API to load large amounts of data without consuming too many API calls. Use the standard API to process small batches and properly report on errors.

  • You have a limited number of APIs you can consume each day and this is dependent upon how many users you have licensed in Salesforce.

  • There is a canned report in Salesforce that will tell you how many APIs each user has consumed each day for the last seven days. This is helpful to track how many APIs your data integration process is consuming.

  • When updating data in Salesforce you must use the Salesforce Internal Id (Id). If you want to upsert (insert/update) data to a table in Salesforce you need to create an External Id that is unique in the target table.

  • The lookup data type in Salesforce can only reference another field using the Salesforce Internal Id (Id). You cannot use an External Id or other field.

  • Get familiar with the way Person is modeled in Salesforce. You can have the Individual model or the Person Account model. How Person is modeled will affect how you relate data in Salesforce.

  • In the person account model, contact records are stored in the account table as well as the contact table. In order to relate account contacts to their parent account in the account table, you need to use the AccountContactRelation table in Salesforce.

  • Get familiar with the Setup section in the Salesforce application, this is where you can learn a lot about your environment. Included here are things like the object model in a nice customizable visualization, bulk data load tracking and history, error logs, and feature settings. Make sure you have access to Setup in Salesforce.

Integration Tools

As you start down the road of data integration with Salesforce its good to know what tools you have available to you. Below is an overview of some of the more common tooling for data integration with Salesforce.

Data Loader - The first and most common data integration tool you will hear about when getting started with Salesforce is the Salesforce Data Loader. This is a free tool Salesforce provides that allows users to load files to Salesforce and also export Salesforce data as a file. While you cannot automate work through the Data Loader tool, I would probably install it anyway and get familiar with it. There is a good chance that somebody in your organization will use Data Loader at some point so it's good to know what it can do.

SSIS Integration - Out of the box, SQL Server Integration Services (SSIS) does not support connectivity to Salesforce but there are a number of companies providing tools that will give you Salesforce connectivity inside of SSIS. Most add-in packages for Salesforce include Salesforce data connectors, Source, Destination, and Lookup components which provide all the integration points you will need for your project. Products we have tested and like are SentryOne SSIS Task Factory, CozyRoc SSIS, Devart Data Flow Components, and ZappySys SSIS Components. The cost of these tools are relatively cheap, $300 - $2000 for a server license. The client tools are free though so you can use and test any of these on your workstation. Of the add-ins mentioned above we liked the SentryOne SSIS Task Factory best.

Because of its ubiquity, community support, free license with SQL Server, and ease of use, SSIS can be a great option. Coupled with some cheap Salesforce add-ins, the cost of integration tooling barely registers a blip on the technology budget radar. If you don't already have Microsoft SQL server in your organization, you might find better options out there.

Apex - If you don't have a data integration or BI developer on staff you can always turn to your app devs for your integration needs. Apex is an object-oriented programming language used exclusively for Salesforce. Apex allows you to execute flow and transaction control statements in Salesforce in conjunction with calls to the API. There are some things, like complex workflows that might require some custom development with your app dev team, but day to day integration tasks are probably better kept with your BI and data teams.

iPaaS - Integration Platform as a Service (iPaaS) is new to a lot of organizations but if you already make use of one--Boomi, Mulesoft, Jitterbit, et al, then you already know what it can do. Without getting too deep in what iPaaS is and what it can do for you, I will say that it can be a great tool for building data integration workflows with Salesforce as well as building complex data integration workflows across multiple disparate systems. Out of the box iPaaS tools have everything you need to integrate with Salesforce. iPaaS isn't cheap though, so if your only business use case is Salesforce integration there are probably better options out there.

Skyvia - As a standalone integration tool, Skyvia can help you will all integrations related to Salesforce. Skyvia is produced by Devart a company that also produces the Devart SSIS components for SSIS. I don't have a lot of experience with this tool so I don't have much to say about it. All I know is that this company is based out of Ukraine and so reaching them during business hours in the west can be a challenge. Aside from that, the feedback on this product is pretty good and worth a look if it makes sense for your environment. Pricing is based on number of records processed and starts at $4800 for 10 million records processed per month.

Capstorm - Is a Salesforce data backup and recovery tool that can also be used to replicate data in and out of Salesforce. I don't think you would use this in most common data integration scenarios, but it does have its uses. Worth a look if you also plan on implementing a robust backup solution--it is a leader in that space.

Now that we know what Salesforce integration options are available to us, let's move on to some of the more nuanced discussions around getting data in and out of Salesforce.

Data Integration and Salesforce APIs

The way you integrate data with Salesforce is through one of the Salesforce APIs. You will use either the standard API (REST or SOAP) or the bulk API and it's important you understand the difference between them. It doesn't matter what integration tooling you use, you still connect to the same APIs and you still have the same constraints of each of those. When thinking about common, schedule-based data integration processes you need to choose how you will connect to Salesforce, with the standard or bulk API.

The reason why this is so important is because of the way Salesforce licenses API calls to its system. For each user license you have in Salesforce you are allowed 1000 API calls per day. That includes the API calls those users are making through the Salesforce application. If you are pushing the limits of your daily API cap, the standard API might not be the way to go for your organization. The standard APIs are restricted to much smaller batch sizes compared to the bulk API resulting in higher API counts per data integration operation. The bulk API allows for much larger batch sizes, 5k + per batch. This is an important consideration when you are getting in to data integration with Salesforce and guess what, those API counts are not the only thing you need to be concerned with when using the different APIs.

There is another key difference between these types of APIs that you need to be aware of. When working with the standard API you are basically getting row by row feedback for your data integration operations so if a record fails and the job fails with it, you get that feedback right away whether that is in Data Loader, SSIS, Apex, or iPaaS. The bulk API is different though, instead of going row by row the bulk API produces a large file per batch and then submits that to Salesforce for processing. That means that once you have submitted that batch through the Bulk API failures will not be reported back to the application you submitted them with. In Salesforce there is a section where you can see the status of your bulk API jobs, they can be found here: Setup>Platform Tools>Environments>Jobs>Bulk Data Load Jobs. This is important to note because data loads using the bulk API can feel like a black box unless you know where to look for status on the load you submitted. This is also where you can find errors related to your data loads so you know what needs to be corrected.

It might make more sense to use the bulk API for large loads and use the standard for everything else. If you can build a good incremental workflow this should set you up to use the standard API most of the time. It might just be for those initial loads that you would use the bulk API. Your mileage may vary but these are things you want to have on your radar. Also, don't forget to track down the canned Salesforce report on API usage. This will show you the API usage by user by day for the last seven days.

DML Operations in Salesforce

When it comes time to actually update the data in Salesforce there are several rules you need to follow. There are different DML operations you can access in Salesforce such as insert, update, delete, upsert, and merge. One thing to note about merge, this is not the same as the merge in SQL Server so if you are looking to update/insert in the same operation you can use the upsert option in Salesforce. There are some other nuances about the different DML types you should know.

If you are updating data in Salesforce you need to do that on a Salesforce Internal Id. A Salesforce internal Id is what Salesforce uses to uniquely identify a row in a table in their database-- this as an auto generated identity column, each time a row is created a new Salesforce internal Id is created to represent uniqueness in the table. The only way you can update (not upsert) a Salesforce table is by using this Salesforce internal Id, you cannot update data without it. You will probably end up with some kind of workflow that brings Salesforce data in to your local environment, uses an External Id to join to you other source data, and then uses the Salesforce Internal Id you brought down to do an update, Or…

If you need to upsert (update/insert) data you can't do this on the Salesforce internal Id, for this you will need a Salesforce external Id. A Salesforce external Id is a way for your business to uniquely identify a row in a Salesforce table using a business key, a value that has meaning to the business. You will select and or create a field in the desired table in Salesforce and designate that field an ExternalId, and while it is not required I would also recommend setting the value as unique because you can't upsert on the table if the value is not unique. Once you have the unique ExternalId configured for your target table you can start upserting to the table.

Establishing Data Relations in Salesforce

Loading data is only the beginning, you need to make sure all the data you are loading is properly relating across the Salesforce instance. One of the areas you might get hung up on early is the Person model in Salesforce. Salesforce has different ways you can model Person - the Individual and the Person Account model. I would encourage you to go out and read more about this, Salesforce has some good documentation around it. What I would like to talk about here is the Person Account model, how that works and how you build relationships with the Person Account model.

In the Person Account model, contacts are stored in both the Account table as well as the Contact table. When you load an account contact to the account table, a contact record is also created in the Contact table in Salesforce automatically. In the Account table you can have many different types of account so let's assume we are working with two types for this example, an Account Household and Account Household Contact (members of the household). Once we load both the Account Household and the Household Contact next we need to tell Salesforce they are related and to do that we need to use the AccountContactRelation table in Salesforce. The AccountContactRelation table is a bridge table/junction object in Salesforce that connects a contact record to an account record using the Salesforce Internal Ids of both the account and the contact. This is the primary way you will relate contacts and accounts in Salesforce in the Person Account model.

Other ways you can relate data in Salesforce is to create or make use of lookup fields. A lookup field is a data type in Salesforce that expects a Salesforce Internal Id on which to lookup/relate the value for. Lookup fields are only compatible with Salesforce Internal Ids, they will not work with an External Id or other field types. Imagine you have an Account type of Registered Investment Advisor (RIA) and you have Advisors who work for this RIA. If you want to relate the Advisor to the RIA you might have a lookup field named Advisor in the Account table that is a lookup data type. This uses the Salesforce Internal Id for the Advisor which provides a lookup for the Advisor record in the RIA record thus building a kind of relationship between the two record types.

You can also build your own junction objects for many-to-many relationships but for one-to-many you can use the lookup fields.

Setup and Troubleshooting

Inevitably you are going to run in to issues during your Salesforce integration project so you need to know where to look. The best resource is in the Setup interface in Salesforce which can be reached by clicking the gear in the top right corner of your Salesforce application and clicking the Setup option. Make sure you have the correct permissions to see this. Once you are in the Setup section you should have access to the Following:

Schema Builder - This is where you can find the relational model (object model) for your Salesforce instance. This has a nice customizable interface that will give you greater insight in to the database relations in Salesforce. Can be found here:

o    Setup>Platform Tools> Objects and Fields>Schema Builder

Object Manager - This is where you can view the fields and attributes configured for Salesforce tables (objects). This can be found here:

o    Setup>Platform Tools> Objects and Fields>Object Manager

Bulk Data Load Jobs - If you are running bulk API jobs you will want to know more about the execution of those jobs and to do that you will need to go to the Setup section that shows active jobs and a history of all jobs run including an errors in those jobs. These can be found here:

o    Setup>Platform Tools>Environments>Jobs>Bulk Data Load Jobs

API Usage Report  - This is something you will need to keep an eye on. Here are instructions for viewing or creating this report: https://help.salesforce.com/articleView?id=000326126&type=1&mode=1

Conclusion

This should provide a good primer to get you up to speed with data integration in Salesforce. Once you know all the rules and how to make it work, it really is quite easy.