Over the past few years, the technology industry has exploded. There seems to be a new record-breaking IPO every year. Businesses are not slowing down either. In fact, corporate spending has done nothing but increase as companies begin to allocate more and more budget to strategic initiatives and overall digital transformation.
Gartner estimates that worldwide IT spending is expected to reach $4.2 trillion in 2021, with enterprise software accounting for about 14.2 percent of this total. This is an 8.6 percent increase in spending over 2020.
Gone are the days of expensive hardware. Every company today uses a variety of different SaaS technologies that have replaced traditional back-end systems. For example, sales teams typically make use of a customer relationship management system (CRM), finance teams use ERP systems, human resource teams make use of human resource information systems (HRIS), and support teams use customer support (CS) tools. . These tools usually include technologies like Hubspot, Salesforce, Netsuite, SAP, Workday, Gainsight, Zendesk, etc. These are just a few examples, but this list could go on and on.
What is a SaaS?
Simply put, SaaS is a software licensing model where the software resides on external servers rather than on private internal servers. SaaS tools are usually built on top of major cloud service providers (eg AWS, Azure, and GCP). In most cases, the software is provided through a subscription or license. Users typically access SaaS tools through a web browser, logging in with a username and password, rather than having to install physical software on their computer. With SaaS tools, you don’t have to worry about any of the infrastructure or maintenance to keep the service running.
In most cases, SaaS tools also have a lower upfront cost, since most companies offer consumption-based and flat-rate pricing models. In addition, the largest SaaS companies are constantly innovating and rolling out new features and updates to improve their products in order to gain more customers. SaaS tools have one big problem though. Since all SaaS applications contain a unique dataset, each new SaaS tool the company adopts creates a data silo. This is because every SaaS tool is built for a specific purpose and most of them don’t necessarily play well with other tools.
What is SaaS integration?
SaaS integrations have emerged as a way to combat the problem of data warehouses. Simply put, a SaaS integration connects a SaaS-based application to either another cloud-based application or an on-premises program through an API or API integration. Once this integration is complete, both apps can easily request and share data with each other. SaaS integrations only provide the blueprints for connecting applications together.
In order to physically link two different tools, companies have to create an integration, so building data pipelines is often the most time-consuming aspect of data engineering teams (this is not usually what they are set up for). It is time consuming and difficult because a medium-sized company often has hundreds of applications and SaaS tools. Additionally, since most SaaS companies are constantly rolling out updates and new features, these integrations and pipelines are highly vulnerable to failure. Even worse, this data is often in its raw state where it hasn’t changed, so it’s not usable.
SaaS Integration Solutions
To overcome the above problems, companies have consistently taken advantage of a variety of data integration solutions:
1. iPaaS (Integration Platform as a Service)
iPaaS solutions transfer data directly between applications and do little or no transformation on data. They usually provide a visual interface for building integrations. If an API endpoint is detected by a SaaS vendor, the iPaaS solution can push data to it or pull data from it.
In general, iPaaS solutions perform actions when the trigger is met. Simply put, when a trigger or event occurs in a system, this information is then passed on to another application via an API or a web hook which then performs one or more predefined actions.
Basically, all iPaaS solutions work the same way, they all send data from point A to point B or vice versa. Since these point-to-point connections are just large workflows, they have become more difficult to maintain compared to the “on-premises integration” created by the data engineering team.
Additionally, iPaaS solutions are only designed to handle simple things, so companies cannot submit more complex information such as ARR (Annual Recurring Revenue) or product usage data to combine with additional information in a marketing platform like Marketo or Hubspot. In essence, iPaaS solutions are essential, and they need to be told exactly what to do and how to do it. Data integrity should be a must.
2. CDPs (Customer Data Platform)
In its simplest form, CDP collects and combines customer data from different sources into a single repository and then sends that information to different destinations. CDPs provide much more functionality compared to iPaaS solutions.
CDPs give marketers and growth teams the ability to aggregate and combine data from different sources to create segments based on user behavior and attributes and then sync those segments directly back into third-party tools to build personalized experiences without having to rely on data engineering teams.
CDPs were created solely for marketing purposes and exist to eliminate the friction that often exists between marketing and data teams. However, CDPs typically rely on very limited, pre-defined data models centered around users and accounts. In fact, each company has its own unique things (such as subscriptions, gigs, products, playlists, artists, etc.)
In addition, CDPs do not integrate with other technologies. For example, when it comes to data transformation, organizations are limited to native integrations and product features built within a CDP and cannot use other tools on top of it.
3. ETL (Extract, Transform, Load)
ETL is a relatively old data integration process that dates back to the 1970s. ETL tool extracts data from first-party databases and third-party sources. After this data is extracted, it is transformed to meet the needs of analysts and data scientists, and then loaded directly into the data warehouse.
This causes a problem because the data is stuck in the data warehouse. Tools like Informatica have popularized this data integration methodology. Cloud data warehouses derive insights and generate reports, but they are less practical when it comes to using that data to create targeted campaigns and customized experiences toward customers in the same way that CDP and iPaaS solutions are used.
4. ELT (Extract, Load, Transform)
All the problems just expressed have given rise to a new line of thinking focused on English language teaching (extract, transform, load). This has been largely supported by innovations in the cloud data storage space. Solutions like Snowflake and BigQuery have become very efficient and reliable for analysis purposes. ELT tools like Fivetran have made it really easy for companies to move data from different sources into a data warehouse.
As a native SaaS solution, Fivetran provides nearly 200 custom connectors or custom integrations for various data sources and SaaS applications designed to handle the “E” and “L” aspects of ELT, automating the entire data pipeline process for engineers. On the other hand, the dbt data builder has completely revolutionized the “T” in ELT by creating a tool that runs on top of the data warehouse to transform data with SQL.
With dbt, companies can create reusable data models to organize and transform their data. ELT should be considered as the solution that supports data warehouse. The integration of ELT with the data warehouse changed the entire data system by eliminating data warehouses.
However, by getting rid of data silos, a data warehouse has become, in effect, a data silo. Data warehouses are useful for creating dashboards and reports which are often driven by business intelligence tool like PowerBI, Looker, Tableau, etc. However, neither ELT nor data warehousing has addressed the problem of SaaS integrations that really focus only on pushing data back into the tools of non-technical business users.
The new model (the opposite of ETL)
This is exactly the problem that Reverse ETL solves. “Reverse ETL is the process of copying data from a cloud data warehouse (such as Amazon Redshift, Google BigQuery, Snowflake, Azure Synapse, etc.) to registry operating systems, including but not limited to SaaS tools used for growth, marketing, sales, and support.” Whereas ETL and ELT read from the source and write to the repository, the opposite of ETL reads from the repository and writes to the source.
A data warehouse already contains all the information from every data source across the entire organization, so it makes sense that it should be standardized as the single source of truth. With Reverse ETL, companies can take advantage of existing data models (rate of change, lifetime value, created workspaces, etc.) to sync this information directly to their destination of choice in real time. These syncs can be done manually or scheduled at a specific time period (every few minutes or every hour/daily).
This removes friction between data engineers and work teams because data engineers can finally focus on the actual jobs they are hired for, and work teams can access the data they need in their native tools. The democratization of data in the repository creates a single source of truth across each different operating system or SaaS application.
Every data integration solution should be declarative to ensure there is alignment across teams and everyone is working towards the same goals.