Connecting people with their data
The soft side of Tably tech
User Friendly Setup Flows
If you want people to use your tool for data work, they will need to connect your tool to their data. You can build the most beautiful and performant tool in the world, but if people can’t connect it to their data, they will not use it.
There are two extremes when it comes to connecting data tools together. On one extreme there is
dbt, which takes the “configuration as code” approach, which targets the data-engineer persona.
On the other extreme, you have Fivetran, which provides graphical setup flows for people to connect their tools together, and holds their hand throughout the process.
Below is a screenshot of Fivetran’s Amazon RDS PostgreSQL setup flow. It is mostly the same as their stand-alone PostgreSQL flow, but it includes screenshots of each step that needs to be taken in AWS UI. This is the standard that we should be aiming to achieve for our setup flows.
Why not just use Fivetran?
At Tably, we want to create an experience that is as reassuring as Fivetran’s import flows. We want to be able to connect as many people to their data as possible, as seamlessly as possible, and with as few surprises and as little technical jargon as possible.
We don’t just want to provide lowest-common-denominator connection to all data sources though. We want to serve the long-tail of people’s data needs, but there will be some sources of data that deserve special attention. When we build the most beautiful and performant tool in the world, we want to have beautiful and performant ways to connect people to their data.
This means we need to be able to develop even higher quality connectors than Fivetran. We are best placed to do this if we control setup flows that people use to connect to their data. This will give us the freedom to switch out the implementation with our own, even after things have been set up.
Serving the long tail
If we’re not using Fivetran, what are we using?
For some services that are critical to people’s success with our platform, we have our own in-house connectors.
To serve the long tail of disparate places where people’s data lives, we are standing on the shoulders of giants, and using Airbyte. Airbyte defines a unified API for connecting to various data sources, and connectors implemented for over a hundred different services. These are implemented as Docker containers, which makes them easy to sandbox. They are also open source, which makes them easily customizable.
What should our setup flows do?
Each Airbyte connector provides us with a standardized description of the values required for setting things up. This takes the form of a JSONSchema document.
It is possible to go directly from JSONSchema to a web form, using something like the
react-jsonschema-form library. Below is a screenshot of the Airbyte PostgreSQL schema, as rendered by the react-jsonschema-form playground. It is a good first attempt, but it doesn’t understand Airbyte specific annotations like
airbyte_secret. It also doesn’t handle the
tunnel_method field properly, because different Airbyte connectors use
oneOf in multiple different ways, and the way that
tunnel_method is encoded doesn’t agree with this
react-jsonschema-form. Airbyte also embeds HTML inside some if its
description fields, but this is not shown here.
The main problem with this form though is that it isn’t helpful. It doesn’t contain any of the richer setup instructions that Fivetran’s example does. You have to link out to the Airbyte documentation for that.
It also asks for many pieces of information that are not needed for connecting to PostgreSQL, in 80% of cases. Can we do more work behind the scenes to help out? Can we work out which fields cover 80% of people’s needs, and only ask for these things before opportunistically trying to connect to people’s data? This would let us only show the advanced options to people who need to provide them.
How do we make a flow that is even more helpful than Fivetran’s RDS flow?
Another thing to call out here is that we will be doing this for over a hundred different services, and we will probably want to give helpful hints for specific situations (like how Fivetran has done for PostgreSQL). We probably don’t want to be editing “HTML inside JSONSchema” for over a hundred forms, and we don’t want to inflict this task on technical writers either. We also probably don’t need the full expressiveness that JSONSchema gives us. We want our technical writers to be editing something that more closely maps to what people will see on tably.com.
Following the rule of least power, if we can restrict our descriptions to a format that more closely models our domain, we will be able to provide a better experience for our technical writers. Our frontend code will also be working with more constrained types, with fewer runtime edge-cases.
What we want is a Domain Specific Language.
Building our Domain Specific Language
We started by defining a set of types that closely match the forms we want to produce. We soon had a test suite that could translate the JSONSchema from each of our Airbyte connectors into our intermediate
EditableForm representation. Once we had that, we also made each
EditableForm generate configurations that it thought was valid, and checked them back against the JSONSchema. This gave us confidence that our types were as simple as they could be, while covering all of the cases that we were seeing in Airbyte.
In parallel with this, we worked on building out the UI components for this representation. Because the types mapped closely to our domain, we could easily maintain a 1:1 relationship between domain types and frontend components.
For our text representation, we decided to use Markdown-within-YAML as our DSL text format. It’s not perfect, but was easy to hack together quickly, thanks to
serde, and we know that we can switch to another format easily (by parsing our .yaml files into EditableForm type, and re-serializing them to the new format). Even if we had stuck with JSONSchema, Markdown-within-YAML is much nicer to edit than HTML-within-JSON, so this is already a win. The tooling for YAML is also pretty good. By using
schemars to describe our types, we can get a lot of help from our text editor. We also made a tool that live-renders the resulting form for you, as you make edits to the text. After all of this work, the developer experience for someone who wants to improve our setup flows is pretty good.
Where are we now?
We now have the machinery in place to help people connect to their data, from over a hundred different places, and we can genuinely assist them as they do so. We have Fivetran to thank for setting the bar high here, and we have Airbyte to thank for helping us reach it.
We also remain in control of the journey, so we can continue to raise the bar.
The work of writing over a hundred setup guides can now start, and we have taken care to genuinely assist the people doing this work.
If you would like to try out what we’ve built, please sign up for early access.