6 Comments

Substantive walkthrough, thanks for sharing your notes! Competitive dynamics here remind me of Alex Rampell's "distribution vs innovation". Most notably with emerging point solutions like governance/observability. Data warehouse players (Snowflake/BigQuery etc) have a natural advantage at being in the "starting point" of many organization's data platform. So to the extent that they can bundle/acquire these features faster than new players can get distribution, they could be strongly positioned as the long-term e2e winner. Time will tell.

Expand full comment

This is a great read. I’ve given the lateral movement of tools through personas in the Enterprise data landscape some thought but couldn’t have articulated it so well.

Thanks for sharing.

Expand full comment

This is one of the best articles I’ve ever read in this space. Knowledgeable. Exemplary.

Expand full comment

Hi Leigh. I found this article fascinating and aligned in so many ways with our view here at Syncari. Have you taken a look at what we're doing with our complete data management platform?

Expand full comment

Leigh, lately we have seen an explosion of articles about data infrastructure coming from VCs. I like yours a lot. You address "jobs to be done" for each persona and that's very compatible with our thinking in Keboola.

Couple of notes

- I like a lot Simon Wardley. His Wardley mapping and his work on value chains, stages of evolution and different practices at each stage. https://medium.com/wardleymaps

His work explains very well why we have so many tools today and what's gonna happen next (and your post is probably the start of it) and how is the market gonna move in

- Several years ago Looker CEO was talking about three epochs of data infrastructure 1. old vertical systems like Oracle 2. cloud based "millions of tools and different tool for everything" and the natural evolution to 3. a platform that allows users to actually use tools they want, but operate them automatically for them as well as governance, security, etc.

We at Keboola subscribe to this view a lot. When we started several years ago with Keboola platform we based it on our pains and experiences as an agency that was doing data engineering, infrastructure setup and cloud migrations and analytics for our clients. We wanted to go first totally DevOps, API first, microservice (and that's all good), but quickly realized that it's not gonna cut it. It can be a great approach if we just wanna geek out, but it's not gonna satisfy all personas involved in the whole "data journey".

It started to be apparent with DataOps manifesto that there is more personas in data then in dev and they have different skills and needs and also wants (we sometimes ten to forget about those)

What we setup as our guiding principles:

- we should not be trying to make everything ourselves - need to setup up a platform that others can build upon (now have 1400 apps from 400 developers and 75% of connectors and "reverse ETL writers are build by our community)

- our users shouldn't wait for us to ship something in 2 years, they should be able to run most of the things themselves and develop new components - this has opened up use cases so much and allowed our users to guide how THEY want to use data. Like Jeff Miller ex VP of McDonalds who started with the whole "data->insights->actions" 3 years ago and brought us onto the way of automations and "reverse ETL"

- users (humans and software) should be abstracted from the actual infrastructure and shouldn't "worry" about it - it's a job of a platform to provide it, run it, scale it, do exceptions, etc. not users

- for every job there needs to be a complete audit trail and each job needs to be immutable !

- platform should take care of data warehousing (lake->structured), management and lineage

- we shouldn't force how people are gonna work, only to support them and make UX great for them

These principles have served us well and allowed our users and our platform to grow form ETL/ELT to data management to transformations and collaboration; adding DS capabilities first with "just R studio" then Py and Julia; to "reverse use cases"; and automation of processes.

When we started to work for large clients like Ceska Erste bank and they came up with security, governance and observability requests we just connected them to our metadata API and voila...it was large clients who brought us to understanding that we need to provide them with "activity center" that helps them understand:

- observability of pipelines and user actions

- automatically observe security incidents and push the data to their systems

- have one click $ attribution per use case/person/department

Today they seem obvious to us and easy to do, because we've thought about the architecture from the beginning. But we could NEVER predict the use cases our users would bring to us.

At the beginning of this year we've opened up Keboola to general public www.keboola.com and gave everyone 300 minutes of free time each month. Now we have several thousand of companies using our platform and the joy is to see companies like RohlikGroup that start as small startups on free program and grow to unicorn size within couple of years and large companies like Erste or Innogy using the same platform.

Sorry, about longer comment. I will stop here. Your topic is great and I could talk about the different aspect of data pipeline and usage for hours, but that's probably for a separate blog post :))))) Cheers, Pavel Dolezal :)

Expand full comment

Thank you for this deep analysis. It proves my thesis behind building Nexla - the need for a converged platform to enable an unprecedented scale of data operations. It needs a new architectural approach at the base which is why the many tools you list that are pushing incremental innovation will never get there.

Expand full comment