Lessons from a long year of consulting in data and AI (2024 edition)

A short report from experiences in management boards, workshops and the trenches

Dec 19, 2024

Several weeks ago, I saw the field notes by

, in which he shared his experience in data this year. I decided now to do the same.

I formed these experiences in many projects, workshops, interviews, conferences, and formal/informal conversations with people from all jobs and organizations of different maturities. It’s not a representative sample, but there are valuable patterns. Since this post can seem quite negative (it’s just how I think, credit to Nassim Taleb), I will use the sandwich method: the bad things are sandwiched between the good.

I spent quite some time on trains this year; this gave me time to think and reflect.

The good: tools are nice

When I started in the early 2010s, we were still working with Hadoop, and deploying a model required much custom work. Now we have nice things: mlflow, Databricks, AWS Sagemaker. Documentation is (mostly) great, and it has never been easier to do “full-stack data science,” so much so that many have started to question whether this field, at least in traditional businesses, has turned into software engineering.

The bad: we keep reinventing the wheel

Working in data for people with over 10 years of experience feels like Groundhog Day. We have seen the same topics over and over again, regurgitated with a shiny new name. I was fortunate to be involved with many software projects beyond data as a CTO and saw many patterns we could borrow. In the world of data, we are trying so hard to reinvent the wheel instead of taking what works and adjusting it for our context.

If I could sum up: 2024 was the year in data and AI when we reinvented the wheel.

Worse: We keep ignoring the fundamentals

The most important thing is having good-quality data, but we do everything possible to avoid tackling this issue. Building a modern data platform or having a super-easy-to-use sandbox environment for your data scientists will not move the needle enough. People with domain knowledge are also still too far away from the implementation. The list goes on.

Even more bad: it’s harder to get valuable nuggets of value from data and AI content

We all try to be on the edge of what is going on and what the new frameworks and tools are. Social media used to have higher-quality content, but in the last year, it has gone south. My go-to places to get useful, relevant, and honest information are Substack, O’Reilly, and the various newsletters that were active before this hype cycle.

Really bad: we still don’t explain what we exactly DO as data people

As a CTO, I was rarely asked to explain what my frontend/backend/BI teams were doing. But data and AI - it was a different story. We are still discussing how to measure the success of such projects and what value our expensive teams are providing. We have to do a better job at this.

But still, sparks of value: there are real, valuable, generative AI use cases

Now, let’s end on a positive note! This year, I saw so many real-world generative use cases with clear value being developed and deployed. My initial skepticism has melted away, but it has made me even more focused on getting the fundamentals right.

With that, I wish you and your loved ones a happy holiday season! Let’s learn the lessons of this exciting year and make 2025 even better!

-Boyan

Thanks for reading Thinking Data! This post is public so feel free to share it.

P.S. Finally, I have something very cool in store, that can perhaps help with many of these issues. See you next year!

Thinking Data