Those who know me know I obsess over figuring out the right architecture and tech stack for a project.
Even for small projects and prototypes, I like to put a little thought into it.
Most people pick the tools they know, the stuff that’s trending, or whatever is in popular tutorials.
At my firm, we have a culture of building prototypes to test ideas and then taking the most promising ones to production.
Serious dev time is expensive. We want to focus it on ideas that:
- Show real traction
- Are possible
- Can cut costs or increase revenue
I love building prototypes. But it’s frustrating to start over later because I picked the wrong tools.
At some places, you can skimp on choosing more complex tools, at other places, you can’t.
For example, if you need to add a user interface, you can use Streamlit or Gradio. They are easy to replace later. Building a React app takes too long and you don’t want to prove that you can build a frontend.
But changing libraries or tools in the core business logic is very hard.
That’s why I rarely use libraries like LangChain or LangGraph. I know I will have to throw them away later.
So here are some rules I try to follow when figuring out a tech stack for a project.
Rules:
- No more than 3 hours to get it running (-> low bar).
- Good documentation, community support and active development.
- (Extensible and Flexible) or (Narrow and Deep).
- Pick burning problems. (5. Runs the same on my machine as in production.)
No more than 3 hours to get it running
If I cannot set up a toy example in under 3 hours, there is no way I am going to build something serious with it.
Either the documentation is lacking or I lack the technical chops to do it.
So pick something else.
Over time, you will pick up on what works and what doesn’t.
Also you will get better, so you will be able to do it faster next time.
Good documentation, community support and active development
If the library has no documentation, or the maintainer does not respond to issues and feature requests on Github, move on.
It’s a matter of time until you find an edge case that you can’t solve and then you are screwed.
(Extensible and Flexible) or (Narrow and Deep)
There are two extremes that I am looking for:
Extensible and Flexible:
- Escape hatches for edge cases.
- Clear patterns for adding new features.
- Well-defined extension points.
- Modular architecture that allows swapping components.
DuckDB is the epitome of this. I have not found an analytical use case that I cannot solve with it.
And also everything around it.
Data ingestion from raw files to a database. ✓ Querying APIs. ✓ Migrating data between data lakes. ✓ Local analytics. ✓
Narrow and Deep:
- Does one thing and does it well.
- No unnecessary complexity.
- Battle-tested in its specific domain.
- Clear boundaries and responsibilities.
- Minimal dependencies.
The only two libraries I use for working with LLMs are Outlines and Instructor. Both do one thing and do it well: Structured outputs.
Pick burning problems
You want to adopt something that is a burning problem. That is overpriced, blocking scale, or a customer pain point.
Runs the same on my machine as in production
This is crucial for speeding up development and avoiding surprises:
- Container support out of the box
- Clear deployment documentation
- Minimal environment-specific configuration
- Reproducible builds
- Easy local development setup
DuckDB runs everywhere. It’s in process. Spark can’t.
Docker-Compose makes it easy to run complete multi-service applications on your machine.
I want to avoid the “it works on my machine” syndrome.
You can bend each rule, but only for good reasons (and you should document them).
Kubernetes is one of my favorite tools, but local development is challenging at best. So when I want to use it, I should have a good reason.
Bonus: How can you find new tech?
How are other people solving the same problem?
Don’t look at what people are talking about on Twitter, Reddit, or Hacker News.
Look at your peers.