▲Launch HN: Baselit (YC W23) – Automatically Reduce Snowflake Costs

48 points by sahil_singla 12 days ago | 30 comments

ukd1 11 days ago [-]

How does this differ from https://espresso.ai ?

karamazov 11 days ago [-]

Chiming in, I'm one of the founders of Espresso AI - we do both query optimization and warehouse optimization, both of which are hands-off. In particular we're beta-testing a fully-automated solution for query optimization (it's taken a lot of engineering!).

Based on the responses here I think we're a superset of where baselit is today, but I could be wrong.

sahil_singla 11 days ago [-]

Would love to see how you’re doing warehouse optimization. Is there a demo video I can look at?

sahil_singla 11 days ago [-]

They are more focused on query optimizations whereas we do warehouse optimizations. We are inclined towards warehouse optimizations due to it being completely hands-off.

ukd1 11 days ago [-]

cool - they're kinda complimentary?

altdataseller 11 days ago [-]

Espresso does warehouse too so they’re competitors

sahil_singla 11 days ago [-]

Yeah, kind of.

Though I'm not exactly sure how their product works, I saw from the landing page that it's broadly focused on query optimization.

We've done a lot of experimentation with query optimizations, both with and without LLMs, and we don't think it's possible to build a fully automated solution. However, a workflow solution might be feasible.

mustansirm 11 days ago [-]

Not a Snowflake user, but I'm curious as to your business model. What barriers are there to prevent Snowflake from reverse engineering your work and including it as part of their native experience? Is the play here an eventual aquisition?

jaggederest 11 days ago [-]

It has been my experience working on similar projects for cutting down e.g. aws spend that the primary billers often have a really hard time accepting or incorporating bill-reducing features. All the incentives they have are geared to want increased spend, regardless of the individual preferences of any members of the company, and so that inertia is really hard to overcome.

sahil_singla 11 days ago [-]

That resonates with what we have heard from our customers.

sahil_singla 11 days ago [-]

Our belief is that building a good optimization tool is not aligned with Snowflake's interests. Instead they seem to be more focused on enabling new use cases and workloads for their customers (their AI push, for example, with Cortex). On the other hand, helping Snowflake users cut down costs is our singular focus.

fock 11 days ago [-]

or to phrase it differently: what kind of market is this, where big companies are herded into tarpits of SaaS which apparently have exactly the same problems as running it the old way had (namely inefficient usage of ressource). Just now you have to pay some symbiotic start-up instead of hiring some generic performance-person.

bluelightning2k 11 days ago [-]

It's not really in their interests?

candiddevmike 12 days ago [-]

What happened to your other idea?

mritchie712 11 days ago [-]

not OP, but for us, LLM's just aren't good enough yet to write analytical SQL queries (and they may never be good enough using pure SQL). Some more context here: https://news.ycombinator.com/item?id=40300171

sahil_singla 11 days ago [-]

+1. We came to a similar conclusion when we were working on this idea.

datadrivenangel 12 days ago [-]

Productizing cost optimization experience! Great to see more options in this space, as so many companies are surprised by the costs of cloud.

For the warehouse size experimentation, how do you value processing time?

sahil_singla 11 days ago [-]

We optimize warehouse sizes for a dbt project as a whole. Users can set a maximum project runtime as one of the parameters for experimentation. The optimization honors this max runtime while tuning warehouse sizes for individual models.

michaelmior 11 days ago [-]

How does this differ from Keebo?

https://keebo.ai/

sahil_singla 11 days ago [-]

We are different from Keebo in the way we approach warehouse optimization. Keebo seems to dynamically change the size of a warehouse - we have found that to be somewhat risky, especially when it's downsizing. Performance can take a big hit in this case. So we've approached this problem in two ways:

1. Route queries to the right-sized warehouse instead of changing the size of a particular warehouse itself. This is part of our dbt optimizer module. This ensures that performance stays within acceptable limits while optimizing for costs.

2. Baselit's Autoscaler optimally manages the scaling out of a multi-cluster warehouse depending on the load, which is more cost effective than upsizing the warehouse.

gregw2 11 days ago [-]

Does anyone support this sort of optimization for AWS Redshift?

I built some lambdas that looked at Queuelength and turned off Redshift Concurrency Scaling for WLM queues to mitigate costs for less critical afternoon workloads but it was always cruder than I wanted.

iknownthing 12 days ago [-]

Does it use AI?

sahil_singla 12 days ago [-]

No AI yet - all algorithms are deterministic under the hood. Although we are considering tinkering with LLMs for query optimization, as part of our roadmap.

mritchie712 11 days ago [-]

We (https://www.definite.app/) were also working on AI for SQL generation. I can see why you pivoted, it doesn't really work! Or at least well enough to displace existing BI solutions.

edit: context below is mostly irrelevant to snowflake cost optimization, but relevant if you're interested in the AI for SQL idea...

I'm pretty hard headed though, so we kept going with it and the solution we've found is to run the entire data stack for our customers. We do ETL, spin up a warehouse (duckdb), a semantic layer (cube.dev) and BI (dashboards / reports).

Since we run the ETL, we know exactly what all the data means (e.g. we know what each column coming from Stripe really means). All this metadata flows into our semantic layer.

LLM's aren't great at writing SQL, but they're really good at writing semantic layer queries. This is for a couple reasons:

1. better defined problem space (you're not feeding the LLM irrelevant context from a sea of tables)

2. the query format is JSON, so we can better control the LLM's output

3. the context is richer (e.g. instead of table and column names, we can provide rich, structured metadata)

This also solves the Snowflake cost issue from a different angle... we don't use it. DuckDB has the performance of Snowflake for a fraction of the cost. It may not scale as well, but 99% of companies don't need the sort of scale Snowflake pitches.

brunoa-ca 6 days ago [-]

It may be worth having a look at Raia (https://raia.live)

I'm actively developing this product.

One of the things I added is something called "Assistant Profiles". Given the fact that you know the DB structure, you can create a custom Assistant Profile and adjust it to fit the underlying DB better, which improves the results quite a lot

You can then expand the connection to other external systems and automate a lot of the analysis processes your users may have

I'm happy to work with you to make it work for your use case

iknownthing 11 days ago [-]

Kind of surprised to hear that given the number of companies I've seen pitching natural language to SQL queries.

ericzakariasson 10 days ago [-]

was also doing that last year with outfinder.co, but like the others said, it's really hard

redwood 11 days ago [-]

Can you clarify what you mean by "they're really good at writing semantic layer queries"?

Re JSON query format: you mean that's what you're using?

mritchie712 11 days ago [-]

Yes, the queries for the semantic layer we're using are in JSON. Here's an example query:

    {
      "measures": ["stories.count"],
      "dimensions": ["stories.category"],
      "filters": [
        {
          "member": "stories.isDraft",
          "operator": "equals",
          "values": ["No"]
        }
      ],
      "timeDimensions": [
        {
          "dimension": "stories.time",
          "dateRange": ["2015-01-01", "2015-12-31"],
          "granularity": "month"
        }
      ],
      "limit": 100,
      "offset": 50,
      "order": {
        "stories.time": "asc",
        "stories.count": "desc"
      },
      "timezone": "America/Los_Angeles"
    }

kwillets 11 days ago [-]

I was thinking about an AI to feed you the proper Snowflake sales pitch each time a query runs expensive or fails a benchmark. At my previous org it could replace several headcount.

Loading comments...

ukd1 11 days ago [-]

How does this differ from https://espresso.ai ?

karamazov 11 days ago [-]

Based on the responses here I think we're a superset of where baselit is today, but I could be wrong.

sahil_singla 11 days ago [-]

Would love to see how you’re doing warehouse optimization. Is there a demo video I can look at?

sahil_singla 11 days ago [-]

They are more focused on query optimizations whereas we do warehouse optimizations. We are inclined towards warehouse optimizations due to it being completely hands-off.

ukd1 11 days ago [-]

cool - they're kinda complimentary?

altdataseller 11 days ago [-]

Espresso does warehouse too so they’re competitors

sahil_singla 11 days ago [-]

Yeah, kind of.

Though I'm not exactly sure how their product works, I saw from the landing page that it's broadly focused on query optimization.

mustansirm 11 days ago [-]

jaggederest 11 days ago [-]

sahil_singla 11 days ago [-]

That resonates with what we have heard from our customers.

sahil_singla 11 days ago [-]

fock 11 days ago [-]

bluelightning2k 11 days ago [-]

It's not really in their interests?

candiddevmike 12 days ago [-]

What happened to your other idea?

mritchie712 11 days ago [-]

sahil_singla 11 days ago [-]

+1. We came to a similar conclusion when we were working on this idea.

datadrivenangel 12 days ago [-]

Productizing cost optimization experience! Great to see more options in this space, as so many companies are surprised by the costs of cloud.

For the warehouse size experimentation, how do you value processing time?

sahil_singla 11 days ago [-]

michaelmior 11 days ago [-]

How does this differ from Keebo?

https://keebo.ai/

sahil_singla 11 days ago [-]

2. Baselit's Autoscaler optimally manages the scaling out of a multi-cluster warehouse depending on the load, which is more cost effective than upsizing the warehouse.

gregw2 11 days ago [-]

Does anyone support this sort of optimization for AWS Redshift?

iknownthing 12 days ago [-]

Does it use AI?

sahil_singla 12 days ago [-]

No AI yet - all algorithms are deterministic under the hood. Although we are considering tinkering with LLMs for query optimization, as part of our roadmap.

mritchie712 11 days ago [-]

We (https://www.definite.app/) were also working on AI for SQL generation. I can see why you pivoted, it doesn't really work! Or at least well enough to displace existing BI solutions.

edit: context below is mostly irrelevant to snowflake cost optimization, but relevant if you're interested in the AI for SQL idea...

Since we run the ETL, we know exactly what all the data means (e.g. we know what each column coming from Stripe really means). All this metadata flows into our semantic layer.

LLM's aren't great at writing SQL, but they're really good at writing semantic layer queries. This is for a couple reasons:

1. better defined problem space (you're not feeding the LLM irrelevant context from a sea of tables)

2. the query format is JSON, so we can better control the LLM's output

3. the context is richer (e.g. instead of table and column names, we can provide rich, structured metadata)

brunoa-ca 6 days ago [-]

It may be worth having a look at Raia (https://raia.live)

I'm actively developing this product.

You can then expand the connection to other external systems and automate a lot of the analysis processes your users may have

I'm happy to work with you to make it work for your use case

iknownthing 11 days ago [-]

Kind of surprised to hear that given the number of companies I've seen pitching natural language to SQL queries.

ericzakariasson 10 days ago [-]

was also doing that last year with outfinder.co, but like the others said, it's really hard

redwood 11 days ago [-]

Can you clarify what you mean by "they're really good at writing semantic layer queries"?

Re JSON query format: you mean that's what you're using?

mritchie712 11 days ago [-]

Yes, the queries for the semantic layer we're using are in JSON. Here's an example query:

    {
      "measures": ["stories.count"],
      "dimensions": ["stories.category"],
      "filters": [
        {
          "member": "stories.isDraft",
          "operator": "equals",
          "values": ["No"]
        }
      ],
      "timeDimensions": [
        {
          "dimension": "stories.time",
          "dateRange": ["2015-01-01", "2015-12-31"],
          "granularity": "month"
        }
      ],
      "limit": 100,
      "offset": 50,
      "order": {
        "stories.time": "asc",
        "stories.count": "desc"
      },
      "timezone": "America/Los_Angeles"
    }

kwillets 11 days ago [-]

I was thinking about an AI to feed you the proper Snowflake sales pitch each time a query runs expensive or fails a benchmark. At my previous org it could replace several headcount.