▲PgAssistant: OSS tool to help devs understand and optimize PG performancegithub.com

253 points by justinclift 138 days ago | 54 comments

cpursley 138 days ago [-]

Neat! Just added it to my "Postgres Is Enough" gist:

https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...

bertrandhartwig 138 days ago [-]

Thanks !

ggregoire 138 days ago [-]

> OpenAI helper: If you have an OpenAI account, the interface can query OpenAI to help you understand your query plan and optimize your SQL queries

How good are LLMs at optimizing queries? Do they just give basic advices like "try adding an index there" or can they do more?

I guess it would need to cross the results of the explain analyze with at least the DDLs of the table (including partitions, indexes, triggers, etc), and like the sizes of the tables and the rate of reads/writes of the tables in some cases, to be able to "understand" and "reason" about what's going on and offer meaningful advices (e.g. proposing design changes for the tables).

I don't see how a generic LLM would be able to do that, but maybe another type of model trained for this task and with access to all the information it needs?

ellisv 138 days ago [-]

In my experience, LLMs are a lot better than my backend engineers who don't even try but not that much better than someone who's skimmed the docs.

Typically they suggest 1) adding indexes and 2) refactoring the query. If you only provide the query then the model isn't aware of what indexes already exist. LLMs make assumptions about your data model that often don't hold (i.e. 3NF). Sometimes you have to give the LLM the latest docs because it's unaware of performance improvements and features for newer versions.

In my view, it's better to look at the query plan and think yourself about how to improve the query but I also recognize that most people writing queries aren't going to do that.

There are other tools (RIP OtterTune) that can tune database parameters. Not something I really see any generative model helping with.

gopalv 138 days ago [-]

> LLMs are a lot better than my backend engineers who don't even try but not that much better than someone who's skimmed the docs.

LLMs are basically at the skill-set of "Googled something and tried it", which for a lot of basic things is mostly what everyone does.

If you can loop back the results of the trial & error back to the model, then it does do a pretty good simulation of someone feeling their way through a problem.

The loop is what makes the approach effective, but a model by itself cannot complete the process - it at least needs an agent which can try the recommendation (or a human who will pull the lever).

sgarland 137 days ago [-]

My main issue with the use of LLMs for problems you’re unfamiliar with is precisely this. Trial and error is a fine way to work through a problem; you learn well from failure. But if you’re copy/pasting from something else the entire time, you’re not the one failing, and you’re not the one reasoning about why something didn’t work.

People do not learn this way, full stop. Anyone who thinks they are should try to write down, on paper, their understanding of a subject they think they have learned.

swasheck 137 days ago [-]

yes. my experience is that an LLM will sometimes suggest an interesting approach that breaks my pattern of thought enough to get me to a better solution. but the ones i primarily use (claude sonnet and chatgpt 4o) also seem to love suggesting CTEs which help readability but it depends on the RDBMS (and other things) as to whether or not they offer any real performance benefit.

that’s a long way of saying that LLMs are a tool in this space but not yet a full solution, in my opinion and experience

sgarland 137 days ago [-]

> In my view, it's better to look at the query plan and think yourself about how to improve the query but I also recognize that most people writing queries aren't going to do that.

That would require people to a. Learn SQL / relational algebra, one of the easiest languages in the world b. Read docs.

ComputerGuru 138 days ago [-]

They’re non-deterministic and YMMV by design. No one can answer that question. It might save you a hundred million with some genius optimization it lucked into or, more likely, it’ll give you a seemingly equivalent query that actually returns subtly different results (easy to do with query optimizations when eg you rely on inherent/default sort order for joins with groups).

akx 138 days ago [-]

They can be useful. I just recently used Claude to make a query 10x faster, though I did have to back-and-forth a bit, so you do still need to know what you're doing.

parthdesai 138 days ago [-]

Claude has been really good at all things Postgres related

bertrandhartwig 138 days ago [-]

I tested numerous LLM models locally with Ollama using PgAssistant. The one that gave me the best results is codestral:latest (15B).However, ChatGPT is truly a cut above the rest.

VWWHFSfQ 138 days ago [-]

I've used them off and on for basic stuff like "tell me what this query does". They're usually pretty good at that. Sometimes it will make suggestions that are meaningful improvements, most of the time not. Unless you specifically ask about something like "would this be better done with a left join lateral/correlated subquery?" But you kinda have to already know what you're going for and then it can be helpful.

victorbjorklund 138 days ago [-]

LLM:s are really really good at creating queries. Shaving of the last ms from a complex query? yea, im sure an experienced query optimizer expert might beat it.

But you get very far from letting the LLM run a few queries to gather info about the database and its use.

joshstrange 138 days ago [-]

They won’t understand your schema and make the wrong assumptions _but_ as long as you are checking the results and can understand the final query they can be very helpful.

Obviously there are tricks to let them better understand your schema but even using all of those it’s going to make the wrong assumptions about some columns and how they are used.

bertrandhartwig 138 days ago [-]

When pgAssistant sends the prompt to the LLM, it provides the DDL of the tables involved in the query (using pg_dump), the result of the ANALYZE query, and of course, the query itself.

Believe me, LLMs understand the schema very well, and in the vast majority of cases, they suggest relevant indexes and/or query rewrites.

That being said, they are still LLMs and inherently non-deterministic.

But the goal of pgAssistant is not to fight against databases experts : it was build to help developpers before asking help to a DBA.

notatoad 138 days ago [-]

i've had great luck with it. i can't say i'm any sort of database expert, but it's a lot better at this than me.

i was especially impressed that it could not only suggest changes to the queries or indexes to add, but it suggested a query be split into multiple queries at the application level which worked incredibly well.

dewey 138 days ago [-]

The screenshot section in the README seems to be empty. Would've been interesting to see that. There's many tools that do similar things like https://github.com/ankane/pghero or some tools here: https://gitlab.com/postgres-ai

justinclift 138 days ago [-]

I think it's probably just a large file, as I remember thinking the same thing but it was loaded when I returned to the browser tab with the readme later on.

dewey 138 days ago [-]

Ah, you are right. I'm on a slow connection right now and it's a 22MB gif.

(https://github.com/nexsol-technologies/pgassistant/tree/main...)

ai-christianson 138 days ago [-]

> and it's a 22MB gif

Whoa... that's a lot of data for a README! But demos are pretty important, so I guess it's worth it.

dewey 138 days ago [-]

A mp4 would be a much better choice instead of a gif and probably ~1.5MB.

zimpenfish 138 days ago [-]

    ffmpeg -y -i pgassistant.gif -c:v libx265 -q 55 -tag:v hvc1 -movflags '+faststart' -pix_fmt yuv420p pgassistant.mp4

That takes the GIF down to 1338973 bytes (1.3M) with (to my eyes) little loss of readability.

mjevans 138 days ago [-]

  __264516 Feb 12 11:44  pgassistant.gif
  22782965 Feb 12 11:46  pgassistant.gif.raw.gif
  _2120322 Feb 12 11:55  pgassistant.gif.av1-20.mp4
  __245780 Feb 12 11:56  pgassistant.gif.av1-55.mp4

wget -O pgassistant.gif.raw.gif 'https://github.com/nexsol-technologies/pgassistant/blob/main...'

ffmpeg -h encoder=libaom-av1

ffmpeg -i pgassistant.gif.raw.gif -c:v libaom-av1 -crf 20 -cpu-used 8 -movflags '+faststart' -pix_fmt yuv420p pgassistant.gif.av1-20.mp4

ffmpeg -i pgassistant.gif.raw.gif -c:v libaom-av1 -crf 55 -cpu-used 8 -movflags '+faststart' -pix_fmt yuv420p pgassistant.gif.av1-55.mp4

A place to start from at least, note the 264516 gif is what's currently on the landing page, with the wget command to grab the raw file.

kirici 138 days ago [-]

Is there any benefit in using libaom-av1 over libsvtav1 nowadays?

mjevans 138 days ago [-]

Licensing, patents (as far as I've heard anyway).

NEARLY everything can use AV1 and you don't need your clients to install a licensed codec if their OS didn't happen to include one. https://caniuse.com/av1 Far more than https://caniuse.com/hevc

kirici 133 days ago [-]

libaom-av1 and libsvtav1 are both AV1, the former being the reference implementation and the latter simply far faster

ai-christianson 138 days ago [-]

Submit a PR?

zimpenfish 138 days ago [-]

I'm trying but it turns out that it's impossible to submit a useful PR with a video embed in `README.md` because the only way it works is by uploading the video through the web editor[0] and using a static user-images.githubusercontent.com URL.

[0] Which puts it under your account, obvs., and is therefore not that helpful for a PR.

ai-christianson 138 days ago [-]

Maybe just open an issue and provide the ffmpeg command then

zimpenfish 138 days ago [-]

Yeah, did that in the end.

ai-christianson 138 days ago [-]

Good point.

znpy 138 days ago [-]

honest question: does anybody know a good book/tutorial/source/whatever on becoming a postgresql DBA ?

most of the material i see is written for people that want to write applications that work with postgresql, not on how to proficiently manage postgresql itself.

oa335 138 days ago [-]

2nd Quadrant (bought by Enterprise DB several years ago) had a bunch of excellent YouTube tutorials on common tasks. Some of them may still be available on YouTube.

EDB has a fairly comprehensive set of self-paced trainings for sale. I went through them and thought they were really good.

Postgres documentation is excellent though, and though the docs are long, reading through it carefully should give you almost all the information you need for most database tasks.

https://www.postgresql.org/docs/current/admin.html

jpitz 138 days ago [-]

Greg Smith wrote a book about postgres high performance that does go into the internals a bit and how to analyze performance problems. If you want to be a DBA, this will probably wind up needing to be in your back pocket at some point.

edoceo 138 days ago [-]

The PostgreSQL docs are great and then watch the presentations from their conferences - very cool stuff.

Then practice, docker makes this easy.

lacoolj 138 days ago [-]

Most of what I've learned has come from PG docs and various blogs on "we found this crazy thing costing us 100x memory/processing/response time/etc"

But there is a great book on the internals (very long, in depth, but not unreadable if you have a good foundation already) - https://edu.postgrespro.com/postgresql_internals-14_en.pdf

They probably have newer versions, this is just what's in my bookmarks

andatki 138 days ago [-]

What sorts of skills are you looking to build? Since this post has a lot of comments about query optimization, I’d suggest query optimization is more in the realm of developers and not DBAs. Schema, query, and index design. DBAs might be more focused on replication/HA, security, fleet management, backups, DR.

Happy to help with more targeted recommendations!

znpy 137 days ago [-]

> What sorts of skills are you looking to build?

I work as a system engineer / devops engineer / infrastructure engineer. I often need to keep postgresql up and running, ideally in a smooth and performant way.

I don't like tool-specific tutorials or storytelling-based articles.

I'd like to learn the core postgresql things (terminology, tasks, operations) so that then i can evaluate autonomously what tool to pick and why, rather than following the trend of the month. And of course, being able to actually do "the things".

Ideally, the next progression of this would be being able to understand postgresql performance and how to tune it (again, in an holistic manner).

Per my understanding, this is the kind of skills that a DBA should have. Some of it overlaps with the skills of a decent developer (i understand that).

justinclift 137 days ago [-]

This is a bit older now, but seems like it covers the timeless basics:

https://postgrespro.com/community/courses/DBA1

znpy 137 days ago [-]

this looks like the kind of material i'm looking for.

I'll take a closer look to the material, which is already good.

It's sad that there does not seem to be anything past "DBA1" (that is, no DBA2 or DBA3).

justinclift 137 days ago [-]

All of their materials seem to be available on their file share website here:

https://edu.postgrespro.com

There's some stuff for internals (mentioned in a sibling comment), as well as what seems like stuff for PG15 though I haven't looked at it.

Olshansky 138 days ago [-]

Added this to the performance tuning section: https://github.com/Olshansk/postgres_for_everything?tab=read...

OJFord 137 days ago [-]

I'm surprised that both this list and the linked 'very comprehensive list' are missing pgbadger.

booleanbetrayal 138 days ago [-]

This reminds me of a less feature-rich version of PgAnalyze, but it's always nice to have alternative OSS options for this sort of optimization work.

scosman 138 days ago [-]

I’ve wanted to make this for years. I have a doc of saved pg_stat_statement and other queries saved.

sgarland 137 days ago [-]

Your screenshot for issues found show indices being created with CREATE INDEX. If someone runs that on a prod DB with a large table, they’re gonna have a bad time. You should have it suggest CREATE INDEX CONCURRENTLY.

Also, if I’m being honest, it’s a little off-putting to see a tool giving advice on Postgres and not differentiating between those two.

j45 137 days ago [-]

Always lots to learn, appreciate the insight.

It’s interesting that Postgres doesn’t seem to default some of these settings since they are clearly advisable as most cases.

sgarland 137 days ago [-]

I’ve long held that Postgres is more powerful than MySQL in terms of flexibility and options, but you really ought to have thoroughly read and understood the manual to run it in prod. In comparison, MySQL does a fantastic job out of the box (modulo buffer pool sizing and sync_array_size, and tbf Postgres is equally bad at the former, and the latter doesn’t matter until larger scale), but is somewhat limited in what you can do with it (emphasis on somewhat – it’s still quite adequate for nearly anyone).

That’s why I’m forever salty that all of the tech influencers and bloggers hyped up Postgres so much. Yes, it’s a great RDBMS, but the frontend devs eagerly jumping into it haven’t the slightest clue what they’re getting themselves into.

bertrandhartwig 137 days ago [-]

Thank you for this advice but the tool is designed to help dev ... before to run on production.

sgarland 137 days ago [-]

And you don’t think someone would copy/paste what they did on dev or stage onto prod?

bertrandhartwig 137 days ago [-]

well my dev teams are using Flyway. So, I may not be reasoning correctly. You’re right, a copy-paste can happen quickly. I will modify the code according to your remark.

cpachmann 138 days ago [-]

[dead]

libsofhn 137 days ago [-]

[flagged]

justinclift 137 days ago [-]

In what way?

cwillu 137 days ago [-]

Troll account is trolling.

138 days ago [-]

Loading comments...

cpursley 138 days ago [-]

Neat! Just added it to my "Postgres Is Enough" gist:

https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...

bertrandhartwig 138 days ago [-]

Thanks !

ggregoire 138 days ago [-]

> OpenAI helper: If you have an OpenAI account, the interface can query OpenAI to help you understand your query plan and optimize your SQL queries

How good are LLMs at optimizing queries? Do they just give basic advices like "try adding an index there" or can they do more?

I don't see how a generic LLM would be able to do that, but maybe another type of model trained for this task and with access to all the information it needs?

ellisv 138 days ago [-]

In my experience, LLMs are a lot better than my backend engineers who don't even try but not that much better than someone who's skimmed the docs.

In my view, it's better to look at the query plan and think yourself about how to improve the query but I also recognize that most people writing queries aren't going to do that.

There are other tools (RIP OtterTune) that can tune database parameters. Not something I really see any generative model helping with.

gopalv 138 days ago [-]

> LLMs are a lot better than my backend engineers who don't even try but not that much better than someone who's skimmed the docs.

LLMs are basically at the skill-set of "Googled something and tried it", which for a lot of basic things is mostly what everyone does.

If you can loop back the results of the trial & error back to the model, then it does do a pretty good simulation of someone feeling their way through a problem.

The loop is what makes the approach effective, but a model by itself cannot complete the process - it at least needs an agent which can try the recommendation (or a human who will pull the lever).

sgarland 137 days ago [-]

People do not learn this way, full stop. Anyone who thinks they are should try to write down, on paper, their understanding of a subject they think they have learned.

swasheck 137 days ago [-]

that’s a long way of saying that LLMs are a tool in this space but not yet a full solution, in my opinion and experience

sgarland 137 days ago [-]

> In my view, it's better to look at the query plan and think yourself about how to improve the query but I also recognize that most people writing queries aren't going to do that.

That would require people to a. Learn SQL / relational algebra, one of the easiest languages in the world b. Read docs.

ComputerGuru 138 days ago [-]

akx 138 days ago [-]

They can be useful. I just recently used Claude to make a query 10x faster, though I did have to back-and-forth a bit, so you do still need to know what you're doing.

parthdesai 138 days ago [-]

Claude has been really good at all things Postgres related

bertrandhartwig 138 days ago [-]

I tested numerous LLM models locally with Ollama using PgAssistant. The one that gave me the best results is codestral:latest (15B).However, ChatGPT is truly a cut above the rest.

VWWHFSfQ 138 days ago [-]

victorbjorklund 138 days ago [-]

LLM:s are really really good at creating queries. Shaving of the last ms from a complex query? yea, im sure an experienced query optimizer expert might beat it.

But you get very far from letting the LLM run a few queries to gather info about the database and its use.

joshstrange 138 days ago [-]

They won’t understand your schema and make the wrong assumptions _but_ as long as you are checking the results and can understand the final query they can be very helpful.

Obviously there are tricks to let them better understand your schema but even using all of those it’s going to make the wrong assumptions about some columns and how they are used.

bertrandhartwig 138 days ago [-]

When pgAssistant sends the prompt to the LLM, it provides the DDL of the tables involved in the query (using pg_dump), the result of the ANALYZE query, and of course, the query itself.

Believe me, LLMs understand the schema very well, and in the vast majority of cases, they suggest relevant indexes and/or query rewrites.

That being said, they are still LLMs and inherently non-deterministic.

But the goal of pgAssistant is not to fight against databases experts : it was build to help developpers before asking help to a DBA.

notatoad 138 days ago [-]

i've had great luck with it. i can't say i'm any sort of database expert, but it's a lot better at this than me.

dewey 138 days ago [-]

justinclift 138 days ago [-]

I think it's probably just a large file, as I remember thinking the same thing but it was loaded when I returned to the browser tab with the readme later on.

dewey 138 days ago [-]

Ah, you are right. I'm on a slow connection right now and it's a 22MB gif.

(https://github.com/nexsol-technologies/pgassistant/tree/main...)

ai-christianson 138 days ago [-]

> and it's a 22MB gif

Whoa... that's a lot of data for a README! But demos are pretty important, so I guess it's worth it.

dewey 138 days ago [-]

A mp4 would be a much better choice instead of a gif and probably ~1.5MB.

zimpenfish 138 days ago [-]

    ffmpeg -y -i pgassistant.gif -c:v libx265 -q 55 -tag:v hvc1 -movflags '+faststart' -pix_fmt yuv420p pgassistant.mp4

That takes the GIF down to 1338973 bytes (1.3M) with (to my eyes) little loss of readability.

mjevans 138 days ago [-]

  __264516 Feb 12 11:44  pgassistant.gif
  22782965 Feb 12 11:46  pgassistant.gif.raw.gif
  _2120322 Feb 12 11:55  pgassistant.gif.av1-20.mp4
  __245780 Feb 12 11:56  pgassistant.gif.av1-55.mp4

wget -O pgassistant.gif.raw.gif 'https://github.com/nexsol-technologies/pgassistant/blob/main...'

ffmpeg -h encoder=libaom-av1

ffmpeg -i pgassistant.gif.raw.gif -c:v libaom-av1 -crf 20 -cpu-used 8 -movflags '+faststart' -pix_fmt yuv420p pgassistant.gif.av1-20.mp4

ffmpeg -i pgassistant.gif.raw.gif -c:v libaom-av1 -crf 55 -cpu-used 8 -movflags '+faststart' -pix_fmt yuv420p pgassistant.gif.av1-55.mp4

A place to start from at least, note the 264516 gif is what's currently on the landing page, with the wget command to grab the raw file.

kirici 138 days ago [-]

Is there any benefit in using libaom-av1 over libsvtav1 nowadays?

mjevans 138 days ago [-]

Licensing, patents (as far as I've heard anyway).

NEARLY everything can use AV1 and you don't need your clients to install a licensed codec if their OS didn't happen to include one. https://caniuse.com/av1 Far more than https://caniuse.com/hevc

kirici 133 days ago [-]

libaom-av1 and libsvtav1 are both AV1, the former being the reference implementation and the latter simply far faster

ai-christianson 138 days ago [-]

Submit a PR?

zimpenfish 138 days ago [-]

[0] Which puts it under your account, obvs., and is therefore not that helpful for a PR.

ai-christianson 138 days ago [-]

Maybe just open an issue and provide the ffmpeg command then

zimpenfish 138 days ago [-]

Yeah, did that in the end.

ai-christianson 138 days ago [-]

Good point.

znpy 138 days ago [-]

honest question: does anybody know a good book/tutorial/source/whatever on becoming a postgresql DBA ?

most of the material i see is written for people that want to write applications that work with postgresql, not on how to proficiently manage postgresql itself.

oa335 138 days ago [-]

2nd Quadrant (bought by Enterprise DB several years ago) had a bunch of excellent YouTube tutorials on common tasks. Some of them may still be available on YouTube.

EDB has a fairly comprehensive set of self-paced trainings for sale. I went through them and thought they were really good.

Postgres documentation is excellent though, and though the docs are long, reading through it carefully should give you almost all the information you need for most database tasks.

https://www.postgresql.org/docs/current/admin.html

jpitz 138 days ago [-]

edoceo 138 days ago [-]

The PostgreSQL docs are great and then watch the presentations from their conferences - very cool stuff.

Then practice, docker makes this easy.

lacoolj 138 days ago [-]

Most of what I've learned has come from PG docs and various blogs on "we found this crazy thing costing us 100x memory/processing/response time/etc"

But there is a great book on the internals (very long, in depth, but not unreadable if you have a good foundation already) - https://edu.postgrespro.com/postgresql_internals-14_en.pdf

They probably have newer versions, this is just what's in my bookmarks

andatki 138 days ago [-]

Happy to help with more targeted recommendations!

znpy 137 days ago [-]

> What sorts of skills are you looking to build?

I work as a system engineer / devops engineer / infrastructure engineer. I often need to keep postgresql up and running, ideally in a smooth and performant way.

I don't like tool-specific tutorials or storytelling-based articles.

Ideally, the next progression of this would be being able to understand postgresql performance and how to tune it (again, in an holistic manner).

Per my understanding, this is the kind of skills that a DBA should have. Some of it overlaps with the skills of a decent developer (i understand that).

justinclift 137 days ago [-]

This is a bit older now, but seems like it covers the timeless basics:

https://postgrespro.com/community/courses/DBA1

znpy 137 days ago [-]

this looks like the kind of material i'm looking for.

I'll take a closer look to the material, which is already good.

It's sad that there does not seem to be anything past "DBA1" (that is, no DBA2 or DBA3).

justinclift 137 days ago [-]

All of their materials seem to be available on their file share website here:

https://edu.postgrespro.com

There's some stuff for internals (mentioned in a sibling comment), as well as what seems like stuff for PG15 though I haven't looked at it.

Olshansky 138 days ago [-]

Added this to the performance tuning section: https://github.com/Olshansk/postgres_for_everything?tab=read...

OJFord 137 days ago [-]

I'm surprised that both this list and the linked 'very comprehensive list' are missing pgbadger.

booleanbetrayal 138 days ago [-]

This reminds me of a less feature-rich version of PgAnalyze, but it's always nice to have alternative OSS options for this sort of optimization work.

scosman 138 days ago [-]

I’ve wanted to make this for years. I have a doc of saved pg_stat_statement and other queries saved.

sgarland 137 days ago [-]

Also, if I’m being honest, it’s a little off-putting to see a tool giving advice on Postgres and not differentiating between those two.

j45 137 days ago [-]

Always lots to learn, appreciate the insight.

It’s interesting that Postgres doesn’t seem to default some of these settings since they are clearly advisable as most cases.

sgarland 137 days ago [-]

bertrandhartwig 137 days ago [-]

Thank you for this advice but the tool is designed to help dev ... before to run on production.

sgarland 137 days ago [-]

And you don’t think someone would copy/paste what they did on dev or stage onto prod?

bertrandhartwig 137 days ago [-]

well my dev teams are using Flyway. So, I may not be reasoning correctly. You’re right, a copy-paste can happen quickly. I will modify the code according to your remark.

cpachmann 138 days ago [-]

[dead]

libsofhn 137 days ago [-]

[flagged]

justinclift 137 days ago [-]

In what way?

cwillu 137 days ago [-]

Troll account is trolling.

138 days ago [-]