> OpenAI helper: If you have an OpenAI account, the interface can query OpenAI to help you understand your query plan and optimize your SQL queries
How good are LLMs at optimizing queries? Do they just give basic advices like "try adding an index there" or can they do more?
I guess it would need to cross the results of the explain analyze with at least the DDLs of the table (including partitions, indexes, triggers, etc), and like the sizes of the tables and the rate of reads/writes of the tables in some cases, to be able to "understand" and "reason" about what's going on and offer meaningful advices (e.g. proposing design changes for the tables).
I don't see how a generic LLM would be able to do that, but maybe another type of model trained for this task and with access to all the information it needs?
ellisv 21 hours ago [-]
In my experience, LLMs are a lot better than my backend engineers who don't even try but not that much better than someone who's skimmed the docs.
Typically they suggest 1) adding indexes and 2) refactoring the query. If you only provide the query then the model isn't aware of what indexes already exist. LLMs make assumptions about your data model that often don't hold (i.e. 3NF). Sometimes you have to give the LLM the latest docs because it's unaware of performance improvements and features for newer versions.
In my view, it's better to look at the query plan and think yourself about how to improve the query but I also recognize that most people writing queries aren't going to do that.
There are other tools (RIP OtterTune) that can tune database parameters. Not something I really see any generative model helping with.
sgarland 50 minutes ago [-]
> In my view, it's better to look at the query plan and think yourself about how to improve the query but I also recognize that most people writing queries aren't going to do that.
That would require people to a. Learn SQL / relational algebra, one of the easiest languages in the world b. Read docs.
gopalv 9 hours ago [-]
> LLMs are a lot better than my backend engineers who don't even try but not that much better than someone who's skimmed the docs.
LLMs are basically at the skill-set of "Googled something and tried it", which for a lot of basic things is mostly what everyone does.
If you can loop back the results of the trial & error back to the model, then it does do a pretty good simulation of someone feeling their way through a problem.
The loop is what makes the approach effective, but a model by itself cannot complete the process - it at least needs an agent which can try the recommendation (or a human who will pull the lever).
swasheck 4 minutes ago [-]
yes. my experience is that an LLM will sometimes suggest an interesting approach that breaks my pattern of thought enough to get me to a better solution. but the ones i primarily use (claude sonnet and chatgpt 4o) also seem to love suggesting CTEs which help readability but it depends on the RDBMS (and other things) as to whether or not they offer any real performance benefit.
that’s a long way of saying that LLMs are a tool in this space but not yet a full solution, in my opinion and experience
sgarland 51 minutes ago [-]
My main issue with the use of LLMs for problems you’re unfamiliar with is precisely this. Trial and error is a fine way to work through a problem; you learn well from failure. But if you’re copy/pasting from something else the entire time, you’re not the one failing, and you’re not the one reasoning about why something didn’t work.
People do not learn this way, full stop. Anyone who thinks they are should try to write down, on paper, their understanding of a subject they think they have learned.
ComputerGuru 22 hours ago [-]
They’re non-deterministic and YMMV by design. No one can answer that question. It might save you a hundred million with some genius optimization it lucked into or, more likely, it’ll give you a seemingly equivalent query that actually returns subtly different results (easy to do with query optimizations when eg you rely on inherent/default sort order for joins with groups).
akx 21 hours ago [-]
They can be useful. I just recently used Claude to make a query 10x faster, though I did have to back-and-forth a bit, so you do still need to know what you're doing.
parthdesai 21 hours ago [-]
Claude has been really good at all things Postgres related
bertrandhartwig 9 hours ago [-]
I tested numerous LLM models locally with Ollama using PgAssistant. The one that gave me the best results is codestral:latest (15B).However, ChatGPT is truly a cut above the rest.
notatoad 5 hours ago [-]
i've had great luck with it. i can't say i'm any sort of database expert, but it's a lot better at this than me.
i was especially impressed that it could not only suggest changes to the queries or indexes to add, but it suggested a query be split into multiple queries at the application level which worked incredibly well.
victorbjorklund 20 hours ago [-]
LLM:s are really really good at creating queries. Shaving of the last ms from a complex query? yea, im sure an experienced query optimizer expert might beat it.
But you get very far from letting the LLM run a few queries to gather info about the database and its use.
joshstrange 21 hours ago [-]
They won’t understand your schema and make the wrong assumptions _but_ as long as you are checking the results and can understand the final query they can be very helpful.
Obviously there are tricks to let them better understand your schema but even using all of those it’s going to make the wrong assumptions about some columns and how they are used.
bertrandhartwig 7 hours ago [-]
When pgAssistant sends the prompt to the LLM, it provides the DDL of the tables involved in the query (using pg_dump), the result of the ANALYZE query, and of course, the query itself.
Believe me, LLMs understand the schema very well, and in the vast majority of cases, they suggest relevant indexes and/or query rewrites.
That being said, they are still LLMs and inherently non-deterministic.
But the goal of pgAssistant is not to fight against databases experts : it was build to help developpers before asking help to a DBA.
VWWHFSfQ 22 hours ago [-]
I've used them off and on for basic stuff like "tell me what this query does". They're usually pretty good at that. Sometimes it will make suggestions that are meaningful improvements, most of the time not. Unless you specifically ask about something like "would this be better done with a left join lateral/correlated subquery?" But you kinda have to already know what you're going for and then it can be helpful.
I think it's probably just a large file, as I remember thinking the same thing but it was loaded when I returned to the browser tab with the readme later on.
dewey 23 hours ago [-]
Ah, you are right. I'm on a slow connection right now and it's a 22MB gif.
That takes the GIF down to 1338973 bytes (1.3M) with (to my eyes) little loss of readability.
mjevans 18 hours ago [-]
__264516 Feb 12 11:44 pgassistant.gif
22782965 Feb 12 11:46 pgassistant.gif.raw.gif
_2120322 Feb 12 11:55 pgassistant.gif.av1-20.mp4
__245780 Feb 12 11:56 pgassistant.gif.av1-55.mp4
A place to start from at least, note the 264516 gif is what's currently on the landing page, with the wget command to grab the raw file.
kirici 16 hours ago [-]
Is there any benefit in using libaom-av1 over libsvtav1 nowadays?
mjevans 14 hours ago [-]
Licensing, patents (as far as I've heard anyway).
NEARLY everything can use AV1 and you don't need your clients to install a licensed codec if their OS didn't happen to include one. https://caniuse.com/av1 Far more than https://caniuse.com/hevc
ai-christianson 22 hours ago [-]
Submit a PR?
zimpenfish 20 hours ago [-]
I'm trying but it turns out that it's impossible to submit a useful PR with a video embed in `README.md` because the only way it works is by uploading the video through the web editor[0] and using a static user-images.githubusercontent.com URL.
[0] Which puts it under your account, obvs., and is therefore not that helpful for a PR.
ai-christianson 20 hours ago [-]
Maybe just open an issue and provide the ffmpeg command then
This reminds me of a less feature-rich version of PgAnalyze, but it's always nice to have alternative OSS options for this sort of optimization work.
scosman 20 hours ago [-]
I’ve wanted to make this for years. I have a doc of saved pg_stat_statement and other queries saved.
znpy 21 hours ago [-]
honest question: does anybody know a good book/tutorial/source/whatever on becoming a postgresql DBA ?
most of the material i see is written for people that want to write applications that work with postgresql, not on how to proficiently manage postgresql itself.
oa335 21 hours ago [-]
2nd Quadrant (bought by Enterprise DB several years ago) had a bunch of excellent YouTube tutorials on common tasks. Some of them may still be available on YouTube.
EDB has a fairly comprehensive set of self-paced trainings for sale. I went through them and thought they were really good.
Postgres documentation is excellent though, and though the docs are long, reading through it carefully should give you almost all the information you need for most database tasks.
Greg Smith wrote a book about postgres high performance that does go into the internals a bit and how to analyze performance problems. If you want to be a DBA, this will probably wind up needing to be in your back pocket at some point.
andatki 9 hours ago [-]
What sorts of skills are you looking to build? Since this post has a lot of comments about query optimization, I’d suggest query optimization is more in the realm of developers and not DBAs. Schema, query, and index design. DBAs might be more focused on replication/HA, security, fleet management, backups, DR.
Happy to help with more targeted recommendations!
znpy 23 minutes ago [-]
> What sorts of skills are you looking to build?
I work as a system engineer / devops engineer / infrastructure engineer. I often need to keep postgresql up and running, ideally in a smooth and performant way.
I don't like tool-specific tutorials or storytelling-based articles.
I'd like to learn the core postgresql things (terminology, tasks, operations) so that then i can evaluate autonomously what tool to pick and why, rather than following the trend of the month. And of course, being able to actually do "the things".
Ideally, the next progression of this would be being able to understand postgresql performance and how to tune it (again, in an holistic manner).
Per my understanding, this is the kind of skills that a DBA should have. Some of it overlaps with the skills of a decent developer (i understand that).
justinclift 4 hours ago [-]
This is a bit older now, but seems like it covers the timeless basics:
They probably have newer versions, this is just what's in my bookmarks
edoceo 21 hours ago [-]
The PostgreSQL docs are great and then watch the presentations from their conferences - very cool stuff.
Then practice, docker makes this easy.
cpachmann 22 hours ago [-]
[dead]
21 hours ago [-]
sgarland 58 minutes ago [-]
Your screenshot for issues found show indices being created with CREATE INDEX. If someone runs that on a prod DB with a large table, they’re gonna have a bad time. You should have it suggest CREATE INDEX CONCURRENTLY.
Also, if I’m being honest, it’s a little off-putting to see a tool giving advice on Postgres and not differentiating between those two.
j45 50 minutes ago [-]
Always lots to learn, appreciate the insight.
It’s interesting that Postgres doesn’t seem to default some of these settings since they are clearly advisable as most cases.
sgarland 19 minutes ago [-]
I’ve long held that Postgres is more powerful than MySQL in terms of flexibility and options, but you really ought to have thoroughly read and understood the manual to run it in prod. In comparison, MySQL does a fantastic job out of the box (modulo buffer pool sizing and sync_array_size, and tbf Postgres is equally bad at the former, and the latter doesn’t matter until larger scale), but is somewhat limited in what you can do with it (emphasis on somewhat – it’s still quite adequate for nearly anyone).
That’s why I’m forever salty that all of the tech influencers and bloggers hyped up Postgres so much. Yes, it’s a great RDBMS, but the frontend devs eagerly jumping into it haven’t the slightest clue what they’re getting themselves into.
https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...
How good are LLMs at optimizing queries? Do they just give basic advices like "try adding an index there" or can they do more?
I guess it would need to cross the results of the explain analyze with at least the DDLs of the table (including partitions, indexes, triggers, etc), and like the sizes of the tables and the rate of reads/writes of the tables in some cases, to be able to "understand" and "reason" about what's going on and offer meaningful advices (e.g. proposing design changes for the tables).
I don't see how a generic LLM would be able to do that, but maybe another type of model trained for this task and with access to all the information it needs?
Typically they suggest 1) adding indexes and 2) refactoring the query. If you only provide the query then the model isn't aware of what indexes already exist. LLMs make assumptions about your data model that often don't hold (i.e. 3NF). Sometimes you have to give the LLM the latest docs because it's unaware of performance improvements and features for newer versions.
In my view, it's better to look at the query plan and think yourself about how to improve the query but I also recognize that most people writing queries aren't going to do that.
There are other tools (RIP OtterTune) that can tune database parameters. Not something I really see any generative model helping with.
That would require people to a. Learn SQL / relational algebra, one of the easiest languages in the world b. Read docs.
LLMs are basically at the skill-set of "Googled something and tried it", which for a lot of basic things is mostly what everyone does.
If you can loop back the results of the trial & error back to the model, then it does do a pretty good simulation of someone feeling their way through a problem.
The loop is what makes the approach effective, but a model by itself cannot complete the process - it at least needs an agent which can try the recommendation (or a human who will pull the lever).
that’s a long way of saying that LLMs are a tool in this space but not yet a full solution, in my opinion and experience
People do not learn this way, full stop. Anyone who thinks they are should try to write down, on paper, their understanding of a subject they think they have learned.
i was especially impressed that it could not only suggest changes to the queries or indexes to add, but it suggested a query be split into multiple queries at the application level which worked incredibly well.
But you get very far from letting the LLM run a few queries to gather info about the database and its use.
Obviously there are tricks to let them better understand your schema but even using all of those it’s going to make the wrong assumptions about some columns and how they are used.
Believe me, LLMs understand the schema very well, and in the vast majority of cases, they suggest relevant indexes and/or query rewrites.
That being said, they are still LLMs and inherently non-deterministic.
But the goal of pgAssistant is not to fight against databases experts : it was build to help developpers before asking help to a DBA.
(https://github.com/nexsol-technologies/pgassistant/tree/main...)
Whoa... that's a lot of data for a README! But demos are pretty important, so I guess it's worth it.
ffmpeg -h encoder=libaom-av1
ffmpeg -i pgassistant.gif.raw.gif -c:v libaom-av1 -crf 20 -cpu-used 8 -movflags '+faststart' -pix_fmt yuv420p pgassistant.gif.av1-20.mp4
ffmpeg -i pgassistant.gif.raw.gif -c:v libaom-av1 -crf 55 -cpu-used 8 -movflags '+faststart' -pix_fmt yuv420p pgassistant.gif.av1-55.mp4
A place to start from at least, note the 264516 gif is what's currently on the landing page, with the wget command to grab the raw file.
NEARLY everything can use AV1 and you don't need your clients to install a licensed codec if their OS didn't happen to include one. https://caniuse.com/av1 Far more than https://caniuse.com/hevc
[0] Which puts it under your account, obvs., and is therefore not that helpful for a PR.
most of the material i see is written for people that want to write applications that work with postgresql, not on how to proficiently manage postgresql itself.
EDB has a fairly comprehensive set of self-paced trainings for sale. I went through them and thought they were really good.
Postgres documentation is excellent though, and though the docs are long, reading through it carefully should give you almost all the information you need for most database tasks.
https://www.postgresql.org/docs/current/admin.html
Happy to help with more targeted recommendations!
I work as a system engineer / devops engineer / infrastructure engineer. I often need to keep postgresql up and running, ideally in a smooth and performant way.
I don't like tool-specific tutorials or storytelling-based articles.
I'd like to learn the core postgresql things (terminology, tasks, operations) so that then i can evaluate autonomously what tool to pick and why, rather than following the trend of the month. And of course, being able to actually do "the things".
Ideally, the next progression of this would be being able to understand postgresql performance and how to tune it (again, in an holistic manner).
Per my understanding, this is the kind of skills that a DBA should have. Some of it overlaps with the skills of a decent developer (i understand that).
https://postgrespro.com/community/courses/DBA1
I'll take a closer look to the material, which is already good.
It's sad that there does not seem to be anything past "DBA1" (that is, no DBA2 or DBA3).
But there is a great book on the internals (very long, in depth, but not unreadable if you have a good foundation already) - https://edu.postgrespro.com/postgresql_internals-14_en.pdf
They probably have newer versions, this is just what's in my bookmarks
Then practice, docker makes this easy.
Also, if I’m being honest, it’s a little off-putting to see a tool giving advice on Postgres and not differentiating between those two.
It’s interesting that Postgres doesn’t seem to default some of these settings since they are clearly advisable as most cases.
That’s why I’m forever salty that all of the tech influencers and bloggers hyped up Postgres so much. Yes, it’s a great RDBMS, but the frontend devs eagerly jumping into it haven’t the slightest clue what they’re getting themselves into.