NHacker Next
login
▲Analyzing database trends through 1.8M Hacker News headlinescamelai.com
156 points by vercantez 3 days ago | 79 comments
Loading comments...
codeulike 13 hours ago [-]
MS Sql Server not even mentioned. This tells us there is a whole world almost totally omitted from discussion on HN: "Enterprise"
thewebguyd 13 hours ago [-]
Oracle isn't in there either, which goes to show how much of a bubble HN actually is considering MSSQL and Oracle are #1 and #2 in market share.
oefrha 2 hours ago [-]
Well, if you analyze programming language trends through 1.8M Hacker News headlines you’d find Rust is the most popular language and C/C++ are barely even used.
bob1029 3 hours ago [-]
Nor is DB2. A non trivial amount of HN's personal wealth is being tracked with this technology right now.
olavgg 3 hours ago [-]
I would not call HN a bubble, Enterprises often have unqualified people making "expensive" decisions.
mirzap 4 hours ago [-]
They are perhaps #1 and #2 in the "enterprise" market share, but in no way are they overall #1 and #2. Not even close. Which web app or startup uses them?
codeulike 3 hours ago [-]
Which web app or startup uses them?

Well with that question you neatly define the bubble that you inhabit.

https://db-engines.com/en/ranking ranks Oracle at number 1 and MS Sql Server at number 3, their method being a broad range of statistics based on job offers and web search statistics.

bdangubic 4 hours ago [-]
Stackoveflow
fmsf 3 hours ago [-]
Postgres: 51% MS SQL: 27% Oracle: 10%

https://survey.stackoverflow.co/2024/technology#most-popular...

mirzap 4 hours ago [-]
One. Continue. For each you mention, I can think of 10 other well-known web apps that don't use them. 90% of the web doesn't use those 2. That's the fact.
morkalork 12 hours ago [-]
I used MS SQL and Oracle at my last job, but what's there to say about them? They've been around forever, are stable and get all the same table-stakes feature updates as everyone else. Start-ups avoid them like the plague because they're so damn expensive, you won't be running either on your phone or an embedded device like SQLite either.
hinterlands 8 hours ago [-]
I do think it's an SFBA / generational bubble. We have plenty of boring, expensive software projects that someone will always bring up in a HN thread. For example, every time there's a thread on PCB design, you have some folks talking about Cadence. What's there to say about Cadence? Well, first and foremost, it costs a lot. Otherwise, it lets you design PCBs. But there are people here who pay for it, use it, and want to talk about it.
xyzzy123 7 hours ago [-]
Right but having access to a Cadence license is considered "elite" (it means you are a Real Engineer), while having to use mssql server means you're kind of a schlub (who probably has to work for a real business, that makes money but is super boring, with no equity, among people who don't understand any of this status hierarchy at all).
diggan 2 hours ago [-]
> This tells us there is a whole world almost totally omitted from discussion on HN

It doesn't though, all it tells you is that it's missing from the headlines in the submissions.

"Enterprise" is discussed on HN too, but inside submissions that aren't exclusively about MS Sql Server. Try searching for some terms on the Algolia HN search, order by date and filter by comments and you'll find the subthreads/submissions where it's discussed :)

fullstackchris 13 hours ago [-]
There is a reason it is not even mentioned
cheesekunator 7 hours ago [-]
And what's that reason .... ?
Imustaskforhelp 1 hours ago [-]
I almost knew that postgresql would be the winner just because of how much people recommend it here or literally anywhere. Postgres is cool.

My personal favourite depending on situations are postgres (technically supabase is postgres too),sqlite,duckdb,(valkey?)

I am just curious but guys what are your favourite options and why?

Imustaskforhelp 1 hours ago [-]
I really wanted to see the chat with HN data option or something https://camelai.com/hackernews/

But I am stuck at the cloudflare cf turnstile challenge and when I do click on it and it works, it shows error occured try again.

So frustating since I was so curious.

xnx 17 hours ago [-]
More unsolicited feedback: Month-by-month is kind of noisy. You might do 3 month average to smooth it a little and make the trend clearer.
conradkay 17 hours ago [-]
There's an online playground with the data here: https://play.clickhouse.com/

Wrote up this query:

  SELECT
    db_name,
    sum(if(type = 'comment', 1, 0)) AS comment_mentions,
    sum(if(type = 'story', 1, 0)) AS post_mentions,
    count(*) AS total_mentions,
    sum(score) as total_score
  FROM hackernews
  ARRAY JOIN
    extractAll(replaceAll(LOWER(text), ' ', ''), '(sqlite|postgres|mysql|mongodb|redis|clickhouse|mariadb|oracle|sqlserver|duckdb)') AS db_name
  WHERE toYear(time) >= 2022
  GROUP BY
    db_name
  ORDER BY
    post_mentions DESC;
Imustaskforhelp 1 hours ago [-]
Very interesting, where does the play.clickhouse get its hackernews data from though? There isn't any url link from where it fetches.

Does play.clickhouse contain all the HN data so that we can play with it?

Aachen 19 hours ago [-]
Is MariaDB included in MySQL? I see no mention of it in the post, but MySQL trending downwards would make sense as people upgrade and switch over. Besides of course novelty wearing off as posited for all engines further down the post
evanelias 19 hours ago [-]
> Is MariaDB included in MySQL?

I was wondering the same, but I'm not sure if it would make a major change in the graphs. MySQL and MariaDB have both been unpopular on Hacker News for many years. Submissions on either topic rarely get much traction, which then leads to fewer submissions.

> MySQL trending downwards would make sense as people upgrade and switch over.

No, most large MySQL users are still using MySQL; there hasn't been a widespread migration to MariaDB. They're both actively developed and have grown in slightly different directions. Among corporations, MySQL's usage still far outstrips MariaDB by a significant degree. Lately MariaDB has better product velocity though, and their commercial enterprise finally seems to have stable footing.

Aachen 14 hours ago [-]
> there hasn't been a widespread migration to MariaDB

I don't think I even knew I was running MariaDB at first, or perhaps more as a side note that I saw it dropping in mariadb when I apt installed mysql. If you upgraded Debian some time ago, I'm pretty certain you were automatically migrated, so anyone running that (or, presumably, one of the derivatives like Ubuntu) would have migrated knowingly or unknowingly, hence my assumption

evanelias 14 hours ago [-]
Sure, it's a common point of confusion specifically because a few major Linux distros did that. But SREs / DBAs / DBREs will generally take a much more rigorous approach to database version upgrades. Companies just don't tend to upgrade their important databases in that fashion, and ditto for operating systems if they self-host.

And then there's all the users of managed cloud database offerings (RDS, Cloud SQL, etc) who definitely don't accidentally switch database vendors in that manner. Google Cloud doesn't even offer managed MariaDB, and Azure is retiring their managed MariaDB product.

Also keep in mind MariaDB hasn't been fully drop-in compatible with MySQL for over a decade. They've increasingly diverged in features and minor syntax differences over time.

Just to be clear, I'm not bashing MariaDB, I quite like it as a database. But there's a lot of misconceptions about the relative usage levels of MariaDB vs MySQL among FOSS circles.

tonymet 17 hours ago [-]
is anyone seriously using it? even their own brand facepile is pretty weak
evanelias 16 hours ago [-]
MariaDB is widely used, including by some extremely high-traffic sites like Wikipedia [1], as well as some quite large multinational businesses [2].

It may not be as widespread as MySQL, but that's no surprise; despite HN's disdain, MySQL is still one of the most widely-used open source databases in existence.

[1] https://wikitech.wikimedia.org/wiki/MariaDB

[2] https://mariadb.com/resources/customer-stories/

Aachen 14 hours ago [-]
Their what now?
Tepix 19 hours ago [-]
Sqlite seems to be growing recently which matches my perception, but it‘s not listed among the growing databases. Weird.
vercantez 18 hours ago [-]
Yeah I found a mistake in the analysis. I'm updating the post to reflect SQLite's popularity.
vercantez 18 hours ago [-]
SQLite is now reflected in the growth table
kwillets 18 hours ago [-]
Snowflake seems to have peaked; 2023 was hellish dealing with roomfuls of inexperienced devs and even architects convinced it was the fastest cheapest thing ever.
redwood 12 hours ago [-]
Well as pointed out above since Oracle and SQL Server don't even show up.. this simply does not reflect enterprise and Snowflake and Eatabricks both lean Enterprise
vercantez 17 hours ago [-]
UPDATE: Added a weighted average analysis based on story points and comments. SQLite ranks highest in points per story and Redis ranks highest in comments per post. Also added SQLite to the growth table. I had accidentally deleted this row in the original post.
98codes 17 hours ago [-]
Interesting to see SQL Server not listed here, am curious whether it didn't have enough signal, or suffered from being a two-word product, with "SQL" being far too generic on its own.
jiggawatts 13 hours ago [-]
I’ve also don’t remember SAP HANA, Oracle, or DB2 mentioned even once here but believe me, along with MSSQL these occupy most of the top ten database deployments world wide.

Something that I’ve been thinking about a lot recently is that all of the proprietary vendors are quietly strangling their flagship products.

Free and open source database engines were always “nipping at their heels” but weren’t a serious threat for decades. Only other proprietary engines were.

Now that PostgreSQL has more features than SQL Server and better performance, it’s a serious competitor.

But Microsoft is holding MSSQL’s face under water with core-based licensing. It means that per dollar you get dozens of times less compute available for your data than with open-source systems. That ratio is growing exponentially, because they haven’t redone their pricing in… ever.

Oracle and DB2 are being similarly choked off at the same rate, so looking left and right at their direct competition their respective product managers haven’t noticed the problem, which is akin to Fuji and Kodak raising film prices in lockstep just as digital photography is taking off.

We’re entering the era of “kilocores”: single servers becoming available that have over a thousand cores. You can’t imagine what per-core licensing costs for something like that!

PS: I saw a similar dynamic play out in the network space with load balancers and “web accelerators” like NetScaler sold “by bandwidth” with a starter SKU as small as 2 Mbps. I kept trying to politely explain to the reps that the smallest cloud VMs can cheerfully put out 10 Gbps, and hence their product is a 500x decelerator. They eventually listened to someone and made it bandwidth-unlimited. Too late. Everyone uses NGINX now.

12 hours ago [-]
redwood 12 hours ago [-]
When you're addicted to bad revenue is very hard to compress it
jiggawatts 11 hours ago [-]
It's a repeating problem across many industries.

Proprietary compilers and developer tooling were similarly strangled, and have been completely replaced by free/open tools in all but a few niche areas such as embedded, hard realtime, and circuit design.

RadiozRadioz 17 hours ago [-]
It is also less mentioned on the site in general, owing to it being a proprietary Microsoft product in an audience of people who primarily go for Free / Open Source non-Microsoft products.

There are some people here who are interested in corporate Europe or <insert Microsoft foothold place/industry here>, but most are aligned with Silicon Valley hackers.

pythonaut_16 12 hours ago [-]
Someone else mentioned it already, but what is there to talk about with SQL Server (and Oracle)? Like I'm sure there's plenty someone could write about but generally it's pay Microsoft so it's their problem.

Whereas something like Postgres has a plethora of forks and tools built around it, because it's open source devs can actually do interesting things to solve their problems.

bix6 7 hours ago [-]
Any commentary on DuckDB from users? I keep hearing about it but am not a user myself. Is it a fad or here to stay?
zurfer 18 hours ago [-]
Unfortunately, I only got data until 2022, but here is a similar overview with a few more charts and sentiment analysis: https://eu.getdot.ai/share/f3f0853d-fa91-4301-8fb2-52821b65e...

Will try to update it with some more recent data later.

Aachen 19 hours ago [-]
The data query tool linked at the bottom of the post doesn't work for me. Cloudflare shows error 600010, whatever that means. Nice that there is "no login required" but if it did, or allowed that option, maybe it wouldn't need an algorithm to decide whether my traffic is abusive because you could block abusive accounts instead
jtbaker 19 hours ago [-]
https://camelai.com/hackernews/? Worked for me.
sega_sai 19 hours ago [-]
I am getting an infinite loop of 'Verify you are human'....
vercantez 18 hours ago [-]
We use cloudflare turnstile. Sometimes it blocks some VPNs. Very rarely it blocks some browsers.
tea-lover 12 hours ago [-]
It does it "very rarely" if you only care about the most populated & richest areas of the world. It also blocks clients from the neglected "global south" all the time. FWIW, I too am stuck in a captcha loop, and these days I usually just bounce when I see Cloudflare captcha instead of trying to fight it. In your logs it probably looks like bot traffic.
Aachen 14 hours ago [-]
Not on a VPN. Guess I can't use this browser then? So much for HTML/Ecmascript being standards anyone could implement...
Aachen 14 hours ago [-]
Yep, that one. Praise that the algorithm likes you!
123yawaworht456 18 hours ago [-]
>a ClickHouse database of every HN story

I remember downloading it a few years ago, but the bookmark I have is dead. where is it now? is it still public?

xnx 18 hours ago [-]
Here: https://play.clickhouse.com/play?user=play#U0VMRUNUIG1heCh0a...

It's really fantastic. Continuously updated and fast anonymous queries. Big kudos to ClickHouse.

jabart 18 hours ago [-]
Still Public, still chews through million->billion or rows in seconds. Their Cloud version has some Cloud specific features. A few vendors have build custom thing on top or custom builds off the open source project too.
18 hours ago [-]
xnx 18 hours ago [-]
Would be great to share the queries. Are these results weighted for storypoints and/or number of comments?
vercantez 18 hours ago [-]
Purely based on headline occurrence but weighing based on storypoints and comments is a great idea. I'll update the blog, thanks.
vercantez 17 hours ago [-]
Updated with weighted analysis.
esafak 15 hours ago [-]
How are you handling sanitization? Anything interesting?
bellareed 18 hours ago [-]
If your curiosity inspires you to dig deeper into the data, our "chat with hacker news" free tool is available. No login required: https://camelai.com/hackernews/
chickenzzzzu 18 hours ago [-]
the funniest thing about this graph is that it proves there was a raw drop off in all popularities in the last 2 years, which of course directly coincides with the great layoffening that has been happening for almost 3 years now.

this shows that people are definitely rotating out of "web technologies" in general, not because they aren't useful, but because the money isn't there anymore.

perhaps a large chunk have switched to AI hype trains, and it would be interesting to compare raw results of different AI headlines, but i suspect maybe 30% of people have left tech all together.

redwood 18 hours ago [-]
I think it's attention and mindshare going to AI
chickenzzzzu 18 hours ago [-]
we would have to look at raw numbers, like, perhaps web tech is just "flat", not declining.

but my suspicion without evidence is that the gross number of people in the industry is actually dropping, though it should be increasing.

bellareed 18 hours ago [-]
This would be an interesting request to directly ask the data. Which you can do using our "chat with hacker news data" free tool: https://camelai.com/hackernews/

No login required.

chickenzzzzu 18 hours ago [-]
i went there on mobile and asked two questions. it went pretty well from a UI and response quality perspective. the data they showed me didn't show any obvious trends, but i suspect it's because i didn't specify a long enough list of technologies, and that some general terms were included like "machine learning" and "llm" which had an effect of hiding the trends i was looking for.

a great start and much more enjoyable than writing the sql or for loops myself :)

chickenzzzzu 18 hours ago [-]
thank you very much for suggesting that and for making it available without a login :)
xnx 18 hours ago [-]
Confusingly, I just came across the unrelated https://www.camel-ai.org/ today.
bellareed 18 hours ago [-]
Sooo confusing. We've debated changing our name but can't bring ourselves to break up with our cute camel logo lol.
gushie 17 hours ago [-]
I would have suggested HumpAI if hump didn't have another meaning that might attract users you didn’t intend :)
bellareed 17 hours ago [-]
XD lmao
esafak 15 hours ago [-]
First one to http://camel.ai/ wins :)
bellareed 13 hours ago [-]
I had a semi-viral tweet about my attempt to buy camel.ai useless domain squatter wants $40k for it! https://x.com/isabella_patane/status/1820987472287080867
jeffbee 6 hours ago [-]
Is it weird or just me that bigquery is mentioned, but bigtable and spanner are not? The article presents a grab-bag of database concepts that do not seem related. BigQuery and PostgreSQL are just fundamentally different things.

It all makes me wonder what is the biggest "dark" database, the one nobody on HN wants to talk about, but it's out there serving the most transactions.

nsbk 19 hours ago [-]
Some of the insights match my personal experience and preferences. At $dayjob we're migrating from Mongo to TimescaleDB (now TigerData ¯\_(ツ)_/¯) which is basically a PostgreSQL extension for time series data and couldn't be happier. We are getting better performance and massive storage savings.

On the analytics side of things we are starting to use DuckDB for some development efforts, but we are keen on potentially replacing some or all of our Snowflake usage with DuckDB.

throw_m239339 19 hours ago [-]
Can you tell me, the scenarios you used MongoDB for? Because I'm still curious about why would anyone use MongoDB after all these years.
CoastalCoder 3 hours ago [-]
> I'm still curious about why would anyone use MongoDB after all these years.

Because MongoDB is webscale.

nsbk 19 hours ago [-]
It is the main database for a huge Rails app. They adopted Mongo right when its popularity started to decline. I always thought it was a very poor choice since the day I joined.

It is a especially bad choice considering that a lot of the data stored in it is IoT-like and the system creates a single document per event :facepalm:

beembeem 17 hours ago [-]
I'm sorry to hear about your bad experience. From your comment I take it that you weren't using a time-series collection to store data in mdb which uses industry-standard compression techniques?
RS-232 19 hours ago [-]
No SQLite?
vercantez 18 hours ago [-]
Mistake in the analysis. Fixing now.
vercantez 18 hours ago [-]
Fixed.
b0a04gl 16 hours ago [-]
[dead]
markwclancy 14 hours ago [-]
Absolute drivel. Comparing operational/transactional databases like MongoDB and Postgres to analytics / columnar datastores like Redshift and Snowflake is meaningless. You might as well as say "...the popularity of hammers is way up, with screwdrivers appearing to be in decline..". If this is the type of data analysis that AI is supporting, we're all in trouble.