NHacker Next
login
▲Ask HN: Weird archive.today behavior?
80 points by rabinovich 9 hours ago | 21 comments
Loading comments...
rafram 2 hours ago [-]
Remember when Archive.is/today used to send Cloudflare DNS users into an endless captcha loop because the creator had some kind of philosophical disagreement with Cloudflare? Not the first time they’ve done something petty like this.
NedF 28 minutes ago [-]
[dead]
dunder_cat 3 hours ago [-]
Hmm. If it is an attempt at DDoS attacks, it's probably not very fruitful:

  >$ resolvectl query gyrovague.com

  gyrovague.com: 192.0.78.25                     -- link: eno1
                 192.0.78.24                     -- link: eno1
Viewing the first IP address on https://bgp.he.net/ip/192.0.78.25 shows AS2635 (https://bgp.he.net/AS2635) is announcing 192.0.78.0/24. AS2635 is owned by https://automattic.com aka wordpress.com. I assume that for a managed environment at their scale, this is just another Wednesday for them.
arcfour 2 hours ago [-]
I believe they're probably trying to get the blog suspended (automatically?) hence the cache busting; chewing through higher than normal resources all of a sudden might do the trick even if it doesn't actually take it offline.
dunder_cat 3 hours ago [-]
It occurred to me while reading the article that I could also just have checked the TLS cert. The cert I was given presents "Common Name tls.automattic.com". However, maybe someone will discover bgp.he.net via this :-)
catlifeonmars 2 hours ago [-]
> maybe someone will discover bgp.he.net via this

I did, thank you!

justsomehnguy 9 minutes ago [-]
Add https://bgp.tools to the list
mike_d 2 hours ago [-]
It is using the ?s= parameter which causes WordPress to initiate a search for a random string. This can result in high CPU usage, which I believe is one of the DoS vectors that works on hosted WordPress.
eli 2 hours ago [-]
Well that is a very silly way to punish the author of an article you don’t want people to know about.
crazysim 55 minutes ago [-]
"It’s a testament to their persistence that they’re managed to keep this up for over 10 years, and I for one will be buying Denis/Masha/whoever a well deserved cup of coffee."

https://gyrovague.com/2023/08/05/archive-today-on-the-trail-...

And one where the author's cool with whoever is running archive.today.

ideasphere 1 hours ago [-]
https://news.ycombinator.com/item?id=45922875

“Behind the complaints: Our investigation into the suspicious pressure on Archive.today”

sbdaman 3 hours ago [-]
Given it's set to generate random pages on the site, is there even any possible explanation for this that isn't sketchy?
mediumdeviation 2 hours ago [-]
It's not random, setting the query string to a new value on every fetch is a cache busting technique - it's trying to prevent the browser from caching the page, presumably to increase bandwidth usage.
gertop 5 minutes ago [-]
It's trying to prevent the server from caching the search. Thousands of different searches will cause high CPU load and the WordPress might decide to suspend the blog.
nativeit 2 hours ago [-]
I just tried in my browser (Firefox on Ubuntu) and got the same result. Deeply curious.
Barbing 2 hours ago [-]
Worth blocking the URL for users of that Archive site then, avoid extra burden?
internetter 2 hours ago [-]
There's really no interpretation of this which isn't malicious, although, not to defend this behaviour whatsoever, I'm not entirely surprised by it. The only real value of archive.is is its paywall bypassing abilities and, presumably, large swaths of residential proxies that allow it to archive sites that archive.org can't. Only somebody with some degree of lawlessness would operate such a project.
jijijijij 37 minutes ago [-]
Not excusing this malicious behavior, but I have to say, the mentioned blog post is a major dick move, too. Got quite the impression of a passive aggressive undertone, and there is clearly bittersweet irony in collecting and "archiving" an archiver's personal information from long ago traces. Maybe it's all some feud between two dicks, some backstory untold. Maybe the blog author wanted some information gone from archive.today, but was denied.
Brybry 2 hours ago [-]
It's not just for paywall bypassing. Sometimes there are archive.today snapshots that aren't in the Wayback Machine (though I think your overall point about lawlessness still stands).

For example, there was some NASA debris that hit a guy's house in Florida and it was in the news. [1] Some news sites linked to a Twitter post he made with the images but he later deleted the post. [2]

The Wayback Machine has a ton of snapshots of the Twitter post but none of them render for me. [3]

But archive.today's snapshot works great. [4]

[1] https://www.bbc.com/news/articles/c9www02e49zo

[2] https://xcancel.com/Alejandro0tero/status/176872903149342722...

[3] https://web.archive.org/web/20240715000000*/https://twitter....

[4] https://archive.md/obuWr

mediumdeviation 3 hours ago [-]
Pretty sure that blog is hosted on Wordpress.com infrastructure so it's not like the blog owner would even notice unless it generates so much traffic that WP itself notices.

That said I don't think there's many non-malicious explanation for this, I would suggest writing to HN and see about blocking submissions from the domain hn@ycombinator.com

heraldgeezer 2 hours ago [-]
[flagged]
self_awareness 29 minutes ago [-]
And that's how advertising works, folks. If someone wants a website dead, I want to know more about it.
ventegus 8 hours ago [-]
They might need to tweak a single word. Streisand readers won’t have a clue which.

Save the page now and compare a week later.

russian_archive 50 minutes ago [-]
[dead]