This clearly elucidated a number of things I've tried to explain to people who are so excited about "conversations" with computers. The example I've used (with varying levels of effectiveness) was to get someone to think about driving their car by only talking to it. Not a self driving car that does the driving for you, but telling it things like: turn, accelerate, stop, slow down, speed up, put on the blinker, turn off the blinker, etc. It would be annoying and painful and you couldn't talk to your passenger while you were "driving" because that might make the car do something weird. My point, and I think it was the author's as well, is that you aren't "conversing" with your computer, you are making it do what you want. There are simpler, faster, and more effective ways to do that then to talk at it with natural language.
shubhamjain 2 days ago [-]
I had the same thoughts on conversational interfaces [1]. Humane AI failed not only because of terrible execution, the whole assumption of voice being a superior interface (and trying to invent something beyond smartphones) was flawed.
> Theoretically, saying, “order an Uber to airport” seems like the easiest way to accomplish the task. But is it? What kind of Uber? UberXL, UberGo? There’s a 1.5x surge pricing. Acceptable? Is the pickup point correct? What would be easier, resolving each of those queries through a computer asking questions, or taking a quick look yourself on the app?
> Another example is food ordering. What would you prefer, going through the menu from tens of restaurants yourself or constantly nudging the AI for the desired option? Technological improvement can only help so much here since users themselves don’t clearly know what they want.
And 10x worse than that is booking a flight: I found one that fits your budget, but it leaves at midnight, or requires an extra stop, or is on an airline for which you don't collect frequent flyer miles, or arrives it at a secondary airport in the same city, or it only has a middle seat available.
How many of these inconveniences will you put up with? Any of them, all of them? What price difference makes it worthwhile? What if by traveling a day earlier you save enough money to even pay for a hotel...?
All of that is for just 1 flight, what if there are several alternatives? I can't imagine have a dialogue about this with a computer.
fragmede 1 days ago [-]
But that is how we used to buy a plane ticket. Long before flights.google.com's price table, you'd call a human up and tell them you'd like to go on holiday. They'd ask you where and when and how much you could afford, and then after a while with the old system (SABRE) clicking and clacking they'd find you a good deal. After a few flights with that travel agent, they'd hey to know you and wouldn't have to ask so many questions.
Similarly, long before Waymo, you'd get into a taxi, and tell the human driver you're going to the airport, and they'd take you there. In fact, they'd get annoyed at you if you backseat drove, telling them how to use the blinker and how hard to brake and accelerate.
The thing about conversational interfaces is that we're used to them, because we (well, some of us) interface with other humans fairly regularly, and so it's a fairly baseline level skill to have to exist in the world today. There's a case to be made against them, but since everyone can be assumed to be conversational (though perhaps not in a given language), it's here to stay. Restaurants have menus that customers look at before using the conversation interface to get food, in order to guide the discussion, and that's had thousands of years to evolve, so it might be a local maxima, but it's a pretty good one.
grbsh 1 days ago [-]
It's a great point that this is how we primarily used to interact with businesses and services, but we've moved on. For Gen-Z, e.g., many will refuse to use the product or service if they have to speak to an actual human. Just like we're now not willing to take boat across the ocean for 3 months, but before airplanes this was not uncommon.
taneq 1 days ago [-]
Taking a 3 month voyage was still an uncommon thing to do for a person, it’s just that it was the most common type of intercontinental journey due to lack of competition.
everdrive 1 days ago [-]
But the booking agent used to understand what you were saying, and it'd be very easy to work out miscommunications. AI chatbots just send you in circles endlessly and if you get "stuck" there is no recourse.
Zamaamiro 1 days ago [-]
I don't see how "but that's the way we used to do things" is an argument in favor of conversational interfaces.
The whole point is that we currently have better, more efficient ways of doing those things, so why would we regress to inferior methods?
dcrimp 1 days ago [-]
the inferior methods were slower but more flexible - could handle any and all edge cases. Currently we have a UX that really efficiently realises 80% of cases.
To relate to the article - google flights is the Keyboard and Mouse - covering 80% of cases very quickly. Conversational is better for when you're juggling more contextual info than what can be represented in a price/departure time/flight duration table. For example, "i'm bringing a small child with me and have an appointment the day before and I really hate the rain".
Rushed comment because I'm working, but I hope you get the gist.
Current flight planning UX is overfit on the 80% and will never cater to the 20% because cost/benefit of the development work isn't good
fragmede 21 hours ago [-]
You have to define which axis' you're using to define efficient. If I were an executive at some corporation, I'd tell my assistant to book me a flight to New York on Friday at 7pm and that takes me less than 10 seconds. It may take her a while longer, but that's her problem and that's what I pay her for.
How long is it going to take you to get to a device, load the app/webpage, tell it which airport you're flying from and going to and what date and then you start looking at options. You've blown way past the 10 seconds it took for that executive to get a plane flight.
Better is in the eye of the beholder. What's monetarily efficient isn't going to be temporaly efficient, and that's true along a lot of other dimensions too.
Point is, there are some people that like having conversations, you may not be one of them. you don't have to be. I'm not taking away your mouse and keyboard. I have those too and won't give them up either. But I also find talking out loud helps my thinking process though I know that's not everybody.
nerdponx 1 days ago [-]
And how many people book flights that way today?
wetoastfood 1 days ago [-]
Many people today are booking flights for others, be it families, business leaders, or traditional travel agents. They’re communicating preferences and asking about preferred travel times, budget, seat selection, and more. When you book for and with someone else, these preferences get learned and you no longer have to ask if they prefer an aisle seat—you just pick it.
The booking experience today is granular to help you find a suitable flight to meet all the preferences you’re compiling into an optimal scenario. The experience of AI booking in the future will likely be similar: find that optimal scenario for you once you’re able to articulate your preferences and remember them over time.
uoaei 1 days ago [-]
And how many bad experiences do you expect people to tolerate before AI eventually learns the person's "real" preferences?
mschuster91 1 days ago [-]
More than enough. Corporate flights are almost always handled that way, alone for compliance reasons (the travel agency knows about budget and "appearance" limits aka only c-level gets business class, everyone else gets economy).
Anecdata: last year my wife and I went on a rail tour through Eastern Europe and god, I wish we had chosen to spend a few hundred euros on a travel agency in retrospect - I can't count just how much time we had to spend researching on what kind of rail, bus and public transit tickets you need on which leg, how to create accounts, set up payment and godknowswhat else. Easily took us two days worth of work and about two dozens individual payment transactions. A professional travel agency can do all the booking via Sabre, Amadeus or whatever...
fragmede 22 hours ago [-]
not how many, but which ones? As a regular person, I buy it myself, but do you think rich people do that? No, they just ask their (human) assistant to get a flight to New York around 7pm this Friday, and then move onto the next problem in their lives.
indigoabstract 1 days ago [-]
And then there is the fact that voice isn't the dominant mode of expression for all people. Some are predominantly visual thinkers, some are analytic and slow to decide, while some prefer to use their hands and so on.
I guess there's just no substitute for someone actually doing the work of figuring out the most appropriate HMI for a given task or situation, be it voice controls, touch screens, physical buttons or something else.
ramblejam 1 days ago [-]
> since users themselves don’t clearly know what they want.
Knowing what you want is, sadly, computationally irreducible.
littlestymaar 1 days ago [-]
Why couldn't the interface ask you about your preferences? Because instead, what we have right now are clunky web interface that just cram every choice in the small screen in front of you and letting you understand how they are in fact different and sort out yourself how to make things work.
Of course a conversational interface is useless if it tries to just do the same thing as a web UI, which is why it failed a decade ago when it was trendy, because the tech was nowhere clever enough to make that useful. But today, I'd bet the other way round.
UncleMeat 1 days ago [-]
Scrolling through a list of a few options seems much less clunky than being asked via voice about which option I prefer. I can see multiple options at once and compare them easily. But via voice I need to keep all of the options in working memory to compare them. Harder.
littlestymaar 1 days ago [-]
The problem with scrolling is that you'll be presented tens of options you don't care about because the options have to be determined in advance and be the same for everyone.
That's why the “advanced search” is almost always hidden somewhere. And that's also why you can never find the filter you need on an e-shopping website.
earnestinger 1 days ago [-]
It can ask, but how much time do you want to spend answering stuff?
Such dialog is probably nice for first time user, it is a nightmare for repeated user.
littlestymaar 1 days ago [-]
What prevents the system to remember your previous choices?
Then it can assume you choice haven't changed, and propose you a solution that matches your previous choices. And to give the user control it just needs to explicitly tell the user about the assumption it made.
In fact, a smart enough system could even see when violating the assumptions could lead to a substantial gain and try convincing the user that it may be a good option this time.
nosianu 1 days ago [-]
It still has to tell you. Visually in a form it's much faster. Similar reason why many people prefer a blog post over a video.
Talking is not very efficient, and it's serial in fixed time. With something visual you can look at whatever you want whenever you want, at your own (irregular) pace.
You will also be able to make changes much faster. You can go to the target form element right away, and you get immediate feedback from the GUI (or from a physical control that you moved - e.g. in cars). If it's talk, you need to wait to have it said back to you - same reason as why important communication in flight control or military is always read back. Even humans misunderstand. You can't just talk-and-forget unless you accept errors.
You would need some true intelligence for just some brief spoken requests to work well enough. A (human) butler worked fine for such cases, but even then only the best made it into such high-level service positions, because it required real intelligence to know what your lord needed and wanted, and lots of time with them to gain that experience.
littlestymaar 1 days ago [-]
> It still has to tell you. Visually in a form it's much faster.
Who said it cannot be visual? It's still a “conversational” UI if it's a chatbot that writes down its answer.
> Similar reason why many people prefer a blog post over a video.
Well I certainly do, but I also know that we are few and far between in that case. People in general prefer videos over blog post by a very large margin.
> Talking is not very efficient, and it's serial in fixed time. With something visual you can look at whatever you want whenever you want, at your own (irregular) pace. You will also be able to make changes much faster. You can go to the target form element right away, and you get immediate feedback from the GUI.
Saying “I want to travel to Berlin next monday” is much faster than fighting with the website's custom datepicker which will block you until you select your return date until you realize you need to go back and toggle the “one way trip” button before clicking the calendar otherwise it's not working…
There's a reason why nerds love their terminal: GUIs are just very slow and annoying. They are useful for whatever new thing you're doing, because it's much more discoverable than CLI, but it's much less efficient.
> If it's talk, you need to wait to have it said back to you - same reason as why important communication in flight control or military is always read back. Even humans misunderstand. You can't just talk-and-forget unless you accept errors.
This is true, but stays true with a GUI, that's why you have those pesky confirmation pop-ups, because as annoying as they are when you know what you're doing, they are necessary to catch errors.
> You would need some true intelligence for just some brief spoken requests to work well enough.
I don't think so. IMO you just need something that emulates intelligence enough on that particular purpose. And we've seen that LLMs are pretty decent at emulating apparent intelligence so I wouldn't bet against them on that.
nosianu 13 hours ago [-]
> Who said it cannot be visual? It's still a “conversational” UI if it's a chatbot that writes down its answer.
You can't be serious??
Oh it's 1st of April, my apologies! I almost took it seriously. I should ignore this website on this day.
littlestymaar 13 hours ago [-]
I don't understand your complaint.
What's the difference between a blog post and a chatbot answer in terms of how “visual” things are?
fragmede 19 hours ago [-]
> Similar reason why many people prefer a blog post over a video.
I used to be a reading blog over watching video person, but for some things I’ve come to appreciate the video version. The reason you want to get the video of the whatever is because in the blog post, what’s written down only what the author thought was important. But I’m not them. I don’t know everything they know and I don’t see everything they see. I can’t do everything they do but with the video I get everything. When you perform the whatever the video has every detail, not just the ones you think are important. That bit between step 1 and step 2 that’s obvious? It’s not obvious to everyone, or mine is broken in a slightly different way that I really need to see that bit between 1 and 2. of course, videos get edited and cut so they don’t always have that benefit, but I’ve grown to appreciate them.
UncleMeat 1 days ago [-]
The previous choice might not what I want today.
Maybe I'm tired of layovers and I'm willing to pay more for a direct flight this time. Maybe I want a different selection at a restaurant because I'm in the mood for tacos rather than a burrito.
littlestymaar 1 days ago [-]
Just tell it then.
soco 10 hours ago [-]
And then we're back to point one: retelling the whole stack of choices every time because nobody on the other side of the conversation, person or AI; can tell whether all my previous options are still valid. Because even I, the caller, might not remember what "defaults" I set in the previous call. So yeah, this argument in favor of conversational interfaces sounds at this point more like ideology than logic.
littlestymaar 9 hours ago [-]
> every time because nobody on the other side of the conversation, person or AI; can tell whether all my previous options are still valid.
But you can, so as long as the interlocutor tells you what assumptions it made, you can correct it if it doesn't match your current mood.
> So yeah, this argument in favor of conversational interfaces sounds at this point more like ideology than logic.
There's no ideology behind the fact that every people rich enough to afford paying someone to deal with mundane stuff will have someone doing it for them, it's just about convenience. Nobody likes to fight with web UIs for fun, the only reason why it has become mainstream is because it's so much cheaper than having a real person working.
Same for Microsoft Word by the way, many people used to have secretaries typing stuff for them, and it's been a massive regression of social status for the upper middle class to have to type things by themselves, it only happened because it was cheaper (in appearance at least).
soco 7 hours ago [-]
Okay I think I finally get your point, and I even agree. The comparison with an executive assistant doesn't help much here, because the CEO interacts with only one person over all those delegatable activities, and the expectations are that person already knows all the defaults. That's what makes it smooth. This doesn't scale when you must deal with a different AI for each interaction. Will we get to a (scary maybe) point where Siri/Alexa/whoever can actually be that personal assistant? Maybe, but we're still far from it. So at least for today, the conversational interface is an extra burden. And tomorrow, we'll see.
Propelloni 1 days ago [-]
> I had the same thoughts on conversational interfaces [1]. Humane AI failed not only because of terrible execution, the whole assumption of voice being a superior interface (and trying to invent something beyond smartphones) was flawed.
Amen to that. I guess, it would help to get of the IT high horse and have a talk with linguists and philosophers of language. They are dealing with this shit for centuries now.
phyzix5761 2 days ago [-]
You're onto something. We've learned to make computers and electronic devices feel like extensions of ourselves. We move our bodies and they do what we expect. Having to switch now to using our voice breaks that connection. Its no longer an extension of ourselves but a thing we interact with.
namaria 1 days ago [-]
Two key things that make computers useful, specificity and exactitude, are thrown out of the window by interposing NLP between the person and the computer.
I don't get it at all.
TeMPOraL 1 days ago [-]
[imprecise thinking]
v <--- LLMs do this for you
[specific and exact commands]
v
[computers]
v
[specific and exact output]
v <--- LLMs do this for you
[contextualized output]
In many cases, you don't want or need that. In some, you do. Use right tool for the job, etc.
namaria 22 hours ago [-]
Despite feeling like a "let me draw it for you" answer is a tad condescending, I want to address something here.
This would be great if LLMs did not tend to output nonsense. Truly it would be grand. But they do. So it isn't. It's wasting resources hoping for a good outcome and risking frustration, misapprehensions, prompt injection attacks... It's non-deterministic algorithms hoping P=NP, except instead of branching at every decision you're doing search by tweaking vectors whose values you don't even know and whose influence on the outcome is impossible to foresee.
Sure, a VC subsidized LLM is a great way to make CVs in LaTeX (I do it all the time), translating text, maybe even generating some code if you know what you need and can describe it well. I will give you that. I even created a few - very mediocre - songs. Am I contradicting myself? I don't think I am, because I would love to live in a hotel if I only had to pay a tiny fraction of the cost. But I would still think that building hotels would be a horrible way to address the housing crisis in modern metropolises.
TeMPOraL 20 hours ago [-]
> Despite feeling like a "let me draw it for you" answer is a tad condescending, I want to address something here.
I didn't mean it to be condescending - though I can see how it can come across as such. FWIW, I opted for a diagram after I typed half a page worth of "normal" text and realized I'm still not able to elucidate my point - so I deleted it and drew something matching my message more closely.
> This would be great if LLMs did not tend to output nonsense. Truly it would be grand. But they do. So it isn't.
I find this critique to be tiring at this point - it's just as wrong as assuming LLMs work perfectly and all is fine. Both views are too definite, too binary. In reality, LLMs are just non-deterministic - that is, they have an error rate. How big it is, and how small can it get in practice for a given tasks - those are the important questions.
Pretty much every aspect of computing is only probabilistically correct - either because the algorithm is explicitly so (UUIDs and primality testing, for starters), or just because it runs on real hardware, and physics happen. Most people get away with pretending that our systems are either correct or not, but that's only possible because the error rate is low enough. But it's never that low by accident - it got pushed there by careful design at every level, hardware and software. LLMs are just another probabilistically correct system that, over time, we'll learn how to use in ways that gets the error rate low enough to stop worrying about it.
How can we get there - now, that is an interesting challenge.
namaria 13 hours ago [-]
Natural language has a high entropy floor. It's a very noisy channel. This isn't anything like bit flipping or component failure. This is a whole different league. And we've been pouring outrageous amounts of resources into diminishing returns. OpenAI keeps touting AGI and burning cash. It's being pushed everywhere as a silver bullet, helping spin lay offs as a good thing.
LLMs are cool technology sure. There's a lot of cool things in the ML space. I love it.
But don't pretend like the context of this conversation isn't the current hype and that it isn't reaching absurd levels.
So yeah we're all tired. Tired of the hype, of pushing LLMs, agents, whatever, as some sort of silver bullet. Tired of the corporate smoke screen around it. NLP is still a hard problem, we're nowhere near solving it, and bolting it on everything is not a better idea now than it was before transformers and scaling laws.
On the other hand my security research business is booming and hey the rational thing for me to say is: by all means keep putting NLP everywhere.
fragmede 19 hours ago [-]
Using the word hotel has a lot of baggage, but having a large quantity of rooms for rent, for cheap, with a bathroom but no dedicated kitchen would be amazing for the housing crisis. If they were high quality and sound isolated, with high speed elevators, and communal spaces for residents, it could work. I'm not an architect though.
saratogacx 18 hours ago [-]
In the early 2000's there was a push for building apodments which were a room, bathroom, and shared kitchen area. Some people liked them but it isn't for everyone.
namaria 13 hours ago [-]
You're describing student housing and if you ever lived in one you'd know how bad of an idea you're musing with.
TeMPOraL 13 hours ago [-]
He's also describing hotels, and aparthostels, and officers' quarters on a ship and bunch of other stuff. The devil is in the details - specifically, how much it costs to rent per sqm, and what stops the price from going up to the point it forces multiple people to share the room? What stops the landlords from subdividing the rooms further and renting them out apiece? What stops already shoddy construction from getting even worse?
Those are the big challenges of housing. Not just how many units there are, but what they are, and how much the "how many" is plain cheating.
shakna 1 days ago [-]
I don't think they give a specific and exact output, considering how nondeterminism plays a role in most models.
TeMPOraL 1 days ago [-]
I'll need to work on the diagram to make it clearer next time.
What it's trying to communicate is, in general, a human operating a computer has to turn their imprecise thinking into "specific and exact commands", and subsequently, understand the "specific and exact output" in whatever terms they're thinking off, prioritizing and filtering out data based on situational context. LLMs enter the picture in two places:
1) In many situations, they can do the "imprecise thinking" -> "specific and exact commands" step for the user;
2) In many situations, they can do the "specific and exact output" -> contextualized output step for the user;
In such scenarios, LLMs are not replacing software, they're being slotted as intermediary between user and classical software, so the user can operate closer to what's natural for them, vs. translating between it and rigid computer language.
This is not applicable everywhere, but then, this is also not the only way LLMs are useful - it's just one broad class of scenarios in which they are.
furyofantares 1 days ago [-]
The diagram you're replying to agrees with this.
walthamstow 1 days ago [-]
Does it? The way I'm reading it, the first step is LLM turning human imprecise thinking into specific and exact commands
furyofantares 1 days ago [-]
That's true, but that is the input section of the diagram, not the output section where [specific and exact output] is labeled, so I believe there was legitimate confusion I was responding to.
To your point, which I think is separate but related, that IS a case where LLMs are good at producing specific and exact commands. The models + the right prompt are pretty reliable at tool calling by themselves, because you give them a list of specific and exact things they can do. And they can be fully specific and exact at inference time with constrained output (although you may still wish it called a different tool.)
shakna 1 days ago [-]
The tool may not even exist. LLMs are really terrible at admitting where the limits of the training are. They will imagine a tool into being. They will also claim the knowledge is within their realm, when it isn't.
furyofantares 1 days ago [-]
At inference time you can constrain output to a strict json schema that only includes valid tools.
shakna 1 days ago [-]
That would only be possible, if you could prevent hallucinations from ever occurring. Which you can't. Even if you supply a strict schema, the model will sometimes act outside of it - and infer the existence of "something similar".
furyofantares 1 days ago [-]
That's not true. You say the model will sometimes act outside of the schema, but models don't act at all, they don't hallucinate by themselves, they don't produce text at all, they do all of this in conjunction with your inference engine.
The model's output is a probability for every token. Constrained output is a feature of the inference engine. With a strict schema the inference engine can ignore every token that doesn't adhere to the schema and select the top token that does adhere to the schema.
xigoi 24 hours ago [-]
Just because the answer adheres to the schema does not mean that it’s correct.
furyofantares 23 hours ago [-]
Yes, we've been discussing "specific and exact" output. As I said, you might wish it called at different tool; nothing in this discussion is addressing that.
grbsh 1 days ago [-]
Why would you ever hire a human to perform some task for you in a company? They're known for having problems with ambiguity and precision in communication.
Humans require a lot of back and forth effort for "alignment" with regular "syncs" and "iterations" and "I'll get that to you by EOD". If you approach the potential of natural interfaces with expectations that frame them the same way as 2000s era software, you'll fail to be creative about new ways humans interact with these systems in the future.
brookst 1 days ago [-]
I also don’t like command like interfaces for all things, but there are cases where they excel, or where they are necessary due to technical constraints. But when the man page for a simple command runs to 10 screens of options I sometimes wonder.
johnnyanmac 1 days ago [-]
Yeah, it comes and goes in games for a reason. If it's not already some sort of social game, then the time to speak an answer is always slower than 3 button presses to select a pre-canned answer. Navigating a menu with Kinect voice commands will often be slower than a decent interface a user clicks through.
Voice interface only prevails in situations with hundreds of choices, and even then it's probably easier to use voice to filter down choices rather than select. But very few games have such scale to worry about (certainly no AAA game as of now).
d3vmax 2 days ago [-]
Agree. Not all systems require convo mode.
I personally find Chat/Convo/IVR type interface slow/tedious.
Keyboard/Mouse ftw.
However,
A CEO using Power BI with Convo to can get more insights/graphs rather than slice/dicing his data. They do have fixed metrics but incase they want something not displayed.
rurp 23 hours ago [-]
An empirical example would be Amazon's utter failure at making voice shopping a thing with the Echo. There were always a number of obvious flaws with the idea. There's no way to compare purchase options, check reviews, view images, or just scan a bunch of info at once with your eyeballs at 100x the information bandwidth of a computer generated voice talking to you.
Even for straightforward purchases, how many people trust Amazon to find and pick the best deal for them? Even if Amazon started out being diligent and honest it would never last if voice ordering became popular. There's no way that company would pass up a wildly profitable opportunity to rip people off in an opaque way by selecting higher margin options.
steveBK123 1 days ago [-]
Yeah I mean - haven't we already been doing this a decade with home voice assistant speaker things and all found them to be underwhelming?
Theres 1-5 things any individual finds them useful for (timers/lights/music/etc) and then.. thats it.
99.9% of what I use a computer for its far faster to type/click/touch my phone/tablet/computer.
ryandrake 1 days ago [-]
I think a lot of these "voice assistant" systems are envisioned and pushed by senior leadership in companies like SVPs and VPs. They're the ones who make the decision to invest in products like this. Why do they think these products make sense? Because they themselves have personal assistants and nannies and chauffeurs and private chefs, and voice is their primary interface to these people. It makes sense that people who spend all their time vocally telling others to do work, think that voice is a good interface for regular people to tell their computers to do work.
steveBK123 1 days ago [-]
That is actually a very interesting take I've not seen before and does make some sense.
If your work revolves about telling people what to do and asking questions, a voice assistant seems like a great idea (even if you yourself wouldn't have to stoop to using a robotic version since you have a real live human).
If your work actually involves doing things, then voice/conversational text interface quickly falls apart.
scott_w 1 days ago [-]
> you couldn't talk to your passenger while you were "driving" because that might make the car do something weird.
This even happens while walking my dog. If my wife messages me, my iPhone reads it out and, at the same time, I'm trying to cross a road, she'll get a garbled reply which is just me shouting random words at my dog to keep her under control.
guestbest 2 days ago [-]
If the driver could queue actions it would make chat interfaced driving easier since the desired actions could be prepared for implementation by button press rather than needed a dedicated button built at a factory built by an engineer.
moffkalast 2 days ago [-]
Honestly that just says that the interface is too low level. Telling a car to drive you to some place and make it fast is how we interact with taxi drivers. It works fine as a concept, it just needs a higher level of abstraction that isn't there yet.
citrin_ru 2 days ago [-]
Selecting pick-up/drop-off points on a map for me easier than explaining in words and that’s on of appeals of uber like services.
TeMPOraL 1 days ago [-]
It's easier up until it's time to drop you off, and the selected dropoff point is suboptimal or plain impossible to stop at, and you want to give the car last-minute directions. Then the traditional, "human driver way" of looking out the window and telling them where to go based on what you see is far superior than trying to perspective-switch between the 3D situated view and imprecise, finicky 2D map interface.
citrin_ru 1 days ago [-]
A perfect interface would be a combination of both ways. Also it depends on personal preferences. I learned to use a paper map as a teenager and it’s convenient for me but I know some people struggle to find a way even using a map on a smartphone.
MatekCopatek 2 days ago [-]
This only works for tasks where the details of execution are not important. Driving fits that category well, but many other tasks we're throwing at AI don't.
pydry 1 days ago [-]
This rules out conversational UI for some tasks and applications but there are many where it will be useful and many where a hybrid would be best.
Even in a car, being able to control the windscreen wipers, radio, ask how much fuel is left are all tasks it would be useful to do conversationally.
There are some apps (im thinking of jira as an example) where i'd like to do 90% of the usage conversationally.
la_oveja 1 days ago [-]
> Even in a car, being able to control the windscreen wipers, radio, ask how much fuel is left are all tasks it would be useful to do conversationally.
are you REALLY sure you want that?
how much fuel there is is a quick glance into the dash, and you can control precisely the radio volume without even looking.
'turn up the volume', 'turn down the volume a little bit', 'a bit more',...
and then a radio ad going 'get yourself a 3 pack of the new magic wipers...' and car wipers going off.
id hate conversational ui on my car.
notnullorvoid 1 days ago [-]
If the choice for controls is touchscreen vs conversational, conversational wins by a mile. However if physical buttons and dials are an option there's really no competing with that.
I wish car manufacturers stopped with the touchscreen bullshit, but it seems more likely that they'll try to offset the terrible experience with voice controls.
brookst 1 days ago [-]
It’s less common now that car controls have somewhat standardized, but I’m old enough that I remember when rental cars were a pain because it would start raining and you couldn’t find the windshield wipers.
Conversational interfaces are great for rarely used features or when the user doesn’t know how to do something. For repetitive, common tasks they’re terrible.
But nobody is using ChatGPT for repetitive tasks. In fact the whole LLM revolution seems to be about letting users accomplish tasks without having to learn how to do them. Which I know some people look down on, but it’s the literal definition of management (which, to be fair, some people also look down on).
ryandrake 1 days ago [-]
> It’s less common now that car controls have somewhat standardized, but I’m old enough that I remember when rental cars were a pain because it would start raining and you couldn’t find the windshield wipers.
This is a problem of standardization across manufacturers, not something inherent in physical controls. I never have a problem using the steering wheel in a rental car because they're all the same.
You'd have the same problem with voice interfaces: For some rental cars, turning on the wipers would be "Turn on the wipers". For others, you'd have to say "Activate the wipers." For others, "Enable the windshield wipers." There is no way manufacturers will be capable of standardizing on a single phrase.
general_reveal 1 days ago [-]
[dead]
PeterStuer 2 days ago [-]
Here's where the article goes wrong:
1. "Natural language is a data transfer mechanism"
2. "Data transfer mechanisms have two critical factors: speed and lossiness"
3. "Natural language has neither"
While a conversational interface does transfer information, its main qualities are what I always refer to as "blissfull ignorance" and "intelligent interpretation".
Blisfull ignorance allows the requester to state an objective while not being required to know or even be right in how to achieve it. It is the opposite of operational command. Do as I mean, not as I say.
"Intelligent Interpretation" allows the receiver the freedom to infer an intention in the communication rather than a command. It also allows for contextual interactions such as goal oriented partial clarification and elaboration.
The more capable of intelligent interpretation the request execution system is, the more appropriate a conversational interface will be.
Think of it as managing a team. If they are junior, inexperienced and not very bright, you will probably tend towards handholding, microtasking and micromanagement to get things done. If you have a team of senior, experienced and bright engineers, you can with a few words point out a desire and, trust them to ask for information when there is relevant ambiguity, and expect a good outcome without having to detail manage every minute of their days.
2 days ago [-]
throwaway290 2 days ago [-]
> If you have a team of senior, experienced and bright engineers, you can with a few words point out a desire and, trust them to ask for information when there is relevant ambiguity, and expect a good outcome
It's such a fallacy. First thing an experienced and bright engineer will tell you is to leave the premises with your "few words about a desire" and not return without actual specs and requirements formalized in some way. If you do not understand what you want yourself, it means hours/days/weeks/months/literally years of back and forths and broken solutions and wasted time, because natural language is slow and lossy af (the article hits the nail on the head on this one).
Re "ask for information", my favorite example is when you say one thing if I ask you today and then you reply something else (maybe the opposite, it happened) if I ask you a week later because you forgot or just changed your mind. I bet a conversational interface will deal with this just fine /s
lolinder 1 days ago [-]
> First thing an experienced and bright engineer will tell you is to leave the premises with your "few words about a desire" and not return without actual specs and requirements formalized in some way.
No, that's what a junior engineer will do. The first thing that an experienced and bright senior engineer will do is think over the request and ask clarifying questions in pursuit of a more rigorous specification, then repeat back their understanding of the problem and their plan. If they're very bright they'll get the plan down in writing so we stay on the same page.
The primary job of a senior engineer is not to turn formal specifications into code, it's to turn vague business requests into formal specifications. They're senior enough to recognize that that's the actually difficult part of the work, the thing that keeps them employed.
indoordin0saur 1 days ago [-]
You're entirely right. The person you're responding to doesn't sound like a senior engineer so much as a grouchy old engineer who is burned out. Of course, you can get bad clients but expecting them to know exactly what specs they want every time is unreasonable in most situations, particularly if they don't have the technical knowledge of the systems you work in.
throwaway290 9 hours ago [-]
You are either immature as a software engineer and unfamiliar with how software work is done conceptually, or you are jaded and disgruntled from dysfunctional orgs that cannot come up with requirements. That is okay, but you should not try to be instructive to others on this matter.
I love product work and programming. As I wrote in this thread, I did it while freelancing, I do it now at dayjob. I am bored by just programming and want more control over the result. People come to me with "a few words about a desire" and I do come up with specifics and I get credit for it
But I am recognized as a product person, not just programmer. And I know better to not make the mistake you make and pretend that every builder or a structural engineer should be an architect of a building or an urban planner.
People like you is why we have managers come to an expert level say C++ dev with "a few words about a desire" and expect them to decide what thing to build in the first place AND to build it, just to later tell them it was wrong. When there is no product person who determines the reqs random people will make programmer come up with requirements yourself and then later tell you it is not up to "requirements".
This lack of organization and requirement clarity is offensive to expert programmers and probably the reason most projects drag on forever and die.
throwaway290 1 days ago [-]
I used to think like you. My job is to ask questions etc. But after a couple decades I see if someone doesn't bother to even think about the idea enough to understand it himself beyond a few words he is not worth engaging with in this fashion. He doesn't really know what he wants. Today I ask a clarifying question he says one thing, next week he changes his mind or forgets and the result slowly becomes a mess
> The primary job of a senior engineer is not to turn formal specifications into code, it's to turn vague business requests into formal specifications.
Converting vibes and external world into specific requirements is product owner job.
Do not mistake software engineers and product people. These are very different things. Sometimes these things are done by the same person if the org has not enough money. Many freelancers working with small biz do both. I often do both at my day job. But this is a higher level role and if you are a senior engineer doing product stuff I hope it is recognized and you get proportionate comp.
ryandrake 1 days ago [-]
> Do not mistake software engineers and product people. These are very different things. Sometimes these things are done by the same person if the org has not enough money.
I worked for one of the largest, richest tech companies in the world, and (at least in our org) they did not have a dedicated product owner role. They expected this skill from the senior/lead engineers on the teams. Any coder can churn out code and you can call them senior after a few years. But if you want to be considered actually senior, you need to know how to make a product, not just code. IMO if you are a developer and all you know how to do is turn a fully-formed spec/requirements doc into software, and push back on anything that is not fully-formed, you're never going to truly reach "Senior" level, wherever you are.
throwaway290 9 hours ago [-]
Money is not a cure for organizational dysfunction.
But as I said these roles can be done by one person, just remember they are different activities.
lolinder 1 days ago [-]
You and I are either talking about very different kinds of specifications or very different kinds of product people. The product people I'm familiar with are completely incapable of creating a specification that is sufficiently detailed to implement without a lot of back and forth. Not because they're not good at what they do, but because what they do does not include defining requirements in sufficient fidelity for an engineer to act on.
throwaway290 9 hours ago [-]
You should get to know better product people and if you successfully built a project as an engineer without a product person then hey you were one yourself
Hauthorn 1 days ago [-]
I think you work in different domains.
Expecting a good outcome is different from expecting to get exactly what you intended.
Formal specifications are useful in some lines of work and for some projects, less so for others.
Wicked problems would be one example where formal specs are impossible by definition.
johnnyanmac 1 days ago [-]
>Anyway, the disabled are pretty much always allowed to be collateral damage by society, so this will just be senseless pain.
For games, you don't really need nor desire formal specs. But it also can really show how sometimes a director has a low tolerance for interpretation despite their communication being very loose. This leads to situations where it feels like the director is shifting designs on a dome, which is a lose-lose situation for everyone involved.
If nothing else, formal specification is for CYA. You get what you ask for, and any deviation should go in the next task order or have been addressed beforehand.
throwaway290 8 hours ago [-]
> For games, you don't really need nor desire formal specs.
Whoah is this wrong. Maybe when you hear "formal specs" you have something specific in your mind...
Formal spec can mean almost literally anything better than natural language vibes in a "few words about a desire", which is what I replied to because I was triggered by it
throwaway290 1 days ago [-]
> Formal specifications are useful in some lines of work and for some projects, less so for others
There is always formal specification. Code is final formal specification in the end. But converting vague vibes from natural language into a somewhat formalized description is key ability you need for any really new non trivial project idea. Another human can't do it for you, conversational UI can't do it for you...
PeterStuer 1 days ago [-]
I do understand that in bad cases it can be very frustrating as an engineer to chase vague statements only to be told later 'nah, that was not what I meant'. This is especially true when the gap in both directions is very large or there is incompetence and/or even adversarial stances between the parties. Language and communication only work if both parties are willing to understand.
Unfortunately if either is the case "actual specs and requirements formalized", while sounding logical, and might help, in my experience did very little to save any substantial project (and I've seen a lot). The common problem is that the business/client/manager is forced to sign of on formal documents far outside their domain of competence, or the engineers are straitjacketed into commitments that do not make sense or have no idea of what is considered tacit knowledge in the domain and so can't contextualize the unstated. Those formalized documents then mostly become weaponized in a mutual destructive CYA.
What I've also seen more than once is years of formalized specs and requirements work while nothing ever gets produced, and the project is aborted before even the first line of code hit test.
I've given this example before: When Covid lockdows hit there were digitization projects years in planning and budgeted for years of implementation, that were hastily specked, coded and roiled out into production by a 3 person emergency team over a long weekend. Necessity apparently has a way of cutting through the BS like nothing else can.
You need both sides capable, willing and able to understand. If not, good luck mitigating, but you're probably doomed either way.
brookst 1 days ago [-]
I’m a PM and pride myself in specs that give the right level of detail, where “right” can vary hugely depending on context.
But I still get lazy with LLMs and fall into iteration the way bad PM/eng teams do. “Write a SQL query to look at users by gesture by month”. “Now make the time unit a parameter”. “Now pivot the features to columns”. “Now group features hierarchically”. “Now move the feature table to a WITH”.
My point and takeaway is that LLMs are endlessly patient and pretty quick to turn requirements around, so they lend themselves to exploration more than human teams do. Agile, I guess, to a degree that we don’t even aspire to in the human world because it would be very expensive and lead to fisticuffs.
throwaway290 1 days ago [-]
> What I've also seen more than once is years of formalized specs and requirements work while nothing ever gets produced, and the project is aborted before even the first line of code hit test.
It just shows that no one really understood what they wanted. It is crazy to expect somebody to understand something better than you and it is hilarious to want a conversational UI to understand something better than you.
PeterStuer 1 days ago [-]
"It just shows that no one really understood what they wanted."
Then what were the literally room full of formal process and spec documents, meeting reports and formal agreements (near 100.000 pages) by the analysts on either side for? And how did those not 'solve' the understanding problem?
When I go to the garage to have my car serviced, I expect them to understand it way better than I do. When I go to a nice restaurant I expect the cooks to prepare me dishes that taste greater than me writing them out a step-by-step recipe for them to follow. If I hire a senior consultant in even my own domain, I expect them to not just know my niche, but bring tacit knowledge from having worked on these types of solutions across my industry.
Expecting somebody to understand something better than me is exactly the reason why I hire senior people in the first place.
throwaway290 1 days ago [-]
> Then what were the literally room full of formal process and spec documents, meeting reports and formal agreements (near 100.000 pages) by the analysts on either side for? And how did those not 'solve' the understanding problem?
Sure.
There are many possible factors (eg. somebody had a shitty idea and a committee of people sabotaged it because they didn't wanted it to succeed, or it was good but committee interests/politics were against it, or it was generally a dysfunctional org) but it's irrelevant so let's pretend people are good and it's the ideal case.
There was likely somebody who had a good idea originally. However somebody failed to communicate it. Somebody brought vague vibes to the table with N people and they ended up with N different ideas and could not agree on a specific.
It just reiterates the original problem that I described doesn't it?
discreteevent 1 days ago [-]
> it is hilarious to want a conversational UI to understand something better than you.
This is true. But what if you swap "conversational UI" with something actually intelligent like a developer. Then we see this kind of thing all the time: A user has tacit, unconscious knowledge of some domain. The developer keeps asking them questions in order to get a formal understanding of the domain. At the end the developer has a formal understanding and the user keeps their tacit understanding.
In theory we could do the same with an AI - If the AI was actually intelligent.
throwaway290 1 days ago [-]
You described an interaction not between product owner and software engineer but between a user and product owner. A product person can also be a developer, it happens, but do not confuse the two roles before people think you're saying that a conversational UI can be product owner.
The original example I replied to was where somebody had an idea and went with it to some engineering team or conversational interface.
"If the AI was actually intelligent" does a lot of work. To take a few words and make a detailed spec from it and ask the right questions, even humans can't do it for you.
First because most probably you don't really understand it yourself, because you didn't think about it enough.
Second somebody who can do it would need to really deeply understand and want the same things as you. But if chatbot has abilities like "understand" and "want" (which is a special case of "feel", another famous special case of "feel" is "suffer") that is a dangerous territory, because if it understands and feels and has no ability to refuse you and fulfill its wishes etc your "conversational interface" becomes an euphemism, you are using a slave.
johnnyanmac 1 days ago [-]
The US having this culture of blame and deflect doesn't help either. When you're more concerned about making sure you can't be held liable if X fails, then you spend more time covering your tracks than developing the project. And that's how the beauracracy creeps in.
And approach of shared responsibility in all respects (successes and failure) would accelerate past the inevitable shortcomings that occur and let all parties focus on recovering and delivering.
brookst 1 days ago [-]
How about a conversational UI to help you iterate and explore what you want rather than having to know it clearly and in detail before anyone writes any code?
throwaway290 8 hours ago [-]
Regarding iteration, as the article says natural language is just slow and lossy. If you are ok iterating more slowly and constantly explain and correct things then why not? I find it tedious
TeMPOraL 1 days ago [-]
Star Trek continues to be prescient. It not only introduced the conversational interface to the masses, it also nailed its proper uses in ways we're still (re)discovering now.
If you pay attention to how the voice interface is used in Star Trek (TNG and upwards), it's basically exactly what the article is saying - it complements manual inputs and works as a secondary channel. Nobody is trying to manually navigate the ship by voicing out specific control inputs, or in the midst of a battle, call out "computer, fire photon torpedoes" - that's what the consoles are for (and there are consoles everywhere). Voice interface is secondary - used for delegation, queries (that may be faster to say than type), casual location-independent use (lights, music; they didn't think of kitchen timers, though (then again, replicators)), brainstorming, etc.
Yes, this is a fictional show and the real reason for voice interactions was to make it a form of exposition, yadda yadda - but I'd like to think that all those people writing the script, testing it, acting and shooting it, were in perfect position to tell which voice interactions made sense and which didn't: they'd know what feels awkward or nonsensical when acting, or what comes off this way when watching it later.
ben_w 1 days ago [-]
I have similar thoughts on LCARS: the Doylist requirement for displays that are bold enough and large enough to feel meaningful even when viewed on a 1990-era TV, are also the requirements for real life public information displays.
At first glance it feels like real life will not benefit from labelling 90% of the glowing rectangles with numbers as the show does, but second thoughts say spreadsheets and timetables.
blatantly 1 days ago [-]
I remember Picard barking out commands to make the ship do preprogrammed evasion or fight maneuvers too. This seems like another good use.
TeMPOraL 1 days ago [-]
Yeah, this and I think even weapons control, happened on the show. But the scenario for these cases is when the bridge is understaffed for episode-specific plot reasons, and the protagonist has to simultaneously operate systems usually handled by distinct stations. That's when you get an officer e.g. piloting the shuttle/runabout while barking out commands to manage power flow, or voice-ordering evasions while manually operating weapons, etc.
(Also worth noting is that "pre-programmed evasion patterns" are used in normal circumstances, too. "Evasive maneuver JohnDoe Alpha Three" works just as well when spoken to the helm officer as to a computer. I still don't know whether such preprogrammed maneuvers make sense in real-life setting, though.)
But specifically manoeuvres, rather than weapons systems? Today, I doubt it: the ships are too slow for human brains to be the limiting factor. But if we had an impulse drive and inertial dampers (in the Trek sense rather than "shock absorbers"), then manoeuvres would also necessarily be automated.
In the board game Star Fleet Battles (based on a mix of TOS, TAS, and WW2 naval warfare), one of the (far too many*) options is "Erratic Manoeuvres", for which the lore is a combination of sudden acceleration and unpredictable changes in course.
As we live in a universe where the speed of light appears to be a fundamental limit, if we had spaceships pointing lasers at each other and those ships could perform such erratic manoeuvres as compatible with the lore of the show about how fast they can move and accelerate, performing such manoeuvres manually would be effective when the ships are separated by light seconds. But if the combatants are separated by "only" 3000 km, then it has to be fully automated because human nerve impulses from your brain to your finger are not fast enough to be useful.
* The instructions are shaped like pseudocode for a moderately complex video game, but published 10-20 years before home computers were big enough for the rule book. So it has rules for boarding parties, and the Tholian web, and minefields, and that one time in the animated series where the Klingons had a stasis field generator…
jeremyjh 1 days ago [-]
There was an episode where Beverly Crusher was alone on the ship, and controlled everything just by talking to the computer. I wondered why there is a bridge, much less a bridge crew. But yes it makes sense to use higher bandwidth control systems when possible.
ben_w 1 days ago [-]
If that was the episode where the crew disappeared with nobody else but her noticing, it doesn't really count because she was trapped in a Negative Space Wedgie pocket dimension based on her own thoughts at the time she was trapped.
jeremyjh 1 days ago [-]
Yes, that was it. I think though that she had a good enough understanding of the ship's capabilities that her private world would have been realistic in that respect.
TeMPOraL 1 days ago [-]
Whatever understanding she had back then, "a lot has happened in the last 20 years" (30+ IRL) between then and that memorable ending of Picard S3 :).
johnnyanmac 1 days ago [-]
Star trek's crews overall are chosen in a way that seems to consider redundancies, as well as meshing as a team that can offer varying viewpoints.
It runs directly counter to that more capitalistic mindset of "why don't we do more with less?" when spending years navigating all kinds of unknown situations, you want as many options as possible available.
TeMPOraL 1 days ago [-]
Definitely plays well with the kind of scenarios the writers throw at them - you can pretty much expect any Starfleet officer, whether a commander or an ensign, to operate any system on the ship with at least some passing competence. There's no "I work in stellar cartography, I don't know which button fires torpedoes or how to turn on the bio-bed in sick bay" on a Starfleet ship, except when uttered as a joke (or with EMHs). Overkill in real life? Perhaps. But definitely reassuring.
Hell, if someone really didn't know, they could expect "Computer, turn on the bio-bed 3" to just work - circling us back to the topic of what NLP and voice interfaces are good for.
beefnugs 13 hours ago [-]
Actually this is why Lower Decks is so neat, they hint that there are juniors who barely know anything. That might have been the one sticking negative for me growing up, was that you had to be some super smartypants to be anywhere near starfleet
cdrini 2 days ago [-]
Completely agree, voice UI is best as an augmentation of our current HCI patterns with keyboard/mouse. I think one of the reasons this is, is because our brains kind of have separate buffers for visual memory and aural memory (Baddeley's working memory model). Most computer use takes up the visual buffer, and our aural buffer has extra bandwidth. This also means we can do things aurally while still maintaining focus/attention on what we're doing visually, allowing a kind of multitasking.
One thing I will note is that I'm not sure I buy the example for voice UIs being inefficient. I've almost never said "Alexa what's the weather like in Toronto?". I just say "Alexa, weather". And that's much faster than taking my phone out and opening an app. I don't think we need to compress voice input. Language kind of auto-compresses, since we create new words for complex concepts when we find the need.
For example, in a book club we recently read "As Long as the Lemon Trees Grow". We almost immediately stopped referring to it by the full title, and instead just called it "lemons" because we had to refer to it so much. Eg "Did you finish lemons yet?" or "This book is almost as good as lemons!". The context let shorten the word. Similarly the context of my location shortens the word to just "weather". I think this might be the way the voice UIs can be made more efficient: in the same way human speech makes itself more efficient.
incognito124 2 days ago [-]
> This also means we can do things aurally while still maintaining focus/attention on what we're doing visually, allowing a kind of multitasking.
Maybe you, but I most definitely cannot focus on different things aurally and visually. I never successfully listened to something in the background while doing something else. I can't even talk properly if I'm typing something on a computer.
cdrini 1 days ago [-]
Or to clarify, I don't think one can be in deep flow eg programming and simultaneously in deep flow having an aural conversation; we're human we can't truly multitask. But I do think that if you're focusing on something using your computer, it's _less_ disruptive to eg say "Alexa remind me in twenty minutes to take out the trash" then it is to stop what you're doing and put that in an app on your computer.
theshackleford 2 days ago [-]
Yup, we are all different. I require auditory stimulation to work at my peak.
I did horribly in school but once I was in an environment where I could have some kind of background audio/video playing I began to excel. It also helps me sleep of a night. It’s like the audio keeps the portion of me that would otherwise distract me occupied.
gblargg 2 days ago [-]
The multitasking is something I like about smart home speakers. I can be asking it to turn the lights on/off or check the temperature, while doing other things physically and not interrupting them, often while walking through the room. Even if voice commands are slower, they don't interrupt other processing nearly as much as having to visually devote attention and fine motor skills, and navigate to the right screen in an app to do what you want.
XorNot 2 days ago [-]
I feel like the people using Voice Attack or whatever in space sims zeroed in on this.
It's very useful being able to request auxillary functions without losing your focus, and I think that would apply to say, word editing as well - e.g. being able to say "insert a date here" rather the having to get into the menus to find it.
Conversely, latency would be a big issue.
pugio 2 days ago [-]
> The second thing we need to figure out is how we can compress voice input to make it faster to transmit. What’s the voice equivalent of a thumbs-up or a keyboard shortcut? Can I prompt Claude faster with simple sounds and whistles?
The number of times in the last few years I've wanted that level of "verbal hotkeys"... The latencies of many coding llms are still a little bit too low to allow for my ideal level of flow (though admittedly I haven't tried one's hosted on services like groq), but I can clearly envision a time when I'm issuing tight commands to a coder model that's chatting with me and watching my program evolve on screen in real time.
On a somewhat related note to conversational interfaces, the other day I wanted to study some first aid stuff - used Gemini to read the whole textbook and generate Anki flash cards, then copied and pasted the flashcards directly into chat GPT voice mode and had it quiz me. That was probably the most miraculous experience of voice interface I've had in a long time - I could do chores while being constantly quizzed on what I wanted to learn, and anytime I had a question or comment I could just ask it to explain or expound on a term or tangent.
WhyIsItAlwaysHN 2 days ago [-]
I worked like that for a year in uni because of RSI and it's very easy to get voice strain if you use your voice for coding like that. Many short commands is very tiring for the voice.
It's also hard to dictate code without a lot of these commands because it's very dense in information.
I hope something else will be the solution. Maybe LLMs being smart enough to guess the code out of a very short description and then a set of corrections.
mplanchard 5 hours ago [-]
Would be nice to be able to do something like write a function signature and then just say “fill out this function,” with it having the implicit needed context, as though it had been pairing with you all along and is just taking the wheel for a second. Or when you’ve finished writing a function, “test this function with some happy path inputs.” I feel like I’d appreciate that kind of use, which could integrate decently into the flow state I get into when programming. The current suite of tools for me often feels too clunky, with the need to explicitly manage context and queries: it takes me out of my flow state and feels slower than just doing it myself.
szszrk 2 days ago [-]
Oh wow. That video is 12 years old. Early in the presentation Travis reveals he used Dragon back then.
Do you recall Swype keyboard for Android? The one that popularized swyping to write on touch screens? It had Dragon at some point.
IT WAS AMAZING.
Around 12-14 years ago (Android 2.3? Maybe 3?) I was able to easily dictate full long text messages and emails, in my native tongue, including punctuation and occasional slang or even word formation. I could dictate a decent long paragraph of text on the first try and not have to fix a single character.
It's 2025 and the closest I can find is a dictation app on my newest phone that uses online AI service, yet it's still not that great when it comes to punctuation and requires me to spit the whole paragraph at once, without taking a breath.
Is there anything equally effective for any of you nowadays? That actually works across the whole device?
davvid 14 hours ago [-]
> It's 2025 and the closest I can find is a dictation app on my newest phone that uses online AI service, yet it's still not that great [...]
> Is there anything equally effective for any of you nowadays?
I'm not affiliated in any way. You might be interested in the "Futo Keyboard" and voice input apps - they run completely offline and respect your privacy.
The source code is open and it does a good job at punctuation without you needing to prompt it by saying, "comma," or, "question mark," unlike other voice input apps such as Google's gboard.
I know and like Futo, very interesting project. Unfortunately multilang models are not great in my case. Still not bad for an offline tool, but far from "forget it's there, just use it" vibe I had with Dragon.
Funny thing is that I may have missgonfigured something in futo, because my typing corrections are phonetical :) so I type something in Polish and get autocorrect in English composed of different letters, but kind of similar sounding word.
Cthulhu_ 1 days ago [-]
It sounds like Dragon was never ambitious enough, and / or the phone manufacturers were too closed off to allow them entry into that market.
But now Microsoft bought them a few years ago. Weird that it took so long though.
android521 2 days ago [-]
>I admit that the title of this essay is a bit misleading (made you click though, didn’t it?). This isn’t really a case against conversational interfaces, it’s a case against zero-sum thinking.
No matter the intention or quality of the article, i do not like this kind of deceitful link-bait article. It may have higher quality than pure link-bait but nobody like to be deceived
indoordin0saur 1 days ago [-]
I did not find the article to be deceitful at all. He does make a case against overuse of conversational interfaces. The author is just humbly acknowledging his position is more nuanced than the title of article might suggest.
mpalmer 1 days ago [-]
"Humbly"? The author has full control over the title, and in addition to being bait, the title is not humble at all.
Not a case against, but the case against.
johnnyanmac 1 days ago [-]
I simply saw that as tongue in cheek about how the author wanted to use a more general core point. The lens of conversational interfaces makes a good case for that while keeping true to the idea.
You can argue against something but also not think it's 100% useless.
whatnow37373 1 days ago [-]
It's no wonder extraverted normie and managerial types that get through their day by talking think throwing words at a problem is the best thing since sliced bread.
They have problems like "compose an email that vaguely makes the impression I'm considering various options but I'm actually not" and for that, I suspect, the conversational workflow is quite good.
Anyone else that actually just does the stuff is viscerally aware of how sub-optimal it is to throw verbiage at a computer.
I guess it depends on what level of abstraction you're working at.
sevensor 1 days ago [-]
The best executives to work for are the ones who are able to be as precise at their level of abstraction as I am at mine. There’s a shared understanding at an intermediate level, and we can resolve misunderstandings quickly. And then there are the executives who think we should just feed our transducer data into an llm.
techpineapple 2 days ago [-]
There’s an interesting… paradox? Observation? That up until 20-30 years ago, humans were not computerized beings. I remember a thought leader at a company I worked at said that the future was wearable computing, a computer that disappears from your knowing and just integrates with your life. And that sounds great and human and has a very thought leadery sense of being forward thinking.
But I think it’s wrong? Ever since the invention of the television, we’ve been absolutely addicted to screens. Screens and remotes, and I think there’s something sort of anti-humanly human about it. Maybe we don’t want to be human? But people I think would generally much rather tap their thumb on the remote than talk to their tv, and a visual interface you hold in the palm of your hand is not going away any time soon.
neom 2 days ago [-]
I went through Waldorf education and although Rudolf Steiner is quite eccentric, one thing I think he was spot on about was regarding WHEN you introduce technology. He believed that introducing technology or mechanized thinking too early in childhood would hinder imaginative, emotional, and spiritual development. He emphasized that children should engage primarily with natural materials, imaginative play, storytelling, artistic activities, and movement, as opposed to being exposed prematurely to mechanical devices or highly structured thinking, I seem to recall he recommended this till the age of 6.
My parents did this with me, no screens till 6 (wasn't so hard as I grew up in the early 90s, but still, no TV). I notice too how much people love screens, that non-judgmental glow of mental stimulation, it's wonderful, however I do think it's easier to "switch off" when you spent the first period of your life fully tuned in to the natural world. I hope folks are able to do this for their kids, it seems it would be quite difficult with all the noise in the world. Given it was hard for mine during the era of CRT and 4 channels, I have empathy for parents of today.
soulofmischief 2 days ago [-]
I will counter this by saying that my time spent with screens before 6 was unimaginably critical for me.
If I hadn't had it, I would have been trapped by the racist, religously zealous, backwoods mentality that gripped the rest of my family and the majority of the people I grew up with. I discovered video games at age 3 and it changed EVERYTHING. It completely opened my mind to abstract thought and, among other things, influenced me to teach myself to read at age 3. I was reading at a collegiate level by age five and discovered another passion, books. Again, propelled me out of an extremely anti-intellectual upbringing.
I simply could not imagine where I would be without video games, visual arts or books. Screens are not the problem. Absent parenting is the problem. Not teaching children the power of these screens is the problem.
f1shy 2 days ago [-]
I second this motion. Technology is just a tool. It can be wisely used or not. Just forbidding it, is not wise in my opinion. You have to be careful to use it properly, or course.
Also let me drop the thought here, that Rudolf Steiner, like Montesori and the like, shoot "this is good" "this is bad" based on "feeling" or intuition, or such. There were no extensive scientific studies behind it.
soulofmischief 1 days ago [-]
The funny thing is that I remember the exact moment I fell in love with computers at 4. My grandmother cleaned houses and was often very late to pick me up from Headstart. So I would spend hours waiting, unsupervised, in a room with computer that had a giant note attached to the screen saying DO NOT TOUCH.
>:)
By 5, all I wanted was a computer. To me they represented and unending well of knowledge.
setr 2 days ago [-]
I’ve been theory crafting around video games for children on the opposing premise. I think fundamentally the divide is on the quality of content — most games have some value to extract, but many are designed to be played inefficiently, and require far more time investment than value extracted.
Eg Minecraft, Roblox, CoD, Fortnite, Dota/LoL, the various mobile games clearly have some kind of value (mechanical skill, hand-eye coordination, creative modes, 3D space navigation / translation / rotation, numeric optimization, social interaction, etc), but they’re also designed as massive timesinks mostly through creative mode or multiplayer.
Games like paper Mario, pikmin, star control 2, katamari damacy, lego titles, however are all children-playable but far more time efficient and importantly time-bounded for play. Even within timesink games there are higher quality options — you definitely get more, and faster, out of satisfactory / factorio than modded Minecraft. If you can push kids towards the higher quality, lower timesink games, I think it’s worth. Fail to do so and it’s definitely not.
The same applies to TV, movies, books, etc. Any medium of entertainment have horrendous timesinks to avoid, and if you can do so, avoiding the medium altogether is definitely a missed opportunity. Screens are only notable in that the degenerate cases are far more degenerate than anything that came before it
neom 2 days ago [-]
Oh, his theory wasn't about video games though, they didn't exist in 1910, it was about the full breadth of human sensorial systems being used in the context of our neurology for a prolonged period of time during high neuroplasticity (0 to 6 was his theory). I haven't really played video games, so I don't know much about them personally.
setr 2 days ago [-]
No I get that; video games are just my medium of choice. The problem I was trying to get at is these arguments and perceptions usually stem from the degenerate cases, which only get worse the further in time you go, but I don’t think it’s really due to the technology itself. You have the same braindead systems appear in any medium of entertainment — there are definitely systems of total waste in sports, physical play (I’ve yet to encounter anything so degenerate as balltapping — and that shit spreads rapidly once it starts), literature, etc.
It can hardly be said that a studio ghibli flick stunted the imagination of children worldwide but I would definitely believe it if you suggested cocomelon rotted the brains directly out of their skulls
I think it’s also worth noting that kids have a shitload of time. They can engage in both technologies and physical play and other activities simultaneously; the problem occurs when singular or few activities overwhelmingly consume that time — which is why I claim the unbounded timesinks can be catastrophic — and what I think most people are worried about when they blanket-ban whole systems/mediums
theshackleford 1 days ago [-]
I owe my entire career and livelihood to a childhood spent with the unbounded timesinks that were the games available to me on Amiga and my PC.
I might be a touch different in that it was obvious where I was going, and the correct decision was made to embrace my interest in the glowing screen and yes, the video games. It was video games more than anything else from which all other interests spawned.
More often than not it probably ends badly though I suppose. Despite a lifetime spent in front of screens all my social abilities work, I have a wide friends circle, a partner, my job requires me to work well with a wide variety of individuals and demographics etc which I couldn’t do otherwise. I have noticed this is not the case with all who shared a similar background.
nine_k 2 days ago [-]
I don't see a contradiction. Watching passively in an expectation of a dopamine hit = bad. Playing actively with things that respond in various interesting ways = good, no matter if the things are material or virtual.
It is, packed of pseudosciences, which we still suffer today.
In Switzerland, we get often measle outbreaks thanks to his cult.
neom 2 days ago [-]
Well, how folks view the philosophy can be multifaceted, so I'll leave the pseudoscience and cult part aside. On the measles, Steiner was certainly skeptical of vaccination, but I think in Switzerland you have a cultural issue with vaccination. The Waldorf school I went to in Canada, everyone had a measles vaccine, but I do recall a Swiss student coming to our distinctively not Waldorf high school and there being a huge song and dance about their vaccination status, I think as a society generally...you've got some problems there?
ithkuil 2 days ago [-]
When societies get advanced enough that all the basic needs are covered, a new generation arises where people think we can go back to a simpler past and ditch all that ugly gray industrial scientific technocratic globalistic etc etc ( add more scary qualifiers ) things that are perceived to be the the reason why things are bad and never ever concede that these things play an important role in enabling the safe environment where those very thoughts can be entertained.
The hedonic treadmill is driving the world
f1shy 2 days ago [-]
It is pseudoscience, as they speak as if it was science (made categorical affirmations of what is better and worst for the education), but there is not science behind it. The cult is more controversial. But as long as people believe something that is not scientifically backed, for me at least, that is what I call religion.
lrem 2 days ago [-]
Playing computer games since an early age made me who I am. It required learning English a decade earlier than my peers. It pulled me into programming around start of primary school. I wouldn’t be a staff engineer in a western country without these two.
bsder 2 days ago [-]
> Screens and remotes, and I think there’s something sort of anti-humanly human about it.
When I was teaching, I used to force students using laptops to sit near the back of the room for exactly this reason. It's almost impossible for humans to ignore a flickering screen.
strogonoff 2 days ago [-]
Sensitivity to stimuli behind orienting impulse varies by individual and I wish I was less sensitive on daily basis.
These days screen brightness goes pretty high and it is unbelievable how many people seem to never use their screen (phone or laptop) on anything less than 100% brightness in any situation and are seemingly not bothered by flickering bright light or noise sources.
I am nostalgic about old laptops’ dim LCD screens that I saw a few times as a kid, they did not flicker much and had a narrow angle of view. I suspect they would even be fine in a darkened classroom.
Al-Khwarizmi 1 days ago [-]
The last few times I've bought a new monitor, I've gone through the process of adjusting brightness based on comparing a document on screen to a paper sheet. This invariably results into going from defaults of 50-70% to very low figures like 5-15%, and it's not that I work in dark places, my offices have reasonable light from outside. I would be extremely uncomfortable using default settings, for me they are absurdly bright.
King-Aaron 2 days ago [-]
A flickering screen is modern man's flickering campfire.
LoganDark 2 days ago [-]
Computers are tools, not people. They should be made easier to use as tools, not tried to be made people. I actually hate people, tools are much better.
alnwlsn 1 days ago [-]
Somebody showed me a text-to-CAD AI tool recently, and I can't help but feel that whoever made it doesn't understand that people who use CAD aren't trying to solve the problem of "make a model of a rubber duck" but something more like "make a custom angle bracket which mounts part number xxxyyyy". Sure, you can try to describe what you want in words, but there's a reason machine shops want drawings and not a 300 word poem like you're a 14th century monk. Much much easier to just draw a picture.
stevage 1 days ago [-]
Surely those text based tools exist for people who aren't CAD experts. I don't know CAD. But a tool that let me type in a description of a thing and then send it off to be 3D printed sounds pretty great to me.
DabeDotCom 1 days ago [-]
> It was like they were communicating telepathically.
>
> That is the type of relationship I want to have with my computer!
The problem is, "The Only Thing Worse Than Computers Making YOU Do Everything... Is When They Do Everything *FOR* You!"
"ad3} and "aP might not be "discoverable" vi commands, but they're fast and precise.
Plus, it's easier to teach a human to think like a computer than to teach a computer to think like a human — just like it's easier to teach a musician to act than to teach an actor how to play an instrument — but I admit, it's not as scalable; you can't teach everyone Fortran or C, so we end up looking for these Pareto Principle shortcuts: Javascript provides 20% of the functionality, and solves 80% of the problems.
But then people find Javascript too hard, so they ask ChatGPT/Bard/Gemini to write it for them. Another 20% solution — of the original 20% is now 4% as featureful — but it solves 64% of the world's problems. (And it's on pace to consume 98% of the world's electricity, but I digress!)
PS: Mobile interfaces don't HAVE to suck for typing; I could FLY on my old Treo! But "modern" UI eschews functionality for "clean" brutalist minimalism. "Why make it easy to position your cursor when we spent all that money developing auto-conflict?" «sigh»
grbsh 1 days ago [-]
I think we can have the best of both worlds here. We want the precision and speed of using vi commands, but we want the discoverability of GUI document editors. LLMs may be able to solve the discoverability problem. If the editor can be highly confident that you want to use a given a command, for example, it can give you an intellisense like completion option. I don't think we've cracked the code on how this UX should work yet though -- as evidenced by how many people find cursor/copilot autocompletion suggestions so frustrating.
The other great thing about this mode is that it can double as a teaching methodology. If I have a complicated interface that is not very discoverable, it may be hard to sell potential users on the time investment required to learn everything. Why would I want to invest hours into learning non-transferrable knowledge when I'm not even sure I want to go with this option versus a competitor? It will be a far better experience if I can first vibe-use the product , and if it's right for me, I'll probably be incented to learn the inner workings of it as I try to do more and more.
Izkata 1 days ago [-]
> We want the precision and speed of using vi commands, but we want the discoverability of GUI document editors.
> The other great thing about this mode is that it can double as a teaching methodology.
gvim has menus and puts the commands in the menus as shortcuts. I learned from there vim has folding and how to use it.
benob 2 days ago [-]
To me natural language interfaces are like the mouse-driven menu vs terminal interpreter. They allow good discoverability in systems that we don't master at the cost of efficiency.
As always, good UI allows for using multiple modalities.
chthonicdaemon 2 days ago [-]
I feel like chat interfaces have terrible discoverability. You can ask for anything but you have no idea what the system can actually do. In the menu system the options were all spelled out - that's what discoverability means to me. If you spend enough time going through the menus and dialogs you will find all the options, and in a well-designed interface you might notice a function you didn't know about near the one you're using now.
What chat interfaces have over CLIs is good robustness. You can word your request in lots of different ways and get a useful answer.
InsideOutSanta 2 days ago [-]
Yes, this is exactly it. For things that I do rarely, I would love to have a working natural language interface because I know what I want to do, but I don't know how to do it. Even if there were more efficient ways to achieve my goal, since I do not know what they are, the inefficiencies of a natural language interface do not matter to me.
In this sense, natural language interfaces are more powerful search features rather than a replacement for other types of interfaces.
benrutter 2 days ago [-]
Yesyesyesyes! I do wish I could think of more examples supporting both well.
VSCode is probably the best I can think of, where keyboard shortcuts can get you up to a decent speed as an advanced user, but mouse clicks provide an easy intro for a new user.
For the most part, I see tools like NVim, which is super fast but not new-user friendly. Or IOS, which a toddler can navigate, but doesn't afford many ways to speed up interactions like typing.
earcar 1 days ago [-]
Who's actually making the claim we should replace everything with natural language? Almost nobody serious. This article sets up a bit of a strawman while making excellent points.
What we're really seeing is specific applications where conversation makes sense, not a wholesale revolution. Natural language shines for complex, ambiguous tasks but is hilariously inefficient for things like opening doors or adjusting volume.
The real insight here is about choosing the right interface for the job. We don't need philosophical debates about "the future of computing" - we need pragmatic combinations of interfaces that work together seamlessly.
The butter-passing example is spot on, though. The telepathic anticipation between long-married couples is exactly what good software should aspire to. Not more conversation, but less need for it.
Where Julian absolutely nails it is the vision of AI as an augmentation layer rather than replacement. That's the realistic future - not some chat-only dystopia where we're verbally commanding our way through tasks that a simple button press would handle more efficiently.
The tech industry does have these pendulum swings where we overthink basic interaction models. Maybe we could spend less time theorizing about natural language as "the future" and more time just building tools that solve real problems with whatever interface makes the most sense.
mattmanser 1 days ago [-]
I don't think it's a straw man, there's lots of people who think it might, or under vague impressions that it might. Plenty of less technical people. Because they haven't thought it through.
The article is useful as it's enunciated arguments which many of us have intuited, but are not necessarily able to explain ourselves.
nottorp 2 days ago [-]
> because after 50+ years of marriage he just sensed that she was about to ask for it. It was like they were communicating telepathically.
> That is the type of relationship I want to have with my computer!
He means automation of routine tasks? Took 50 years to reach that in the example.
What if you want to do something new? Will the thought guessing module in your computer even allow that?
chongli 2 days ago [-]
I don't know, but I feel like we already have the "telepathic grandfather interface." Or at least we try to have it. My iPhone is constantly guessing at things to suggest to me (I use the share button a lot in different apps) and it's wrong more often than not, forcing me to constantly hunt for things (to say nothing about autocorrect, which is constantly changing correct words that I'd previously typed into incorrect ones)! It doesn't even use a basic, sensible LRU eviction policy. It has some totally inscrutable method of determining what to suggest!
If we want an interface that actually lets us work near the speed of thought, it can't be anything that re-arranges options behind our back all the time. Imagine if you went into your kitchen to cook something and the contents of all your drawers and cupboards had been re-arranged without your knowledge! It would be a total nightmare!
We already knew decades ago that spatial interfaces [1] are superior to everything else when it comes to working quickly. You can walk into a familiar room and instinctively turn on a light by reaching for the switch without even looking. With a well-organized kitchen an experienced chef (or even a skilled home cook) can cook a very complicated dish very efficiently when they know where all of the utensils are so that they don't need to go hunting for everything.
Yet today it seems like all software is constantly trying to guess what we want and in the process ends up rearranging everything so that we never feel comfortable using our computers anymore. I REALLY miss using Mac OS 9 (and earlier). At some point I need to set up some vintage Macs to use it again, though its usefulness at browsing the web is rather limited these days (mostly due to protocol changes, but also due to JavaScript). It'd be really nice to have a modern browser running on a vintage Mac, though the limited RAM would be a serious problem.
> With a well-organized kitchen an experienced chef (or even a skilled home cook) can cook a very complicated dish very efficiently when they know where all of the utensils are so that they don't need to go hunting for everything.
Even I can make a breakfast without looking in my kitchen, because I know where all the needed stuff is :)
On another topic, it doesn't have to look well organized. My home office looks like a bomb exploded in it, but I know exactly where everything is.
> I REALLY miss using Mac OS 9 (and earlier).
I was late to the Mac party, about the Snow Leopard days. I definitely remember that back then OS X applications weren't allowed to steal focus from what I had in the foreground. These days every idiotic splash screen steals my typing.
albertsondev 2 days ago [-]
This right here is probably my single biggest complaint with modern computing. It's a phenomenon I've taken to calling, in daily life, "tools trying to be too damn smart for their own good". I detest it. I despise it. Many of the evils of the modern state of tech--algorithmic feeds, targeted advertising, outwardly user-hostile software that goes incredible lengths to kneecap your own ability to choose how to use it--so, so much of it boils down to tools, things that should be extensions of their users' wills, being designed to "think" they know better what the user wants to do than the users themselves. I do not want my software, designed more often than not by companies with adversarial ulterior motives, to attempt to decide for me what I meant to watch, to listen to, to type, to use, to do. It flies in the face of the function of a tool, it robs people of agency, and above all else it's frankly just plain annoying having to constantly correct and work around these assumptions made based on spherical users in frictionless vacuums and tuned for either the lowest common denominator or whatever most effectively boosts some handful of corporate metrics-cum-goals (usually both).
I want my computer to do what I tell it to, not what it (or rather, some bunch of brainworm-infested parasites on society locked in a boardroom) thinks I want to do.
I can make exceptions for safety-critical applications. I do not begrudge my computer for requiring additional confirmation to rm -rf root, or my phone for lowering my volume when I have it set stupidly loud, or my car for having overly-sensitive emergency stop or adaptive cruise functions. These cases also all, crucially, have manual overrides. I can add --no-preserve-root, crank my volume right back up, and turn off cruise control and control my speed with the pedals. Forced security updates I only begrudge for their tendency to serve as a justification or cover for shipping anti-features alongside. Autocorrecting the word "fuck" out of my vocabulary, auto-suggesting niche music out of my listening, and auto-burying posts from my friends who don't play the game out of my communications are not safety-critical.
Let computers be computers. Let them do what I ask of them. Let me make the effort of telling them what that is.
Is that so much to ask>
1 days ago [-]
rimeice 2 days ago [-]
Individual UIs have been built for every product that has a UI with specific shortcuts and specific techniques you learn to use that tool. I don’t see why the same couldn’t apply for speech interfaces. The article does mention we haven’t figured out shortcuts like the thumbs up equivalent in speech yet but doesn’t explore that further. I can imagine specific words or combinations of words being used to control certain software that you have to learn. Eventually there would be some unification for common tasks.
Arainach 2 days ago [-]
Speaking is fundamentally slower than typing or using a mouse, and it is a catastrophically bad choice if you are not alone in a room.
fellerts 1 days ago [-]
> To put the writing and speaking speeds into perspective, we form thoughts at 1,000-3,000 words per minute. Natural language might be natural, but it’s a bottleneck.
Natural language is very lossy: forming a thought and conveying that through speech or text is often an exercise in frustration. So where does "we form thoughts at 1,000-3,000 words per minute" come from?
The author clearly had a point about the efficiency of thought vs. natural language, but his thought was lost in a layer of translation. Probably because thoughts don't map cleanly onto words: I may lack some prerequisite knowledge to graph what the author is saying here, which pokes at the core of the issue: language is imperfect, so the statement "we form thoughts at 1,000-3,000 words per minute" makes no sense to me.
Meta-joking aside, is "we form thoughts at 1,000-3,000 words per minute" an established fact? It's oddly specific.
paulluuk 1 days ago [-]
I'm also curious about this -- I'm pretty sure that I think actual words at about the speed at which I can speak them. I can not speak 3000 words per minute.
I also have my doubts about the numbers put forward on reading, listening and speaking. When reading, again I can read words about as fast as I can speak words. When I'm reading, I am essentially speaking out the words but in my mind. Is that not how other people read?
fellerts 1 days ago [-]
It sounds like you have a strong inner monologue. Some people do, some don't. I don't subvocalize (no inner voice when reading). Words aren't involved when I think about stuff. I don't have an inner "voice" at all, only when I'm trying to communicate. Maybe I need to more "translating" from thought to voice than you do?
This stuff is fascinating.
whatevertrevor 1 days ago [-]
Nope. Plenty people don't have an internal monologue, and even if they do it's not on all the time.
For me, when I need to think clearly about a specific/novel thing, a monologue helps, but I don't voice out thoughts like "I need a drink right now".
Also I read much faster than I speak, I have to slow down while reading fiction as a result.
macleginn 2 days ago [-]
I agree with some of the sentiments in the post, but I am somewhat surprised by the framing. Why make ‘a case’ against something that will clearly win or lose depending on adoption? Is the author suggestion that we should not be betting our money or resources on developing this? In this case we need more details for particular use cases, I would say.
3l3ktr4 2 days ago [-]
I disagree with the author when they say something along the lines of “why don’t we use buttons instead of using these new assistive technology? Buttons are much faster, and I proved humans like fast.”
I think that’s false. Why after 10 years of software development I haven’t learned EMACS? Because I’m lazy, because I don’t think it’s the bottleneck of my work. My bottleneck might be creativity or knowledge and conversational interfaces might be the best thing there are for these (in the lack of a knowledgeable and kind human, which the author also seems to agree with).
Anyway, I don’t know, I found the title a bit disconnected from the content and the conclusions a bit overlappingly confusing but this is a complicated question. In the end I agree that we want a mix of things, we want a couple of keyboard strokes and we want chats. But most of all we probably want direct brain interface! ;)
eviks 1 days ago [-]
> but we’ve never found a mobile equivalent for keyboard shortcuts. Guess why we still don’t have a truly mobile-first productivity app after almost 20 years since the introduction of the iPhone?
Has it even been tried? Is there an iPhone text editing app with fully customizable keyboard that allows for setting up modes/gestures/shortcuts, scriptable if necessary?
> A natural language prompt like “Hey Google, what’s the weather in San Francisco today?” just takes 10x longer than simply tapping the weather app on your homescreen.
That's not entirely fair, the natural language could just as well be side button + saying "Weather" with the same result, though you can make app availability even easier by just displaying weather results on the homescreen without tapping
These are both desktop equivalents using an actual desktop keyboard or a mini variant thereof
walterbell 1 days ago [-]
Why is Blackberry a desktop equivalent? It preceded iPhone by many years, with unique workflows that varied by model.
eviks 1 days ago [-]
Because it's literally a physical=desktop keyboard, just smaller in size while almost all current mobile interfaces are touch based?(also, the question wasn't about uniqueness, but productivity levels of a desktop productivity app, think about code editors with extensions, keyboard and mouse gesture customization.
What did they have in their touch interfaces?
walterbell 1 days ago [-]
For most of their existence, Blackberry had no touch interface. One appeared in later versions as they tried to compete with Android and iPhone. One example of a "mobile keyboard" shortcut was long pressing a physical key to launch a specific function.
It might be hard to understand now, but Blackberry power users could be much more productive with email/texting than any phone that exists today. But they were special purpose 2-way radio (initially, pager) devices that lacked the flexibility of modern apps with full internet data access.
gatinsama 2 days ago [-]
It is a huge turnoff for me when futuristic series use conversational interfaces. It happened in the Expanse and was hard to watch. For anyone who likes to think, learn, and tinker with user interfaces (HCI in general), it's obviously a high-latency and noisy channel.
internet_points 2 days ago [-]
I actually found that quite reasonable. E.g. they were using it to sort and filter data, just like people today use llm's to write their R script and (avoid having to) figure out how to invoke gnuplot. I'm sure somewhere in that computer it's still invoking gnuplot under a century of vibe-coded moldy spaghetti code =P
I don't remember where else they used voice, they had a lot of other interface types they switched between. Tried searching for a clip and found this quote:
> The voice interface had been problematic from the start.
> The original owner was Chinese so, I turned the damn thing off.
So yes, quite realistic :-)
woile 1 days ago [-]
I think the expanse nails it quite well. I really like when they move the videos from one screen to another. Or when they interact with the ship, they use all kind of outputs, voice, screens, buttons. For planning together, they talk and the machine renders, but then they have screens or even bracelets to interact.
perlgeek 2 days ago [-]
> To put the writing and speaking speeds into perspective, we form thoughts at 1,000-3,000 words per minute. Natural language might be natural, but it’s a bottleneck.
We might form fleeting thoughts much faster than we can express them, but if we want to formulate thoughts clearly enough to express them to other people, I think we're close to the ~150 words per minute we can actually speak.
I recently listened to a Linguistics podcast (lingthusiasm, though I don't recall which episode) where they talked about the efficiency of different languages, and that in the end they all end up roughly the same, because it's really the thought processes that limit the amount of information you communicate, not the language production.
tgv 2 days ago [-]
There is no evidence for any of that. Thoughts can form relatively quickly, but there's no way we can keep that up. Thoughts seem to stick around for a while.
And thoughts develop over time. They're often not conceived complete. That has been shown with some clever experiments.
And language production also puts a limit on our communication channel. It is probably optimized to convert communication intent into motor actions. It surely takes its time. That is not a problem for the system, since motor actions are slow. Idk where "lingthusiam" gets their ideas from, but there's psycholinguistic literature dating back to the 1920s that is often neglected by linguists.
vakkermans 1 days ago [-]
I appreciate the attempt at making sense of conversational interfaces, but I don't think natural language as a "data transfer mechanism" is a productive way of doing it.
Natural language isn't best described as data transfer. It's primarily a mechanism for collaboration and negotiation. A speech act isn't transferring data, it's an action with intent. Viewed as such the key metrics are not speed and loss, but successful coordination.
This is a case where a computer science stance isn't fruitful, and it's best to look through a linguistics lens.
nitwit005 2 days ago [-]
> I’m not entirely sure where this obsession with conversational interfaces comes from.
There's a very similar obsession with the idea that things should be visual instead of textual. We tend to end up back at text.
Personal suspicion for both is the media set a lot of people's expectations. They loudly talked to the computer in films like 2001 or Star Trek for drama reasons, and all the movie computers generally fancy visual interactions.
byschii 2 days ago [-]
(assuming privacy is handled correctly) i like the idea of my pc always having a side-channel for communication of "simpler" things.
I m not sure how it could fit in to my 2 modalities of work: (i) alone in complete focus / silence (ii) in the office where there is already too much spoken communication between humans... maybe it s just a matter of getting used to it
incorrecthorse 1 days ago [-]
> we form thoughts at 1,000-3,000 words per minute
I would like to know what this measures exactly.
The reason I often prefer writing to talking is because writing lets me the time to pause and think. In those cases the bottleneck is very clearly my thought process (which, at least consciously, doesn't appear to me as "words").
janpmz 2 days ago [-]
Speaking and pronouncing words feels like more effort and requires more attention than typing on my keyboard or moving the mouse.
Aardwolf 1 days ago [-]
I'd be ok with a conversational interface if I can use it to improve my non-conversational UI.
E.g. say I find the scrollbars somewhere way too thin and invisible and I want thick high contrast scrollbars, and nobody thought of implementing that? Ask the AI and it changes your desktop interface to do it immediately.
Peteragain 2 days ago [-]
"Like writing, my ChatGPT conversation is a thinking process – not an interaction that happens post-thought" - Brilliant! I have worked on computers and language for over 30 years and the ups and downs certainly make such a passion a CLA (career limiting activity). I am adding the citation to my bibtex file ..
var_cw 2 days ago [-]
have few thoughts on this, esp after working in voice AI since couple of years
1. > "What’s the voice equivalent of a thumbs-up or a keyboard shortcut?"
Current ASR systems are much narrow in terms of just capturing the transcript. there is no higher level of intelligence, even the best of GPT voice models fail at this. Humans are highly receptive of non-verbal cues. All the uhms, ahs, even the pauses we take is where the nuance lies.
2. the hardware for voice AI is still not consumer ready
interacting with a voice AI is still doesn't feel private. i am only able to do a voice based interaction only when am in my car. sadly at other places it just feels a privacy breach as its acoustically public. have been thinking about a private microphones to enable more AI based conversations.
The author seems to ignore the main case for conversational interfaces - which is not to replace the software, but the software user.
Not telling your car to turn left or right, but telling your cab driver you're going to the airport.
This is our usecase at our startup[1] - we want to enable tiny SMBs who didn't have the budget to hire a "video guy", to get an experience similar to having one. And that's why we're switching to a conversational UX (because those users would normally communicate with the "video guy" or girl by sending them a Whatsapp message, not by clicking buttons on the video software)
> “This is it! The next computing paradigm is here! We’ll only use natural language going forward!”
Is anyone actually making any argument like that? The whole piece feels like a giant strawman.
notarobot123 2 days ago [-]
What if apps published a declarative interface for context specific commands? Conversational interfaces would glue together spoken instructions with sensible matches from the set of available contextual interfaces.
willtemperley 2 days ago [-]
2001: A Space Odyssey was the original case against conversational interfaces.
woile 1 days ago [-]
I was recently at a cafe using the computer in Argentina, and I was thinking it would be impossible to use a voice interface here. Everyone chatting so loud I could barely hear my own thoughts.
spprashant 1 days ago [-]
Anyone know how to figure out the web stack for this blog? Its elegant, minimal, and has enough support for some rich elements which add to the experience.
heisenbit 1 days ago [-]
The way I think is in-band vs. out-of-band control. The former is initially convenient but can blow up in surprising ways and remains a source of security issues.
break_the_bank 2 days ago [-]
shameless plug here but we have been building in a similar space. We call it tabtabtab.ai - https://tabtabtab.ai/
The core loop is promptless ai that’s guided by accessibility x screenshots & it’s everywhere on your Mac.
You can snap this comment section or the front page and we’ll structure it for you if it’s a spreadsheet or write a tweet if you’re on Twitter.
meowface 2 days ago [-]
I can't zoom in to your website on my phone without an email subscription prompt blocking the screen that I can't easily close, and each new zoom in or out repeats it.
Also, unless I'm missing something, the app is called TabTabTab while its only feature is copy & paste? Tabbing doesn't seem to be mentioned at all. I'm guessing tabbing is involved but there doesn't seem to be a word about it except from users referencing it in the reviews. It seems to only bill itself as "magic copy-paste".
novaRom 2 days ago [-]
> AI needs to work at the OS level
Absolutely agree. An agent running in the background.
levmiseri 1 days ago [-]
WPM and other attempts at putting a one specific number/metric to point to is imo only mudding the waters. Better way to think about just how awfully slow natural language (on average) is as an interface is to think about interactions with {whatever} in terms of *intents* and *actions*.
Comparing "What's the weather in London" with clicking the weather app icon is misleading and too simplistic. When people imagine a future driven by conversational interfaces, they usually picture use cases like:
1. "When is my next train leaving?"
2. "Show me my photos from the vacation in Italy with yellow flowers on them"
3. "Book a flight from New York to Zurich on {dates}"
...
And a way to highlight what's faster/less-noisy is to compare how natural language vs. mouse/touch maps onto the Intent -> Action. The thing is that interactions like these are generally so much more complex. E.g. Does the machine know what 'my' train is? If it doesn't, can it offer reasonable disambiguation? If it can't, what then? And does it present the information in a way where the next likely action is reachable, or will I need to converse about it?
You could picture a long table listing similar use cases in different contexts and compare various input methods and modalities and their speed. Flicking a finger on a 2d surface or using a mouse and a keyboard is going to be — on average — much faster and with less dead-ends.
Conversational interfaces are not the future. Imo even in the sense of 'augmenting', it's not going to happen. Natural-language driven interface will always play the role of a supporting (still important, though!) role. An accessibility aid when e.g. temporarily, permanently, or contextually not able to use the primary input method to 'encode your intent'.
m463 2 days ago [-]
"the case against"
You know, doesn't matter what you say. If businesses want something, they'll do it to you whether it's the best interface or not.
Amazon forces "the rabble" into their chatbot customer service system, and hides access to people.
People get touchscreens in their car and fumble to turn on their fog lights or defrost in bad weather. They get voice assistant phone trees and angrily yell "operator and agent".
I really wish there were true competition that would let people choose what works for them.
paulsutter 1 days ago [-]
One-way voice is the right answer. Keep the UI, even the mouse and keyboard. But let me speak requests instead of typing literally everything
matsemann 2 days ago [-]
Not exactly the same case as the article, but just a few minutes ago I booked a time for vaccinations online, and it was done through a chat interface. Screenshot: https://imgur.com/a/OWv7deF
Just infuriating. Instead of a normal date- and timepicker where I could see available slots, it's a chat where you have to click certain options. Then I had to reply "Ja" (yes) when it asked me if I had clicked the correct date. And then when none of the times of the day suited me, I couldn't just click a new date on the previous message, I instead have to press "vis datovelger på nytt"/show datepicker again, and get a new chat message where I this time select a different date and answer "Ja" again to see the available time slots. It's slow and useless. The title bar of the page says "Microsoft Copilot Studio", some fancy tech instead of a simple form..
immibis 1 days ago [-]
That sounds like a great interface if your goal is to take power away from the user. We could even show an ad every third prompt.
anthk 2 days ago [-]
Get a proper physical keyboard to write, and stop using smartphones as typing devices.
eviks 1 days ago [-]
Do you know of a foldable model that that can be used while walking?
2 days ago [-]
randomfool 2 days ago [-]
And yet here we are, discussing this in a threaded conversation.
graemep 2 days ago [-]
It is a discussion with other human beings. Very different.
resurrected 1 days ago [-]
[dead]
mjfl 2 days ago [-]
[dead]
wewewedxfgdf 1 days ago [-]
There have abeen a few of these posts on HN redcently - people who claim that AI/LLM are just some sort of passing fad or of no value or less value than people are saying anyway.
People who write these posts want to elevate their self value by nay-saying what is popular. I don't understand the psychology but it seems like that sort of pattern to me.
It takes a deliberate blindness to say that AI/LLMs are just some sort of thing that has popped up every few years and this is the same as them and it will fade away. Why would someone choose to be so blind and dismissive of something obviously fundamentally world changing? Again - it's the instinct to knock down the tall poppy and therefore prove that you have some sort of strength/value.
gertrunde 1 days ago [-]
I have to suspect that your post is based on assumptions from the (somewhat misleading) post title, rather than from reading the article content.
The following is a direct quite from the article:
"None of this is to say that LLMs aren’t great. I love LLMs. I use them all the time. In fact, I wrote this very essay with the help of an LLM."
> Theoretically, saying, “order an Uber to airport” seems like the easiest way to accomplish the task. But is it? What kind of Uber? UberXL, UberGo? There’s a 1.5x surge pricing. Acceptable? Is the pickup point correct? What would be easier, resolving each of those queries through a computer asking questions, or taking a quick look yourself on the app?
> Another example is food ordering. What would you prefer, going through the menu from tens of restaurants yourself or constantly nudging the AI for the desired option? Technological improvement can only help so much here since users themselves don’t clearly know what they want.
[1]: https://shubhamjain.co/2024/04/16/voice-is-bad-ui/
How many of these inconveniences will you put up with? Any of them, all of them? What price difference makes it worthwhile? What if by traveling a day earlier you save enough money to even pay for a hotel...?
All of that is for just 1 flight, what if there are several alternatives? I can't imagine have a dialogue about this with a computer.
Similarly, long before Waymo, you'd get into a taxi, and tell the human driver you're going to the airport, and they'd take you there. In fact, they'd get annoyed at you if you backseat drove, telling them how to use the blinker and how hard to brake and accelerate.
The thing about conversational interfaces is that we're used to them, because we (well, some of us) interface with other humans fairly regularly, and so it's a fairly baseline level skill to have to exist in the world today. There's a case to be made against them, but since everyone can be assumed to be conversational (though perhaps not in a given language), it's here to stay. Restaurants have menus that customers look at before using the conversation interface to get food, in order to guide the discussion, and that's had thousands of years to evolve, so it might be a local maxima, but it's a pretty good one.
The whole point is that we currently have better, more efficient ways of doing those things, so why would we regress to inferior methods?
To relate to the article - google flights is the Keyboard and Mouse - covering 80% of cases very quickly. Conversational is better for when you're juggling more contextual info than what can be represented in a price/departure time/flight duration table. For example, "i'm bringing a small child with me and have an appointment the day before and I really hate the rain".
Rushed comment because I'm working, but I hope you get the gist.
Current flight planning UX is overfit on the 80% and will never cater to the 20% because cost/benefit of the development work isn't good
How long is it going to take you to get to a device, load the app/webpage, tell it which airport you're flying from and going to and what date and then you start looking at options. You've blown way past the 10 seconds it took for that executive to get a plane flight.
Better is in the eye of the beholder. What's monetarily efficient isn't going to be temporaly efficient, and that's true along a lot of other dimensions too.
Point is, there are some people that like having conversations, you may not be one of them. you don't have to be. I'm not taking away your mouse and keyboard. I have those too and won't give them up either. But I also find talking out loud helps my thinking process though I know that's not everybody.
The booking experience today is granular to help you find a suitable flight to meet all the preferences you’re compiling into an optimal scenario. The experience of AI booking in the future will likely be similar: find that optimal scenario for you once you’re able to articulate your preferences and remember them over time.
Anecdata: last year my wife and I went on a rail tour through Eastern Europe and god, I wish we had chosen to spend a few hundred euros on a travel agency in retrospect - I can't count just how much time we had to spend researching on what kind of rail, bus and public transit tickets you need on which leg, how to create accounts, set up payment and godknowswhat else. Easily took us two days worth of work and about two dozens individual payment transactions. A professional travel agency can do all the booking via Sabre, Amadeus or whatever...
I guess there's just no substitute for someone actually doing the work of figuring out the most appropriate HMI for a given task or situation, be it voice controls, touch screens, physical buttons or something else.
Knowing what you want is, sadly, computationally irreducible.
Of course a conversational interface is useless if it tries to just do the same thing as a web UI, which is why it failed a decade ago when it was trendy, because the tech was nowhere clever enough to make that useful. But today, I'd bet the other way round.
That's why the “advanced search” is almost always hidden somewhere. And that's also why you can never find the filter you need on an e-shopping website.
Such dialog is probably nice for first time user, it is a nightmare for repeated user.
Then it can assume you choice haven't changed, and propose you a solution that matches your previous choices. And to give the user control it just needs to explicitly tell the user about the assumption it made.
In fact, a smart enough system could even see when violating the assumptions could lead to a substantial gain and try convincing the user that it may be a good option this time.
Talking is not very efficient, and it's serial in fixed time. With something visual you can look at whatever you want whenever you want, at your own (irregular) pace.
You will also be able to make changes much faster. You can go to the target form element right away, and you get immediate feedback from the GUI (or from a physical control that you moved - e.g. in cars). If it's talk, you need to wait to have it said back to you - same reason as why important communication in flight control or military is always read back. Even humans misunderstand. You can't just talk-and-forget unless you accept errors.
You would need some true intelligence for just some brief spoken requests to work well enough. A (human) butler worked fine for such cases, but even then only the best made it into such high-level service positions, because it required real intelligence to know what your lord needed and wanted, and lots of time with them to gain that experience.
Who said it cannot be visual? It's still a “conversational” UI if it's a chatbot that writes down its answer.
> Similar reason why many people prefer a blog post over a video.
Well I certainly do, but I also know that we are few and far between in that case. People in general prefer videos over blog post by a very large margin.
> Talking is not very efficient, and it's serial in fixed time. With something visual you can look at whatever you want whenever you want, at your own (irregular) pace. You will also be able to make changes much faster. You can go to the target form element right away, and you get immediate feedback from the GUI.
Saying “I want to travel to Berlin next monday” is much faster than fighting with the website's custom datepicker which will block you until you select your return date until you realize you need to go back and toggle the “one way trip” button before clicking the calendar otherwise it's not working…
There's a reason why nerds love their terminal: GUIs are just very slow and annoying. They are useful for whatever new thing you're doing, because it's much more discoverable than CLI, but it's much less efficient.
> If it's talk, you need to wait to have it said back to you - same reason as why important communication in flight control or military is always read back. Even humans misunderstand. You can't just talk-and-forget unless you accept errors.
This is true, but stays true with a GUI, that's why you have those pesky confirmation pop-ups, because as annoying as they are when you know what you're doing, they are necessary to catch errors.
> You would need some true intelligence for just some brief spoken requests to work well enough.
I don't think so. IMO you just need something that emulates intelligence enough on that particular purpose. And we've seen that LLMs are pretty decent at emulating apparent intelligence so I wouldn't bet against them on that.
You can't be serious??
Oh it's 1st of April, my apologies! I almost took it seriously. I should ignore this website on this day.
What's the difference between a blog post and a chatbot answer in terms of how “visual” things are?
I used to be a reading blog over watching video person, but for some things I’ve come to appreciate the video version. The reason you want to get the video of the whatever is because in the blog post, what’s written down only what the author thought was important. But I’m not them. I don’t know everything they know and I don’t see everything they see. I can’t do everything they do but with the video I get everything. When you perform the whatever the video has every detail, not just the ones you think are important. That bit between step 1 and step 2 that’s obvious? It’s not obvious to everyone, or mine is broken in a slightly different way that I really need to see that bit between 1 and 2. of course, videos get edited and cut so they don’t always have that benefit, but I’ve grown to appreciate them.
Maybe I'm tired of layovers and I'm willing to pay more for a direct flight this time. Maybe I want a different selection at a restaurant because I'm in the mood for tacos rather than a burrito.
But you can, so as long as the interlocutor tells you what assumptions it made, you can correct it if it doesn't match your current mood.
> So yeah, this argument in favor of conversational interfaces sounds at this point more like ideology than logic.
There's no ideology behind the fact that every people rich enough to afford paying someone to deal with mundane stuff will have someone doing it for them, it's just about convenience. Nobody likes to fight with web UIs for fun, the only reason why it has become mainstream is because it's so much cheaper than having a real person working.
Same for Microsoft Word by the way, many people used to have secretaries typing stuff for them, and it's been a massive regression of social status for the upper middle class to have to type things by themselves, it only happened because it was cheaper (in appearance at least).
Amen to that. I guess, it would help to get of the IT high horse and have a talk with linguists and philosophers of language. They are dealing with this shit for centuries now.
I don't get it at all.
This would be great if LLMs did not tend to output nonsense. Truly it would be grand. But they do. So it isn't. It's wasting resources hoping for a good outcome and risking frustration, misapprehensions, prompt injection attacks... It's non-deterministic algorithms hoping P=NP, except instead of branching at every decision you're doing search by tweaking vectors whose values you don't even know and whose influence on the outcome is impossible to foresee.
Sure, a VC subsidized LLM is a great way to make CVs in LaTeX (I do it all the time), translating text, maybe even generating some code if you know what you need and can describe it well. I will give you that. I even created a few - very mediocre - songs. Am I contradicting myself? I don't think I am, because I would love to live in a hotel if I only had to pay a tiny fraction of the cost. But I would still think that building hotels would be a horrible way to address the housing crisis in modern metropolises.
I didn't mean it to be condescending - though I can see how it can come across as such. FWIW, I opted for a diagram after I typed half a page worth of "normal" text and realized I'm still not able to elucidate my point - so I deleted it and drew something matching my message more closely.
> This would be great if LLMs did not tend to output nonsense. Truly it would be grand. But they do. So it isn't.
I find this critique to be tiring at this point - it's just as wrong as assuming LLMs work perfectly and all is fine. Both views are too definite, too binary. In reality, LLMs are just non-deterministic - that is, they have an error rate. How big it is, and how small can it get in practice for a given tasks - those are the important questions.
Pretty much every aspect of computing is only probabilistically correct - either because the algorithm is explicitly so (UUIDs and primality testing, for starters), or just because it runs on real hardware, and physics happen. Most people get away with pretending that our systems are either correct or not, but that's only possible because the error rate is low enough. But it's never that low by accident - it got pushed there by careful design at every level, hardware and software. LLMs are just another probabilistically correct system that, over time, we'll learn how to use in ways that gets the error rate low enough to stop worrying about it.
How can we get there - now, that is an interesting challenge.
LLMs are cool technology sure. There's a lot of cool things in the ML space. I love it.
But don't pretend like the context of this conversation isn't the current hype and that it isn't reaching absurd levels.
So yeah we're all tired. Tired of the hype, of pushing LLMs, agents, whatever, as some sort of silver bullet. Tired of the corporate smoke screen around it. NLP is still a hard problem, we're nowhere near solving it, and bolting it on everything is not a better idea now than it was before transformers and scaling laws.
On the other hand my security research business is booming and hey the rational thing for me to say is: by all means keep putting NLP everywhere.
Those are the big challenges of housing. Not just how many units there are, but what they are, and how much the "how many" is plain cheating.
What it's trying to communicate is, in general, a human operating a computer has to turn their imprecise thinking into "specific and exact commands", and subsequently, understand the "specific and exact output" in whatever terms they're thinking off, prioritizing and filtering out data based on situational context. LLMs enter the picture in two places:
1) In many situations, they can do the "imprecise thinking" -> "specific and exact commands" step for the user;
2) In many situations, they can do the "specific and exact output" -> contextualized output step for the user;
In such scenarios, LLMs are not replacing software, they're being slotted as intermediary between user and classical software, so the user can operate closer to what's natural for them, vs. translating between it and rigid computer language.
This is not applicable everywhere, but then, this is also not the only way LLMs are useful - it's just one broad class of scenarios in which they are.
To your point, which I think is separate but related, that IS a case where LLMs are good at producing specific and exact commands. The models + the right prompt are pretty reliable at tool calling by themselves, because you give them a list of specific and exact things they can do. And they can be fully specific and exact at inference time with constrained output (although you may still wish it called a different tool.)
The model's output is a probability for every token. Constrained output is a feature of the inference engine. With a strict schema the inference engine can ignore every token that doesn't adhere to the schema and select the top token that does adhere to the schema.
Humans require a lot of back and forth effort for "alignment" with regular "syncs" and "iterations" and "I'll get that to you by EOD". If you approach the potential of natural interfaces with expectations that frame them the same way as 2000s era software, you'll fail to be creative about new ways humans interact with these systems in the future.
Voice interface only prevails in situations with hundreds of choices, and even then it's probably easier to use voice to filter down choices rather than select. But very few games have such scale to worry about (certainly no AAA game as of now).
However, A CEO using Power BI with Convo to can get more insights/graphs rather than slice/dicing his data. They do have fixed metrics but incase they want something not displayed.
Even for straightforward purchases, how many people trust Amazon to find and pick the best deal for them? Even if Amazon started out being diligent and honest it would never last if voice ordering became popular. There's no way that company would pass up a wildly profitable opportunity to rip people off in an opaque way by selecting higher margin options.
Theres 1-5 things any individual finds them useful for (timers/lights/music/etc) and then.. thats it.
99.9% of what I use a computer for its far faster to type/click/touch my phone/tablet/computer.
If your work revolves about telling people what to do and asking questions, a voice assistant seems like a great idea (even if you yourself wouldn't have to stoop to using a robotic version since you have a real live human).
If your work actually involves doing things, then voice/conversational text interface quickly falls apart.
This even happens while walking my dog. If my wife messages me, my iPhone reads it out and, at the same time, I'm trying to cross a road, she'll get a garbled reply which is just me shouting random words at my dog to keep her under control.
Even in a car, being able to control the windscreen wipers, radio, ask how much fuel is left are all tasks it would be useful to do conversationally.
There are some apps (im thinking of jira as an example) where i'd like to do 90% of the usage conversationally.
are you REALLY sure you want that?
how much fuel there is is a quick glance into the dash, and you can control precisely the radio volume without even looking.
'turn up the volume', 'turn down the volume a little bit', 'a bit more',...
and then a radio ad going 'get yourself a 3 pack of the new magic wipers...' and car wipers going off.
id hate conversational ui on my car.
I wish car manufacturers stopped with the touchscreen bullshit, but it seems more likely that they'll try to offset the terrible experience with voice controls.
Conversational interfaces are great for rarely used features or when the user doesn’t know how to do something. For repetitive, common tasks they’re terrible.
But nobody is using ChatGPT for repetitive tasks. In fact the whole LLM revolution seems to be about letting users accomplish tasks without having to learn how to do them. Which I know some people look down on, but it’s the literal definition of management (which, to be fair, some people also look down on).
This is a problem of standardization across manufacturers, not something inherent in physical controls. I never have a problem using the steering wheel in a rental car because they're all the same.
You'd have the same problem with voice interfaces: For some rental cars, turning on the wipers would be "Turn on the wipers". For others, you'd have to say "Activate the wipers." For others, "Enable the windshield wipers." There is no way manufacturers will be capable of standardizing on a single phrase.
1. "Natural language is a data transfer mechanism"
2. "Data transfer mechanisms have two critical factors: speed and lossiness"
3. "Natural language has neither"
While a conversational interface does transfer information, its main qualities are what I always refer to as "blissfull ignorance" and "intelligent interpretation".
Blisfull ignorance allows the requester to state an objective while not being required to know or even be right in how to achieve it. It is the opposite of operational command. Do as I mean, not as I say.
"Intelligent Interpretation" allows the receiver the freedom to infer an intention in the communication rather than a command. It also allows for contextual interactions such as goal oriented partial clarification and elaboration.
The more capable of intelligent interpretation the request execution system is, the more appropriate a conversational interface will be.
Think of it as managing a team. If they are junior, inexperienced and not very bright, you will probably tend towards handholding, microtasking and micromanagement to get things done. If you have a team of senior, experienced and bright engineers, you can with a few words point out a desire and, trust them to ask for information when there is relevant ambiguity, and expect a good outcome without having to detail manage every minute of their days.
It's such a fallacy. First thing an experienced and bright engineer will tell you is to leave the premises with your "few words about a desire" and not return without actual specs and requirements formalized in some way. If you do not understand what you want yourself, it means hours/days/weeks/months/literally years of back and forths and broken solutions and wasted time, because natural language is slow and lossy af (the article hits the nail on the head on this one).
Re "ask for information", my favorite example is when you say one thing if I ask you today and then you reply something else (maybe the opposite, it happened) if I ask you a week later because you forgot or just changed your mind. I bet a conversational interface will deal with this just fine /s
No, that's what a junior engineer will do. The first thing that an experienced and bright senior engineer will do is think over the request and ask clarifying questions in pursuit of a more rigorous specification, then repeat back their understanding of the problem and their plan. If they're very bright they'll get the plan down in writing so we stay on the same page.
The primary job of a senior engineer is not to turn formal specifications into code, it's to turn vague business requests into formal specifications. They're senior enough to recognize that that's the actually difficult part of the work, the thing that keeps them employed.
I love product work and programming. As I wrote in this thread, I did it while freelancing, I do it now at dayjob. I am bored by just programming and want more control over the result. People come to me with "a few words about a desire" and I do come up with specifics and I get credit for it
But I am recognized as a product person, not just programmer. And I know better to not make the mistake you make and pretend that every builder or a structural engineer should be an architect of a building or an urban planner.
People like you is why we have managers come to an expert level say C++ dev with "a few words about a desire" and expect them to decide what thing to build in the first place AND to build it, just to later tell them it was wrong. When there is no product person who determines the reqs random people will make programmer come up with requirements yourself and then later tell you it is not up to "requirements".
This lack of organization and requirement clarity is offensive to expert programmers and probably the reason most projects drag on forever and die.
> The primary job of a senior engineer is not to turn formal specifications into code, it's to turn vague business requests into formal specifications.
Converting vibes and external world into specific requirements is product owner job.
Do not mistake software engineers and product people. These are very different things. Sometimes these things are done by the same person if the org has not enough money. Many freelancers working with small biz do both. I often do both at my day job. But this is a higher level role and if you are a senior engineer doing product stuff I hope it is recognized and you get proportionate comp.
I worked for one of the largest, richest tech companies in the world, and (at least in our org) they did not have a dedicated product owner role. They expected this skill from the senior/lead engineers on the teams. Any coder can churn out code and you can call them senior after a few years. But if you want to be considered actually senior, you need to know how to make a product, not just code. IMO if you are a developer and all you know how to do is turn a fully-formed spec/requirements doc into software, and push back on anything that is not fully-formed, you're never going to truly reach "Senior" level, wherever you are.
But as I said these roles can be done by one person, just remember they are different activities.
Expecting a good outcome is different from expecting to get exactly what you intended.
Formal specifications are useful in some lines of work and for some projects, less so for others.
Wicked problems would be one example where formal specs are impossible by definition.
For games, you don't really need nor desire formal specs. But it also can really show how sometimes a director has a low tolerance for interpretation despite their communication being very loose. This leads to situations where it feels like the director is shifting designs on a dome, which is a lose-lose situation for everyone involved.
If nothing else, formal specification is for CYA. You get what you ask for, and any deviation should go in the next task order or have been addressed beforehand.
Whoah is this wrong. Maybe when you hear "formal specs" you have something specific in your mind...
Formal spec can mean almost literally anything better than natural language vibes in a "few words about a desire", which is what I replied to because I was triggered by it
There is always formal specification. Code is final formal specification in the end. But converting vague vibes from natural language into a somewhat formalized description is key ability you need for any really new non trivial project idea. Another human can't do it for you, conversational UI can't do it for you...
Unfortunately if either is the case "actual specs and requirements formalized", while sounding logical, and might help, in my experience did very little to save any substantial project (and I've seen a lot). The common problem is that the business/client/manager is forced to sign of on formal documents far outside their domain of competence, or the engineers are straitjacketed into commitments that do not make sense or have no idea of what is considered tacit knowledge in the domain and so can't contextualize the unstated. Those formalized documents then mostly become weaponized in a mutual destructive CYA.
What I've also seen more than once is years of formalized specs and requirements work while nothing ever gets produced, and the project is aborted before even the first line of code hit test.
I've given this example before: When Covid lockdows hit there were digitization projects years in planning and budgeted for years of implementation, that were hastily specked, coded and roiled out into production by a 3 person emergency team over a long weekend. Necessity apparently has a way of cutting through the BS like nothing else can.
You need both sides capable, willing and able to understand. If not, good luck mitigating, but you're probably doomed either way.
But I still get lazy with LLMs and fall into iteration the way bad PM/eng teams do. “Write a SQL query to look at users by gesture by month”. “Now make the time unit a parameter”. “Now pivot the features to columns”. “Now group features hierarchically”. “Now move the feature table to a WITH”.
My point and takeaway is that LLMs are endlessly patient and pretty quick to turn requirements around, so they lend themselves to exploration more than human teams do. Agile, I guess, to a degree that we don’t even aspire to in the human world because it would be very expensive and lead to fisticuffs.
It just shows that no one really understood what they wanted. It is crazy to expect somebody to understand something better than you and it is hilarious to want a conversational UI to understand something better than you.
Then what were the literally room full of formal process and spec documents, meeting reports and formal agreements (near 100.000 pages) by the analysts on either side for? And how did those not 'solve' the understanding problem?
When I go to the garage to have my car serviced, I expect them to understand it way better than I do. When I go to a nice restaurant I expect the cooks to prepare me dishes that taste greater than me writing them out a step-by-step recipe for them to follow. If I hire a senior consultant in even my own domain, I expect them to not just know my niche, but bring tacit knowledge from having worked on these types of solutions across my industry.
Expecting somebody to understand something better than me is exactly the reason why I hire senior people in the first place.
Sure.
There are many possible factors (eg. somebody had a shitty idea and a committee of people sabotaged it because they didn't wanted it to succeed, or it was good but committee interests/politics were against it, or it was generally a dysfunctional org) but it's irrelevant so let's pretend people are good and it's the ideal case.
There was likely somebody who had a good idea originally. However somebody failed to communicate it. Somebody brought vague vibes to the table with N people and they ended up with N different ideas and could not agree on a specific.
It just reiterates the original problem that I described doesn't it?
This is true. But what if you swap "conversational UI" with something actually intelligent like a developer. Then we see this kind of thing all the time: A user has tacit, unconscious knowledge of some domain. The developer keeps asking them questions in order to get a formal understanding of the domain. At the end the developer has a formal understanding and the user keeps their tacit understanding. In theory we could do the same with an AI - If the AI was actually intelligent.
The original example I replied to was where somebody had an idea and went with it to some engineering team or conversational interface.
"If the AI was actually intelligent" does a lot of work. To take a few words and make a detailed spec from it and ask the right questions, even humans can't do it for you.
First because most probably you don't really understand it yourself, because you didn't think about it enough.
Second somebody who can do it would need to really deeply understand and want the same things as you. But if chatbot has abilities like "understand" and "want" (which is a special case of "feel", another famous special case of "feel" is "suffer") that is a dangerous territory, because if it understands and feels and has no ability to refuse you and fulfill its wishes etc your "conversational interface" becomes an euphemism, you are using a slave.
And approach of shared responsibility in all respects (successes and failure) would accelerate past the inevitable shortcomings that occur and let all parties focus on recovering and delivering.
If you pay attention to how the voice interface is used in Star Trek (TNG and upwards), it's basically exactly what the article is saying - it complements manual inputs and works as a secondary channel. Nobody is trying to manually navigate the ship by voicing out specific control inputs, or in the midst of a battle, call out "computer, fire photon torpedoes" - that's what the consoles are for (and there are consoles everywhere). Voice interface is secondary - used for delegation, queries (that may be faster to say than type), casual location-independent use (lights, music; they didn't think of kitchen timers, though (then again, replicators)), brainstorming, etc.
Yes, this is a fictional show and the real reason for voice interactions was to make it a form of exposition, yadda yadda - but I'd like to think that all those people writing the script, testing it, acting and shooting it, were in perfect position to tell which voice interactions made sense and which didn't: they'd know what feels awkward or nonsensical when acting, or what comes off this way when watching it later.
At first glance it feels like real life will not benefit from labelling 90% of the glowing rectangles with numbers as the show does, but second thoughts say spreadsheets and timetables.
(Also worth noting is that "pre-programmed evasion patterns" are used in normal circumstances, too. "Evasive maneuver JohnDoe Alpha Three" works just as well when spoken to the helm officer as to a computer. I still don't know whether such preprogrammed maneuvers make sense in real-life setting, though.)
But specifically manoeuvres, rather than weapons systems? Today, I doubt it: the ships are too slow for human brains to be the limiting factor. But if we had an impulse drive and inertial dampers (in the Trek sense rather than "shock absorbers"), then manoeuvres would also necessarily be automated.
In the board game Star Fleet Battles (based on a mix of TOS, TAS, and WW2 naval warfare), one of the (far too many*) options is "Erratic Manoeuvres", for which the lore is a combination of sudden acceleration and unpredictable changes in course.
As we live in a universe where the speed of light appears to be a fundamental limit, if we had spaceships pointing lasers at each other and those ships could perform such erratic manoeuvres as compatible with the lore of the show about how fast they can move and accelerate, performing such manoeuvres manually would be effective when the ships are separated by light seconds. But if the combatants are separated by "only" 3000 km, then it has to be fully automated because human nerve impulses from your brain to your finger are not fast enough to be useful.
* The instructions are shaped like pseudocode for a moderately complex video game, but published 10-20 years before home computers were big enough for the rule book. So it has rules for boarding parties, and the Tholian web, and minefields, and that one time in the animated series where the Klingons had a stasis field generator…
It runs directly counter to that more capitalistic mindset of "why don't we do more with less?" when spending years navigating all kinds of unknown situations, you want as many options as possible available.
Hell, if someone really didn't know, they could expect "Computer, turn on the bio-bed 3" to just work - circling us back to the topic of what NLP and voice interfaces are good for.
One thing I will note is that I'm not sure I buy the example for voice UIs being inefficient. I've almost never said "Alexa what's the weather like in Toronto?". I just say "Alexa, weather". And that's much faster than taking my phone out and opening an app. I don't think we need to compress voice input. Language kind of auto-compresses, since we create new words for complex concepts when we find the need.
For example, in a book club we recently read "As Long as the Lemon Trees Grow". We almost immediately stopped referring to it by the full title, and instead just called it "lemons" because we had to refer to it so much. Eg "Did you finish lemons yet?" or "This book is almost as good as lemons!". The context let shorten the word. Similarly the context of my location shortens the word to just "weather". I think this might be the way the voice UIs can be made more efficient: in the same way human speech makes itself more efficient.
Maybe you, but I most definitely cannot focus on different things aurally and visually. I never successfully listened to something in the background while doing something else. I can't even talk properly if I'm typing something on a computer.
I did horribly in school but once I was in an environment where I could have some kind of background audio/video playing I began to excel. It also helps me sleep of a night. It’s like the audio keeps the portion of me that would otherwise distract me occupied.
It's very useful being able to request auxillary functions without losing your focus, and I think that would apply to say, word editing as well - e.g. being able to say "insert a date here" rather the having to get into the menus to find it.
Conversely, latency would be a big issue.
This reminds me of the amazing 2013 video of Travis Rudd coding python by voice: https://youtu.be/8SkdfdXWYaI?si=AwBE_fk6Y88tLcos
The number of times in the last few years I've wanted that level of "verbal hotkeys"... The latencies of many coding llms are still a little bit too low to allow for my ideal level of flow (though admittedly I haven't tried one's hosted on services like groq), but I can clearly envision a time when I'm issuing tight commands to a coder model that's chatting with me and watching my program evolve on screen in real time.
On a somewhat related note to conversational interfaces, the other day I wanted to study some first aid stuff - used Gemini to read the whole textbook and generate Anki flash cards, then copied and pasted the flashcards directly into chat GPT voice mode and had it quiz me. That was probably the most miraculous experience of voice interface I've had in a long time - I could do chores while being constantly quizzed on what I wanted to learn, and anytime I had a question or comment I could just ask it to explain or expound on a term or tangent.
It's also hard to dictate code without a lot of these commands because it's very dense in information.
I hope something else will be the solution. Maybe LLMs being smart enough to guess the code out of a very short description and then a set of corrections.
Do you recall Swype keyboard for Android? The one that popularized swyping to write on touch screens? It had Dragon at some point.
IT WAS AMAZING.
Around 12-14 years ago (Android 2.3? Maybe 3?) I was able to easily dictate full long text messages and emails, in my native tongue, including punctuation and occasional slang or even word formation. I could dictate a decent long paragraph of text on the first try and not have to fix a single character.
It's 2025 and the closest I can find is a dictation app on my newest phone that uses online AI service, yet it's still not that great when it comes to punctuation and requires me to spit the whole paragraph at once, without taking a breath.
Is there anything equally effective for any of you nowadays? That actually works across the whole device?
> Is there anything equally effective for any of you nowadays?
I'm not affiliated in any way. You might be interested in the "Futo Keyboard" and voice input apps - they run completely offline and respect your privacy.
The source code is open and it does a good job at punctuation without you needing to prompt it by saying, "comma," or, "question mark," unlike other voice input apps such as Google's gboard.
https://keyboard.futo.org/
I know and like Futo, very interesting project. Unfortunately multilang models are not great in my case. Still not bad for an offline tool, but far from "forget it's there, just use it" vibe I had with Dragon.
Funny thing is that I may have missgonfigured something in futo, because my typing corrections are phonetical :) so I type something in Polish and get autocorrect in English composed of different letters, but kind of similar sounding word.
But now Microsoft bought them a few years ago. Weird that it took so long though.
No matter the intention or quality of the article, i do not like this kind of deceitful link-bait article. It may have higher quality than pure link-bait but nobody like to be deceived
Not a case against, but the case against.
You can argue against something but also not think it's 100% useless.
They have problems like "compose an email that vaguely makes the impression I'm considering various options but I'm actually not" and for that, I suspect, the conversational workflow is quite good.
Anyone else that actually just does the stuff is viscerally aware of how sub-optimal it is to throw verbiage at a computer.
I guess it depends on what level of abstraction you're working at.
But I think it’s wrong? Ever since the invention of the television, we’ve been absolutely addicted to screens. Screens and remotes, and I think there’s something sort of anti-humanly human about it. Maybe we don’t want to be human? But people I think would generally much rather tap their thumb on the remote than talk to their tv, and a visual interface you hold in the palm of your hand is not going away any time soon.
My parents did this with me, no screens till 6 (wasn't so hard as I grew up in the early 90s, but still, no TV). I notice too how much people love screens, that non-judgmental glow of mental stimulation, it's wonderful, however I do think it's easier to "switch off" when you spent the first period of your life fully tuned in to the natural world. I hope folks are able to do this for their kids, it seems it would be quite difficult with all the noise in the world. Given it was hard for mine during the era of CRT and 4 channels, I have empathy for parents of today.
If I hadn't had it, I would have been trapped by the racist, religously zealous, backwoods mentality that gripped the rest of my family and the majority of the people I grew up with. I discovered video games at age 3 and it changed EVERYTHING. It completely opened my mind to abstract thought and, among other things, influenced me to teach myself to read at age 3. I was reading at a collegiate level by age five and discovered another passion, books. Again, propelled me out of an extremely anti-intellectual upbringing.
I simply could not imagine where I would be without video games, visual arts or books. Screens are not the problem. Absent parenting is the problem. Not teaching children the power of these screens is the problem.
Also let me drop the thought here, that Rudolf Steiner, like Montesori and the like, shoot "this is good" "this is bad" based on "feeling" or intuition, or such. There were no extensive scientific studies behind it.
>:)
By 5, all I wanted was a computer. To me they represented and unending well of knowledge.
Eg Minecraft, Roblox, CoD, Fortnite, Dota/LoL, the various mobile games clearly have some kind of value (mechanical skill, hand-eye coordination, creative modes, 3D space navigation / translation / rotation, numeric optimization, social interaction, etc), but they’re also designed as massive timesinks mostly through creative mode or multiplayer.
Games like paper Mario, pikmin, star control 2, katamari damacy, lego titles, however are all children-playable but far more time efficient and importantly time-bounded for play. Even within timesink games there are higher quality options — you definitely get more, and faster, out of satisfactory / factorio than modded Minecraft. If you can push kids towards the higher quality, lower timesink games, I think it’s worth. Fail to do so and it’s definitely not.
The same applies to TV, movies, books, etc. Any medium of entertainment have horrendous timesinks to avoid, and if you can do so, avoiding the medium altogether is definitely a missed opportunity. Screens are only notable in that the degenerate cases are far more degenerate than anything that came before it
It can hardly be said that a studio ghibli flick stunted the imagination of children worldwide but I would definitely believe it if you suggested cocomelon rotted the brains directly out of their skulls
I think it’s also worth noting that kids have a shitload of time. They can engage in both technologies and physical play and other activities simultaneously; the problem occurs when singular or few activities overwhelmingly consume that time — which is why I claim the unbounded timesinks can be catastrophic — and what I think most people are worried about when they blanket-ban whole systems/mediums
I might be a touch different in that it was obvious where I was going, and the correct decision was made to embrace my interest in the glowing screen and yes, the video games. It was video games more than anything else from which all other interests spawned.
More often than not it probably ends badly though I suppose. Despite a lifetime spent in front of screens all my social abilities work, I have a wide friends circle, a partner, my job requires me to work well with a wide variety of individuals and demographics etc which I couldn’t do otherwise. I have noticed this is not the case with all who shared a similar background.
In Switzerland, we get often measle outbreaks thanks to his cult.
The hedonic treadmill is driving the world
Actually, it's the reverse. The orienting response is wired in quite deeply. https://en.wikipedia.org/wiki/Orienting_response
When I was teaching, I used to force students using laptops to sit near the back of the room for exactly this reason. It's almost impossible for humans to ignore a flickering screen.
These days screen brightness goes pretty high and it is unbelievable how many people seem to never use their screen (phone or laptop) on anything less than 100% brightness in any situation and are seemingly not bothered by flickering bright light or noise sources.
I am nostalgic about old laptops’ dim LCD screens that I saw a few times as a kid, they did not flicker much and had a narrow angle of view. I suspect they would even be fine in a darkened classroom.
The problem is, "The Only Thing Worse Than Computers Making YOU Do Everything... Is When They Do Everything *FOR* You!"
"ad3} and "aP might not be "discoverable" vi commands, but they're fast and precise.
Plus, it's easier to teach a human to think like a computer than to teach a computer to think like a human — just like it's easier to teach a musician to act than to teach an actor how to play an instrument — but I admit, it's not as scalable; you can't teach everyone Fortran or C, so we end up looking for these Pareto Principle shortcuts: Javascript provides 20% of the functionality, and solves 80% of the problems.
But then people find Javascript too hard, so they ask ChatGPT/Bard/Gemini to write it for them. Another 20% solution — of the original 20% is now 4% as featureful — but it solves 64% of the world's problems. (And it's on pace to consume 98% of the world's electricity, but I digress!)
PS: Mobile interfaces don't HAVE to suck for typing; I could FLY on my old Treo! But "modern" UI eschews functionality for "clean" brutalist minimalism. "Why make it easy to position your cursor when we spent all that money developing auto-conflict?" «sigh»
The other great thing about this mode is that it can double as a teaching methodology. If I have a complicated interface that is not very discoverable, it may be hard to sell potential users on the time investment required to learn everything. Why would I want to invest hours into learning non-transferrable knowledge when I'm not even sure I want to go with this option versus a competitor? It will be a far better experience if I can first vibe-use the product , and if it's right for me, I'll probably be incented to learn the inner workings of it as I try to do more and more.
> The other great thing about this mode is that it can double as a teaching methodology.
gvim has menus and puts the commands in the menus as shortcuts. I learned from there vim has folding and how to use it.
As always, good UI allows for using multiple modalities.
What chat interfaces have over CLIs is good robustness. You can word your request in lots of different ways and get a useful answer.
In this sense, natural language interfaces are more powerful search features rather than a replacement for other types of interfaces.
VSCode is probably the best I can think of, where keyboard shortcuts can get you up to a decent speed as an advanced user, but mouse clicks provide an easy intro for a new user.
For the most part, I see tools like NVim, which is super fast but not new-user friendly. Or IOS, which a toddler can navigate, but doesn't afford many ways to speed up interactions like typing.
What we're really seeing is specific applications where conversation makes sense, not a wholesale revolution. Natural language shines for complex, ambiguous tasks but is hilariously inefficient for things like opening doors or adjusting volume.
The real insight here is about choosing the right interface for the job. We don't need philosophical debates about "the future of computing" - we need pragmatic combinations of interfaces that work together seamlessly.
The butter-passing example is spot on, though. The telepathic anticipation between long-married couples is exactly what good software should aspire to. Not more conversation, but less need for it.
Where Julian absolutely nails it is the vision of AI as an augmentation layer rather than replacement. That's the realistic future - not some chat-only dystopia where we're verbally commanding our way through tasks that a simple button press would handle more efficiently.
The tech industry does have these pendulum swings where we overthink basic interaction models. Maybe we could spend less time theorizing about natural language as "the future" and more time just building tools that solve real problems with whatever interface makes the most sense.
The article is useful as it's enunciated arguments which many of us have intuited, but are not necessarily able to explain ourselves.
> That is the type of relationship I want to have with my computer!
He means automation of routine tasks? Took 50 years to reach that in the example.
What if you want to do something new? Will the thought guessing module in your computer even allow that?
If we want an interface that actually lets us work near the speed of thought, it can't be anything that re-arranges options behind our back all the time. Imagine if you went into your kitchen to cook something and the contents of all your drawers and cupboards had been re-arranged without your knowledge! It would be a total nightmare!
We already knew decades ago that spatial interfaces [1] are superior to everything else when it comes to working quickly. You can walk into a familiar room and instinctively turn on a light by reaching for the switch without even looking. With a well-organized kitchen an experienced chef (or even a skilled home cook) can cook a very complicated dish very efficiently when they know where all of the utensils are so that they don't need to go hunting for everything.
Yet today it seems like all software is constantly trying to guess what we want and in the process ends up rearranging everything so that we never feel comfortable using our computers anymore. I REALLY miss using Mac OS 9 (and earlier). At some point I need to set up some vintage Macs to use it again, though its usefulness at browsing the web is rather limited these days (mostly due to protocol changes, but also due to JavaScript). It'd be really nice to have a modern browser running on a vintage Mac, though the limited RAM would be a serious problem.
[1] https://arstechnica.com/gadgets/2003/04/finder/
Even I can make a breakfast without looking in my kitchen, because I know where all the needed stuff is :)
On another topic, it doesn't have to look well organized. My home office looks like a bomb exploded in it, but I know exactly where everything is.
> I REALLY miss using Mac OS 9 (and earlier).
I was late to the Mac party, about the Snow Leopard days. I definitely remember that back then OS X applications weren't allowed to steal focus from what I had in the foreground. These days every idiotic splash screen steals my typing.
Natural language is very lossy: forming a thought and conveying that through speech or text is often an exercise in frustration. So where does "we form thoughts at 1,000-3,000 words per minute" come from?
The author clearly had a point about the efficiency of thought vs. natural language, but his thought was lost in a layer of translation. Probably because thoughts don't map cleanly onto words: I may lack some prerequisite knowledge to graph what the author is saying here, which pokes at the core of the issue: language is imperfect, so the statement "we form thoughts at 1,000-3,000 words per minute" makes no sense to me.
Meta-joking aside, is "we form thoughts at 1,000-3,000 words per minute" an established fact? It's oddly specific.
I also have my doubts about the numbers put forward on reading, listening and speaking. When reading, again I can read words about as fast as I can speak words. When I'm reading, I am essentially speaking out the words but in my mind. Is that not how other people read?
This stuff is fascinating.
For me, when I need to think clearly about a specific/novel thing, a monologue helps, but I don't voice out thoughts like "I need a drink right now".
Also I read much faster than I speak, I have to slow down while reading fiction as a result.
Has it even been tried? Is there an iPhone text editing app with fully customizable keyboard that allows for setting up modes/gestures/shortcuts, scriptable if necessary?
> A natural language prompt like “Hey Google, what’s the weather in San Francisco today?” just takes 10x longer than simply tapping the weather app on your homescreen.
That's not entirely fair, the natural language could just as well be side button + saying "Weather" with the same result, though you can make app availability even easier by just displaying weather results on the homescreen without tapping
iPad physical keyboards also have shortcuts.
What did they have in their touch interfaces?
It might be hard to understand now, but Blackberry power users could be much more productive with email/texting than any phone that exists today. But they were special purpose 2-way radio (initially, pager) devices that lacked the flexibility of modern apps with full internet data access.
I don't remember where else they used voice, they had a lot of other interface types they switched between. Tried searching for a clip and found this quote:
So yes, quite realistic :-)We might form fleeting thoughts much faster than we can express them, but if we want to formulate thoughts clearly enough to express them to other people, I think we're close to the ~150 words per minute we can actually speak.
I recently listened to a Linguistics podcast (lingthusiasm, though I don't recall which episode) where they talked about the efficiency of different languages, and that in the end they all end up roughly the same, because it's really the thought processes that limit the amount of information you communicate, not the language production.
And thoughts develop over time. They're often not conceived complete. That has been shown with some clever experiments.
And language production also puts a limit on our communication channel. It is probably optimized to convert communication intent into motor actions. It surely takes its time. That is not a problem for the system, since motor actions are slow. Idk where "lingthusiam" gets their ideas from, but there's psycholinguistic literature dating back to the 1920s that is often neglected by linguists.
Natural language isn't best described as data transfer. It's primarily a mechanism for collaboration and negotiation. A speech act isn't transferring data, it's an action with intent. Viewed as such the key metrics are not speed and loss, but successful coordination.
This is a case where a computer science stance isn't fruitful, and it's best to look through a linguistics lens.
There's a very similar obsession with the idea that things should be visual instead of textual. We tend to end up back at text.
Personal suspicion for both is the media set a lot of people's expectations. They loudly talked to the computer in films like 2001 or Star Trek for drama reasons, and all the movie computers generally fancy visual interactions.
I m not sure how it could fit in to my 2 modalities of work: (i) alone in complete focus / silence (ii) in the office where there is already too much spoken communication between humans... maybe it s just a matter of getting used to it
I would like to know what this measures exactly.
The reason I often prefer writing to talking is because writing lets me the time to pause and think. In those cases the bottleneck is very clearly my thought process (which, at least consciously, doesn't appear to me as "words").
E.g. say I find the scrollbars somewhere way too thin and invisible and I want thick high contrast scrollbars, and nobody thought of implementing that? Ask the AI and it changes your desktop interface to do it immediately.
1. > "What’s the voice equivalent of a thumbs-up or a keyboard shortcut?" Current ASR systems are much narrow in terms of just capturing the transcript. there is no higher level of intelligence, even the best of GPT voice models fail at this. Humans are highly receptive of non-verbal cues. All the uhms, ahs, even the pauses we take is where the nuance lies.
2. the hardware for voice AI is still not consumer ready interacting with a voice AI is still doesn't feel private. i am only able to do a voice based interaction only when am in my car. sadly at other places it just feels a privacy breach as its acoustically public. have been thinking about a private microphones to enable more AI based conversations.
Also: https://news.ycombinator.com/item?id=42934190#42935946
Not telling your car to turn left or right, but telling your cab driver you're going to the airport.
This is our usecase at our startup[1] - we want to enable tiny SMBs who didn't have the budget to hire a "video guy", to get an experience similar to having one. And that's why we're switching to a conversational UX (because those users would normally communicate with the "video guy" or girl by sending them a Whatsapp message, not by clicking buttons on the video software)
[1] https://www.onetake.ai
Is anyone actually making any argument like that? The whole piece feels like a giant strawman.
The core loop is promptless ai that’s guided by accessibility x screenshots & it’s everywhere on your Mac.
You can snap this comment section or the front page and we’ll structure it for you if it’s a spreadsheet or write a tweet if you’re on Twitter.
Also, unless I'm missing something, the app is called TabTabTab while its only feature is copy & paste? Tabbing doesn't seem to be mentioned at all. I'm guessing tabbing is involved but there doesn't seem to be a word about it except from users referencing it in the reviews. It seems to only bill itself as "magic copy-paste".
Absolutely agree. An agent running in the background.
Comparing "What's the weather in London" with clicking the weather app icon is misleading and too simplistic. When people imagine a future driven by conversational interfaces, they usually picture use cases like:
1. "When is my next train leaving?"
2. "Show me my photos from the vacation in Italy with yellow flowers on them"
3. "Book a flight from New York to Zurich on {dates}"
...
And a way to highlight what's faster/less-noisy is to compare how natural language vs. mouse/touch maps onto the Intent -> Action. The thing is that interactions like these are generally so much more complex. E.g. Does the machine know what 'my' train is? If it doesn't, can it offer reasonable disambiguation? If it can't, what then? And does it present the information in a way where the next likely action is reachable, or will I need to converse about it?
You could picture a long table listing similar use cases in different contexts and compare various input methods and modalities and their speed. Flicking a finger on a 2d surface or using a mouse and a keyboard is going to be — on average — much faster and with less dead-ends.
Conversational interfaces are not the future. Imo even in the sense of 'augmenting', it's not going to happen. Natural-language driven interface will always play the role of a supporting (still important, though!) role. An accessibility aid when e.g. temporarily, permanently, or contextually not able to use the primary input method to 'encode your intent'.
You know, doesn't matter what you say. If businesses want something, they'll do it to you whether it's the best interface or not.
Amazon forces "the rabble" into their chatbot customer service system, and hides access to people.
People get touchscreens in their car and fumble to turn on their fog lights or defrost in bad weather. They get voice assistant phone trees and angrily yell "operator and agent".
I really wish there were true competition that would let people choose what works for them.
Just infuriating. Instead of a normal date- and timepicker where I could see available slots, it's a chat where you have to click certain options. Then I had to reply "Ja" (yes) when it asked me if I had clicked the correct date. And then when none of the times of the day suited me, I couldn't just click a new date on the previous message, I instead have to press "vis datovelger på nytt"/show datepicker again, and get a new chat message where I this time select a different date and answer "Ja" again to see the available time slots. It's slow and useless. The title bar of the page says "Microsoft Copilot Studio", some fancy tech instead of a simple form..
People who write these posts want to elevate their self value by nay-saying what is popular. I don't understand the psychology but it seems like that sort of pattern to me.
It takes a deliberate blindness to say that AI/LLMs are just some sort of thing that has popped up every few years and this is the same as them and it will fade away. Why would someone choose to be so blind and dismissive of something obviously fundamentally world changing? Again - it's the instinct to knock down the tall poppy and therefore prove that you have some sort of strength/value.
The following is a direct quite from the article:
"None of this is to say that LLMs aren’t great. I love LLMs. I use them all the time. In fact, I wrote this very essay with the help of an LLM."