▲Launch HN: Cuckoo (YC W25) – Real-time AI translator for global teams

81 points by yonghee 119 days ago | 29 comments

ryooit 119 days ago [-]

This looks super useful for global teams dealing with technical discussions! I’m curious about how Cuckoo handles domain-specific jargon beyond what’s included in uploaded reference documents. For instance, in fields like AI/ML or DevOps, terminology evolves rapidly, and even human interpreters sometimes struggle with nuanced technical meanings.

Does Cuckoo adapt dynamically to new terms within a conversation, or does it require preloading domain knowledge beforehand? Also, how do you ensure accuracy in cases where direct translation doesn’t capture the intended meaning (e.g., idiomatic phrases or cultural context differences)?

Excited to see how this evolves!

yonghee 119 days ago [-]

Hey Ryoo, thanks for the question.

Right now, we have a set of “industry presets” where we have preloaded keywords and context for different industries (GPU, LLM, GPT for AI, for example).

Over time, we want our users to build upon these preset terms, for example, automatically adding the terms mentioned in different meetings. There is a challenge here—how do we add terms that may be mispronounced or that the LLM may have mixed up? I think having the context of their conversation and their base documents for these conversations could definitely help.

donbox 119 days ago [-]

Congrats on the launch! Does it support Hinglish (Hindi + English)? I'm a native Hindi speaker, but I found the Hindi in the demo video hard to follow. Many Hindi speakers, especially in everyday conversations, naturally mix Hindi and English — often using English for technical or difficult terms. Would love to know if the model handles that kind of language blend

yonghee 119 days ago [-]

My batchmate was saying the same thing, actually. He was trying to use only Hindi for the demo sakes and it was almost difficult for him to explain his product without referring to English.

We do our best to deal with langauge changes. For example, when talking bio, almost half of the sentence is in English terms and Cuckoo does pretty well in that context as well!

joeevans1000 119 days ago [-]

Plot twist!

119 days ago [-]

joshdavham 119 days ago [-]

Congrats on the launch!

Also question: your writing makes you seem quite bilingual and fluent in English. Given this, would you consider yourself a user of your own product? Do you often find yourself needing to use it? It strikes me that the main users would be people who struggle with English specifically. Though I guess with recent innovations in China, potentially more English speakers will start needing to translate from Chinese.

yonghee 119 days ago [-]

I love this question!

Yes, I am bilingual. I was fortunate enough to study both in Korea and Canada.

I use our product every day when I’m meeting with customers in Japan and China. We joke that we are our very first customers. Personally, it’s best when I get to meet them in person and use our in-person meeting feature since I get to see their reactions.

I would say half of our users are fluent in English since they mostly work for U.S. companies. The other half would be people in Korea, Japan, China, and more who need the language support.

throw_m239339 116 days ago [-]

Congrats.

I remember 10+ years ago there were a few Kickstarter promising this kind of product but with an hardware device. Obviously they were all fraudulent back then, but it is definitely in the realm of possibility today.

dleeftink 119 days ago [-]

Love it! Friction is part of (language) learning, so hopefully some doses will remain down the line.

yonghee 119 days ago [-]

Cuckoo x Duolingo - the best combo!

aresant 119 days ago [-]

This is awesome!

Is there a consumer version available?

Or is there a company focused on that side of the business?

yonghee 119 days ago [-]

While we are focusing on business use cases, we are seeing a few invidivuals sign up for their own uses.

Email me at yonghee@cuckoo.so so I can help you out with first few months!

DavidaGinter 118 days ago [-]

Interesting! Do you have an API (or plan to have one)? could be very relevant to use in our platform, which combines integrations requested by our users

yonghee 118 days ago [-]

Not an official API for now but maybe we can help out. Mail me! (yonghee@cuckoo.so)

mandeepj 119 days ago [-]

I see you are using these two words - Interpreter and Translator - interchangeably! They aren't same; there's a big difference between them [0].

From your demo, I gather you are a translator, which is a big let-off for me. Reading and understanding text is much slower than just listening. Also, spoken words are just 30ish% of the overall communication. I'm afraid while your users would be busy in reading translated text, they'd lose out on other vital communication cues like hand gestures, facial expressions etc.

Is real-time audio interpretation in the pipeline?

[0] - https://www.google.com/search?q=translator+vs+interpreter&oq...

yonghee 119 days ago [-]

I agree with your observation about "Interpreter" vs. "Translator."

When we first started this project, we referred to it as an "interpreter." However, after speaking with human interpreters and considering their feedback, we settled on "real-time translation". We might have left some of our past on the internet tho..

As with everything, there are both advantages and limitations to text-based translations. Here are a few:

Limitations:

- Some people may find it challenging to follow gestures and expressions while reading.

- In more one-way scenarios, such as presentations and webinars, hearing the speaker’s voice often feels more natural.

Pros:

- Many users actually prefer text because it allows them to hear the speaker’s original voice and pick up on nuances.

- Having a written record enables post-meeting summaries and the opportunity to repurpose transcripts into other materials, such as blog posts, custom user manuals, JIRA notes, and more using AI.

- There are also technical constraints with voice-to-voice translations, which currently tend to be turn-based rather than real-time (streaming) - not ideal for exchange of ideas.

That said, we are excited to see how the TTS and STT technologies evolve and are looking forward to experimenting with “interpretation” in the future!

bityard 119 days ago [-]

> Reading and understanding text is much slower than just listening.

Speak for yourself! I read _much_ faster than listening to someone saying the same thing. This is why I can't stand subtitles on videos, movies, and tv shows. Because of how my brain works, I can't help but read the text. And when it's there, I'm done reading the person's line when they are only 25-50% through speaking it. So it "feels" like I'm watching a show where everyone repeats the last half of every sentence.

> Is real-time audio interpretation in the pipeline?

When I saw the headline, I assumed the product was doing real-time translation and voice cloning in one. Now _that_ would be an interesting use of AI. (Google and others have been doing real-time voice recognition and text translation for years.)

mandeepj 118 days ago [-]

> This is why I can't stand subtitles on videos, movies, and tv shows. Because of how my brain works, I can't help but read the text. And when it's there, so it "feels" like I'm watching a show where everyone repeats the last half of every sentence.

Ha! Translation is done in real time, but subtitles are not!! Were you thinking they are processed the same way? That's your confusion.

> I'm done reading the person's line when they are only 25-50% through speaking it.

How can an AI system translate someone when they haven't even spoken those words yet? Please check the title of the Post - it's a real-time system.

> I read _much_ faster than listening to someone saying the same thing.

Everyone reads and/or speaks at a different speed. You can pause a movie, but not a meeting, during the first time. You don't have to make any critical decisions while consuming entertainment, but on the contrary, at work, you might have to listen, process, understand, and connect the dots into various other subsystems and conclude how they may or may not affect your standing. At the end, might have to challenge the speaker or add to what they are saying. A lot of variables.

yonghee 119 days ago [-]

We are also excited about real-time translation + voice cloning (like having your K-pop stars speaking your language with their voices!) This is actually something we explored previously. The tech is there but we weren't sure of the the user experience, especially in terms of latency.

Maybe we'll have this for Cuckoo 2.0!

nicolinox 118 days ago [-]

for this check www.palabra.ai

givemeethekeys 119 days ago [-]

> Reading and understanding text is much slower than just listening.

While watching their demo video, I had no trouble reading and interpreting the translated English at the speed the conversations were going on. There's a chance that some speakers would speak much quicker, but I think this software covers the vast majority of use-cases.

Real-time translation is a great start. I'm sure these models can be tweaked over time for better interpretation, especially given that they learn based on context.

There is another aspect to this: the small pauses forced by technology will give people just enough time to think, which is welcome in a business meeting.

Full disclosure: I am not a user or a customer, but this looks like it is something I would one day want to use if the opportunity presents itself.

yonghee 119 days ago [-]

I think it’s similar to Netflix subtitles—some people prefer subtitles and dislike voice-overs, while others opt for the dubbed versions.

I also believe that as the meeting progresses, it feels more natural, and participants become aware of the translator. (Interestingly, they often start speaking more clearly and using fuller sentences, just as they would with human interpreters!)

Thanks for your comment. I hope you give it a try!

carlosjobim 119 days ago [-]

In a business setting, the only communication that has any legal standing is the spoken word and the written word.

yonghee 119 days ago [-]

That’s a great point. An added benefit of our approach is that it provides a written version of the conversation in multiple languages.

(Of course, some users may prefer to remove the conversation entirely for data security and privacy reasons.)

mandeepj 119 days ago [-]

> In a business setting, the only communication that has any legal standing is the spoken word and the written word.

Legal, huh? How many indictments have you seen come out of a business meeting? The expression representation is very much part of a work environment. Someone may not say a word while hearing a crazy idea, but they'll certainly roll an eye.

carlosjobim 119 days ago [-]

A business is a legal construct for organizing work, it's nothing more. And that's always present. You can't tell your boss that he should have known you weren't going to do the task you were asked to, because you rolled your eyes in the meeting. If you didn't communicate it verbally or in writing, then you haven't communicated it at all.

mandeepj 111 days ago [-]

I don't understand why some of the folks in this thread are so confused! It's just basic English.

> You can't tell your boss that he should have known you weren't going to do the task you were asked to, because you rolled your eyes in the meeting. If you didn't communicate it verbally or in writing, then you haven't communicated it at all.

We are talking about "expressions" and just "expression" in a meeting; that's it. You don't talk about those things which you had mentioned in a meeting where someone is presenting on a topic.

_rm 119 days ago [-]

*Phone calls* also please.

yonghee 119 days ago [-]

Yes, definitely.

Loading comments...

ryooit 119 days ago [-]

Excited to see how this evolves!

yonghee 119 days ago [-]

Hey Ryoo, thanks for the question.

Right now, we have a set of “industry presets” where we have preloaded keywords and context for different industries (GPU, LLM, GPT for AI, for example).

donbox 119 days ago [-]

yonghee 119 days ago [-]

My batchmate was saying the same thing, actually. He was trying to use only Hindi for the demo sakes and it was almost difficult for him to explain his product without referring to English.

We do our best to deal with langauge changes. For example, when talking bio, almost half of the sentence is in English terms and Cuckoo does pretty well in that context as well!

joeevans1000 119 days ago [-]

Plot twist!

119 days ago [-]

joshdavham 119 days ago [-]

Congrats on the launch!

yonghee 119 days ago [-]

I love this question!

Yes, I am bilingual. I was fortunate enough to study both in Korea and Canada.

I would say half of our users are fluent in English since they mostly work for U.S. companies. The other half would be people in Korea, Japan, China, and more who need the language support.

throw_m239339 116 days ago [-]

Congrats.

dleeftink 119 days ago [-]

Love it! Friction is part of (language) learning, so hopefully some doses will remain down the line.

yonghee 119 days ago [-]

Cuckoo x Duolingo - the best combo!

aresant 119 days ago [-]

This is awesome!

Is there a consumer version available?

Or is there a company focused on that side of the business?

yonghee 119 days ago [-]

While we are focusing on business use cases, we are seeing a few invidivuals sign up for their own uses.

Email me at yonghee@cuckoo.so so I can help you out with first few months!

DavidaGinter 118 days ago [-]

Interesting! Do you have an API (or plan to have one)? could be very relevant to use in our platform, which combines integrations requested by our users

yonghee 118 days ago [-]

Not an official API for now but maybe we can help out. Mail me! (yonghee@cuckoo.so)

mandeepj 119 days ago [-]

I see you are using these two words - Interpreter and Translator - interchangeably! They aren't same; there's a big difference between them [0].

Is real-time audio interpretation in the pipeline?

[0] - https://www.google.com/search?q=translator+vs+interpreter&oq...

yonghee 119 days ago [-]

I agree with your observation about "Interpreter" vs. "Translator."

As with everything, there are both advantages and limitations to text-based translations. Here are a few:

Limitations:

- Some people may find it challenging to follow gestures and expressions while reading.

- In more one-way scenarios, such as presentations and webinars, hearing the speaker’s voice often feels more natural.

Pros:

- Many users actually prefer text because it allows them to hear the speaker’s original voice and pick up on nuances.

- Having a written record enables post-meeting summaries and the opportunity to repurpose transcripts into other materials, such as blog posts, custom user manuals, JIRA notes, and more using AI.

- There are also technical constraints with voice-to-voice translations, which currently tend to be turn-based rather than real-time (streaming) - not ideal for exchange of ideas.

That said, we are excited to see how the TTS and STT technologies evolve and are looking forward to experimenting with “interpretation” in the future!

bityard 119 days ago [-]

> Reading and understanding text is much slower than just listening.

> Is real-time audio interpretation in the pipeline?

mandeepj 118 days ago [-]

Ha! Translation is done in real time, but subtitles are not!! Were you thinking they are processed the same way? That's your confusion.

> I'm done reading the person's line when they are only 25-50% through speaking it.

How can an AI system translate someone when they haven't even spoken those words yet? Please check the title of the Post - it's a real-time system.

> I read _much_ faster than listening to someone saying the same thing.

yonghee 119 days ago [-]

Maybe we'll have this for Cuckoo 2.0!

nicolinox 118 days ago [-]

for this check www.palabra.ai

givemeethekeys 119 days ago [-]

> Reading and understanding text is much slower than just listening.

Real-time translation is a great start. I'm sure these models can be tweaked over time for better interpretation, especially given that they learn based on context.

There is another aspect to this: the small pauses forced by technology will give people just enough time to think, which is welcome in a business meeting.

Full disclosure: I am not a user or a customer, but this looks like it is something I would one day want to use if the opportunity presents itself.

yonghee 119 days ago [-]

I think it’s similar to Netflix subtitles—some people prefer subtitles and dislike voice-overs, while others opt for the dubbed versions.

Thanks for your comment. I hope you give it a try!

carlosjobim 119 days ago [-]

In a business setting, the only communication that has any legal standing is the spoken word and the written word.

yonghee 119 days ago [-]

That’s a great point. An added benefit of our approach is that it provides a written version of the conversation in multiple languages.

(Of course, some users may prefer to remove the conversation entirely for data security and privacy reasons.)

mandeepj 119 days ago [-]

> In a business setting, the only communication that has any legal standing is the spoken word and the written word.

carlosjobim 119 days ago [-]

mandeepj 111 days ago [-]

I don't understand why some of the folks in this thread are so confused! It's just basic English.

We are talking about "expressions" and just "expression" in a meeting; that's it. You don't talk about those things which you had mentioned in a meeting where someone is presenting on a topic.

_rm 119 days ago [-]

*Phone calls* also please.

yonghee 119 days ago [-]

Yes, definitely.