Skip to content
Patient Square
Patient SquareCODE-MIXING
Hinglish In, English Notes Out: How Code-Mixing Is Handled
A

Hinglish In, English Notes Out: How Code-Mixing Is Handled

By Patient Square Team · · 6 min read

A lot of Indian consults aren't in Hindi or in English. They're in both, mid-sentence, with the drug name in English and the rest in Hindi. That's code-mixing, and it breaks ordinary transcription tools. AI Scribe by Patient Square captures the mixed speech and hands back a clean clinical English note. The rule underneath is simple: the input can be Hinglish, the output is always English. Here's a worked example of that happening, on a different case than our Hindi and languages post walks through.

Key takeaways

  • The rule, stated plainly: input is multilingual (Hindi, Hinglish, regional), output is clean clinical English. Every note, no exceptions.
  • Code-switching mid-sentence is a known weak point for tools trained on one language. The English drug name inside a Hindi sentence is where they fail.
  • The note is the record; the audio isn't. Visit audio is processed in memory and discarded once the note is drafted.
  • This post uses a fresh case (a monsoon fever consult), distinct from the diabetes example in the main languages post.
20+

Indian languages captured, code-mixing detected automatically

English

note output language, every time, whatever the spoken mix

0

transcripts or recordings kept: audio discarded once the note is drafted

What does a code-mixed consult sound like, really?

Here's a different scene from the one in our main languages post. A monsoon-season fever case, the kind every Indian OPD sees by the dozen from July. The exchange below is an illustration, not a real patient:

Doctor: "Kitne din se fever hai? Aur body pain ke saath?" Patient: "Teen din se, doctor. Tez bukhar aata hai, phir thoda kam ho jaata hai. Body pain bahut hai, aur aankhon ke peeche dard hota hai. Platelet ka kuch problem to nahi?" Doctor: "Dengue ka season hai, isliye main ek CBC aur NS1 antigen test likh raha hoon. Abhi sirf paracetamol lijiye fever ke liye, koi aspirin ya ibuprofen nahi. Fluids zyada rakhiye. Agar platelet count kam aaya ya bleeding dikhe to turant aana."

Three turns, half a dozen language switches, and a clinical decision braided through it: a duration, a symptom cluster, two named investigations, a drug to take, two drugs to avoid, a red-flag safety net. A general Hindi transcriber would garble "CBC", "NS1 antigen", "paracetamol", "aspirin", "ibuprofen", and "platelet". A general English transcriber would drop the Hindi entirely and lose the history. Neither leaves you with a record you could sign.

What does the note look like afterward?

Clean clinical English, structured, ready to review:

S: Three days of intermittent high-grade fever with severe body ache and retro-orbital pain. Patient anxious about platelet count. Monsoon season. O: Febrile illness, clinical assessment pending labs. A: Acute febrile illness, dengue fever to be ruled out. P: Order CBC and NS1 antigen. Paracetamol for fever; avoid aspirin and ibuprofen. Advise increased oral fluids. Return immediately if platelet count falls or bleeding appears.

The patient asked about platelets in Hindi. The chart records "retro-orbital pain," "NS1 antigen," and "avoid aspirin and ibuprofen" in English. That inversion, Hinglish conversation in, English record out, is the whole job. It's what makes the note useful to a pathologist running the CBC, a TPA processing a claim, or a court that won't read Hinglish.

Hinglish

spoken in: "teen din se tez bukhar... aankhon ke peeche dard... platelet ka problem to nahi?"

English

note out: dengue to rule out; order CBC + NS1 antigen; paracetamol, avoid aspirin/ibuprofen; return if platelets fall

What's the rule, stated plainly?

Input multilingual, output English. That's it, and it's worth saying without hedging.

You and your patient can speak Hindi, Hinglish, Tamil, Bengali, Marathi, whatever puts the consultation at ease. The scribe captures that. But the note it produces, the SOAP note, the ICD-10 suggestions, and the prescription draft, is always generated in clean clinical English. We don't produce a Hindi-language note, and we're explicit about that, because the record has a different audience than the conversation. The conversation belongs to the patient. The record goes to the referral specialist, the insurer, the medico-legal file, all of which run in English.

AI Scribe by Patient Square is an ambient AI medical scribe that listens during the visit and hands back a structured SOAP note, ICD-10 suggestions, and a prescription draft, ready to review and sign about two minutes after the visit. The structuring and the language conversion happen in the same step. You don't get a Hinglish transcript to clean up afterward. You get an English note to read and sign.

Why do ordinary tools choke on Hinglish?

Because they were trained to listen in one language, and a real consult doesn't cooperate.

Code-switching, changing language within a single utterance, is a recognised hard problem in speech recognition. Models trained predominantly on one language lose accuracy on the "other" language the moment a speaker flips, and that's exactly where the clinically load-bearing words live: the English drug name in the Hindi sentence, the lab abbreviation, the dose. Research on Hindi-English code-switched speech, including efforts to build dedicated code-switched corpora, exists precisely because off-the-shelf monolingual models handle this so poorly.

So "supports Hindi" on a vendor's feature page is not the same claim as "handles a real Hinglish consult." Capturing clean Hindi audio and correctly transcribing "platelet count kam aaya to turant aana" are different engineering problems. The second one is the one that matters in your OPD, and it's the one we built for. If a vendor only advertises the first, test the second on day one of a trial.

Does this work when the room is noisy or the signal drops?

Both are the real Indian OPD, so both have to work.

A crowded room with a relative answering half the questions is the hard case for any scribe, and it's where transcription quality separates products. And in tier-2 and tier-3 clinics, the connection itself isn't reliable. Capture works offline with on-device encryption and syncs when the signal returns, so a dead tower during a monsoon-fever rush doesn't cost you the note. The language handling is identical offline; connectivity changes when the note is ready, not what it understands.

And the audio doesn't linger. Visit audio is processed in memory and discarded the moment the note is drafted, which is also the right answer under the DPDP Act 2023. There's no Hindi transcript and no recording archived anywhere. The full posture is on our security page.

How do you test the Hinglish handling before you trust it?

On your own patients, with your own accents. A demo on a scripted sentence proves nothing.

Take the 7-day free trial, no card, and run it on a real OPD session, the messier the better: a relative interrupting, a regional accent, a sentence that starts in Hindi and ends with a drug name in English. Then read the English notes and ask whether they're clean enough to sign without a rewrite, and whether the drug names and lab values survived the language switch intact. That's the only test that tells you anything. If the India rate card is your starting question instead, every number there is in rupees with GST shown.

When you want to see the Hinglish handling on your own consults, book a demo and bring your hardest code-mixed case.

FAQ

Common questions

What is code-mixing in a medical consult?

Code-mixing, or Hinglish in north India, is switching languages within a single sentence: a Hindi clause, an English drug name, a Hindi clause again. It is how a huge share of Indian consults are actually conducted. AI Scribe by Patient Square captures the mixed speech and writes the note in clean clinical English, so the record is usable even though the conversation was not in one language.

Will my note be in Hinglish or proper English?

Clean clinical English, always. This is the core rule: the input can be Hindi, Hinglish, or a regional language, but the output note is English. You and your patient speak however is comfortable; the SOAP note, ICD-10 suggestions, and prescription draft come out in the English your referrals, insurers, and records actually use.

Why is code-mixing harder than plain Hindi for transcription tools?

Most speech models are trained on one language at a time, so when a speaker flips mid-sentence the model mangles the other half. The English drug name buried in a Hindi sentence is exactly where ordinary tools fail. Handling a real Hinglish consult is a different engineering problem from supporting Hindi audio, and many vendors only do the second.

Does the scribe keep the Hindi transcript too?

No. Visit audio is processed in memory and discarded once the note is drafted, so there is no Hindi transcript and no audio archive sitting around. What remains is the structured English note you reviewed and signed. A raw transcript in any language is not a medical record; a clean English note is.

Which regional languages besides Hindi are covered?

English and Hindi plus 20-plus Indian languages, including Tamil, Telugu, Bengali, Marathi, Gujarati, Kannada, Malayalam, and Punjabi, with code-mixed speech detected automatically. Whatever the spoken mix, the note is produced in clean clinical English.

Sources

  1. HiACC: a Hindi-English code-switched speech corpus (research on code-switched ASR difficulty).
  2. Irving G, et al. International variations in primary care physician consultation time: 67 countries. BMJ Open, 2017.

Finish your notes before the patient reaches the front desk.