Comparing Large Language Models and Traditional Machine Translation Tools for Translating Medical Consultation Summaries: Quantitative Pilot Feasibility Study

Background: Translation of medical consultation summaries is essential for equitable health care communication in culturally and linguistically diverse populations. While machine translation (MT) tools and large language models (LLMs) are widely accessible, their feasibility and safety for health care contexts remain underexplored. Objective: This pilot study investigates the feasibility and limitations of using LLMs and traditional MT tools to translate medical consultation summaries from English into the most common languages other than English spoken in Australia—Arabic, Chinese (simplified written form), and Vietnamese. Methods: Two simulated summaries—a simple patient-facing summary and a complex clinician-oriented interprofessional letter—were translated using 3 LLMs (GPT-4o, Llama-3.1, and Gemma-2) and 3 MT tools (Google Translate, Microsoft Bing Translator, and DeepL). Translations were benchmarked against professional third-party interpreter translations using Bilingual Evaluation Understudy, Character-level F-score, and Metric for Evaluation of Translation with Explicit Ordering metrics. Results: The translation performance varied across languages, tools, and summary complexity when assessed using automatic evaluation metrics. Traditional MT tools outperformed LLMs on surface-level metrics, while LLMs showed relative strengths in semantic similarity for Vietnamese and Chinese. Arabic translations improved with complex input, suggesting morphological advantages. The metric-based evaluation highlighted feasibility but also risks, particularly in Chinese clinical contexts. Conclusions: This pilot study provides formative evidence of opportunities and limitations in applying artificial intelligence translation for health care communication. Findings underscore the importance of human oversight; domain-specific evaluation metrics; and further formative and clinical research to guide the safe, equitable use of artificial intelligence translation tools.
<img src="https://jmir-production.s3.us-east-2.amazonaws.com/thumbs/b86925d16f121fdeb31e2fefcf1227ba" />

STAT+: Trump goes soft on insurance, and a medical underwriting chart

This is the online version of STAT’s weekly email newsletter Health Care Inc. Sign up here.

We watched the Artemis II astronauts splash down safely last week. A reminder that legitimately amazing things can still happen. Parachute your thoughts here: bob.herman@statnews.com.

Tough talk, soft stance

A few months ago, President Trump confidently said he would be meeting with the country’s largest health insurance companies to pressure them to lower their premiums. The message was just that — a message to give the appearance that Trump officials were willing to crack down on health insurers, which have been at the center of Americans’ disdain of the health care system for decades.

Continue to STAT+ to read the full story…

Why opinion on AI is so divided

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

In an industry that doesn’t stand still, Stanford’s AI Index, an annual roundup of key results and trends, is a chance to take a breath. (It’s a marathon, not a sprint, after all.)

This year’s report, which dropped today, is full of striking stats. A lot of the value comes from having numbers to back up gut feelings you might already have, such as the sense that the US is gunning harder for AI than everyone else: It hosts 5,427 data centers (and counting). That’s more than 10 times as many as any other country.  

There’s also a reminder that the hardware supply chain the AI industry relies on has some major choke points. Here’s perhaps the most remarkable fact: “A single company, TSMC, fabricates almost every leading AI chip, making the global AI hardware supply chain dependent on one foundry in Taiwan.” One foundry! That’s just wild.

But the main takeaway I have from the 2026 AI Index is that the state of AI right now is shot through with inconsistencies. As my colleague Michelle Kim put it today in her piece about the report: “If you’re following AI news, you’re probably getting whiplash. AI is a gold rush. AI is a bubble. AI is taking your job. AI can’t even read a clock.” (The Stanford report notes that Google DeepMind’s top reasoning model, Gemini Deep Think, scored a gold medal in the International Math Olympiad but is unable to read analog clocks half the time.)

Michelle does a great job covering the report’s highlights. But I wanted to dwell on a question that I can’t shake. Why is it so hard to know exactly what’s going on in AI right now?  

The widest gap seems to be between experts and non-experts. “AI experts and the general public view the technology’s trajectory very differently,” the authors of the AI Index write. “Assessing AI’s impact on jobs, 73% of U.S. experts are positive, compared with only 23% of the public, a 50 percentage point gap. Similar divides emerge with respect to the economy and medical care.”

That’s a huge gap. What’s going on? What do experts know that the public doesn’t? (“Experts” here means US-based researchers who took part in AI conferences in 2023 and 2024.)

I suspect part of what’s going on is that experts and non-experts base their views on very different experiences. “The degree to which you are awed by AI is perfectly correlated with how much you use AI to code,” a software developer posted on X the other day. Maybe that’s tongue-in-cheek, but there’s definitely something to it.

The latest models from the top labs are now better than ever at producing code. Because technical tasks like coding have right or wrong results, it is easier to train models to do them, compared with tasks that are more open-ended. What’s more, models that can code are proving to be profitable, so model makers are throwing resources at improving them.

This means that people who use those tools for coding or other technical work are experiencing this technology at its best. Outside of those use cases, you get more of a mixed bag. LLMs still make dumb mistakes. This phenomenon has become known as the “jagged frontier”: Models are very good at doing some things and less good at others.

The influential AI researcher Andrej Karpathy also had some thoughts. “Judging by my [timeline] there is a growing gap in understanding of AI capability,” he wrote in reply to that X post. He noted that power users (read: people who use LLMs for coding, math, or research) not only keep up to date with the latest models but will often pay $200 a month for the best versions. “The recent improvements in these domains as of this year have been nothing short of staggering,” he continued.

Because LLMs are still improving fast, someone who pays to use Claude Code will in effect be using a different technology from someone who tried using the free version of Claude to plan a wedding six months ago. Those two groups are speaking past each other.

Where does that leave us? I think there are two realities. Yes, AI is far better than a lot of people realize. And yes, it is still pretty bad at a lot of stuff that a lot of people care about (and it may stay that way). Anyone making bets about the future on either side should bear that in mind.

<![CDATA[Exercise boosts recovery in depression and psychosis, and improves cognition and quality of life; clinicians can now use the 5A model to make movement a routine psychiatric tool.]]>

Neural Mechanism Underlying Sensory Behavior Revealed in C. elegans

Animal behavior reflects a complex interplay between an animal’s brain and its sensory surroundings. In a new study published in Nature Neuroscience titled, “Neural sequences underlying directed turning in Caenorhabditis elegans,” researchers from Massachusetts Institute of Technology (MIT) have shown how neuron circuits within C. elegans nematode worms respond to odors and generate movement as they pursue favorable versus unfavorable smells. The results inform understanding of the basic principles of the sensory nervous system for therapeutic applications. 

“Across the animal kingdom, there are just so many remarkable behaviors,” said Steven Flavell, PhD, associate professor at the Picower Institute at MIT, Howard Hughes Medical Institute (HHMI) investigator, and corresponding author of the study. “With modern neuroscience tools, we are finally gaining the ability to map their mechanistic underpinnings.” 

Whether moving toward a food source or away from a predator, animals must integrate sensory stimuli to navigate to favorable locations. The neural circuits for navigation are tasked with generating directed movement while simultaneously integrating sensory input to update behavior. Understanding how neural circuits select, execute and adapt sensory-guided navigation behaviors uncovers basic principles of how nervous systems are organized to integrate sensory information and control behavior. 

In C. elegans, the authors identified error-correcting turns during navigation and used whole-brain calcium imaging and cell-specific perturbations to determine their neural underpinnings. Defined neurons activated in a stereotyped order during each turn. Distinct neurons in this sequence respond to the spatial distribution of attractive and aversive olfactory cues, anticipate upcoming turn directions and drive movement, linking key features of this sensorimotor behavior across time. 

“One thing that really excited us about this study is that we were able to see what a sensorimotor arc looks like at the scale of a whole nervous system: all the bits and pieces, from responses to the sensory cue until the behavioral response is implemented,” Flavell said.  

The electrical activity of more than 100 neurons was tracked during sensory movement. Notably, C. elegans only have 302 neurons total. Instead of random movements, the worms executed turns with advantageous timing and at well-chosen angles.  

The activity of SAA neurons was crucial for integrating odor detection with planned movement and predicted the direction of upcoming turns. Several neurons showed different activity patterns depending on the location of odors were and whether the worm was moving forward or in reverse. 

Additionally, the neuromodulator, tyramine, was essential for turning and shifting gears. When the worms moved in reverse, tyramine from the neuron RIM enabled other neurons in the sequence to change their activity appropriately to execute the turns. In several experiments, the scientists knocked out RIM tyramine, which disrupted the navigation behaviors and the sequence of neural activity. 

The post Neural Mechanism Underlying Sensory Behavior Revealed in <i>C. elegans</i> appeared first on GEN – Genetic Engineering and Biotechnology News.

Co-Design of a Depression Self-Management Tool for Adolescent and Young Adult Cancer Survivors: Rapid Qualitative Analysis of Interview Feedback on a Prototype

<strong>Background:</strong> Over 2.1 million adolescent and young adult cancer survivors (AYACS) live in the United States. Recent estimates suggest that up to one-third of AYACS experience major depressive disorder. Although several efficacious evidence-based interventions are available to manage symptoms of depression, these interventions are often inaccessible to AYACS who have many competing commitments. Digital mental health tools hold promise for this population; however, only a few have been tailored to meet the unique needs of AYACS, and findings to date have yielded mixed results. <strong>Objective:</strong> This study aims to obtain feedback from AYACS on a mid-fidelity prototype of a depression self-management tool being tailored for AYACS. <strong>Methods:</strong> Individuals with a history of cancer diagnosed at age 12 or older who were between the ages of 15 and 39 and had completed primary treatment were identified through a review of medical records from a comprehensive cancer center in the Southeastern United States. Potentially eligible participants were contacted by study staff to conduct additional screening and obtain informed consent via REDCap (Research Electronic Data Capture; Vanderbilt University). Upon enrollment, participants provided demographic and clinical information, as well as their availability for an interview. The principal investigator (KMI) conducted semistructured individual interviews with consented AYACS. Most of the interview was dedicated to showing participants the mid-fidelity prototype of the tool, explaining how the prototype might work, and requesting targeted feedback. Demographic and clinical characteristics, as well as some aspects of feedback on the prototype, were summarized using descriptive statistics. Interviews were audio- and video-recorded and transcribed. The transcriptions underwent rapid qualitative analysis guided by the Rigorous and Accelerated Data Reduction technique. <strong>Results:</strong> A total of 14 AYACS (n=9, 64%, female; n=9, 64%, white; ages 15-38) completed an individual interview. Participant preferences for mood tracking, content presentation, user input, and duration of use were captured qualitatively but analyzed quantitatively. For example, most participants (n=10, 71%) indicated that they preferred a mood-tracking option that included emojis and would be willing to track their mood at least once per day (n=11, 79%). Participant preferences captured qualitatively fell into 4 themes: (1) features to promote user engagement (eg, the use of gamification); (2) tailored content presentation (eg, authenticity in the portrayal of the cancer experience); (3) perceived usability (eg, simplifying user input); and (4) interface design (eg, implementing a coherent design theme and color scheme). <strong>Conclusions:</strong> Findings indicated that AYACS highly value personalization, flexibility, and peer support in digital interventions. Based on insights obtained during individual interviews, a working prototype was developed by reprogramming an existing digital tool. Qualitative and quantitative findings informed modifications to the existing digital tool. The working prototype will next undergo evaluation as part of a pilot full-factorial trial.

STAT+: GSK advancing ovarian cancer drug mo-rez

Want to stay on top of the science and politics driving biotech today? Sign up to get our biotech newsletter in your inbox.

It’s been a minute since I’ve wished you a good morning. Morning!

We’ve got some big news on Revolution Medicines’ pancreatic cancer treatment. But don’t miss GSK’s move to push an ovarian cancer ADC into five Phase 3 trials after striking early data. And Spyre Therapeutics released some competitive ulcerative colitis results. 

Continue to STAT+ to read the full story…

<![CDATA[Artemis II astronauts spark a psychiatrist’s look at “joy trains,” music, and meaning—practical ways teams rekindle joy and prevent burnout.]]>

Want to understand the current state of AI? Check out these charts.

If you’re following AI news, you’re probably getting whiplash. AI is a gold rush. AI is a bubble. AI is taking your job. AI can’t even read a clock. The 2026 AI Index from Stanford University’s Institute for Human-Centered Artificial Intelligence, AI’s annual report card, comes out today and cuts through some of that noise. 

Despite predictions that AI development may hit a wall, the report says that the top models just keep getting better. People are adopting AI faster than they picked up the personal computer or the internet. AI companies are generating revenue faster than companies in any previous technology boom, but they’re also spending hundreds of billions of dollars on data centers and chips. The benchmarks designed to measure AI, the policies meant to govern it, and the job market are struggling to keep up. AI is sprinting, and the rest of us are trying to find our shoes.

All that speed comes at a cost. AI data centers around the world can now draw 29.6 gigawatts of power, enough to run the entire state of New York at peak demand. Annual water use from running OpenAI’s GPT-4o alone may exceed the drinking water needs of 12 million people. At the same time, the supply chain for chips is alarmingly fragile. The US hosts most of the world’s AI data centers, and one company in Taiwan, TSMC, fabricates almost every leading AI chip. 

The data reveals a technology evolving faster than we can manage. Here’s a look at some of the key points from this year’s report. 

The US and China are nearly tied

In a long, heated race with immense geopolitical stakes, the US and China are almost neck and neck on AI model performance, according to Arena, a community-driven ranking platform that allows users to compare the outputs of large language models on identical prompts. In early 2023, OpenAI had a lead with ChatGPT, but this gap narrowed in 2024 as Google and Anthropic released their own models. In February 2025, R1, an AI model built by the Chinese lab DeepSeek, briefly matched the top US model, ChatGPT. As of March 2026, Anthropic leads, trailed closely by xAI, Google, and OpenAI. Chinese models like DeepSeek and Alibaba lag only modestly. With the best AI models separated in the rankings by razor-thin margins, they’re now competing on cost, reliability, and real-world usefulness. 

Chart of the performance of top models on the Arena by select providers, showing the Arena score from May 2023 to Jan 2026 with the models all trending upward.  The scores are tightly packed by US based Anthropic, xAI, Google and OpenAI lead Alibaba, DeepSeek and Mistral (in that order.) Meta trails the pack.

The index notes that the US and China have different AI advantages. While the US has more powerful AI models, more capital, and an estimated 5,427 data centers (more than 10 times as many as any other country), China leads in AI research publications, patents, and robotics. 

As competition intensifies, companies like OpenAI, Anthropic, and Google no longer disclose their training code, parameter counts, or data-set sizes. “We don’t know a lot of things about predicting model behaviors,” says Yolanda Gil, a computer scientist at the University of Southern California who coauthored the report. This lack of transparency makes it difficult for independent researchers to study how to make AI models safer, she says.

AI models are advancing super fast

Despite predictions that development will plateau, AI models keep getting better and better. By some measures, they now meet or exceed the performance of human experts on tests that aim to measure PhD-level science, math, and language understanding. SWE-bench Verified, a software engineering benchmark for AI models, saw top scores jump from around 60% in 2024 to almost 100% in 2025. In 2025, an AI system produced a weather forecast on its own.  

“I am stunned that this technology continues to improve, and it’s just not plateauing in any way,” says Gil.

line chart of Select AI Index technical performance benchmarks vs human performance, showing that skills such as image classification, English language understanding, multitask language understanding, visual reasoning, medium level reading comprehension, multimodal understanding and reasoning have surpassed the human baseline at or before 2025, with autonomous software engineering, mathmatical reasoning and agent multimodal computer use trending towards meeting the human baseline by 2026.

However, AI still struggles in plenty of other areas. Because the models learn by processing enormous amounts of text and images rather than by experiencing the physical world, AI exhibits “jagged intelligence.” Robots are still in their early days and succeed in only 12% of household tasks. Self-driving cars are farther along: Waymos are now roaming across five US cities, and Baidu’s Apollo Go vehicles are shuttling riders around in China. AI is also expanding into professional domains like law and finance, but no model dominates the field yet. 

But the way we test AI is broken

These reports of progress should be taken with a grain of salt. The benchmarks designed to track AI progress are struggling to keep up as models quickly blow past their ceilings, the Stanford report says. Some are poorly constructed—a popular benchmark that tests a model’s math abilities has a 42% error rate. Others can be gamed: when models are trained on benchmark test data, for example, they can learn to score well without getting smarter. 

Because AI is rarely used the same way it’s tested, strong benchmark performance doesn’t always translate to real-world performance. And for complex, interactive technologies such as AI agents and robots, benchmarks barely exist yet. 

AI companies are also sharing less about how their models are trained, and independent testing sometimes tells a different story from what they report. “A lot of companies are not releasing how their models do in certain benchmarks, particularly the responsible-AI benchmarks,” says Gil. “The absence of how your model is doing on a benchmark maybe says something.” 

AI is starting to affect jobs

Within three years of going mainstream, AI is now used by more than half of people around the world, a rate of adoption faster than the personal computer or the internet. An estimated 88% of organizations now use AI, and four in five university students use it. 

It’s early days for deployment, and AI’s impact on jobs is hard to measure. Still, some studies suggest AI is beginning to affect young workers in certain professions. According to a 2025 study by economists at Stanford, employment for software developers aged 22 to 25 has fallen nearly 20% since 2022. The decline might not be pinned on AI alone, as broader macroeconomic conditions could be to blame, but AI appears to be playing a part.

two line charts showing the normalized headcount trends by age group from 2021 through 2025. On the left for software developers the early career (age 22-25) cohort drops rapidly after a peak in September 2022, with other ages still rising albeit less steeply.  On the right, customer support agents see a similar trend, although the decline for the early career group is less steep than for software developers.

Employers say that hiring may continue to tighten. According to a 2025 survey conducted by McKinsey & Company, a third of organizations expect AI to shrink their workforce in the coming year, particularly in service and supply chain operations and software engineering. AI is boosting productivity by 14% in customer service and 26% in software development, according to research cited by the index, but such gains are not seen in tasks requiring more judgment. Overall, it’s still too early to understand the bigger economic impact of AI. 

People have complicated feelings about AI 

Around the world, people feel both optimistic and anxious about AI: 59% of people think that it will provide more benefits than drawbacks, while 52% say that it makes them nervous, according to an Ipsos survey cited in the index. 

Notably, experts and the public see the future of AI very differently, according to a Pew survey. The biggest gap is around the future of work: While 73% of experts think that AI will have a positive impact on how people do their jobs, only 23% of the American public thinks so. Experts are also more optimistic than the public about AI’s impact on education and medical care, but they agree that AI will hurt elections and personal relationships.

Bar chart of US perceptions of AI's societal impact contrasting US adults with AI experts, with the percentage of AI experts saying that AI will have a positive impact in the next 20 years is 2-3 times higher than the US adults.  The most optimistic AI experts are in the field of medical care with 84% predicting a positive outcome (versus 44% of US adults.) The greatest difference is for jobs with experts polling at 73% and US adults  polling at 23%.  Both groups have a similar (11% for experts and 9% of adults.) expectation for a positive outcome for AI in elections.

Among all countries surveyed, Americans trust their government least to regulate AI appropriately, according to another Ipsos survey. More Americans worry federal AI regulation won’t go far enough than worry it will go too far. 

Governments are struggling to regulate AI

Governments around the world are struggling to regulate AI, but there were some minor successes last year. The EU AI Act’s first prohibitions, which ban the use of AI in predictive policing and emotion recognition, took effect. Japan, South Korea, and Italy also passed national AI laws. Meanwhile, the US federal government moved toward deregulation, with President Trump issuing an executive order seeking to handcuff states from regulating AI. 

Despite this federal action, state legislatures in the US passed a record 150 AI-related bills. California enacted landmark legislation, including SB 53, which mandates safety disclosures and whistleblower protections for developers of AI models. New York passed the RAISE Act, requiring AI companies to publish safety protocols and report critical safety incidents.

line chart showing the number of AI-related bills passed into law by all US states from 2016-2025, which increases sharply in 2023 and peaks with 150 bills in 2025.

But for all the legislative activity, Gil says, regulation is running behind the technology because we don’t really understand how it works. “Governments are cautious to regulate AI because … we don’t understand many things very well,” she says. “We don’t have a good handle on those systems.”