The emergence of the web data infrastructure layer for AI

AI is booming. New use cases are emerging each day. To capitalize on the technology’s potential, enterprises require data at scale. In many cases, though, the relevant information is blocked or unstructured, which limits its use by AI models. 

To understand this challenge, consider the foundation of the web itself. The web was not designed for the automated discovery and retrieval that new AI applications demand. Overcoming this inherent design constraint requires infrastructure.

The next frontier in AI may depend on a new web data infrastructure layer that can enable models to discover and map this ever-expanding digital realm. This layer must be able to navigate hundreds of millions of existing web domains and billions of new URLs created each week, delivering real-time information and overcoming technical barriers.

“The data suggests there’s far more data out there,” says Or Lenchner, CEO of Bright Data, a web data collection platform. “Think of the universe: It’s out there, but you don’t know what you don’t know.”

Enabling access to fresh, relevant, and trustworthy data

While early AI breakthroughs were driven by scaling training data and model size, organizations are now encountering a fundamental bottleneck: They need to keep pace with the dynamic, unstructured, and constantly evolving nature of web data in order to ground outputs in current and verifiable information. AI performance increasingly depends not just on model architecture but on a system’s compute, networking, retrieval, and data engineering capabilities—that is, the system’s ability to quickly and reliably retrieve data that is fresh, relevant, and trustworthy.

Traditional model training relies on snapshots of information collected at a particular point in time. Training AI on such static data is no longer sufficient. To track fluctuations such as competitor pricing, consumer sentiment, and market trends, companies need a constant feed of new information, pulling data in real time along with relevant context. Their infrastructure must therefore be able to handle millions of simultaneous interactions across websites that vary by geography, language, format, and access rules.

“If it can’t retrieve real-time information, it lacks context,” Lenchner says. “In a business setting, that’s not acceptable anymore. Stale answers lead to bad decisions and disappointed consumers.”

Speed is not merely a matter of convenience; it’s a matter of necessity. Today’s organizations operate in environments where prices, inventory, markets, security threats, and customer behavior change continuously. Delayed data retrieval can reduce the usefulness of an otherwise sophisticated model.

Using live, high-quality web data can also reduce AI hallucinations because the model has a more relevant knowledge base. This builds user trust. In fact, one survey found that 56% of AI practitioners said businesses need access to real-time web data to improve trust in AI outputs. To ensure the model runs efficiently and effectively, the information must also be pared down to the appropriate essentials. 

Despite the introduction of retrieval-augmented generation (RAG), where models pull in external data at the moment of a query, many AI systems still struggle to deliver outputs that are current, contextually relevant, and trustworthy in operational settings. According to Gartner, 60% of AI projects that are not supported by AI-ready data—accurate, structured, organized, and contextualized—will be abandoned by the end of the year. 

This is because large-scale retrieval alone does not solve the problem. As Lenchner puts it, “You need to retrieve data at scale, but also in real time. Latency becomes an issue because of the end user who is waiting for the output.” 

Accessing fresh, AI-ready data at scale introduces technical and structural challenges. In practice, many enterprise systems combine public web retrieval with APIs, licensed datasets, and proprietary internal data in their AI applications. Integrating these fragmented sources into a timely and usable knowledge layer requires specialized capabilities. Some research has found that 97% of AI organizations depend on real-time web data infrastructure, but 90% feel boxed in by various restrictions. Companies are increasingly developing technical approaches to navigate these constraints.

Lenchner draws this metaphor: “Think of the trained model as intelligence and relevant data as knowledge. A powerful intelligence layer sitting on top of a hollow knowledge layer is like a genius who knows nothing—useless in practice. Intelligence and knowledge have to come together.”

The promise of new infrastructure

A new layer of web data infrastructure can address this developing need for stronger AI inputs by enabling discovery of data, real-time access, and tailoring to a specific context. As Lechner describes it, “It’s all about collecting data at scale, super-low latency, without being blocked.”

Rather than relying on increased computing power, this type of platform emulates human browsing behavior to access available content and transform raw code into structured data feeds. It can work with websites that might not interact with traditional scraping tools, such as those heavy in JavaScript, or with aggressive antibot software. 

As Lenchner explains, “It’s basically having infrastructure that can mimic a web user with identifying information—IP address, location, and 1,000 more parameters. And at scale. Think of doing that 80 billion times a day for millions of websites. And every single time, you are looking exactly as the website expects you to look.”

Of course, continuous retrieval introduces new data governance challenges. To address them, platforms can enforce strict compliance protocols aligned with global privacy frameworks, such as the EU’s General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA). They can also be limited to openly accessible, public information, avoiding paywalls or private logins. Any networks used can be vetted and consent-based, and incentives can be provided to owners of IP addresses. In this way, systems can be designed to comply with tightening regulation.

Such complex capabilities do not come easy. “When this is critical infrastructure for a company,” Lenchner says, “doing it in-house becomes a full-time engineering problem that competes with the actual AI work.” Addressing this complexity requires organizations to commit significant resources, leading many to seek specialized platforms designed specifically for data retrieval, orchestration, and observability.

Infrastructure for the real world

Real-time data retrieval is changing what AI systems can do inside organizations. For example, a retail company can use public information to enable a dynamic pricing engine, and global brands can track trademark infringements. 

As the ecosystem matures, organizations that invest in this emerging data infrastructure layer will be better positioned to build AI systems that are more responsive, reliable, and aligned with real-world conditions—AI systems that can continuously adapt using current web data. Over time, the distinction between AI models and the infrastructure that feeds them may even begin to disappear.

As Lenchner says, “The world is changing. And everything that is happening in the world is being uploaded to the public web. The amount of new data that is being generated is growing and accelerating.”

To learn more from Bright Data, read the Data for AI 2026 report.

This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff. It was researched, designed, and written by human writers, editors, analysts, and illustrators. This includes the writing of surveys and collection of data for surveys. AI tools that may have been used were limited to secondary production processes that passed thorough human review.

Opinion: How I used public radio to recruit 20,000 participants for a peer-reviewed study on walking breaks

Journalists don’t usually appear in the byline of peer-reviewed scientific papers. But recently, I received an email I’d been waiting on for nearly three years: A prestigious journal had accepted the findings from a study I helped lead with more than 20,000 participants across all 50 states. It was published Tuesday evening in the British Journal of Sports Medicine.

My team at NPR had joined forces with physiologist Keith Diaz’s at Columbia University Medical Center to test his lab findings. Specifically, we invited people to try taking movement breaks every 30 minutes, every hour, or every two hours. Our goal was to test whether short walking breaks, which have been shown to offset some of the damage of our sedentary, screen-bound lives, were actually feasible out in the real world.

Read the rest…

STAT+: Want high-quality generic drugs? One expert has ideas on how consumers can trust their supply

For many years, generic drugs have accounted for roughly 90% of the prescriptions doled out to Americans thanks to their lower cost. Yet reliable supplies have been an issue due to inconsistent quality — more than 60% of the generic shortages have been attributed to quality concerns, according to the Food and Drug Administration. Numerous manufacturers, many based in India, have been cited for violating manufacturing protocols that led to product recalls and, sometimes, bans on sending drugs to the U.S.

But Kevin Schulman, a professor and deputy director of the Clinical Excellence Research Center at the Stanford University School of Medicine, believes a solution is within reach. Schulman — who has also worked with an independent lab called Valisure that found impurities in some widely used medicines — argues the FDA should encourage testing by independent, accredited laboratories.

We recently spoke with him about the subject. This is an edited version of our conversation.

Continue to STAT+ to read the full story…

<![CDATA[Clinicians empower schizophrenia patients with shared decisions and flexible treatment options—oral, long-acting injectable, or transdermal—to improve adherence, trust, and remission potential.]]>

When OCD Is Loud, Trust Your Higher Power

by Annabella Hagen, LCSW

When I met Marie, she shared how faith and her connection with a Higher Power had always been important in her life. Her parents taught her that faith could be an anchor during hard times.

But Marie also had a genetic predisposition to obsessive compulsive disorder (OCD). When doubts and fears began to take over, she slowly lost confidence that she could ever feel peace again. Without knowing it, the more she tried to reason with the thoughts, fight them, or seek reassurance, the stronger they became.

Her OCD changed themes as she grew up. The voice within whispered different fears at different times:

“You may hurt the kids you’re babysitting.”
“You caused your granny’s pneumonia because you didn’t wash your hands well enough.”
“Am I going blind?”
“Why do these ugly images come into my head in sacred places? I must stop them.”

She tried to “fix” her doubts. But the more she focused on them, the more they grew. They distracted her from what mattered most — including her relationship with her Higher Power. She blamed herself for not feeling close to God. She felt ashamed and spiritually broken.

Many people with OCD blame themselves for their unwanted thoughts. They panic.

“Why would I think this?”
“What does this say about me?”
“Am I a terrible person?”

No matter what Marie did, she could not find certainty. She could not get enough reassurance. She wished she could control her thoughts and feelings. Because she couldn’t, she became very hard on herself. Her self-compassion slowly disappeared.

But here is something important: every human being — whether they have OCD or not — experiences disturbing thoughts, images, or impulses at times. Research going back decades, including studies like Rachman and de Silva (1978), shows that intrusive thoughts are common in the general population.

The difference is not the content of the thoughts. The difference is how often they come, how intense they feel, and how much distress they cause.

When someone without OCD has a strange thought, they may feel uncomfortable and say, “That was weird,” and move on.

But someone with OCD feels a strong need to solve the doubt. They may analyze it, argue with it, pray repeatedly, seek reassurance, or try to push it away. Without realizing it, these efforts make the thoughts louder and more frequent. This is how the OCD cycle grows.

Understanding this can bring hope. It means the problem is not your faith. It is the pattern.

And the good news is that OCD is not only genetic or neurological. It is also behavioral. That means you can learn to respond differently!

Thoughts and feelings are like the weather. They come and go. When we fight them or try to control them, they often stay longer.

You can learn to let them be.

Through Exposure and Response Prevention (ERP), you can practice moving toward what matters most — your faith, your family, your values — even when doubt is present. Instead of trying to silence the thoughts, you can choose not to follow the urge to fix them.

The first step is awareness.

You may already notice the unwanted thoughts. But can you notice how you respond?

Ask yourself gently:

  • Do I try to get rid of emotional pain right away?
  • Do I avoid situations because they trigger anxiety and doubts?
  • When I feel an urge, do I automatically act on it?
  • Can I see that thoughts are just thoughts, not facts?

These small moments of awareness begin to weaken the cycle.

As you practice new responses, you can begin shaping new pathways in your brain. Slowly, you can move closer to the connection with your Higher Power that you have been longing for.

Thoughts come and go. What matters most is what you choose to do.

You can act in faith and trust your Higher Power, even when the OCD voice is loud. That voice feels powerful, but it is not your identity. It does not define your relationship with God.

Change takes time. It takes practice. But it is possible. And it is worth it!

And you can find your way back!

Remember, OCD may try to use your faith as a weapon, your faith is not the problem—the disorder is. OCD is a health condition that seeks certainty where faith invites trust.

If you find yourself in a cycle of “loud” thoughts and repetitive compulsions—like over-praying, seeking constant reassurance, or fearing you’ve lost your connection to the divine—know that healing is possible.

To help more individuals like Marie navigate these challenges, the International OCD Foundation has released a comprehensive new brochure specifically for people of faith.

Download the “OCD is Not What You Think It Is” Brochure here or visit the Faith & OCD Resource Page to find more specialized support and information.

The post When OCD Is Loud, Trust Your Higher Power appeared first on International OCD Foundation.

The Download: record-breaking subsea tunnels and flexible data centers

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.

Inside the world’s deepest and longest subsea road tunnel

—Niall Firth

I’m currently around 1,000 feet beneath the North Sea, in a dark, dank cave. It smells weird. And I’m increasingly aware of the pressure from millions of tons of seawater just above my head.

I’m under the iconic fjords of Norway to visit what will soon become the world’s longest and deepest subsea road tunnel—an exceptional engineering feat that will carry drivers deep beneath the North Sea.

I’m here to understand how you make a 16.6-mile highway that sits 1,280 feet below the sea at its deepest point. And also—at a time when it can feel hard to get anything done—to reassure myself that ambitious engineering is still possible. That we can still make things. 

Step inside Norway’s Rogfast tunnel and see how engineers are making it happen.

This story is from the next edition of our magazine, which is all about engineering. Subscribe now to get a copy when it lands on Wednesday!

Want to get a data center online quickly? Give it some flex.

The AI boom is putting unprecedented pressure on the electric grid. But rather than rushing to build new power plants, companies could find part of the solution right under our noses—or, more precisely, in the transmission lines under our feet and above our heads.

If data centers can limit the power they draw during high-demand stretches, they won’t need to wait for big infrastructure upgrades or build their own off-grid generation.

The idea of flexibility isn’t entirely foreign to grid operators. But a new generation of software could make the process faster, smarter, and more precise for the AI era.

Find out how the challenge of powering AI could lead to a smarter, more flexible grid.

—Amos Zeeberg

The must-reads

I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.

1 SK Hynix has overtaken Samsung as South Korea’s most valuable company
It’s also now the world’s most valuable memory chipmaker. (Reuters $)
+  And one of the biggest beneficiaries of the global AI boom. (BBC)
+ AI’s need for memory chips is set to skyrocket device prices. (WSJ $)

2 Trump says he no longer views Anthropic as a national security threat
“Well, not now, but a week ago, maybe,” he told The Axios Show. (Axios)
+ He praised the response of Anthropic CEO Dario Amodei. (Reuters $)
+ Anthropic’s IPO outcome could depend on the midterms. (WSJ $)
+ A culture war tactic against Anthropic has backfired. (MIT Technology Review)

3 SpaceX has received the lowest possible ESG rating
Index provider MSCI gave the company a triple C. (Financial Times $)
+ Russia got the same score after invading Ukraine. (Business Times)
+ Elon Musk previously called ESG metrics the “Devil Incarnate.” (CNBC)

4 A Tesla on Autopilot allegedly crashed into a Texas home and killed a woman
The driver said his Tesla Model 3 was in self-driving mode. (NYT $)
+ Tesla’s AI trainers don’t trust its self-driving tech. (Reuters $)

5 Polymarket reportedly paid creators to post fake betting videos
Clips showed them winning big on bets they would have really lost. (WSJ $)
+ Polymarket bets on an Iran deal are fueling insider-trading fears. (Bloomberg $)

6 Physicists have proposed that black holes don’t exist
They may be something much stranger: “gravastars.” (404 Media)
+ This is the first ever photo of a black hole. (MIT Technology Review)

7 A daring space rescue mission is set to launch this week
A spacecraft will try to lift an observatory into a safer orbit. (Space)
+ We’re putting more stuff into space than ever. (MIT Technology Review)

8 Nothing’s next budget phone has been cancelled due to “RAMageddon”
The company said memory prices pushed costs too high. (The Verge $)
+ Buying a used phone makes more sense than ever. (Wired $)

9 A viral doomsday scenario aims to pierce Europe’s AI complacency
It envisions the US and China tearing Europe into pieces. (Guardian)

10 Scientists have invented a way to brew espresso with ultrasonic waves
No hot water required. (Wired $)

Quote of the day

“Even before we start reaping the benefits of AI in our devices, we are already paying the bill.” 

—Francisco Jeronimo, an analyst at IDC, tells CNBC that consumers are covering the costs of the ongoing memory shortage.

One More Thing

Bill Kirwa drives for Wasili, an Uber-style ridesharing company

BRIAN OTIENO


How mobile money supercharged Kenya’s sports betting addiction

As the lorry he’d flagged down lurched through Kenya’s western highlands, Bill Kirwa’s Infinix smartphone dinged with a notification. The bet of 3,500 shillings he’d placed with mobile money—then worth approximately $35—had just turned into nearly $8,500.

Kirwa, now 26, put the windfall to good use, purchasing a car that enabled him to drive for Wasili, an Uber-style ride-hailing service. But he continued gambling, and over time, his losses mounted. In just a few years, he’s effectively erased his big win.  

Kirwa’s experience is hardly unique. Across Africa, the rapid spread of smartphones and mobile money has fueled an explosion in online gambling. But nowhere is the craze as acute as it is in Kenya. Find out why.

—Jonathan W. Rosen

We can still have nice things

A place for comfort, fun, and distraction to brighten up your day. (Got any ideas? Drop me a line.)

+ A clever Bengal cat has seemingly learned to understand English—and talk back.
+ This list of the 100 greatest bird names lovingly captures the quirks of avian taxonomy.
+ Darth Vader’s weird chestplate transforms into a cassette player in these reworked Star Wars clips.
+ Trace the history and evolution of heavy metal music through the interactive genres and playlists of Map of Metal.