AI Technology Trends

Boosting AI Agent Speed and Energy Efficiency

June 25, 2026

Key Takeaways

MIT and Microsoft researchers introduced Murakkab, an intelligent system that automates the design and deployment of agentic AI workflows.
By accepting high‑level intent descriptions from developers, Murakkab selects optimal models, tools, hardware configurations, and resource allocations on the fly.
In tests on video Q&A and code‑generation workloads, Murakkab used ≈35 % of the computation, ≈27 % of the energy, and < 25 % of the cost of traditional methods while preserving performance.
The system can trade‑off accuracy for efficiency—e.g., cutting energy use by >10× with only a 2 % accuracy loss—demonstrating its adaptability to user‑specified priorities.
Future work will scale Murakkab to larger clusters and more complex workflows, aiming to bring resource‑optimal agentic AI to major cloud platforms.

Introduction to Agentic Workflows and Their Inefficiencies

Agentic workflows are AI‑driven software systems that stitch together multiple models and external tools—such as databases, Python scripts, or specialized accelerators—to solve multi‑step tasks like video question answering or automated code generation. As these workflows grow in complexity, they increasingly become the backbone of cloud‑provider services. However, the current practice of hard‑coding every technical decision—model selection, tool chaining, hardware allocation, and trade‑off balancing—creates a fragmented design process. This rigidity often leads to over‑provisioning of computational units, wasted energy, and unnecessary expense, especially when new, more efficient models emerge after deployment.

The Motivation Behind Murakkab

Recognizing the “configuration conundrum” that plagues developers, a team from MIT’s Electrical Engineering and Computer Science department and Microsoft Azure set out to build a system that could intelligently streamline the entire lifecycle of agentic workflows. Lead author Gohar Chaudhry, an EECS graduate student, explained the problem succinctly:

“Agentic workflows are getting very complicated and quickly becoming the backbone of what cloud providers are doing. Energy usage is a huge concern, so we need to be very careful about how efficient these workflows are. It is very easy to over‑allocate resources, wasting energy and money. Enabling a cloud provider to intelligently make these workflows more resource‑optimal is a win for everyone involved.”

The resulting prototype, named Murakkab (an Urdu word meaning “a composition of things”), aims to remove the manual burden of configuration while continuously optimizing for user‑defined objectives such as cost, speed, or energy consumption.

How Murakkab Transforms Developer Intent into Optimized Workflows

Murakkab’s core innovation lies in its ability to accept high‑level, plain‑language specifications from developers instead of requiring exhaustive low‑level details. For example, a developer might simply state:

“Create a video Q&A application that extracts key frames, generates a transcript, and then answers user queries about the video.”

From this intent, Murakkab automatically:

Selects the best‑fit models and tools from a repository of available AI components.
Determines execution ordering, deciding which steps can run in parallel versus sequentially to maximize throughput.
Configures hardware resources (CPU, GPU, memory, etc.) in real time, adapting to the cloud provider’s current capacity and the workload’s demands.

Because these decisions are made dynamically, the system remains resilient to future advancements. As Chaudhry notes,

“The platform makes configuration decisions dynamically over time, so if a new model or GPU accelerator comes out tomorrow, the developer doesn’t need to worry about that.”

Runtime Optimization for Cloud‑Deployed Workflows

When a cloud provider instantiates a Murakkab‑generated workflow for a specific customer request, the system does not stop at static planning. It continuously monitors runtime metrics—latency, throughput, power draw—and re‑allocates resources to satisfy the user’s constraints. If a user prioritizes accuracy but still requires a response within a certain latency window, Murakkab will shift compute toward more precise (though potentially heavier) models while trimming excess capacity elsewhere.

This adaptive capability also gives cloud providers a global view of multiple workloads, enabling them to share computational resources more efficiently across tenants. Chaudhry highlights this benefit:

“Our system also gives cloud providers visibility into multiple workloads, so the provider can share computational resources in the most efficient manner while satisfying the constraints of users.”

Empirical Results: Substantial Savings Without Performance Loss

To validate Murakkab, the researchers evaluated it on a suite of representative agentic workloads, including video question answering and automated code generation. The outcomes were striking:

Computation: Murakkab consumed only ≈35 % of the compute units required by baseline, hand‑tuned workflows.
Energy: Energy usage dropped to ≈27 % of that required by traditional approaches.
Cost: Monetary cost fell to < 25 % of the conventional baseline.

Importantly, these reductions came without sacrificing task performance. In one illustrative case, the system lowered energy consumption by more than an order of magnitude (i.e., >10×) while incurring merely a 2 % decrease in accuracy—a trade‑off many users would find acceptable given the massive efficiency gains.

Chaudhry remarked on the unexpected depth of optimization Murakkab uncovered:

“The system was also able to identify an unexpectedly ideal configuration for a model that selects video frames, optimizing performance for a video Q&A task. This type of optimization would be nearly impossible for a developer to do manually.”

Future Directions and Broader Implications

Encouraged by these results, the team plans to extend Murakkab along several fronts:

Scaling to larger computing clusters and more intricate workflows that involve dozens of agents and heterogeneous hardware.
Supporting emerging AI modalities (e.g., multimodal foundation models) and newer accelerators as they become available.
Exploring cross‑workload optimization at the scale of major cloud platforms, where millions of agentic tasks run concurrently.

The overarching vision is to make agentic AI intrinsically resource‑optimal, thereby reducing the environmental footprint of cloud services while preserving—or even enhancing—the functionality that users expect. As Chaudhry succinctly puts it,

“There is a lot of potential to make these workflows more resource‑optimal so they consume far less energy, but we need to be thinking about this at the scale of major cloud platforms.”

Conclusion

Murakkab represents a paradigm shift from static, developer‑centric workflow design to a dynamic, intent‑driven, self‑optimizing approach. By abstracting away low‑level configuration choices and continuously adapting to both user priorities and runtime conditions, the system delivers dramatic cuts in computation, energy, and cost—up to 65 % less compute, 73 % less energy, and 75 % lower expense—while maintaining performance levels comparable to traditional, manually tuned pipelines. As agentic workflows become ever more central to cloud‑based AI services, innovations like Murakkab will be essential for achieving sustainable, scalable, and economically viable AI at cloud scale.

https://news.mit.edu/2026/improving-ai-agent-speed-and-energy-efficiency-0625

SignUpSignUp form

Modal title