As artificial intelligence moves beyond large data centres and hyperscale infrastructure, the need for scalable, energy-efficient, and decentralised AI computing is becoming increasingly important. Igneta D’Souza, Co-founder & Chief Product Officer, Ziroh Labs, in conversation with Abhineet Kumar, Elets News Network, discusses how Kompact AI is redefining AI inferencing through a CPU-first approach, enabling distributed AI deployment, fostering digital sovereignty, and creating a new category within the global AI infrastructure stack.
Editor Experts:
At a time when hyperscalers are doubling down on GPU clusters, Ziroh Labs is betting on CPU-first AI. What structural inefficiencies in current AI infrastructure does Kompact AI aim to solve?
The problem statement that hyperscalers attempt to solve is that of a very large scale, and that is why they are known as hyperscalers, which means that if there is an application and the application wants to provide AI inferencing services for thousands of users at the same time, then such kind of problem statements are generally solved by hyperscalers. Therefore, hyperscalers need to solve this kind of problem, and there is no alternative but GPUs. Therefore, hyperscalers produce GPUs on a massive scale. These GPUs are then used to provide concurrent inference for many thousands of users simultaneously.
Kompact AI, on the other hand, is trying to solve a different problem. Our goal with Kompact AI is to solve everyday problems with AI. Everyday problems include, for example, putting out an AI assistant for a doctor in a chamber, putting out a small model in a retail shop, maybe a shopping mall, putting out a very good model for learning, maybe learning Japanese in a school, and then putting out a small model that works in a factory setting, so on and so forth.
In this type of problem statement, you do not have 10,000 users accessing at the same time. For example, there is no school in India with 10,000 students, and they will go and learn Japanese at the same time. Yes, 100 students will learn Japanese, not 10,000.
There is no factory in India where 10,000 biscuit packets come out at once, and you have to check whether the biscuits are good or bad. There is no such factory. Therefore, the problem we are solving with Kompact AI is more distributed and decentralised, and it can be solved using small language models because it’s a niche requirement.
A factory that wants to solve the problem of determining biscuit quality is only trying to determine biscuit quality. It is not trying to solve a different problem, like helping a student learn Japanese. Similarly, in a school, when a student is learning Japanese, his problem is not solving the biscuit problem at the factory.
Therefore, the goals of hyperscalers, the problem statements they are trying to solve, and the problem statements we are trying to solve with Kompact AI are different. But ultimately, the goal of each one of us, Kompact AI as well as hyperscalers, is only one thing: “How can you make AI pervasive, ubiquitous, and help people to uplift their lives?” Approaches differ because problem statements differ.
Running LLM inference on CPUs has traditionally faced latency and efficiency challenges. What architectural or algorithmic innovations allowed you to achieve 3x performance without compromising model quality?
We have designed several techniques that allow us to run a model on both a CPU and a GPU, retaining full quality while achieving very high throughput. We must emphasise that speed is only one dimension. The second important dimension in the problem for AI inferencing is quality.
If there is no quality and absolutely high speed, it does not make sense. On the other hand, it has to be of high quality and have adequate speed. As we described in your first question, we are not targeting models with more than 32 billion parameters.
And we are also not targeting use cases with 10,000 concurrent users. Therefore, with the inventions we have made over the last two to three years at Ziroh Labs and Kompact AI, we are able to maintain full quality and provide high throughput.
How does Kompact AI optimise across model compression, memory management, and parallelisation to make CPUs competitive with GPU-based inference?
We have developed multiple proprietary techniques that extract the algebraic components of a model and utilise them to create a highly efficient execution plan for a CPU.
It is very important to realise that we work on the mathematical constructs of a model. So, mathematical constructs, for example, you need the group query attention. It includes the various kinds of feedforward networks. It includes various kinds of normalisation techniques. It includes the composition of these different types of techniques. And then, of course, the composition of the different blocks of a model.
So the work we primarily do is on algebraic structures. It’s from the algebraic structures that the models are designed. Therefore, we apply multiple different techniques to optimise those algebraic structures.
One of the points we wanted to make here is that the techniques we have developed aren’t just for a CPU. It can also be deployed for a GPU. If it is deployed for a GPU, the throughput you will observe will be higher than what you are getting today.
And we are very happy to share with you that in the coming months, in a couple of quarters, Kompact AI will also be available on GPU as well. Yeah, in India and other emerging economies, the problems are almost similar. There is a large population.
In markets like India and other emerging economies, where costs and access to computing are major barriers, how do you see CPU-based AI reshaping enterprise and public-sector adoption?
The resources are almost nonexistent or very constrained. And therefore, solutions to such problems require readily available systems, do not require new energy provisioning, and can therefore be scaled quickly. Again, returning to the answer we provided in the first question, KompactAI is geared towards distributed and decentralised systems.
It’s not about creating a single monolithic AI application and deploying the application to serve 1.4 billion people. Kompact AI is about creating small, elegant and efficient solutions that can be deployed to serve 1.5 billion people with their various needs, in a distributed, decentralised manner with no centralisation. Not with any centralisation.
By default, because of the decentralisation and the distribution, the localisation and sovereignty of AI are supported in Kompact AI.
Your collaboration with IIT Madras and IITM Pravartak suggests deep research integration. How critical is academia-industry collaboration in building globally competitive deep-tech products from India?
Yes, we are collaborating not only with IIT Madras, but also very closely with IIT Guwahati. There is a lot of work underway in this regard.
And we feel that, because Kompact AI is a work of science and engineering, it is very important for us to be associated with institutions that focus heavily on science and deep engineering, and therefore we are.
With the rapid evolution of edge AI and distributed computing, do you see Kompact AI playing a role beyond data centres, perhaps in edge devices, telecom networks, or industrial systems?
Yes, Kompact AI is designed precisely for those systems that you mentioned: edge devices, telecom networks, or industrial systems.
How do you position Ziroh Labs in the global AI infrastructure stack? Are you building an alternative to GPU-led ecosystems or enabling a hybrid compute future?
In the global AI infrastructure, we are actually creating a new category. And the new category is a new inferencing architecture that is scalable, has no energy constraints, and can be deployed across the nook and corner of the globe with very limited resources.
Be a part of Elets Collaborative Initiatives. Join Us for Upcoming Events and explore business opportunities. Like us on Facebook , connect with us on LinkedIn and follow us on Twitter, Instagram.
"Exciting news! Elets technomedia is now on WhatsApp Channels Subscribe today by clicking the link and stay updated with the latest insights!" Click here!



