TL;DR: If AI is the core of your React Native app, don’t rely on wrappers like whisper.rn and llama.rn. They lag behind the native libraries, expose only a subset of the API, and you’ll spend more time on workarounds than building features. Go native for the AI layer.
Like many others recently, AI is inevitable and we are rapidly seeing it become part of our daily lives. I mean, as developers we have been for a while now, but I believe just recently we’ve all seen that there is no going back. It makes us faster. Maybe before we were looking at it like an aid, a nice to have, but if it went away we would continue doing what we did before. We have the knowledge, we are capable engineers, we are good. But now I believe we’re all seeing it as a must have.
As an owner of a software development agency, I wasn’t getting any projects that actually required me diving deep into LLMs. Just one with minimal RAG usage, nothing crazy so far. Until last year I saw a need in my life that made me dive deep.
It was the need to record meetings with clients in order to not miss or recap requirements, details on bugs and features I was sometimes missing while spacing out on meetings occasionally. Nothing crazy, but I couldn’t find a tool that gave me offline private local LLM on a mobile device with transcription. I found some that locally transcribed offline, but if we’re being honest no one is reading through an hour or two worth of text. I needed summaries and the ability to chat with my notes to find things and recap things. So I set out to build the app myself. It would record audio, transcribe, summarize, give key points and key dates, and let me chat with my notes. All local, all private, on-device LLM, no cloud, no subscriptions. So I built Viska.
I have been a hardcore React Native user since 2017. I picked it up immediately and never looked back. I was a fan and have spent nearly a decade utilizing it for every single mobile app project without exception, so naturally I was going to use it to build Viska. And I almost regret it.
Not because of React Native itself. React Native is fine. The problem is what sits between your JavaScript and native APIs or “low level” external dependencies, wrappers. These are new times, and for regular RAG utilizing APIs like OpenRouter or directly OpenAI or Anthropic I would have not bat an eye, this would have been a breeze. But we were building fully on-device LLM that requires tapping into all the device’s resources. We need speed, reliability, up-to-date libraries, and bleeding edge updates are happening frequently.
The Setup
Viska uses two main LLM-focused libraries:
- whisper.rn, a React Native wrapper around whisper.cpp
- llama.rn, a React Native wrapper around llama.cpp
This sounded great! I got everything I need. whisper.cpp is the standard for transcribing and I have it in React Native, and llama.cpp is the standard for local LLM and I got it on React Native. Great, right!?
Wrappers Always Lag Behind
This is the core problem and everything else flows from it.
whisper.cpp gets a performance improvement. Great. Now you wait for the whisper.rn maintainer to update their bindings, test them, and publish a new version. That takes weeks, honestly sometimes months depending on the priority.
Meanwhile, you’re shipping a production app with a slower version of the engine or without a feature or improvement because there’s literally nothing you can do about it. You can’t just bump a dependency. The wrapper has to catch up first.
The AI space moves ridiculously fast. New quantization methods, new model architectures, performance optimizations, they land in whisper.cpp and llama.cpp weekly. The wrappers? They’re perpetually behind. It’s not the maintainers’ fault. It’s structural. A wrapper, by definition, reacts to upstream changes. It can never lead.
You Get the Subset, Not the Full API
This was the first issue I came across and it was a brutal one. Took me weeks to work through. I initially had chosen react-native-nitro-sound for the audio recording side of things because I needed to go as native as possible, as reliable as possible in recording sound. Someone hits record on an important meeting, that recording better start immediately AND it better not crash. The user is counting on the app to record and then maybe let their phone sit there and not look at it. Although I can’t guarantee that, I better give it everything I got to minimize the risk. I added a crash guard also but that’s a different story.
Anyway, whisper.cpp supports multiple audio formats. WAV, MP3, various encodings. whisper.rn? WAV only. And can you guess what react-native-nitro-sound doesn’t support directly? Yeah. There are ways around it, for example in iOS I set the type to LPCM and that can be converted to WAV in a good enough way to work with whisper.rn. But Android, oh boy was that not the case. Nothing could be done about it because it was not true WAV, it would just not work. Impossible. I tried everything. After various painful days I just made it to where Android uses something completely different, and even then I have to basically manipulate the entire metadata and rewrite the entire audio to WAV format on device without FFmpeg, because that’s another nightmare on React Native I am not gonna get into.
And let’s be clear, this was not a bug. It’s how wrappers work. Every wrapper is an opinion about which parts of the native API matter. Sometimes that opinion doesn’t match what your app needs.
The Android GPU Problem
Once all that was done I jumped into llama.rn and at first everything seemed smooth, until I started to test Android. On my iPhone the LLM response was basically instant, and the iPhone I had was 8GB of RAM. When I tested on my Android that has 16GB, TWICE as much, it was painfully slow! My gosh, what the hell, can I catch a break? I must have set something up wrong. I was aware of the Metal offloading in iOS so I know RAM is not the only thing that matters, but twice as much RAM and I have to wait like 3-5 seconds for the synthesis? Long story short, Android’s GPU usage is so fragmented in the ecosystem there is not a unified API to tap into it. Companies use different methods. Google uses its Tensor chip, Samsung is on Snapdragon. Makes me want to cry just thinking about it again. Anyway, whisper.cpp is using the latest access to the GPU on certain devices to offload, but llama.rn doesn’t, so I just had to sit this one out. Android will just have to be slower for now, but I am constantly keeping tabs.
Would You Do It Again?
If I were starting Viska today, I’d go native without hesitation. I didn’t know the intensity and the bare metal I would be needing.
I mean that literally. Build the AI layer in Swift/Kotlin. Use whisper.cpp and llama.cpp directly. Take the hit on having platform-specific code for the part of your app that actually matters.
You can still use React Native for the UI if you want. But the AI engine, the thing your users are paying for, should talk directly to the native libraries. No middleman. No wrapper. No waiting.
“But that’s more work.” Yes. It is. You know what’s also more work? Debugging phantom crashes in a binding layer at 2 AM before a release. Waiting eight weeks for a wrapper update that unlocks a feature your competitor shipped last month. Writing workarounds for API limitations that don’t exist in the native library.
When Wrappers Are Fine
I’m not saying wrappers are always wrong. If AI is a minor feature in your app, maybe you’ve got a chat assistant tucked in a settings screen, or you’re doing some light text classification, a wrapper is probably fine. The convenience wins. The lag doesn’t matter as much because AI isn’t your differentiator.
But if AI is the product, if you’re keeping everything internal, if you need the latest and greatest, if it’s the reason someone downloads your app, you cannot afford a layer of indirection maintained on someone else’s schedule.
Reassess, Adjust, Now I Know
The AI ecosystem is moving mad fast for wrappers to keep up. That’s not going to change. If anything, it’s accelerating.
Every week there’s a new Whisper variant, a new competitive model, another company releasing something new, a new quantization scheme, a new LLaMA optimization. The native libraries absorb these almost immediately. The wrappers trail behind by weeks or months. And in a market where users compare your transcription quality to the app that updated last Tuesday, those weeks matter.
I shipped Viska with wrappers. It works. But I’m probably going to be restricted and play the waiting game no one should be playing right now with AI, and I just can’t do that. Maybe I start moving everything to the native side. Maybe I rewrite fully native. One thing I know is for this era I can’t rely on middlemen. I need to speak to the man himself.
Build accordingly.