Run LLaMA 3.2 Offline in React Native with Just a Few Lines of Code!

2 min read1 hour ago

Imagine running advanced AI models like LLaMA 3.2 completely offline within your React Native app — no internet needed, no cloud dependencies. Thanks to ExecuTorch, this dream is now a reality! In this blog, we’ll guide you through integrating LLaMA 3.2 into your mobile app, ensuring privacy, speed, and simplicity.

What is ExecuTorch?

ExecuTorch, developed by Meta, allows PyTorch models to run efficiently on mobile devices and microcontrollers. By converting models into standalone binaries, ExecuTorch ensures that models run locally, enhancing privacy and reducing costs.

Why Use React Native ExecuTorch for Offline AI?

With React Native ExecuTorch, AI models like LLaMA 3.2 can run offline, directly on devices. This means:

Complete Privacy: No cloud, no data sharing.
Low Latency: Fast, local inference.
Cost Efficiency: Save on cloud infrastructure costs.

Getting Started with LLaMA 3.2 Offline

Step 1: Installation

Install the library using your favorite package manager:

npm install react-native-executorch

Configure Metro to recognize model binaries:

// metro.config.js  
defaultConfig.resolver.assetExts.push('pte');  
defaultConfig.resolver.assetExts.push('bin');

Step 2: Running LLaMA 3.2 in React Native

1)Initialise the Model:
Load the model and tokeniser for offline use:

import { useLLM, LLAMA3_2_1B_URL } from 'react-native-executorch';  

const llama = useLLM({  
  modelSource: LLAMA3_2_1B_URL,  
  tokenizer: require('../assets/tokenizer.bin'),  
  contextWindowLength: 3,  
});

2)Send a Message:

await llama.generate('Tell me about ExecuTorch!');

3)Display the Response:

<Text>{llama.response}</Text>

How Offline AI Transforms Mobile Apps

Running AI models offline in React Native brings several benefits:

Data Security: User data stays on the device, ensuring privacy.
Faster Performance: No network delays mean instant responses.
Cost Savings: Avoid cloud service fees and reduce app costs.

Exporting Your Own LLaMA 3.2 Model

Need a custom model? Here’s how to export LLaMA 3.2 for offline use:

Create an Account: Sign up on HuggingFace or the official LLaMA site.
Select Your Model: Choose from various versions, including quantized models for better efficiency.
Download Files: Get the model (consolidated.00.pth), parameters (params.json), and tokenizer.
Rename Tokenizer: Rename tokenizer.model to tokenizer.bin.
Run Export Script:

./build_llama_binary.sh --model-path /path/to/consolidated.00.pth --params-path /path/to/params.json

Conclusion

With React Native ExecuTorch, running AI models like LLaMA 3.2 offline has never been easier. Whether you’re building chatbots, translators, or other AI-driven apps, this solution ensures privacy, efficiency, and simplicity.

📲 Ready to revolutionize your mobile app? Start building smarter, faster, and offline!

Check out sofware mansions live example repo to see ExecuTorch in action! 👀🦙

https://github.com/software-mansion/react-native-executorch/tree/main/examples/llama