Playing with Pixtral-12B on Jetson Orin AGX

I recently went through the process of getting Pixtral-12B running on an NVIDIA Jetson Orin AGX. Here’s how I did it, in case it’s helpful for others who want to run this model on similar hardware. I’m using Jetpack 6.2.

Initial Configuration

First, I checked the Hugging Face Transformers documentation and found that LLaVA doesn’t have TensorFlow support, so PyTorch was the way to go. The PyTorch installation on Jetson requires setting the CUDA version:

# Add to your shell config (~/.zshrc in my case)
export CUDA_VERSION=12.6

cuSPARSELt Installation

The target PyTorch version (≥ 24.06) has a hard dependency on cuSPARSELt. NVIDIA provides an installation script, but it’s not compatible with CUDA 12.6 from JetPack 6.2. I had to create a custom installer for cuSPARSELt using the most recent build:

#!/bin/bash

set -ex

# cuSPARSELt license: https://docs.nvidia.com/cuda/cusparselt/license.html
mkdir -p tmp_cusparselt && cd tmp_cusparselt

# For Jetson Orin with CUDA 12.6 (JetPack 6.2)
CUSPARSELT_NAME="libcusparse_lt-linux-aarch64-0.7.1.0-archive"
echo "Downloading: ${CUSPARSELT_NAME}"
curl -v --retry 3 -OLs https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-aarch64/${CUSPARSELT_NAME}.tar.xz

# Check if the file was downloaded
ls -la

# Extract and install
tar xf ${CUSPARSELT_NAME}.tar.xz
cp -a ${CUSPARSELT_NAME}/include/* /usr/local/cuda/include/
cp -a ${CUSPARSELT_NAME}/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf tmp_cusparselt
ldconfig

Installing PyTorch and Transformers

After creating a conda virtual environment, I installed PyTorch using the wheel file specifically built for Jetson:

pip install --no-cache https://developer.download.nvidia.cn/compute/redist/jp/v61/pytorch/torch-2.5.0a0+872d972e41.nv24.08.17622132-cp310-cp310-linux_aarch64.whl

Then I installed Hugging Face Transformers:

pip install transformers

Troubleshooting

When I tested if Transformers was working correctly:

python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('we love you'))"

I got an error suggesting I needed to downgrade NumPy. After fixing that and installing Pillow:

pip install --upgrade numpy==1.24.4
pip install pillow

The test command worked successfully.

Running Pixtral-12B

Finally, I created a Python script based on the usage example from the Hugging Face model page for Pixtral-12B and was able to run the model successfully.

#copy paste from https://huggingface.co/mistral-community/pixtral-12b
from PIL import Image
from transformers import AutoProcessor, LlavaForConditionalGeneration
model_id = "mistral-community/pixtral-12b"
model = LlavaForConditionalGeneration.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained(model_id)

IMG_URLS = [
"https://picsum.photos/id/237/400/300",
"https://picsum.photos/id/231/200/300",
"https://picsum.photos/id/27/500/500",
"https://picsum.photos/id/17/150/600",
]
PROMPT = "<s>[INST]Describe the images.\n[IMG][IMG][IMG][IMG][/INST]"

inputs = processor(text=PROMPT, images=IMG_URLS, return_tensors="pt").to("cuda")
generate_ids = model.generate(**inputs, max_new_tokens=500)
output = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

That’s it! The setup takes some time, especially the cuSPARSELt installation, but it’s worth it to get this powerful multimodal model running on edge hardware.