Running Microsoft Phi-2 on Ollama and LlamaIndex Using an NVIDIA Tesla T4 GPU

Use Ollama + LlamaIndex + Kaggle GPU 🦙

Korkrid Kyle Akepanidtaworn
8 min readMar 29, 2024

Ollama Series’s Articles

Introducing Kaggle GPU

In my previous article, I explained how you can run LLMs models remotely at no cost using Google Colab. This time, I want to shift the focus to Kaggle, a data science competition platform and online community of data scientists and machine learning practitioners under Google. As many of you may already know, Kaggle provides notebook editors with free access to NVIDIA TESLA P100 GPUs. These GPUs are valuable for training deep learning models, although they don’t accelerate most other workflows (i.e., libraries like pandas and scikit-learn don’t benefit from GPU access). You can utilize up to 30 hours per week of GPU, with individual sessions running for up to 9 hours.

Here are some tips and tricks to get the most of your GPU usage on Kaggle:

  1. Only turn on the GPU if you plan on using the GPU: GPUs are only helpful if you are using code that takes advantage of GPU-accelerated libraries (e.g. TensorFlow, PyTorch, etc). But…

--

--

Korkrid Kyle Akepanidtaworn

AI Specialized CSA @ Microsoft | Enterprise AI, GenAI, LLM, LLamaIndex, ML | GenAITechLab Fellow, MScFE at WorldQuant, MSDS at CU Boulder