Private AI with Llama

Blog Private AI with Llama

Private AI with Llama

4th Dec 2024

LLM AI llama ollama linux homelab

This guide will walk you through the necessary steps to test a model like Llama on a local server using Linux and Ollama: we’ll download the desired Llama model, set up Ollama, and experiment with creating custom models by tweaking the configuration.

Llama Models and Their Requirements

The latest version of Llama is 3.2, and it is available in different versions depending on the intended use case, each with specific requirements for optimal functionality.

If you want to manually download Llama, you can find it here:

llama downloads

huggingface llama

Modelos actuales de llama y requisitos de hardware.	Models	Description	Requirements
`Llama 3.2 1B`	Lightweight and efficient model for use anywhere, including mobile devices or edge environments.	CPU	Multi-core processor
	Example use case: Customer service chatbots.	RAM	Minimum 16GB RAM
		GPU	NVIDIA RTX series (for optimal performance), at least 4 GB VRAM
		Storage	Sufficient space for model files (specific size not mentioned).
`Llama 3.2 3B`	Flexible, multilingual open model with superior reasoning and text/code generation capabilities.	CPU	Multi-core processor
	Example use case: Marketing campaign content generation, code creation, and debugging.	RAM	Minimum 16GB RAM
		GPU	NVIDIA RTX series (for optimal performance), at least 8 GB VRAM
		Storage	Sufficient space for model files (specific size not mentioned).
`Llama 3.2 11B Vision`	Next-generation open-source, multilingual, and multimodal model with a large open-source dataset.	CPU	High-performance processor with at least 16 cores (Recommended: AMD EPYC or Intel Xeon)
	Example use case: Visual product analysis and recommendation systems.	RAM	Minimum: 64GB; Recommended: 128GB or higher
		GPU	High-performance GPU with at least 22GB VRAM. Recommended: NVIDIA A100 (40GB) or A6000 (48GB). Multiple GPUs can be used in parallel for production.
		Storage	NVMe SSD with at least 100GB of free space (22GB required for the model).
`Llama 3.2 90B Vision`	Next-generation open-source, multilingual, and multimodal model with a massive open-source dataset.	CPU	Ultra-high-performance processor with at least 32 cores (Recommended: Latest-generation AMD EPYC or Intel Xeon).
	Example use case: Complex data analysis and strategic decision-making assistance.	RAM	Minimum: 256GB RAM; Recommended: 512GB or higher for optimal performance.
		GPU	Latest-generation GPU with at least 180GB VRAM to handle the full model load. Recommended: NVIDIA A100 with at least 80GB VRAM or higher. For inference: multiple lower-tier GPUs in parallel may be an option.
		Storage	NVMe SSD with at least 500GB of free space. Approximately 180GB required for storing the model alone.

Llama 3.2 Using Ollama

or this guide, we’ve used the latest version of Ubuntu 22.04, but if needed, Ollama is also available for Windows and Mac.

For more information about Ollama and its development, you can visit its repositorio en GitHub or in its web web.

Installing Llama with Ollama on Linux

1. Installing Ollama

The first step is to install Ollama, a command-line tool for managing and running language models. The installation is straightforward using the following command:

curl -fsSL https://ollama.com/install.sh | sh

Once the installation is complete, you can verify that Ollama is correctly installed by running the ollama command in the terminal, which will display the available options.

2. Downloading and Running the Llama Model

Before running the model, it’s important to note that Ollama requires your system or virtual machine to have approximately twice the RAM of the model’s size. This ensures the model runs smoothly. For very large models, refer to the requirements outlined in the previous section.

To run the Llama 3.2 model, use the following command in the terminal:

ollama run llama3.2

If you want to specify a particular version, you can use:

ollama run llama3.2:1b or ollama run llama3.2:3b

This will download the model (if you don’t already have it) and execute it in your working environment.

To test other models or different configurations, you can explore the options in their libreria

3. Creating a New Modelfile

Ollama allows you to create custom configurations for models using a file called Modelfile. In this file, you can specify the base model, configuration parameters, and personalization settings to create a unique experience.

To create a Modelfile, open a text editor like vi and write the following content:

FROM llama3.2:3b

# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1

# set the system prompt
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""

This file specifies that the base model is llama3.2:3b and configures the system so that the model responds as if it were Mario, the character from Super Mario Bros. It also adjusts the model's 'temperature,' which influences the creativity of the generated responses.

To save the file and create a new model with this configuration, use the following command:

ollama create mario -f Modelfile

4. Running the Custom Model

Once you’ve created the new model, you can test it by running the following command:

ollama run mario

This command will launch the configured model, allowing Mario to respond to your questions in his unique style.

Additional Resources

We will cover this on a different article but If you’d like to have a web interface to interact with Ollama, a great option would be to use open-webui on your server.

Private AI with Llama