This guide will walk you through the necessary steps to test a model like Llama on a local server using Linux and Ollama: we’ll download the desired Llama model, set up Ollama, and experiment with creating custom models by tweaking the configuration.
The latest version of Llama is 3.2, and it is available in different versions depending on the intended use case, each with specific requirements for optimal functionality.
If you want to manually download Llama, you can find it here:
| Modelos actuales de llama y requisitos de hardware. | Models | Description | Requirements | |
|---|---|---|---|---|
Llama 3.2 1B |
Lightweight and efficient model for use anywhere, including mobile devices or edge environments. | CPU | Multi-core processor | |
| Example use case: Customer service chatbots. | RAM | Minimum 16GB RAM | ||
| GPU | NVIDIA RTX series (for optimal performance), at least 4 GB VRAM | |||
| Storage | Sufficient space for model files (specific size not mentioned). | |||
Llama 3.2 3B |
Flexible, multilingual open model with superior reasoning and text/code generation capabilities. | CPU | Multi-core processor | |
| Example use case: Marketing campaign content generation, code creation, and debugging. | RAM | Minimum 16GB RAM | ||
| GPU | NVIDIA RTX series (for optimal performance), at least 8 GB VRAM | |||
| Storage | Sufficient space for model files (specific size not mentioned). | |||
Llama 3.2 11B Vision |
Next-generation open-source, multilingual, and multimodal model with a large open-source dataset. | CPU | High-performance processor with at least 16 cores (Recommended: AMD EPYC or Intel Xeon) | |
| Example use case: Visual product analysis and recommendation systems. | RAM | Minimum: 64GB; Recommended: 128GB or higher | ||
| GPU | High-performance GPU with at least 22GB VRAM. Recommended: NVIDIA A100 (40GB) or A6000 (48GB). Multiple GPUs can be used in parallel for production. | |||
| Storage | NVMe SSD with at least 100GB of free space (22GB required for the model). | |||
Llama 3.2 90B Vision |
Next-generation open-source, multilingual, and multimodal model with a massive open-source dataset. | CPU | Ultra-high-performance processor with at least 32 cores (Recommended: Latest-generation AMD EPYC or Intel Xeon). | |
| Example use case: Complex data analysis and strategic decision-making assistance. | RAM | Minimum: 256GB RAM; Recommended: 512GB or higher for optimal performance. | ||
| GPU | Latest-generation GPU with at least 180GB VRAM to handle the full model load. Recommended: NVIDIA A100 with at least 80GB VRAM or higher. For inference: multiple lower-tier GPUs in parallel may be an option. | |||
| Storage | NVMe SSD with at least 500GB of free space. Approximately 180GB required for storing the model alone. |
or this guide, we’ve used the latest version of Ubuntu 22.04, but if needed, Ollama is also available for Windows and Mac.
For more information about Ollama and its development, you can visit its repositorio en GitHub or in its web web.
The first step is to install Ollama, a command-line tool for managing and running language models. The installation is straightforward using the following command:
curl -fsSL https://ollama.com/install.sh | sh
Once the installation is complete, you can verify that Ollama is correctly installed by running the ollama command in the terminal, which will display the available options.
Before running the model, it’s important to note that Ollama requires your system or virtual machine to have approximately twice the RAM of the model’s size. This ensures the model runs smoothly. For very large models, refer to the requirements outlined in the previous section.
To run the Llama 3.2 model, use the following command in the terminal:
ollama run llama3.2
If you want to specify a particular version, you can use:
ollama run llama3.2:1b or ollama run llama3.2:3b
This will download the model (if you don’t already have it) and execute it in your working environment.
To test other models or different configurations, you can explore the options in their libreria
Ollama allows you to create custom configurations for models using a file called Modelfile. In this file, you can specify the base model, configuration parameters, and personalization settings to create a unique experience.
To create a Modelfile, open a text editor like vi and write the following content:
FROM llama3.2:3b
# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# set the system prompt
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""
This file specifies that the base model is llama3.2:3b and configures the system so that the model responds as if it were Mario, the character from Super Mario Bros. It also adjusts the model's 'temperature,' which influences the creativity of the generated responses.
To save the file and create a new model with this configuration, use the following command:
ollama create mario -f Modelfile
Once you’ve created the new model, you can test it by running the following command:
ollama run mario
This command will launch the configured model, allowing Mario to respond to your questions in his unique style.
We will cover this on a different article but If you’d like to have a web interface to interact with Ollama, a great option would be to use open-webui on your server.