Setup and Run DeepSeek R1 in Local Environment

PublishedFebruary 04, 2025

DeepSeek R1 is a powerful conversational AI model, and integrating it with Chainlit UI allows to create an interactive and user friendly interface. Using Ollama, allows to run LLMs locally, making it simple to deploy and interact with models like DeepSeek R1.

Install Ollama

To download Ollama, visit the official Ollama GitHub Repository and select the version compatible with your operating system (Windows, macOS, or Linux). Alternatively, you can pull the Docker image if preferred.

For Linux, open the terminal and run the command:

curl -fsSL https://ollama.com/install.sh | sh

For macOS and Windows, simply run the installer and follow the on-screen instructions to complete the installation.

Run DeepSeek R1 using Ollama

Ollama simplifies model management. To download choose the appropriate DeepSeek version based on your local environment. I am able to run the 8b model on an Apple M1 Pro with 16GB of memory. To Setup DeekSeek R1, run:

ollama pull deepseek-r1:8b

This command fetches the DeepSeek R1 model and prepares it for local use.

To start an interactive session with DeepSeek R1 using the following command:

ollama run deepseek-r1:8b

This will launch the model, allowing you to interact with it directly in your terminal. Once the model is running, you can start typing prompts and receive responses in real-time.

Integrate Chainlit UI

Install Chainlit using pip. Make sure you have Python installed.

pip install chainlit

Create a new Python script, for example, app.py, and use it to connect Chainlit with DeepSeek R1 via Ollama. For sample example, refer to the Chainlit Cookbook on GitHub. The cookbook provides sample code and additional guidance to help you integrate Chainlit with DeepSeek R1 via Ollama. To run below code, you also need to install the OpenAI package, using the same way you did for Chainlit.

app.py

import time
from openai import AsyncOpenAI

import chainlit as cl

client = AsyncOpenAI(
  api_key="ollama",
  base_url="http://localhost:11434/v1/"
)

@cl.on_message
async def on_message(msg: cl.Message):
  start = time.time()
  stream = await client.chat.completions.create(
    model="deepseek-r1:8b",
    messages=[
      {"role": "system", "content": "You are an helpful assistant"},
      *cl.chat_context.to_openai()
    ],
    stream=True
  )

  thinking = False
    
  # Streaming the thinking
  async with cl.Step(name="Thinking") as thinking_step:
    final_answer = cl.Message(content="")

    async for chunk in stream:
      delta = chunk.choices[0].delta

       if delta.content == "<think>":
         thinking = True
         continue
                
       if delta.content == "</think>":
         thinking = False
         thought_for = round(time.time() - start)
         thinking_step.name = f"Thought for {thought_for}s"
         await thinking_step.update()
         continue
            
       if thinking:
         await thinking_step.stream_token(delta.content)
       else:
         await final_answer.stream_token(delta.content)
                
  await final_answer.send()

Once the script is ready, you can run the Chainlit application using the following command:

chainlit run app.py -w

This will launch the Chainlit interface in your browser at http://localhost:8000.

Integrate with Custom Applications

Ollama provides an API for integrating the model into custom applications. Use the following steps to set it up:

Start the Ollama Server: ollama serve
Execute the API: Send HTTP requests to the Ollama server (default port: 11434) to interact with DeepSeek R1 programmatically. For example, using curl:
curl http://localhost:11434/api/generate -d '{ "model": "deepseek-r1", "prompt": "Explain quantum computing in simple terms." }'

By following these steps, you can easily install Ollama and run DeepSeek R1 locally and integrate it with custom applications.