It took 2 hours and 58 minutes to deploy the ideal AI programming assistant, Claude Code, and configure the local self-hosted model

Deploy Claude Code (by Anthropic) and connect it to a self-hosted large language model (e.g., Qwen, Llama series, etc.), completely bypassing Anthropic's official API, enabling secure offline/intranet development assistance.

Preface
🔧 1. Install the Claude Code CLI
⚙️ 2. Global Configuration File Setup
💻 3. VS Code Extension Integration
⚠️ 4. Common Issues and Solutions
✅ 5. Summary and Recommendations

Preface

Introduction to Claude Code

Claude Code is Anthropic’s intelligent programming assistant that supports code understanding, generation, debugging, and refactoring.
Through its OpenAI-compatible API interface, Claude Code can seamlessly integrate with any locally hosted LLM service that supports this protocol (e.g., llama.cpp, vLLM, Ollama, etc.)—without relying on Anthropic’s official API.

📖 Official documentation: https://code.claude.com/docs

Self-Hosted Large Language Models

The previous article, “Outperforming 235B-parameter models: Single-GPU private deployment of OpenClaw,” described how to deploy a local LLM service using llama.cpp. This guide uses that setup as the backend LLM API.

🔧 1. Install the Claude Code CLI

Method 1: via npm (ideal for developers/debugging)

npm install -g @anthropic-ai/claude-code

Method 2: Official native installer (recommended for production)

# macOS / Linux / WSL
curl -fsSL https://claude.ai/install.sh | bash

# Windows (PowerShell)
irm https://claude.ai/install.ps1 | iex

✅ Verify installation:

claude --version

⚙️ 2. Global Configuration File Setup

Claude Code identifies the backend model service through environment variables. The following three settings are required:

ANTHROPIC_BASE_URL must point to an OpenAI-compatible API endpoint (e.g., http://localhost:8000/v1)
ANTHROPIC_AUTH_TOKEN: If authentication is not required by your service, set it to any placeholder value (e.g., "not-needed")
ANTHROPIC_MODEL: Must exactly match the model ID registered in your local LLM service

Configuration file path

Platform	Path
Linux/macOS	`~/.claude/settings.json`
Windows	`%USERPROFILE%\.claude\settings.json`

Example configuration (`settings.json`)

{
  "env": {
    "ANTHROPIC_AUTH_TOKEN": "not-needed",
    "ANTHROPIC_BASE_URL": "http://10.0.0.1:8001",
    "ANTHROPIC_MODEL": "Qwen3.5-35B-A3B-UD-Q4_K_M",
    "ANTHROPIC_SMALL_FAST_MODEL": "Qwen3.5-35B-A3B-UD-Q4_K_M",
    "API_TIMEOUT_MS": "600000",
    "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": 80,
    "CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS": 1,
    "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": 1
  }
}

ℹ️ Parameter explanations:

ANTHROPIC_BASE_URL: OpenAI-compatible API address of your local LLM service

ANTHROPIC_MODEL: Model name (must match the --model_alias or actual loaded model name on the server)

CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=80: Automatically compresses conversation history when context usage reaches 80%

For more options, see: Official Settings Documentation

Launch the CLI

Run in your project directory:

claude

CLI Interface Screenshot

💻 3. VS Code Extension Integration

1. Install the extension

Go to the VS Code Marketplace and install:
👉 Claude Code for VS Code

2. Extension configuration

💡 Note: The Claude Code for VS Code extension does NOT read ~/.claude/settings.json. You must configure it separately here!

Go to Settings → Extensions → Claude Code → Edit in settings.json, and add the following:

{
  "claudeCode.selectedModel": "Qwen3.5-35B-A3B-UD-Q4_K_M",
  "claudeCode.allowDangerouslySkipPermissions": true,
  "claudeCode.disableLoginPrompt": true,
  "claudeCode.preferredLocation": "panel",
  "claudeCode.environmentVariables": [
    {
      "name": "ANTHROPIC_AUTH_TOKEN",
      "value": "not-needed"
    },
    {
      "name": "ANTHROPIC_BASE_URL",
      "value": "http://10.0.0.1:8001/v1"
    }
  ]
}

⚠️ Security note:

allowDangerouslySkipPermissions: true allows AI to automatically modify files—only enable in trusted intranet environments

In production, keep permission prompts enabled to prevent accidental overwrites

3. Use the extension

Sidebar panel: Click the Claude icon in the left activity bar
In-editor: Right-click code → “Ask Claude”

⚠️ 4. Common Issues and Solutions

Error example

request (128640 tokens) exceeds the available context size (128000 tokens)

Solutions

✅ Solution 1: Enable auto-compression (recommended)

Already configured via:

"CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": 80

Compression triggers at 80% context usage, but may still be insufficient for very large projects.

✅ Solution 2: Manually compact context

Type in the chat window:

/compact

Forces removal of non-essential history while preserving current code state.

✅ Solution 3: Increase model context window

When starting your local LLM service, explicitly specify a larger context, e.g.:

# llama.cpp example
./server -m ./models/qwen3-35b.Q4_K_M.gguf --ctx-size 131072

✅ Solution 4: Reduce input scope

Create a .claudeignore file in your project root to exclude irrelevant directories:
```
node_modules/
dist/
build/
venv/
*.log
*.bin
```
Avoid full-repo analysis in monolithic repositories

✅ 5. Summary and Recommendations

Aspect	Recommendation
Model selection	Prefer long-context models like Qwen3.5 or Llama-3-70B
Security	Deploy on intranet + disable external access + disable auto-file modification (unless trusted)
Performance	Use GPU acceleration (e.g., cuBLAS in llama.cpp) and quantized models (Q4_K_M offers good speed/accuracy balance)
Maintainability	Include `settings.json` and `.claudeignore` in your project template for consistency

🌐 Backend compatibility: Besides llama.cpp, you can also integrate with Ollama, vLLM, Text Generation WebUI, and other OpenAI-compatible backends.

📌 Final reminder: This solution completely bypasses Anthropic’s cloud services—all data remains local, meeting high-security and compliance requirements.
For automated deployment scripts or Dockerization, refer to community open-source projects.

✅ You now have everything needed to securely and efficiently use Claude Code in a private environment!

Agent AI Claude

john

The person is so lazy that he left nothing.

It took 2 hours and 58 minutes to deploy the ideal AI programming assistant, Claude Code, and configure the local self-hosted model

Table of Contents

Preface

Introduction to Claude Code

Self-Hosted Large Language Models

🔧 1. Install the Claude Code CLI

Method 1: via npm (ideal for developers/debugging)

Method 2: Official native installer (recommended for production)

⚙️ 2. Global Configuration File Setup

Configuration file path

Example configuration (`settings.json`)

Launch the CLI

💻 3. VS Code Extension Integration

1. Install the extension

2. Extension configuration

3. Use the extension

⚠️ 4. Common Issues and Solutions

Error example

Solutions

✅ Solution 1: Enable auto-compression (recommended)

✅ Solution 2: Manually compact context

✅ Solution 3: Increase model context window

✅ Solution 4: Reduce input scope

✅ 5. Summary and Recommendations

Article Comments（0）

It took 2 hours and 58 minutes to deploy the ideal AI programming assistant, Claude Code, and configure the local self-hosted model

Table of Contents

Preface

Introduction to Claude Code

Self-Hosted Large Language Models

🔧 1. Install the Claude Code CLI

Method 1: via npm (ideal for developers/debugging)

Method 2: Official native installer (recommended for production)

⚙️ 2. Global Configuration File Setup

Configuration file path

Example configuration (settings.json)

Launch the CLI

💻 3. VS Code Extension Integration

1. Install the extension

2. Extension configuration

3. Use the extension

⚠️ 4. Common Issues and Solutions

Error example

Solutions

✅ Solution 1: Enable auto-compression (recommended)

✅ Solution 2: Manually compact context

✅ Solution 3: Increase model context window

✅ Solution 4: Reduce input scope

✅ 5. Summary and Recommendations

Article Comments（0）

Example configuration (`settings.json`)