Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Auto detect text files and perform LF normalization
* text=auto

# Python files
*.py text eol=lf

# Markdown files
*.md text eol=lf

# JSON files
*.json text eol=lf

# YAML files
*.yml text eol=lf
*.yaml text eol=lf

# Shell scripts
*.sh text eol=lf

# Configuration files
*.toml text eol=lf
*.cfg text eol=lf
*.ini text eol=lf

# Keep Windows batch files with CRLF
*.bat text eol=crlf
*.cmd text eol=crlf
43 changes: 43 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
name: CI

on:
push:
branches: [master, qwen2.5-coder]
pull_request:
branches: [master, qwen2.5-coder]

jobs:
test:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4

- name: Set up Python 3.12
uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: 'pip'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e ".[dev]"

- name: Run linter
run: python -m ruff check .
continue-on-error: true

- name: Run tests with coverage
run: |
python -m pytest tests/ -v --tb=short --cov=src --cov-report=term-missing --cov-report=xml

- name: Upload coverage to Codecov
uses: codecov/codecov-action@v4
with:
token: ${{ secrets.CODECOV_TOKEN }}
files: ./coverage.xml
flags: unittests
name: ci-coverage
fail_ci_if_error: false
verbose: true
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ build/

# IDE
.vscode/
.pytest_cache/

# Jupyter
notebooks/.ipynb_checkpoints/
Expand Down
224 changes: 222 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,11 @@ Test Dataset + Model Predictions --> [benchmark.py] --> Metrics Report
- **`train_lora.py`** - LoRA fine-tuning using HuggingFace Trainer + PEFT. Supports
QLoRA (4-bit quantization) for training on 1-2 A100 GPUs.

- **`serve.py`** - FastAPI inference server that loads the fine-tuned model and
serves docstring generation via HTTP.
- **`serve.py`** - FastAPI inference server that uses ollama API to generate
docstrings. Supports multiple Qwen Coder models with model-specific configurations.

- **`models.py`** - Model configuration registry with sampling parameters for
Qwen 2.5 Coder and Qwen3 Coder variants.

### Evaluation (`src/evaluation/`)

Expand Down Expand Up @@ -87,6 +90,223 @@ python -m src.data.convert_seed \
--output-dir data/processed/python-method
```

## Serving

The FastAPI inference server provides HTTP endpoints for docstring generation using
ollama as the backend. The server uses a system prompt stored in
`src/training/prompts/system_prompt.md` to generate NumPy-style docstrings.

### Prerequisites

1. **Install ollama**: Make sure [ollama](https://ollama.ai/) is installed and running locally
2. **Pull a model**: Download one of the supported code models:
```bash
# Qwen 2.5 Coder (dense models)
ollama pull qwen2.5-coder:32b # Default, ~18GB Q4
ollama pull qwen2.5-coder:14b # Mid-size, ~8GB Q4
ollama pull qwen2.5-coder:7b # Fast, ~4GB Q4

# Qwen3 Coder (MoE model)
ollama pull qwen3-coder:30b-a3b # Best quality, ~18GB Q4, 256K context
```

### Starting the Server

Start the FastAPI server using uvicorn:

**Linux/macOS:**
```bash
# Using uvicorn directly
uvicorn src.training.serve:app --host 0.0.0.0 --port 8000

# Or run the module directly
python -m src.training.serve
```

**Windows (PowerShell):**
```powershell
uvicorn src.training.serve:app --host 0.0.0.0 --port 8000
```

The server will start on `http://localhost:8000` by default.

### Configuration

The server can be configured using environment variables:

- `OLLAMA_URL` - Ollama API endpoint (default: `http://localhost:11434/api/chat`)
- `OLLAMA_MODEL` - Model key or Ollama model name (default: `qwen2.5-coder-32b`)
- `REQUEST_TIMEOUT` - Request timeout in seconds (default: `120.0`)

**Linux/macOS:**
```bash
OLLAMA_MODEL=qwen3-coder-30b uvicorn src.training.serve:app --port 8000
```

**Windows (PowerShell):**
```powershell
$env:OLLAMA_MODEL="qwen3-coder-30b"; uvicorn src.training.serve:app --port 8000
```

**Windows (CMD):**
```cmd
set OLLAMA_MODEL=qwen3-coder-30b && uvicorn src.training.serve:app --port 8000
```

### Available Models

| Model Key | Ollama Model | Architecture | Memory (Q4) | Context | Description |
|-----------|--------------|--------------|-------------|---------|-------------|
| `qwen2.5-coder-32b` | `qwen2.5-coder:32b` | Dense | ~18GB | 32K | Default, balanced quality/speed |
| `qwen2.5-coder-14b` | `qwen2.5-coder:14b` | Dense | ~8GB | 32K | Mid-size, good performance |
| `qwen2.5-coder-7b` | `qwen2.5-coder:7b` | Dense | ~4GB | 32K | Fast inference |
| `qwen3-coder-30b` | `qwen3-coder:30b-a3b` | MoE | ~18GB | 256K | Best quality, 3.3B active params |

Each model has optimized sampling parameters:
- **Qwen 2.5 Coder**: temperature=0.7, top_p=0.9, top_k=40
- **Qwen3 Coder**: temperature=1.0, top_p=0.95, top_k=40 (per official recommendations)

### Model Selection

You can select a model in two ways:

1. **Environment variable** (applies to all requests):
```bash
OLLAMA_MODEL=qwen3-coder-30b uvicorn src.training.serve:app
```

2. **Per-request** (via API):
```bash
curl -X POST http://localhost:8000/generate \
-H "Content-Type: application/json" \
-d '{"code": "def add(x, y): return x + y", "model": "qwen3-coder-30b"}'
```

### List Available Models

**Via CLI:**
```bash
python scripts/run_ollama.py --list-models
```

**Via API:**
```bash
curl http://localhost:8000/models
```

### API Endpoints

#### Health Check

Check if the service is healthy and ollama is accessible:

```bash
curl http://localhost:8000/health
```

**Response (200 OK):**
```json
{
"status": "healthy",
"service": "ollama",
"active_model": "Qwen 2.5 Coder 32B",
"ollama_model": "qwen2.5-coder:32b"
}
```

**Response (503 Service Unavailable):**
```json
{
"detail": "Service unhealthy: ollama is not running or not accessible"
}
```

#### Generate Docstring

Generate a docstring for a Python function:

```bash
curl -X POST http://localhost:8000/generate \
-H "Content-Type: application/json" \
-d '{
"code": "def add(x, y):\n return x + y",
"max_new_tokens": 256
}'
```

**Request Body:**
- `code` (required): Python function code as a string
- `max_new_tokens` (optional): Maximum number of tokens to generate (uses model default if not specified)
- `model` (optional): Model key or Ollama model name to use for this request

**Response (200 OK):**
```json
{
"docstring": "\"\"\"Compute the sum of two numbers.\n\nParameters\n----------\nx : int\n First number.\ny : int\n Second number.\n\nReturns\n-------\nint\n Sum of x and y.\"\"\"",
"model": "qwen2.5-coder:32b"
}
```

**Response (500 Internal Server Error):**
```json
{
"detail": "Failed to generate docstring: <error message>"
}
```

#### List Models

Get available model configurations:

```bash
curl http://localhost:8000/models
```

**Response (200 OK):**
```json
{
"default": "qwen2.5-coder-32b",
"active": "qwen2.5-coder-32b",
"models": [
{
"key": "qwen2.5-coder-32b",
"name": "Qwen 2.5 Coder 32B",
"ollama_model": "qwen2.5-coder:32b",
"context_window": 32768,
"architecture": "dense",
"memory_q4": "~18GB",
"description": "Dense 32B model, good balance of quality and speed"
}
]
}
```

### CLI Tool

The CLI tool allows testing docstring generation directly:

```bash
# Use default model
python scripts/run_ollama.py --user "def add(x, y): return x + y"

# Use specific model by key
python scripts/run_ollama.py --model-key qwen3-coder-30b --user "def foo(): pass"

# Use raw Ollama model name
python scripts/run_ollama.py --model qwen2.5-coder:7b --user "def bar(): pass"

# List available models
python scripts/run_ollama.py --list-models
```

### Testing

Run the test suite to verify the API endpoints:

```bash
pytest tests/test_serve.py tests/test_models.py -v
```

## Dataset

The seed dataset comes from the [NeuralCodeSum](https://github.com/wasiahmad/NeuralCodeSum)
Expand Down
11 changes: 11 additions & 0 deletions codecov.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
comment:
layout: "reach,diff,flags,tree"
behavior: default
require_changes: false
require_base: no
require_head: yes

# Optional: configure thresholds or ignore patterns below
# coverage:
# precision: 2
# round: down
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,15 @@ dependencies = [
"safetensors",
"fastapi>=0.104.0",
"uvicorn>=0.24.0",
"requests>=2.31.0",
]

[project.optional-dependencies]
dev = [
"pytest>=7.0",
"pytest-cov>=4.0",
"ruff>=0.1.0",
"httpx>=0.24.0",
]

[tool.hatch.build.targets.wheel]
Expand Down
Loading
Loading