Contributing

Thanks for considering a contribution to SimpleVecDB; your help steadily improves this local-first vector database.

Getting Started

Prerequisites

Python 3.10+
uv (recommended) or pip
Git

Local Setup

git clone https://github.com/coderdayton/simplevecdb.git
cd simplevecdb

# Install dependencies with development tools
uv sync

# Or with pip
pip install -e ".[dev]"

Project Structure

simplevecdb/
├── src/simplevecdb/
│   ├── core.py              # Main VectorDB class
│   ├── types.py             # Document, DistanceStrategy types
│   ├── config.py            # Configuration management
│   ├── embeddings/
│   │   ├── models.py        # Local embedding models
│   │   └── server.py        # FastAPI embedding server
│   └── integrations/
│       ├── langchain.py     # LangChain VectorStore wrapper
│       └── llamaindex.py    # LlamaIndex VectorStore wrapper
├── tests/                   # Unit, integration and performance tests
├── examples/                # RAG notebooks, demos
└── docs/                    # Documentation

Development Workflow

Running Tests

# All tests
pytest

# With coverage
pytest --cov=simplevecdb

# Specific test file
pytest tests/unit/test_search.py

Code Style

Follow PEP 8 standards
Use type hints wherever possible (Python 3.10+ syntax: list[str] instead of List[str])
Run a linter (consider using ruff or black)

Making Changes

Create a feature branch

bash git checkout -b feat/your-feature-name

Make your changes and commit with clear messages

bash git commit -m "feat: add cool feature" # or fix:, docs:, etc.

Add/update tests for any new functionality
Run tests locally to ensure nothing breaks
Submit a pull request with a clear description

Areas for Contribution

High Priority

HNSW indexing: Faster approximate nearest neighbor search (waiting on sqlite-vec)
Advanced Metadata filtering: Complex WHERE clause support (OR, nested queries)
Documentation: Docstrings, guides, API docs

Medium Priority

Custom Quantization: Support for custom quantization tables/centroids
Performance benchmarks: Add more comprehensive benchmarks (1M+ vectors)
Integration tests: Expand test coverage for LangChain/LlamaIndex

Lower Priority

GUI: Desktop app (Tauri-based)
Encryption: SQLCipher integration
Analytics: Query performance monitoring

Testing Guidelines

Write tests for all new features
Ensure tests pass locally before submitting PR
Aim for >80% code coverage
Test edge cases (empty vectors, large datasets, etc.)

Example test structure:

def test_similarity_search_with_k():
    db = VectorDB(":memory:")
    collection = db.collection("default")
    collection.add_texts(["doc1", "doc2", "doc3"])
    results = collection.similarity_search("query", k=2)
    assert len(results) == 2
    assert all(isinstance(score, float) for _, score in results)

Documentation

Update docstrings for any API changes
Add examples in the examples/ directory for new features
Update README.md if adding major features
Use type hints to make APIs self-documenting

Performance Considerations

SimpleVecDB prioritizes simplicity over maximum performance
Benchmark large-scale operations (10k+ vectors)
Use NumPy efficiently for vector operations
Minimize database round-trips

Debugging

Enable verbose logging:

import logging
logging.basicConfig(level=logging.DEBUG)

Run the embedding server locally for testing:

simplevecdb-server
# Server runs at http://localhost:53287 by default

Submitting a Pull Request

Ensure all tests pass: pytest
Keep commits clean and focused
Write a clear PR description explaining:
What problem does it solve?
How does it work?
Any breaking changes?
Link any related issues
Be patient — we'll review as soon as we can!

Questions?

Open a GitHub issue for bugs or feature requests
Reach out to @coderdayton on GitHub
Check existing issues before filing a duplicate

Thank you for contributing! Every bit helps make SimpleVecDB better for everyone. 🚀