Thanks for considering a contribution to SimpleVecDB; your help steadily improves this local-first vector database.
Getting Started
Prerequisites
- Python 3.10+
uv(recommended) orpip- Git
Local Setup
git clone https://github.com/coderdayton/simplevecdb.git
cd simplevecdb
# Install dependencies with development tools
uv sync
# Or with pip
pip install -e ".[dev]"
Project Structure
simplevecdb/
├── src/simplevecdb/
│ ├── core.py # Main VectorDB class
│ ├── types.py # Document, DistanceStrategy types
│ ├── config.py # Configuration management
│ ├── embeddings/
│ │ ├── models.py # Local embedding models
│ │ └── server.py # FastAPI embedding server
│ └── integrations/
│ ├── langchain.py # LangChain VectorStore wrapper
│ └── llamaindex.py # LlamaIndex VectorStore wrapper
├── tests/ # Unit, integration and performance tests
├── examples/ # RAG notebooks, demos
└── docs/ # Documentation
Development Workflow
Running Tests
# All tests
pytest
# With coverage
pytest --cov=simplevecdb
# Specific test file
pytest tests/unit/test_search.py
Code Style
- Follow PEP 8 standards
- Use type hints wherever possible (Python 3.10+ syntax:
list[str]instead ofList[str]) - Run a linter (consider using
rufforblack)
Making Changes
- Create a feature branch
bash
git checkout -b feat/your-feature-name
- Make your changes and commit with clear messages
bash
git commit -m "feat: add cool feature" # or fix:, docs:, etc.
-
Add/update tests for any new functionality
-
Run tests locally to ensure nothing breaks
-
Submit a pull request with a clear description
Areas for Contribution
High Priority
- HNSW indexing: Faster approximate nearest neighbor search (waiting on sqlite-vec)
- Advanced Metadata filtering: Complex WHERE clause support (OR, nested queries)
- Documentation: Docstrings, guides, API docs
Medium Priority
- Custom Quantization: Support for custom quantization tables/centroids
- Performance benchmarks: Add more comprehensive benchmarks (1M+ vectors)
- Integration tests: Expand test coverage for LangChain/LlamaIndex
Lower Priority
- GUI: Desktop app (Tauri-based)
- Encryption: SQLCipher integration
- Analytics: Query performance monitoring
Testing Guidelines
- Write tests for all new features
- Ensure tests pass locally before submitting PR
- Aim for >80% code coverage
- Test edge cases (empty vectors, large datasets, etc.)
Example test structure:
def test_similarity_search_with_k():
db = VectorDB(":memory:")
collection = db.collection("default")
collection.add_texts(["doc1", "doc2", "doc3"])
results = collection.similarity_search("query", k=2)
assert len(results) == 2
assert all(isinstance(score, float) for _, score in results)
Documentation
- Update docstrings for any API changes
- Add examples in the
examples/directory for new features - Update README.md if adding major features
- Use type hints to make APIs self-documenting
Performance Considerations
- SimpleVecDB prioritizes simplicity over maximum performance
- Benchmark large-scale operations (10k+ vectors)
- Use NumPy efficiently for vector operations
- Minimize database round-trips
Debugging
Enable verbose logging:
import logging
logging.basicConfig(level=logging.DEBUG)
Run the embedding server locally for testing:
simplevecdb-server
# Server runs at http://localhost:53287 by default
Submitting a Pull Request
- Ensure all tests pass:
pytest - Keep commits clean and focused
- Write a clear PR description explaining:
- What problem does it solve?
- How does it work?
- Any breaking changes?
- Link any related issues
- Be patient — we'll review as soon as we can!
Questions?
- Open a GitHub issue for bugs or feature requests
- Reach out to @coderdayton on GitHub
- Check existing issues before filing a duplicate
Thank you for contributing! Every bit helps make SimpleVecDB better for everyone. 🚀