Conversation
d65d1e6 to
704cf22
Compare
Signed-off-by: Jing Gong <jing.gong@memverge.com>
704cf22 to
e0e65b5
Compare
|
Wrong Helm release field casing causes empty rendered namespaces on most resources. |
Signed-off-by: Jing Gong <jing.gong@memverge.com>
|
Thank you @denny-zhao , i corrected them to |
There was a problem hiding this comment.
Pull request overview
This PR adds a comprehensive Helm chart for deploying the MemMachine stack on Kubernetes, addressing issue #1006. The chart provides a production-ready deployment configuration that supports both in-cluster and external database options, making it flexible for various deployment scenarios.
Changes:
- Added Helm chart structure under
deployments/helm/with Chart.yaml, values.yaml, templates, and comprehensive README - Implemented flexible deployment model supporting optional in-cluster PostgreSQL (with pgvector) and Neo4j, or external database connectivity
- Provided extensive documentation with usage examples for OpenAI, Ollama, and other LLM backends
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 19 comments.
Show a summary per file
| File | Description |
|---|---|
| deployments/helm/Chart.yaml | Helm chart metadata defining chart version 0.1.0 and app version v0.2.6 |
| deployments/helm/values.yaml | Default configuration values for storage, databases, and MemMachine app with LLM/embedder settings |
| deployments/helm/templates/secrets.yaml | Kubernetes secrets for PostgreSQL, Neo4j, and OpenAI API credentials |
| deployments/helm/templates/pvc.yaml | PersistentVolumeClaim definitions for Neo4j, PostgreSQL, and MemMachine data |
| deployments/helm/templates/postgres-deployment.yaml | PostgreSQL deployment with pgvector extension |
| deployments/helm/templates/postgres-service.yaml | ClusterIP service for internal PostgreSQL access |
| deployments/helm/templates/neo4j-deployment.yaml | Neo4j deployment with APOC and Graph Data Science plugins |
| deployments/helm/templates/neo4j-service.yaml | ClusterIP service exposing Neo4j Bolt, HTTP, and HTTPS ports |
| deployments/helm/templates/memmachine-deployment.yaml | Main MemMachine application deployment with init containers for database readiness |
| deployments/helm/templates/memmachine-service.yaml | NodePort service exposing MemMachine API externally |
| deployments/helm/templates/memmachine-configmaps.yaml | ConfigMaps for application configuration (configuration.yml) and environment variables (.env) |
| deployments/helm/README.md | Comprehensive documentation covering architecture, values reference, prerequisites, and usage examples |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| containers: | ||
| - name: memmachine | ||
| image: {{ .Values.memmachine.image }}:{{ .Values.memmachine.tag }} | ||
| imagePullPolicy: {{ .Values.memmachine.pullPolicy }} | ||
| ports: | ||
| - containerPort: 8080 | ||
| env: | ||
| - name: OPENAI_API_KEY | ||
| valueFrom: | ||
| secretKeyRef: | ||
| name: memmachine-secrets | ||
| key: OPENAI_API_KEY | ||
| - name: POSTGRES_PASSWORD | ||
| valueFrom: | ||
| secretKeyRef: | ||
| name: postgres-secret | ||
| key: POSTGRES_PASSWORD | ||
| - name: NEO4J_USER | ||
| valueFrom: | ||
| secretKeyRef: | ||
| name: neo4j-secret | ||
| key: NEO4J_USER | ||
| - name: NEO4J_PASSWORD | ||
| valueFrom: | ||
| secretKeyRef: | ||
| name: neo4j-secret | ||
| key: NEO4J_PASSWORD | ||
| - name: LOG_FILE | ||
| value: "/app/data/memmachine.log" | ||
| - name: MEMORY_CONFIG | ||
| value: {{ .Values.memmachine.config.memoryConfigPath }} | ||
| volumeMounts: | ||
| - name: config-volume | ||
| mountPath: /app/configuration.yml | ||
| subPath: configuration.yml | ||
| - name: env-volume | ||
| mountPath: /app/.env | ||
| subPath: .env | ||
| - name: data-volume | ||
| mountPath: /app/data |
There was a problem hiding this comment.
The memmachine deployment lacks readiness and liveness probes. Without these, Kubernetes cannot determine if the application is healthy and ready to serve traffic. Consider adding a readinessProbe that checks the /api/v2/health endpoint and a livenessProbe to ensure the pod is restarted if the application becomes unresponsive. This is especially important given the init containers only check TCP connectivity, not application readiness.
| - name: NEO4J_server_memory_heap_max__size | ||
| value: {{ .Values.neo4j.heap.max }} | ||
| - name: NEO4J_PLUGINS | ||
| value: '{{ .Values.neo4j.plugins | toJson }}' |
There was a problem hiding this comment.
The Neo4j deployment lacks readiness and liveness probes. Neo4j can take significant time to start, especially when downloading plugins (APOC, Graph Data Science). Without a readiness probe, the init container in the memmachine deployment may succeed before Neo4j is actually ready to serve queries. Add a readinessProbe that checks the Bolt endpoint or uses an HTTP probe against the /db/data/ endpoint.
| value: '{{ .Values.neo4j.plugins | toJson }}' | |
| value: '{{ .Values.neo4j.plugins | toJson }}' | |
| readinessProbe: | |
| tcpSocket: | |
| port: 7687 | |
| initialDelaySeconds: 60 | |
| periodSeconds: 10 | |
| timeoutSeconds: 2 | |
| failureThreshold: 6 | |
| livenessProbe: | |
| tcpSocket: | |
| port: 7687 | |
| initialDelaySeconds: 120 | |
| periodSeconds: 20 | |
| timeoutSeconds: 2 | |
| failureThreshold: 3 |
| containers: | ||
| - name: memmachine | ||
| image: {{ .Values.memmachine.image }}:{{ .Values.memmachine.tag }} | ||
| imagePullPolicy: {{ .Values.memmachine.pullPolicy }} | ||
| ports: | ||
| - containerPort: 8080 | ||
| env: | ||
| - name: OPENAI_API_KEY | ||
| valueFrom: | ||
| secretKeyRef: | ||
| name: memmachine-secrets | ||
| key: OPENAI_API_KEY | ||
| - name: POSTGRES_PASSWORD | ||
| valueFrom: | ||
| secretKeyRef: | ||
| name: postgres-secret | ||
| key: POSTGRES_PASSWORD | ||
| - name: NEO4J_USER | ||
| valueFrom: | ||
| secretKeyRef: | ||
| name: neo4j-secret | ||
| key: NEO4J_USER | ||
| - name: NEO4J_PASSWORD | ||
| valueFrom: | ||
| secretKeyRef: | ||
| name: neo4j-secret | ||
| key: NEO4J_PASSWORD | ||
| - name: LOG_FILE | ||
| value: "/app/data/memmachine.log" | ||
| - name: MEMORY_CONFIG | ||
| value: {{ .Values.memmachine.config.memoryConfigPath }} | ||
| volumeMounts: | ||
| - name: config-volume | ||
| mountPath: /app/configuration.yml | ||
| subPath: configuration.yml | ||
| - name: env-volume | ||
| mountPath: /app/.env | ||
| subPath: .env | ||
| - name: data-volume | ||
| mountPath: /app/data |
There was a problem hiding this comment.
The memmachine deployment lacks resource requests and limits. Without these, the pod can consume unlimited CPU and memory, potentially affecting other workloads in the cluster. Additionally, Kubernetes cannot make informed scheduling decisions without resource requests. Consider adding configurable resource limits via values.yaml, with reasonable defaults (e.g., requests: cpu: 500m, memory: 512Mi; limits: cpu: 2000m, memory: 2Gi).
| image: busybox | ||
| command: ['sh', '-c', 'until nc -zv {{ .Values.postgres.host }} {{ .Values.postgres.port }}; do echo waiting for postgres; sleep 2; done'] | ||
|
|
||
| - name: wait-for-neo4j | ||
| image: busybox |
There was a problem hiding this comment.
The init containers use the "busybox" image without specifying a version tag. This can lead to unpredictable behavior if the image changes. Additionally, the "busybox" image lacks the "nc" command in recent versions. Consider using "busybox:1.36" or a more reliable image like "alpine:3.18" for the init containers to ensure consistent behavior across deployments.
| image: busybox | |
| command: ['sh', '-c', 'until nc -zv {{ .Values.postgres.host }} {{ .Values.postgres.port }}; do echo waiting for postgres; sleep 2; done'] | |
| - name: wait-for-neo4j | |
| image: busybox | |
| image: busybox:1.36 | |
| command: ['sh', '-c', 'until nc -zv {{ .Values.postgres.host }} {{ .Values.postgres.port }}; do echo waiting for postgres; sleep 2; done'] | |
| - name: wait-for-neo4j | |
| image: busybox:1.36 |
| ### Via deployment manager (multi-tenant setups) | ||
|
|
||
| The `deploy_cli.py` script in the parent directory handles NodePort allocation, namespace creation, and service registry automatically: | ||
|
|
||
| ```bash | ||
| # Deploy with Ollama backend | ||
| python deploy_cli.py deploy-memmachine-ollama ollama-30000 | ||
|
|
||
| # Deploy with OpenAI backend | ||
| python deploy_cli.py deploy-memmachine-openai --openaiApiKey sk-... | ||
|
|
||
| # Deploy with vLLM backend | ||
| python deploy_cli.py deploy-memmachine-vllm vllm-chat-31000 vllm-embedder-31003 | ||
| ``` | ||
|
|
||
| --- | ||
|
|
There was a problem hiding this comment.
The README references a "deploy_cli.py" script in the parent directory, but this script does not exist in the repository. Either this script needs to be added as part of this PR, or this section should be removed from the README to avoid confusing users who follow the documentation.
| ### Via deployment manager (multi-tenant setups) | |
| The `deploy_cli.py` script in the parent directory handles NodePort allocation, namespace creation, and service registry automatically: | |
| ```bash | |
| # Deploy with Ollama backend | |
| python deploy_cli.py deploy-memmachine-ollama ollama-30000 | |
| # Deploy with OpenAI backend | |
| python deploy_cli.py deploy-memmachine-openai --openaiApiKey sk-... | |
| # Deploy with vLLM backend | |
| python deploy_cli.py deploy-memmachine-vllm vllm-chat-31000 vllm-embedder-31003 | |
| ``` | |
| --- |
| configuration.yml: | | ||
| logging: | ||
| path: /app/data/memmachine.log | ||
| level: info |
There was a problem hiding this comment.
The logging level in configuration.yml is hardcoded to "info" (line 25) while FAST_MCP_LOG_LEVEL uses the configurable value from values.yaml (line 114). For consistency and flexibility, consider making the configuration.yml logging level also configurable via values.yaml using {{ .Values.memmachine.config.loggingLevel | lower }} to match the environment variable pattern.
| level: info | |
| level: {{ .Values.memmachine.config.loggingLevel | lower }} |
| name: postgres-secret | ||
| key: POSTGRES_PASSWORD | ||
| - name: POSTGRES_INITDB_ARGS | ||
| value: "--encoding=UTF-8 --lc-collate=C --lc-ctype=C" |
There was a problem hiding this comment.
The PostgreSQL deployment lacks readiness and liveness probes. Without these, the init containers in the memmachine deployment may succeed (TCP connection works) before PostgreSQL is actually ready to accept queries. Add a readinessProbe using pg_isready or a simple SQL query to ensure the database is fully initialized before dependent services connect.
| value: "--encoding=UTF-8 --lc-collate=C --lc-ctype=C" | |
| value: "--encoding=UTF-8 --lc-collate=C --lc-ctype=C" | |
| readinessProbe: | |
| exec: | |
| command: | |
| - /bin/sh | |
| - -c | |
| - pg_isready -U "$POSTGRES_USER" -d "$POSTGRES_DB" | |
| initialDelaySeconds: 10 | |
| periodSeconds: 5 | |
| timeoutSeconds: 5 | |
| failureThreshold: 6 | |
| livenessProbe: | |
| exec: | |
| command: | |
| - /bin/sh | |
| - -c | |
| - pg_isready -U "$POSTGRES_USER" -d "$POSTGRES_DB" | |
| initialDelaySeconds: 30 | |
| periodSeconds: 10 | |
| timeoutSeconds: 5 | |
| failureThreshold: 3 |
| spec: | ||
| accessModes: | ||
| - ReadWriteMany | ||
| resources: | ||
| requests: | ||
| storage: {{ .Values.pvcSize }} | ||
| storageClassName: {{ .Values.storageClass }} | ||
| {{- end }} | ||
| {{- if .Values.postgres.enabled }} | ||
| --- | ||
| apiVersion: v1 | ||
| kind: PersistentVolumeClaim | ||
| metadata: | ||
| name: postgres-pvc | ||
| namespace: {{ $namespace }} | ||
| spec: | ||
| accessModes: | ||
| - ReadWriteMany | ||
| resources: | ||
| requests: | ||
| storage: {{ .Values.pvcSize }} | ||
| storageClassName: {{ .Values.storageClass }} | ||
| {{- end }} | ||
| --- | ||
| apiVersion: v1 | ||
| kind: PersistentVolumeClaim | ||
| metadata: | ||
| name: memmachine-pvc | ||
| namespace: {{ $namespace }} | ||
| spec: | ||
| accessModes: | ||
| - ReadWriteMany | ||
| resources: | ||
| requests: | ||
| storage: {{ .Values.pvcSize }} | ||
| storageClassName: {{ .Values.storageClass }} |
There was a problem hiding this comment.
All PVCs use ReadWriteMany (RWX) access mode, but this is not necessary for single-replica deployments. PostgreSQL and Neo4j both run with replicas: 1 and only need ReadWriteOnce (RWO) access. RWX requires more expensive storage classes (like NFS) and can limit deployment options. Consider using RWO for database PVCs and only using RWX for memmachine-pvc if there's a specific need for multi-pod access. This would make the chart compatible with more storage providers.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Steve Scargall <37674041+sscargal@users.noreply.github.com>
Purpose of the change
Add a Helm chart under deployments/helm/ to deploy the full MemMachine stack (app, PostgreSQL with pgvector, Neo4j) on Kubernetes.
Support optional external databases if needed.
Description
Please include a summary of the change and the issue it addresses. Please also include relevant motivation and context. List any dependencies that are required for this change.
Fixes/Closes
Fixes #1006
Type of change
[Please delete options that are not relevant.]
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration.
[Please delete options that are not relevant.]
Test Results: [Attach logs, screenshots, or relevant output]
memmachine + postgres + neo4j
memmachine only
enabled: falseto skip in-cluster deployment/service/pvcChecklist
[Please delete options that are not relevant.]
Maintainer Checklist
Screenshots/Gifs
[If applicable, add screenshots or GIFs that show the changes in action. This is especially helpful for API responses. Otherwise, delete this section or type "N/A".]
Further comments
[Add any other relevant information here, such as potential side effects, future considerations, or any specific questions for the reviewer. Otherwise, type "None".]