Skip to content

Comments

support helm chart deployment#1137

Open
jgong wants to merge 3 commits intoMemMachine:mainfrom
jgong:deploy-helm-charts
Open

support helm chart deployment#1137
jgong wants to merge 3 commits intoMemMachine:mainfrom
jgong:deploy-helm-charts

Conversation

@jgong
Copy link
Contributor

@jgong jgong commented Feb 24, 2026

Purpose of the change

Add a Helm chart under deployments/helm/ to deploy the full MemMachine stack (app, PostgreSQL with pgvector, Neo4j) on Kubernetes.
Support optional external databases if needed.

Description

Please include a summary of the change and the issue it addresses. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes/Closes

Fixes #1006

Type of change

[Please delete options that are not relevant.]

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration.

[Please delete options that are not relevant.]

  • End-to-end Test
  • Manual verification (list step-by-step instructions)

Test Results: [Attach logs, screenshots, or relevant output]

memmachine + postgres + neo4j

  1. Deploy memmachine with default values (set OPENAI_API_KEY in values.yaml before deployment)
(memmachine) jinggong@Jings-MBP helm % pwd
/Users/jinggong/memmachine/MemMachine/deployments/helm
(memmachine) jinggong@Jings-MBP helm % helm upgrade --install memmachine . --namespace memmachine --
create-namespace
Release "memmachine" does not exist. Installing it now.
NAME: memmachine
LAST DEPLOYED: Tue Feb 24 01:05:48 2026
NAMESPACE: memmachine
STATUS: deployed
REVISION: 1
DESCRIPTION: Install complete
TEST SUITE: None
  1. Check helm status
(memmachine) jinggong@Jings-MBP helm % helm status memmachine --namespace memmachine
NAME: memmachine
LAST DEPLOYED: Tue Feb 24 01:05:48 2026
NAMESPACE: memmachine
STATUS: deployed
REVISION: 1
DESCRIPTION: Install complete
RESOURCES:
==> v1/ConfigMap
NAME                DATA   AGE
memmachine-config   1      113s
memmachine-env-config   1     113s

==> v1/PersistentVolumeClaim
NAME        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
neo4j-pvc   Bound    pvc-228a7a61-653a-4d43-908f-389239a26e4c   5Gi        RWX            nfs-client     <unset>                 113s
postgres-pvc   Bound   pvc-b8160d62-94ef-4856-8d7b-6994d67e21b3   5Gi   RWX   nfs-client   <unset>   113s
memmachine-pvc   Bound   pvc-a0aba3c1-b68b-44f8-8103-867a51a7865a   5Gi   RWX   nfs-client   <unset>   113s

==> v1/Service
NAME                 TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
memmachine-service   NodePort   10.105.250.14   <none>        80:31001/TCP   113s
memmachine-neo4j   ClusterIP   10.104.76.114   <none>   7474/TCP,7473/TCP,7687/TCP   113s
memmachine-postgres   ClusterIP   10.104.153.81   <none>   5432/TCP   113s

==> v1/Deployment
NAME         READY   UP-TO-DATE   AVAILABLE   AGE
memmachine   1/1     1            1           113s
neo4j   1/1   1     1     113s
memmachine-postgres   1/1   1     1     113s

==> v1/Pod(related)
NAME                          READY   STATUS    RESTARTS   AGE
memmachine-5977cd6b95-cznz7   1/1     Running   0          113s
neo4j-694c45dff4-vtp5b   1/1   Running   0     113s
memmachine-postgres-5c49d7bf95-v7jdx   1/1   Running   0     113s

==> v1/Secret
NAME              TYPE     DATA   AGE
postgres-secret   Opaque   1      113s
memmachine-secrets   Opaque   1     113s
neo4j-secret   Opaque   3     113s


TEST SUITE: None
  1. Check pod status
jinggong@Jings-MBP ~ % kubectl get pods --namespace memmachine -o wide
NAME                                   READY   STATUS    RESTARTS   AGE     IP              NODE                          NOMINATED NODE   READINESS GATES
memmachine-5977cd6b95-cznz7            1/1     Running   0          3m20s   10.244.29.214   vmnet4-202.eng.memverge.com   <none>           <none>
memmachine-postgres-5c49d7bf95-v7jdx   1/1     Running   0          3m20s   10.244.25.46    vmnet4-205.eng.memverge.com   <none>           <none>
neo4j-694c45dff4-vtp5b                 1/1     Running   0          3m20s   10.244.184.20   vmnet4-204.eng.memverge.com   <none>           <none>
  1. Check memmachine app health
jinggong@Jings-MBP ~ % curl http://vmnet4-200.eng.memverge.com:31001/api/v2/health
{"status":"healthy","service":"memmachine","version":"0.2.6"}

memmachine only

  1. configure external postgres and neo4j, set enabled: false to skip in-cluster deployment/service/pvc
# Sample neo4j config
neo4j:
  enabled: false              # false = skip in-cluster Deployment/Service/PVC, use external host
  host: 10.4.254.200     # internal service name (default); override when enabled: false
  port: 7687                 # default Bolt port; override if external uses different port
  image: neo4j:5.23-community
  auth: neo4j/memverge
  user: neo4j
  password: memverge

# Sample postgres config
postgres:
  enabled: false              # false = skip in-cluster Deployment/Service/PVC, use external host
  host: 10.4.254.200  # internal service name (default); override when enabled: false
  port: 5432                 # default Postgres port; override if external uses different port
  image: pgvector/pgvector:pg16
  user: memmachine-ext
  password: memverge
  1. Deploy memmachine and check helm status (there is no postgres or neo4j deployment/service/pvc)
(memmachine) jinggong@Jings-MBP helm % helm upgrade --install memmachine . -n memmachine --create-namespace 
Release "memmachine" has been upgraded. Happy Helming!
NAME: memmachine
LAST DEPLOYED: Tue Feb 24 01:57:50 2026
NAMESPACE: memmachine
STATUS: deployed
REVISION: 2
DESCRIPTION: Upgrade complete
TEST SUITE: None

(memmachine) jinggong@Jings-MBP helm % helm status memmachine --namespace memmachine
NAME: memmachine
LAST DEPLOYED: Tue Feb 24 01:57:50 2026
NAMESPACE: memmachine
STATUS: deployed
REVISION: 2
DESCRIPTION: Upgrade complete
RESOURCES:
==> v1/Pod(related)
NAME                          READY   STATUS     RESTARTS   AGE
memmachine-6d7b4475bb-tbtk4   0/1     Init:1/2   0          7m32s

==> v1/Secret
NAME              TYPE     DATA   AGE
postgres-secret   Opaque   1      7m32s
memmachine-secrets   Opaque   1     7m33s
neo4j-secret   Opaque   3     7m33s

==> v1/ConfigMap
NAME                DATA   AGE
memmachine-config   1      7m33s
memmachine-env-config   1     7m32s

==> v1/PersistentVolumeClaim
NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
memmachine-pvc   Bound    pvc-a1bc7792-90f2-47fb-bb98-3315494ec229   5Gi        RWX            nfs-client     <unset>                 64s

==> v1/Service
NAME                 TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
memmachine-service   NodePort   10.107.59.243   <none>        80:31001/TCP   7m32s

==> v1/Deployment
NAME         READY   UP-TO-DATE   AVAILABLE   AGE
memmachine   0/1     1            0           7m32s


  1. Check pod status and verify memmachine endpoint
jinggong@Jings-MBP ~ % kubectl get pod --namespace memmachine
NAME                          READY   STATUS    RESTARTS   AGE
memmachine-6d7b4475bb-tbtk4   1/1     Running   0          13m
jinggong@Jings-MBP ~ % curl http://vmnet4-200.eng.memverge.com:31001/api/v2/health
{"status":"healthy","service":"memmachine","version":"0.2.6"}

Checklist

[Please delete options that are not relevant.]

  • I have signed the commit(s) within this pull request
  • My code follows the style guidelines of this project (See STYLE_GUIDE.md)
  • I have performed a self-review of my own code
  • I have commented my code
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have checked my code and corrected any misspellings

Maintainer Checklist

  • Confirmed all checks passed
  • Contributor has signed the commit(s)
  • Reviewed the code
  • Run, Tested, and Verified the change(s) work as expected

Screenshots/Gifs

[If applicable, add screenshots or GIFs that show the changes in action. This is especially helpful for API responses. Otherwise, delete this section or type "N/A".]

Further comments

[Add any other relevant information here, such as potential side effects, future considerations, or any specific questions for the reviewer. Otherwise, type "None".]

Signed-off-by: Jing Gong <jing.gong@memverge.com>
@jgong jgong force-pushed the deploy-helm-charts branch from 704cf22 to e0e65b5 Compare February 24, 2026 10:23
@denny-zhao
Copy link

Wrong Helm release field casing causes empty rendered namespaces on most resources.
Using {{ .Release.namespace }} (lowercase n) renders as empty, while PVCs use {{ .Release.Namespace }}
correctly. This creates inconsistent manifests and can break helm template | kubectl apply flows (PVCs in
one namespace, Deployments/Services/Secrets/ConfigMaps in another/default).

Signed-off-by: Jing Gong <jing.gong@memverge.com>
@jgong
Copy link
Contributor Author

jgong commented Feb 24, 2026

Thank you @denny-zhao , i corrected them to namespace: {{ .Release.Namespace }}.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a comprehensive Helm chart for deploying the MemMachine stack on Kubernetes, addressing issue #1006. The chart provides a production-ready deployment configuration that supports both in-cluster and external database options, making it flexible for various deployment scenarios.

Changes:

  • Added Helm chart structure under deployments/helm/ with Chart.yaml, values.yaml, templates, and comprehensive README
  • Implemented flexible deployment model supporting optional in-cluster PostgreSQL (with pgvector) and Neo4j, or external database connectivity
  • Provided extensive documentation with usage examples for OpenAI, Ollama, and other LLM backends

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 19 comments.

Show a summary per file
File Description
deployments/helm/Chart.yaml Helm chart metadata defining chart version 0.1.0 and app version v0.2.6
deployments/helm/values.yaml Default configuration values for storage, databases, and MemMachine app with LLM/embedder settings
deployments/helm/templates/secrets.yaml Kubernetes secrets for PostgreSQL, Neo4j, and OpenAI API credentials
deployments/helm/templates/pvc.yaml PersistentVolumeClaim definitions for Neo4j, PostgreSQL, and MemMachine data
deployments/helm/templates/postgres-deployment.yaml PostgreSQL deployment with pgvector extension
deployments/helm/templates/postgres-service.yaml ClusterIP service for internal PostgreSQL access
deployments/helm/templates/neo4j-deployment.yaml Neo4j deployment with APOC and Graph Data Science plugins
deployments/helm/templates/neo4j-service.yaml ClusterIP service exposing Neo4j Bolt, HTTP, and HTTPS ports
deployments/helm/templates/memmachine-deployment.yaml Main MemMachine application deployment with init containers for database readiness
deployments/helm/templates/memmachine-service.yaml NodePort service exposing MemMachine API externally
deployments/helm/templates/memmachine-configmaps.yaml ConfigMaps for application configuration (configuration.yml) and environment variables (.env)
deployments/helm/README.md Comprehensive documentation covering architecture, values reference, prerequisites, and usage examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +33 to +72
containers:
- name: memmachine
image: {{ .Values.memmachine.image }}:{{ .Values.memmachine.tag }}
imagePullPolicy: {{ .Values.memmachine.pullPolicy }}
ports:
- containerPort: 8080
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: memmachine-secrets
key: OPENAI_API_KEY
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: POSTGRES_PASSWORD
- name: NEO4J_USER
valueFrom:
secretKeyRef:
name: neo4j-secret
key: NEO4J_USER
- name: NEO4J_PASSWORD
valueFrom:
secretKeyRef:
name: neo4j-secret
key: NEO4J_PASSWORD
- name: LOG_FILE
value: "/app/data/memmachine.log"
- name: MEMORY_CONFIG
value: {{ .Values.memmachine.config.memoryConfigPath }}
volumeMounts:
- name: config-volume
mountPath: /app/configuration.yml
subPath: configuration.yml
- name: env-volume
mountPath: /app/.env
subPath: .env
- name: data-volume
mountPath: /app/data
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memmachine deployment lacks readiness and liveness probes. Without these, Kubernetes cannot determine if the application is healthy and ready to serve traffic. Consider adding a readinessProbe that checks the /api/v2/health endpoint and a livenessProbe to ensure the pod is restarted if the application becomes unresponsive. This is especially important given the init containers only check TCP connectivity, not application readiness.

Copilot uses AI. Check for mistakes.
- name: NEO4J_server_memory_heap_max__size
value: {{ .Values.neo4j.heap.max }}
- name: NEO4J_PLUGINS
value: '{{ .Values.neo4j.plugins | toJson }}'
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Neo4j deployment lacks readiness and liveness probes. Neo4j can take significant time to start, especially when downloading plugins (APOC, Graph Data Science). Without a readiness probe, the init container in the memmachine deployment may succeed before Neo4j is actually ready to serve queries. Add a readinessProbe that checks the Bolt endpoint or uses an HTTP probe against the /db/data/ endpoint.

Suggested change
value: '{{ .Values.neo4j.plugins | toJson }}'
value: '{{ .Values.neo4j.plugins | toJson }}'
readinessProbe:
tcpSocket:
port: 7687
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 2
failureThreshold: 6
livenessProbe:
tcpSocket:
port: 7687
initialDelaySeconds: 120
periodSeconds: 20
timeoutSeconds: 2
failureThreshold: 3

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +72
containers:
- name: memmachine
image: {{ .Values.memmachine.image }}:{{ .Values.memmachine.tag }}
imagePullPolicy: {{ .Values.memmachine.pullPolicy }}
ports:
- containerPort: 8080
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: memmachine-secrets
key: OPENAI_API_KEY
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: POSTGRES_PASSWORD
- name: NEO4J_USER
valueFrom:
secretKeyRef:
name: neo4j-secret
key: NEO4J_USER
- name: NEO4J_PASSWORD
valueFrom:
secretKeyRef:
name: neo4j-secret
key: NEO4J_PASSWORD
- name: LOG_FILE
value: "/app/data/memmachine.log"
- name: MEMORY_CONFIG
value: {{ .Values.memmachine.config.memoryConfigPath }}
volumeMounts:
- name: config-volume
mountPath: /app/configuration.yml
subPath: configuration.yml
- name: env-volume
mountPath: /app/.env
subPath: .env
- name: data-volume
mountPath: /app/data
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memmachine deployment lacks resource requests and limits. Without these, the pod can consume unlimited CPU and memory, potentially affecting other workloads in the cluster. Additionally, Kubernetes cannot make informed scheduling decisions without resource requests. Consider adding configurable resource limits via values.yaml, with reasonable defaults (e.g., requests: cpu: 500m, memory: 512Mi; limits: cpu: 2000m, memory: 2Gi).

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +30
image: busybox
command: ['sh', '-c', 'until nc -zv {{ .Values.postgres.host }} {{ .Values.postgres.port }}; do echo waiting for postgres; sleep 2; done']

- name: wait-for-neo4j
image: busybox
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The init containers use the "busybox" image without specifying a version tag. This can lead to unpredictable behavior if the image changes. Additionally, the "busybox" image lacks the "nc" command in recent versions. Consider using "busybox:1.36" or a more reliable image like "alpine:3.18" for the init containers to ensure consistent behavior across deployments.

Suggested change
image: busybox
command: ['sh', '-c', 'until nc -zv {{ .Values.postgres.host }} {{ .Values.postgres.port }}; do echo waiting for postgres; sleep 2; done']
- name: wait-for-neo4j
image: busybox
image: busybox:1.36
command: ['sh', '-c', 'until nc -zv {{ .Values.postgres.host }} {{ .Values.postgres.port }}; do echo waiting for postgres; sleep 2; done']
- name: wait-for-neo4j
image: busybox:1.36

Copilot uses AI. Check for mistakes.
Comment on lines +460 to +476
### Via deployment manager (multi-tenant setups)

The `deploy_cli.py` script in the parent directory handles NodePort allocation, namespace creation, and service registry automatically:

```bash
# Deploy with Ollama backend
python deploy_cli.py deploy-memmachine-ollama ollama-30000

# Deploy with OpenAI backend
python deploy_cli.py deploy-memmachine-openai --openaiApiKey sk-...

# Deploy with vLLM backend
python deploy_cli.py deploy-memmachine-vllm vllm-chat-31000 vllm-embedder-31003
```

---

Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README references a "deploy_cli.py" script in the parent directory, but this script does not exist in the repository. Either this script needs to be added as part of this PR, or this section should be removed from the README to avoid confusing users who follow the documentation.

Suggested change
### Via deployment manager (multi-tenant setups)
The `deploy_cli.py` script in the parent directory handles NodePort allocation, namespace creation, and service registry automatically:
```bash
# Deploy with Ollama backend
python deploy_cli.py deploy-memmachine-ollama ollama-30000
# Deploy with OpenAI backend
python deploy_cli.py deploy-memmachine-openai --openaiApiKey sk-...
# Deploy with vLLM backend
python deploy_cli.py deploy-memmachine-vllm vllm-chat-31000 vllm-embedder-31003
```
---

Copilot uses AI. Check for mistakes.
configuration.yml: |
logging:
path: /app/data/memmachine.log
level: info
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logging level in configuration.yml is hardcoded to "info" (line 25) while FAST_MCP_LOG_LEVEL uses the configurable value from values.yaml (line 114). For consistency and flexibility, consider making the configuration.yml logging level also configurable via values.yaml using {{ .Values.memmachine.config.loggingLevel | lower }} to match the environment variable pattern.

Suggested change
level: info
level: {{ .Values.memmachine.config.loggingLevel | lower }}

Copilot uses AI. Check for mistakes.
name: postgres-secret
key: POSTGRES_PASSWORD
- name: POSTGRES_INITDB_ARGS
value: "--encoding=UTF-8 --lc-collate=C --lc-ctype=C"
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PostgreSQL deployment lacks readiness and liveness probes. Without these, the init containers in the memmachine deployment may succeed (TCP connection works) before PostgreSQL is actually ready to accept queries. Add a readinessProbe using pg_isready or a simple SQL query to ensure the database is fully initialized before dependent services connect.

Suggested change
value: "--encoding=UTF-8 --lc-collate=C --lc-ctype=C"
value: "--encoding=UTF-8 --lc-collate=C --lc-ctype=C"
readinessProbe:
exec:
command:
- /bin/sh
- -c
- pg_isready -U "$POSTGRES_USER" -d "$POSTGRES_DB"
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 6
livenessProbe:
exec:
command:
- /bin/sh
- -c
- pg_isready -U "$POSTGRES_USER" -d "$POSTGRES_DB"
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +51
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: {{ .Values.pvcSize }}
storageClassName: {{ .Values.storageClass }}
{{- end }}
{{- if .Values.postgres.enabled }}
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-pvc
namespace: {{ $namespace }}
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: {{ .Values.pvcSize }}
storageClassName: {{ .Values.storageClass }}
{{- end }}
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: memmachine-pvc
namespace: {{ $namespace }}
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: {{ .Values.pvcSize }}
storageClassName: {{ .Values.storageClass }}
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All PVCs use ReadWriteMany (RWX) access mode, but this is not necessary for single-replica deployments. PostgreSQL and Neo4j both run with replicas: 1 and only need ReadWriteOnce (RWO) access. RWX requires more expensive storage classes (like NFS) and can limit deployment options. Consider using RWO for database PVCs and only using RWX for memmachine-pvc if there's a specific need for multi-pod access. This would make the chart compatible with more storage providers.

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Steve Scargall <37674041+sscargal@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feat]: Support k8s deploy with helm charts

4 participants