Conversation
WalkthroughThis PR adds an optional Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Possibly related issues
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Trivy has completed a full security repository scan ✅ You can find the analysis results for this PR branch on this overview.
|
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/writers/writer_postgres.py (1)
187-221: Align jobs INSERT column order with spec: placecountryaftercatalog_id.In the jobs INSERT,
countrycurrently appears beforecatalog_id:event_id, country, catalog_id, ...The PR objectives require inserting
countryaftercatalog_id. While named columns keep the query functionally correct, it’s better to match the requested order and what readers will expect.You can adjust the column and value order like this:
- INSERT INTO {table_jobs} - ( - event_id, - country, - catalog_id, - status, - timestamp_start, - timestamp_end, - message, - additional_info - ) + INSERT INTO {table_jobs} + ( + event_id, + catalog_id, + country, + status, + timestamp_start, + timestamp_end, + message, + additional_info + ) VALUES ( %s, %s, %s, %s, %s, %s, %s, %s )""", ( - message["event_id"], - job.get("country", ""), - job["catalog_id"], + message["event_id"], + job["catalog_id"], + job.get("country", ""), job["status"], job["timestamp_start"], job["timestamp_end"], job.get("message"), (json.dumps(job.get("additional_info")) if "additional_info" in job else None), ),This keeps the default behavior (
job.get("country", "")) while aligning the position with the spec.tests/writers/test_writer_postgres.py (1)
92-135: Update runs test indices to matchcountrybeing aftercatalog_idin jobs INSERT.Once
postgres_run_writeis updated to placecountryaftercatalog_id, the job parameter indices in this test should reflect:
- index 1 →
catalog_id- index 2 →
country- indices 6 and 7 remain
messageandadditional_info.Concretely:
- # Check first job - _job1_sql, job1_params = cur.executions[1] - assert job1_params[1] == "" - assert job1_params[2] == "c1" + # Check first job + _job1_sql, job1_params = cur.executions[1] + assert job1_params[1] == "c1" + assert job1_params[2] == "" @@ - # Check second job - _job2_sql, job2_params = cur.executions[2] - assert job2_params[1] == "bw" - assert job2_params[2] == "c2" + # Check second job + _job2_sql, job2_params = cur.executions[2] + assert job2_params[1] == "c2" + assert job2_params[2] == "bw" assert job2_params[6] == "err" assert json.loads(job2_params[7]) == {"k": "v"}This keeps the existing coverage while bringing the tests in line with the intended column order.
🧹 Nitpick comments (1)
conf/topic_runs.json (1)
42-45: Movecountrybelowcatalog_idto match the stated schema ordering.The issue description asks for
countryto be positioned aftercatalog_idinjobs.items.properties. Here it appears beforecatalog_id. While JSON property order is not semantic, aligning the order with the spec and DB inserts will reduce confusion.You could reorder as:
- "properties": { - "country": { - "type": "string", - "description": "The country the data is related to." - }, - "catalog_id": { + "properties": { + "catalog_id": { "type": "string", "description": "Identifier for the data definition (Glue/Hive) database and table name for example" + }, + "country": { + "type": "string", + "description": "The country the data is related to." },
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
conf/topic_dlchange.json(1 hunks)conf/topic_runs.json(1 hunks)src/writers/writer_postgres.py(5 hunks)tests/writers/test_writer_postgres.py(5 hunks)
🔇 Additional comments (3)
conf/topic_dlchange.json (1)
28-31: EDLAcountryschema addition looks correct and aligned with the spec.
countryis optional, typed as string, and positioned immediately aftertimestamp_eventwithout being added torequired, matching the stated objectives.src/writers/writer_postgres.py (1)
90-137: EDLA writer’scountrycolumn and default handling look correct.
countryis inserted immediately aftertimestamp_eventand usesmessage.get("country", "")so missing values fall back to an empty string as required. The parameter ordering in the VALUES tuple matches the column list and the updated tests.tests/writers/test_writer_postgres.py (1)
41-90: EDLA tests correctly cover both presence and defaulting ofcountry.The tests assert the new parameter count, verify
countryat index 6 when present, and confirm the empty-string default and index shifts when optional fields are omitted. This aligns tightly with the EDLA writer behavior and objectives.
Summary
This pull request adds support for a new
countryfield across the data ingestion pipeline. The main goal is to ensure that country information is properly captured, stored, and validated for both event and job records.Release Notes:
Closes #79
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.