PYTHON-5449 - Do not attach invalid document in exception message#2539
PYTHON-5449 - Do not attach invalid document in exception message#2539NoahStapp merged 7 commits intomongodb:masterfrom
Conversation
bson/errors.py
Outdated
There was a problem hiding this comment.
Please add a versionadded and changelog reference.
bson/errors.py
Outdated
There was a problem hiding this comment.
Does this need to be public?
There was a problem hiding this comment.
I can see this being a useful behavior for users, so making this public seems reasonable.
There was a problem hiding this comment.
Sorry, I was suggesting we make it private, not public.
There was a problem hiding this comment.
What's the reasoning for making it private? Security concerns if users misuse it? I find having the full document very useful for debugging and I imagine users do too.
There was a problem hiding this comment.
I agree the getter should be public but when would we expect a user to set the document? We try to be conservative with public apis and only add ones that have a strong justification.
There was a problem hiding this comment.
Oh whoops sorry I misread which method this was about. Totally agree, fixing.
| goto cleanup; | ||
| } | ||
| PyObject *new_msg = PyUnicode_FromFormat("Invalid document %s | %s", dict_str_utf8, msg_utf8); | ||
| PyObject *new_msg = PyUnicode_FromFormat("Invalid document: %s", msg_utf8); |
There was a problem hiding this comment.
For an invalid field in a nested doc, does this yield errors like "Invalid document: Invalid document: Invalid document: Invalid document: Invalid document: ...."?
There was a problem hiding this comment.
No, it works as expected:
encode({"t": {"f": Wrapper(1)}})
# Produces
Invalid document: cannot encode object: 1, of type: <class 'test.test_bson.TestBSON.test_doc_in_invalid_document_error_as_property.<locals>.Wrapper'>
| elements.append(_element_to_bson(key, value, check_keys, opts)) | ||
| except InvalidDocument as err: | ||
| raise InvalidDocument(f"Invalid document {doc} | {err}") from err | ||
| raise InvalidDocument(f"Invalid document: {err}", doc) from err |
There was a problem hiding this comment.
Shouldn't this InvalidDocument except clause include the encoding of _id too?
There was a problem hiding this comment.
That would be a pre-existing bug, no? Happy to fix if we identify why the code currently looks like this. Maybe we assume that _id will be valid if it reaches this far?
There was a problem hiding this comment.
Correct it would be a pre-existing bug. Could you open a ticket for it?
>>>> encode({'_id': set()})
Traceback (most recent call last):
File "<python-input-6>", line 1, in <module>
encode({'_id': set()})
File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 1053, in encode
return _dict_to_bson(document, check_keys, codec_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 1006, in _dict_to_bson
elements.append(_name_value_to_bson(b"_id\x00", doc["_id"], check_keys, opts))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 980, in _name_value_to_bson
raise InvalidDocument(f"cannot encode object: {value!r}, of type: {type(value)!r}")
bson.errors.InvalidDocument: cannot encode object: set(), of type: <class 'set'>vs the expected behavior:
>>>> encode({'_id': 1, 'foo': set()})
Traceback (most recent call last):
File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 1010, in _dict_to_bson
elements.append(_element_to_bson(key, value, check_keys, opts))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 994, in _element_to_bson
return _name_value_to_bson(name, value, check_keys, opts)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 980, in _name_value_to_bson
raise InvalidDocument(f"cannot encode object: {value!r}, of type: {type(value)!r}")
bson.errors.InvalidDocument: cannot encode object: set(), of type: <class 'set'>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<python-input-7>", line 1, in <module>
encode({'_id': 1, 'foo': set()})
File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 1053, in encode
return _dict_to_bson(document, check_keys, codec_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/shane/git/mongo-python-driver/bson/__init__.py", line 1012, in _dict_to_bson
raise InvalidDocument(f"Invalid document {doc} | {err}") from err
bson.errors.InvalidDocument: Invalid document {'_id': 1, 'foo': set()} | cannot encode object: set(), of type: <class 'set'>
bson/_cbsonmodule.c
Outdated
There was a problem hiding this comment.
Does this need to be "_document" now?
There was a problem hiding this comment.
This also highlights that this PR needs to add a test to ensure .document is the expected dict.
There was a problem hiding this comment.
Interestingly the tests fail if this is _document.
There was a problem hiding this comment.
Are the two modified tests in this PR not sufficient to ensure that .document behaves as expected?
There was a problem hiding this comment.
This line is setting a class attribute on the InvalidDocument class so the logic is incorrect, eg:
InvalidDocument.document = dictThere was a problem hiding this comment.
Ah sorry I misunderstood what you meant. You're right that _error returns a class, not an instance.
There was a problem hiding this comment.
Fixed. Had to do some C API delving to figure this out 😅
| } | ||
| PyObject *new_msg = PyUnicode_FromFormat("Invalid document: %s", msg_utf8); | ||
| // Add doc to the error instance as a property. | ||
| PyObject *new_evalue = PyObject_CallFunctionObjArgs(InvalidDocument, new_msg, dict, NULL); |
There was a problem hiding this comment.
Need to check the new_msg == NULL case before we call PyObject_CallFunctionObjArgs here.
ShaneHarvey
left a comment
There was a problem hiding this comment.
I was skeptical of the reference counting in this error handling code so I created this script:
from bson import encode
from bson.errors import InvalidDocument
def main():
small_str = "x" * 1024
while True:
for _ in range (10000):
doc = {"data": small_str, "invalid": set()}
try:
encode(doc)
raise Exception("Shouldn't get here")
except InvalidDocument as exc:
pass
# assert exc.document == doc
if __name__ == "__main__":
main()Running it shows a memory leak both before and after these changes. Should we fix this leak in this PR or another?
bson/errors.py
Outdated
There was a problem hiding this comment.
_set_document can be removed.
|
Update, the leak already exists so I opened https://jira.mongodb.org/browse/PYTHON-5571. |
…ngodb#2539) (cherry picked from commit 266caf0)
This PR removes the full document from the error message and stores it in the
documentproperty on the error instead.