Open
Conversation
Add oscap_source_get_streaming_xmlTextReader() that creates an xmlTextReader directly from file contents or memory buffer without loading the full XML DOM first. For file-based sources, the file is read into a memory buffer and parsed with xmlReaderForMemory. For memory-based sources, the buffer is parsed directly. BZ2- compressed sources fall back to the existing DOM-based path. Also switch oscap_source_get_scap_type() and oscap_source_get_schema_version() to use the streaming reader, avoiding unnecessary DOM construction for document type detection and schema version extraction.
Switch oval_definition_model, oval_syschar_model, oval_variable_model, oval_directives_model, and oval_results_model import functions to use oscap_source_get_streaming_xmlTextReader() instead of oscap_source_get_xmlTextReader(). This avoids loading the full XML DOM into memory when importing OVAL documents, since the OVAL parsers only use streaming-compatible xmlTextReader API calls.
Instead of keeping cloned DOM trees for extracted DataStream components, serialize them to compact XML text buffers via xmlDocDumpMemory() and immediately free the cloned DOM. The component oscap_source is then created from the memory buffer using oscap_source_new_take_memory(). This reduces peak memory during SDS decomposition because serialized XML text is typically 3-5x smaller than its libxml2 DOM representation. The streaming xmlTextReader can also parse directly from these buffers without constructing an intermediate DOM.
Release the xmlDoc held by OVAL and XCCDF sources as soon as the corresponding object models have been built from them. In xccdf_session_load_oval(), call oscap_source_free_xmlDoc() on each OVAL source right after oval_definition_model_import_source(). In _xccdf_session_load_xccdf_benchmark(), free the XCCDF source DOM right after xccdf_benchmark_import_source(). This eliminates the window where both the XML DOM and the parsed object model coexist in memory during the loading phase.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Avoid keeping full libxml2 DOM trees in memory when only the parsed object model is needed. The Source DataStream XML (typically 20+ MB) was previously held as a DOM for the entire evaluation lifetime. Now it is parsed via a streaming xmlTextReader and freed early.
It includes the following changes: