Skip to content

Comments

Moves glob/wildcard matching into Fact.#323

Draft
Stringy wants to merge 3 commits intomainfrom
giles/wildcard-support
Draft

Moves glob/wildcard matching into Fact.#323
Stringy wants to merge 3 commits intomainfrom
giles/wildcard-support

Conversation

@Stringy
Copy link
Contributor

@Stringy Stringy commented Feb 23, 2026

Description

Host scanning now uses globs to only get inodes for the specific files matching the globs.

Prefix map is populated with the longest prefix for each glob e.g. /etc/**/*.conf -> /etc/
/home/user/.ssh/id_{rsa,dsa} -> /home/user/.ssh/id_

Kernel captures events based on inode first and then prefix match (this behavior is unchanged) and then userspace does a glob match on the path and host_path.

Somewhat relies on a chain of PRs in the main repo (in merge order):

stackrox/stackrox#19057
stackrox/stackrox#19063
stackrox/stackrox#19089

Checklist

  • Investigated and inspected CI test results
  • Updated documentation accordingly

Automated testing

  • Added unit tests
  • Added integration tests
  • Added regression tests

If any of these don't apply, please comment below.

Testing Performed

TODO(replace-me)
Use this space to explain how you tested your PR, or, if you didn't test it, why you did not do so. (Valid reasons include "CI is sufficient" or "No testable changes")
In addition to reviewing your code, reviewers must also review your testing instructions, and make sure they are sufficient.

For more details, ref the Confluence page about this section.

Host scanning now uses globs to only get inodes for the specific files
matching the globs.

Prefix map is populated with the longest prefix for each glob
e.g. /etc/**/*.conf -> /etc/
     /home/user/.ssh/id_{rsa,dsa} -> /home/user/.ssh/id_

Kernel captures events based on inode first and then prefix match (this
behavior is unchanged) and then userspace does a glob match on the path
and host_path.
Copy link
Contributor

@Molter73 Molter73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks for tackling this!!

Comment on lines 48 to 52
let len = if filename_prefix.len() > LPM_SIZE_MAX as usize {
LPM_SIZE_MAX as usize
} else {
filename.len()
filename_prefix.len()
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is on me for not knowing about this before, but:

let len = filename_prefix.len().min(LPM_SIZE_MAX as usize);

Comment on lines 34 to 45
// Take the start of the path until the first occurence of a wildcard
// character. This is used as a filter in the kernel in cases where
// the inode has failed to match. The full wildcard string is used
// for further processing in userspace.
let filename_prefix =
if let Some(wildcard_idx) = filename.chars().position(|c| "*?[]{}".contains(c)) {
&filename[..wildcard_idx]
} else {
// if there are no wildcards then the whole path can be
// the prefix
filename
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this work?

Suggested change
// Take the start of the path until the first occurence of a wildcard
// character. This is used as a filter in the kernel in cases where
// the inode has failed to match. The full wildcard string is used
// for further processing in userspace.
let filename_prefix =
if let Some(wildcard_idx) = filename.chars().position(|c| "*?[]{}".contains(c)) {
&filename[..wildcard_idx]
} else {
// if there are no wildcards then the whole path can be
// the prefix
filename
};
let filename_prefix = filename.split(['*', '?', '[', '{']).next().unwrap();

From the docs of str::split:

The [pattern] can be a `&str`, [`char`], a slice of [`char`]s, or a
function or closure that determines if a character matches.

If there are no matches the full string slice is returned as the only
item in the iterator.

Also, I don't think we need the closing patterns, finding one of those would mean we have a misformed glob expression (or we are somehow looking for the explicit character).

let mut new_paths = Vec::with_capacity(paths_config.len());
let mut builder = GlobSetBuilder::new();
for p in paths_config.iter() {
builder.add(Glob::new(&p.to_string_lossy())?);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to hard fail if a configured path is wrong, if we change the string at this point we might not match the strings configured by a user and we will not report things in there.

// using short circuiting to avoid calling is_match in all
// scenarios
if self.paths_globset.is_match(event.get_filename()) ||
self.paths_globset.is_match(event.get_host_path()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

host_path is set on the host_scanner component, it will always be empty coming out of the kernel. We might want to change the kernel side a bit to only include the inode of a file when a match happens, then we can change this check to be something like:

if !event.get_inode().empty() || self.paths_globset.is_match(event.get_filename()) {

In this scenario inode_key_t::empty() would check if the inode number and device are both 0.

let entry = entry.path();
self.scan_inner(&entry)
.with_context(|| format!("Failed to scan {}", entry.display()))?;
glob::glob(&path.to_string_lossy())?.try_for_each(|entry| match entry {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a regular for loop be a bit more readable?

@Stringy Stringy requested a review from Molter73 February 23, 2026 15:27
Copy link
Contributor

@Molter73 Molter73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, can we add at least one integration test with globs? Just to make sure it is working, we can always expand later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants