Conversation
Host scanning now uses globs to only get inodes for the specific files
matching the globs.
Prefix map is populated with the longest prefix for each glob
e.g. /etc/**/*.conf -> /etc/
/home/user/.ssh/id_{rsa,dsa} -> /home/user/.ssh/id_
Kernel captures events based on inode first and then prefix match (this
behavior is unchanged) and then userspace does a glob match on the path
and host_path.
Molter73
left a comment
There was a problem hiding this comment.
Awesome! Thanks for tackling this!!
fact-ebpf/src/lib.rs
Outdated
| let len = if filename_prefix.len() > LPM_SIZE_MAX as usize { | ||
| LPM_SIZE_MAX as usize | ||
| } else { | ||
| filename.len() | ||
| filename_prefix.len() | ||
| }; |
There was a problem hiding this comment.
This one is on me for not knowing about this before, but:
let len = filename_prefix.len().min(LPM_SIZE_MAX as usize);
fact-ebpf/src/lib.rs
Outdated
| // Take the start of the path until the first occurence of a wildcard | ||
| // character. This is used as a filter in the kernel in cases where | ||
| // the inode has failed to match. The full wildcard string is used | ||
| // for further processing in userspace. | ||
| let filename_prefix = | ||
| if let Some(wildcard_idx) = filename.chars().position(|c| "*?[]{}".contains(c)) { | ||
| &filename[..wildcard_idx] | ||
| } else { | ||
| // if there are no wildcards then the whole path can be | ||
| // the prefix | ||
| filename | ||
| }; |
There was a problem hiding this comment.
Would this work?
| // Take the start of the path until the first occurence of a wildcard | |
| // character. This is used as a filter in the kernel in cases where | |
| // the inode has failed to match. The full wildcard string is used | |
| // for further processing in userspace. | |
| let filename_prefix = | |
| if let Some(wildcard_idx) = filename.chars().position(|c| "*?[]{}".contains(c)) { | |
| &filename[..wildcard_idx] | |
| } else { | |
| // if there are no wildcards then the whole path can be | |
| // the prefix | |
| filename | |
| }; | |
| let filename_prefix = filename.split(['*', '?', '[', '{']).next().unwrap(); |
From the docs of str::split:
The [pattern] can be a `&str`, [`char`], a slice of [`char`]s, or a
function or closure that determines if a character matches.
If there are no matches the full string slice is returned as the only
item in the iterator.
Also, I don't think we need the closing patterns, finding one of those would mean we have a misformed glob expression (or we are somehow looking for the explicit character).
fact/src/bpf/mod.rs
Outdated
| let mut new_paths = Vec::with_capacity(paths_config.len()); | ||
| let mut builder = GlobSetBuilder::new(); | ||
| for p in paths_config.iter() { | ||
| builder.add(Glob::new(&p.to_string_lossy())?); |
There was a problem hiding this comment.
We probably want to hard fail if a configured path is wrong, if we change the string at this point we might not match the strings configured by a user and we will not report things in there.
fact/src/bpf/mod.rs
Outdated
| // using short circuiting to avoid calling is_match in all | ||
| // scenarios | ||
| if self.paths_globset.is_match(event.get_filename()) || | ||
| self.paths_globset.is_match(event.get_host_path()) { |
There was a problem hiding this comment.
host_path is set on the host_scanner component, it will always be empty coming out of the kernel. We might want to change the kernel side a bit to only include the inode of a file when a match happens, then we can change this check to be something like:
if !event.get_inode().empty() || self.paths_globset.is_match(event.get_filename()) {In this scenario inode_key_t::empty() would check if the inode number and device are both 0.
fact/src/host_scanner.rs
Outdated
| let entry = entry.path(); | ||
| self.scan_inner(&entry) | ||
| .with_context(|| format!("Failed to scan {}", entry.display()))?; | ||
| glob::glob(&path.to_string_lossy())?.try_for_each(|entry| match entry { |
There was a problem hiding this comment.
Would a regular for loop be a bit more readable?
Molter73
left a comment
There was a problem hiding this comment.
Changes look good, can we add at least one integration test with globs? Just to make sure it is working, we can always expand later.
Description
Host scanning now uses globs to only get inodes for the specific files matching the globs.
Prefix map is populated with the longest prefix for each glob e.g. /etc/**/*.conf -> /etc/
/home/user/.ssh/id_{rsa,dsa} -> /home/user/.ssh/id_
Kernel captures events based on inode first and then prefix match (this behavior is unchanged) and then userspace does a glob match on the path and host_path.
Somewhat relies on a chain of PRs in the main repo (in merge order):
stackrox/stackrox#19057
stackrox/stackrox#19063
stackrox/stackrox#19089
Checklist
Automated testing
If any of these don't apply, please comment below.
Testing Performed
TODO(replace-me)
Use this space to explain how you tested your PR, or, if you didn't test it, why you did not do so. (Valid reasons include "CI is sufficient" or "No testable changes")
In addition to reviewing your code, reviewers must also review your testing instructions, and make sure they are sufficient.
For more details, ref the Confluence page about this section.