Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added parser for /proc/[pid]/environ #28

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mexus
Copy link
Contributor

@mexus mexus commented Aug 26, 2017

Hi!

First of all, thanks for a great and simple library!
I though it'd be nice to contribute to the project so I found out that there is no parser for a /proc/[pid]/environ file and implemented it :)

Please consider to review & merge if it's okay.

Copy link
Owner

@danburkert danburkert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of the API - what do you think about modeling it after VarOs, an iterator of (OsString, OsString)?

/// Parses the provided environ file.
fn environ_path<P: AsRef<Path>>(path: P) -> Result<BTreeMap<Vec<u8>, Vec<u8>>> {
let file_size = metadata(&path)?.len();
let mut buf = vec![0u8; file_size as usize];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is racy with updates to the env, right? I think it would be better and simpler just to use File::read_to_end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From man 5 proc:

If, after an execve(2), the process modifies its environment (e.g., by calling functions such as putenv(3) or modifying the environ(7) variable directly), this file  will not reflect those changes.

So the only race condition I can think of is the process is closed in between getting the size and reading the file.. But still you are right, will fix.

@mexus
Copy link
Contributor Author

mexus commented Aug 26, 2017

I totally forgot about VarOs! Shame on me, shame on me..

@mexus
Copy link
Contributor Author

mexus commented Aug 27, 2017

Updated the patch. Unfortunately VarOs can't be built but with std::env::vars_os :(. So I went with Vec<(OsString, OsString)>.

@danburkert
Copy link
Owner

Sorry for not being clear, I was suggesting creating a new type like std::env::VarOs (it could even be named procinfo::environ::VarOs). The advantage of an iterator is that it can be collected into a vec or a map easily.

@danburkert
Copy link
Owner

actually in terms of naming I think either of these would be fine:

procinfo::pid::Environ
procinfo::pid::VarOs

each has advantages, what do you think?

@mexus
Copy link
Contributor Author

mexus commented Aug 27, 2017

Ah, np. Personally I would rather go with procinfo::pid::Environ since it sounds easier to guess what it is (IMHO), but I'd say the choice is yours.
Btw, regardless of the naming, do you want it to be a full-featured struct or just a typedef? I don't see any advantages with a separate class but there might be some I just can't think of.

@danburkert
Copy link
Owner

danburkert commented Aug 27, 2017 via email

@danburkert
Copy link
Owner

danburkert commented Aug 27, 2017 via email

@mexus
Copy link
Contributor Author

mexus commented Aug 27, 2017

Well, I though about a simple type Environ = Vec<(OsString, OsString)>. Do you think it won't suffice? It seems to be working fine…

@danburkert
Copy link
Owner

danburkert commented Aug 27, 2017 via email

@mexus
Copy link
Contributor Author

mexus commented Aug 27, 2017

Anything else? :)

use libc::pid_t;

/// A list of environment variables and their values.
pub type Environ = Vec<(OsString, OsString)>;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping there was a way to avoid collecting into an intermediate Vec, but I spent a while on it and didn't find an elegant way. Since the API's just exposing a vec, I think we should skip the type def and just return Vec<(OsString, OsString)>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. I'll think a bit more on this.

// Parse 'key=value" pair.
named!(
pair<&[u8], (&[u8], &[u8])>,
tuple!(take_until_and_consume!("="), take_until!("\0"))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to have different behavior that the standard library when the key contains a leading '=' character (https://github.com/rust-lang/rust/blob/1.19.0/src/libstd/sys/unix/os.rs#L405-L408). Can this skip a character first to avoid this?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please add a test case for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure where did they get that "strategy" from. Let's say setenv function explicitly checks that the name doesn't have an equal sign inside: https://github.com/lattera/glibc/blob/a2f34833b1042d5d8eeb263b4cf4caaea138c4ad/stdlib/setenv.c#L301

Also https://stackoverflow.com/a/2821183/1449426 has nice answers with links and everything on the matter.

While it's definitely possible to support such a behaviour it will make the code to be more complex, and my humble opinion is that one should avoid unnecessary code complication.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nonetheless it appears to be a real issue. On my up-to-date linux machine (kernel 4.11) the output of env =FOO=BAR xxd /proc/self/environ shows that leading = characters can appear.

);
multi(input).map(|r| {
r.into_iter()
.map(|(x, y)| {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you move this map call into the pair function, the number of vectors constructed can be reduced from two to one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mm, I have there a tuple of two values. How can I go through them with a single call of the map? Although it definitely makes sense to apply the map right when the values are obtained (it makes the code to be more clear indeed).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The full code context has disappeared (thanks GitHub), but I think the idea was that you could use a map! call inside the parsing code, instead of the Iterator::map call here.

@mexus
Copy link
Contributor Author

mexus commented Aug 27, 2017

Updated the patch again. This time I used the streams (Iterator) concept.

@mexus
Copy link
Contributor Author

mexus commented Aug 27, 2017

Oh man that old rust compiler makes me sick

@mexus mexus force-pushed the master branch 3 times, most recently from 5e4d63d to 4f09397 Compare August 27, 2017 13:09
@danburkert
Copy link
Owner

Oh man that old rust compiler makes me sick

Not sure what you mean by this, I was debating whether to ask you just to copy that code wholesale. The memchr solution is going to be significantly faster, but it's probably not worth bringing in a dependency for.

Updated the patch again. This time I used the streams (Iterator) concept.

The new code is allocating more vectors than the previous version, one for each entry.

@mexus
Copy link
Contributor Author

mexus commented Aug 27, 2017

The new code is allocating more vectors than the previous version, one for each entry

Yeah, you're totally right, I've overlooked it somehow :(

Okay, seems like I have to put some (probably a lot of) thought into the matter.

@danburkert
Copy link
Owner

As I mentioned, I thought your previous cut was very close. Here's the change from .map to map! which I was hinting at:

/// Parses the provided buffer.
fn parse_environ(input: &[u8]) -> IResult<&[u8], Vec<(OsString, OsString)>> {
    // Parse 'key=value" pair.
    named!(pair<&[u8], (OsString, OsString)>,
        map!(tuple!(take_until_and_consume!("="), take_until!("\0")),
             |(x, y): (&[u8], &[u8])| {
                 (OsString::from(OsStr::from_bytes(x)), OsString::from(OsStr::from_bytes(y)))
             }));

    terminated!(input,
        separated_list!(tag!("\0"), pair),
        tag!("\0")
    )
}

That brings it down from two to one vector allocated. Unfortunately this still isn't covering the leading = case, though. I think for that it won't be able to use take_until_and_consume! (but maybe there's some nom magic I'm not aware of).

match pair(&self.data) {
IResult::Done(leftover, parsed) => Some(Ok((leftover.to_vec(), parsed))),
/// An iterator over environment variables.
pub struct EnvironIter<'a> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in doubt about naming and whether the structure should be exported to a top-level (pid::EnvironIter) or not (I think not).

// Calculate position of the equal sign.
let pos = match src.iter().skip(1).position(|c| c == &b'=') {
Some(p) => p,
None => return IResult::Error(error_position!(nom::ErrorKind::Custom(0), src)),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fishy part, not sure which error to return.

assert!(next.is_some());
let next = next.unwrap();
assert!(next.is_ok());
let next = next.unwrap();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I was doing this unwraps w/o is_* checks and it ended up in a mess when an actual error has happened.

@mexus
Copy link
Contributor Author

mexus commented Aug 27, 2017

That brings it down from two to one vector allocated

Pls have a look at yet another version. Sorry for keeping you busy with this.

@mexus
Copy link
Contributor Author

mexus commented Sep 1, 2017

Hi Dan,
did you happen to have a time to have a look at the latest patch?

Also I'd like to make things clear: do you want me to go with allocating a vector that contains all the pairs in one go, or do you like the latest streaming approach? I would rather go with iterators since it's the most idiomatic way as for me. I think I fixed all the extra allocations and manipulations so it should be fine given that I didn't miss anything this time.

@mexus
Copy link
Contributor Author

mexus commented Sep 6, 2017

Since so much time has passed you've probably forgotten how did it start so I squashed everything together.

Copy link
Owner

@danburkert danburkert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left comments assuming an Iterator based API. I still think it may be better just to use a vector API, since that way the parsing can be done up front, and a single error returned, instead of potentially returning an error for every pair.

}

/// An iterator over environment variables.
pub struct EnvironIter<'a> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can Environ and EnvironIter be combined into a single type? I think it should be possible just using a Vec<u8> data field and a usize offset.

type Item = Result<(OsString, OsString)>;

fn next(&mut self) -> Option<Self::Item> {
if self.data.len() == 0 {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer is_empty.



#[test]
fn test_iter() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap all of the tests in a #[cfg(test)] mod tests { like other files, and put them all at the end of the file.

/// Executes `take_until_and_consume` and maps its result to OsString.
named_args!(
until<'a>(sep: &'a str)<OsString>,
map!(take_until_and_consume!(sep), from_bytes)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the body here is simple enough just to inline into pair.

@mexus
Copy link
Contributor Author

mexus commented Feb 17, 2018

Sorry, I totally forgot about this pr. I think I will reiterate it once again in a couple of days considering all the review comments.

@mexus
Copy link
Contributor Author

mexus commented Feb 17, 2018

I've kept EnvironIter in order to pass the data to a user as OsStrs instead of allocating OsStrings every time.

I've also put some examples in the test module that show how to easily collect all the name-value pairs into different collections (without having to handle errors for every given pair).

@mexus
Copy link
Contributor Author

mexus commented Sep 15, 2018

Hi @danburkert ! What do you think about my edits? Should I make some further amendments/improvements?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants