Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indentation and completion improvements #15

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

alexsmith1612
Copy link

This contains various improvements to the indentation/alignment and
completion systems.

There are some things that I left in place but were obsoleted and
could potentially be removed (e.g. beancount-indent-transaction has
been superseded by the more general beancount-indent-directive) or for
which I added an option to change existing behavior but we could
potentially just only use the new behavior for (e.g. always aligning
numbers when indenting a line via beancount-tab-dwim--I'm not sure
why one wouldn't want to align numbers while indenting just in this
one case).

Another thing to note is that the new indentation logic assumes that
the only lines which (after potential leading whitespace) begin with a
timestamp are timestamped directives, so if/when postings with
timestamps are added to beancount this logic will need to be updated.

In general, there are a number of behavior decisions that I made that
may warrant discussion.

This probably could be broken up into multiple smaller PRs, so I can
resubmit as such if you'd like.

I'll also mention that, while I'm not new to programming in general,
I'm a complete novice at elisp, so I may have accidentally done some
non-canonical things or not followed standard elisp conventions/best
practices.

Thank you,
Alex

Now any directive, timestamped or not, will be indented to the
beginning of the line. Previously, if a directive followed a
transaction without an empty line separating them, the directive would
be indented as if it were part of the transaction.
Previously, metadata for a posting would be indented to the same level
as the posting
Set fill-paragraph-function to this more generic function instead of
to beancount-indent-transaction
- metadata indentation
- directive indentation
- open directive currency alignment
- price and balance directive number alignment
Previously the unit test invoked (goto-char) with a position (15) past the
end of the temporary buffer, and certainly not within the link. I
believe this was a simple typo, as position 5 is within the link in
the temporary buffer.
Previously, imenu was not being loaded when running beancount-test.el
in batch mode, causing the imenu-xxx tests to fail due to
undefined/void functions being invoked.
Previously there were certain completions (non-timestamped directives,
options, and timestamped directives) which would insert a space
afterward--different behavior from the rest of the completions.

Additionally this trailing space was unconditionally inserted into the
journal, even if the buffer already included the full
completion-with-trailing-space. This behavior was due to the space not
being included in the matching regexp, and thus not being included in
the (match-beginning) (match-end) bounds. If we do want to keep the
completion-adds-trailing-space behavior, the matching regexps should
be updated to include the optional trailing space.

Finally, allow completions on bare "^option " lines which do not yet
have an opening quote.
Now only match empty lines and lines beginning with potential
non-timestamped directives characters.

Previously pressing TAB at the beginning of a transaction would try to
complete non-timestamped directive.
Previously only tags and links that existed on their own line within a
transaction were able to be completed. Now a link or tag can be
completed anywhere, including on the transaction directive line, and
including when multiple tags or links already exist on a given line.

This also updates the thing-at-point defintion for beancount-link to
remove the arbitrary length limit while keeping (improving) the
worst-case time complexity O(n) where n is the length of the link at
the point.
Now pressing TAB while the point is at the beginning of a (potentially
incomplete) tag or link on its own line in a transaction attempts to
complete the tag or link instead of insertings an account.
@dnicolodi
Copy link
Collaborator

Thank you, this looks good. I had a quick look and there are a few things that I would prefer to be done in a different way. I'll be back with a detailed review ASAP.

Copy link
Collaborator

@dnicolodi dnicolodi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a first batch of comments. I need to look at other changes more in depth. There are several changes for which I don't understand the motivation.

@@ -178,6 +178,9 @@ from the open directive for the relevant account."
(defconst beancount-date-regexp "[0-9]\\{4\\}[-/][0-9]\\{2\\}[-/][0-9]\\{2\\}"
"A regular expression to match dates.")

(defconst beancount-timestamp-indent-regexp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of these defconst is to make the code easier to read. beancount-timestamp-indent-regexp sounds very much like the name of a regexp that matches the indentation part of a timestamp, however, it is not this and as a consequence it makes the code harder to read. I would go without this defconst.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that beancount-timestap-indent-regexp is not the best name. Something like beancount-timestamp-relaxed-indentation-regexp or similar may be more clear to a reader.

My purpose for using a defconst here was because I use this regexp in multiple places, and IMO minimizing the number of times one repeats themselves in code is a good thing.

My assumption was also that defconst evaluates the expression only once at file load time, which would be a minor (and, admittedly, probably completely negligible) optimization when the expression is used in multiple different places throughout the file and in functions that get repeatedly executed throughout the duration of the program.

(concat "^" beancount-directive-base-regexp))

(defconst beancount-directive-indent-regexp
(concat "\\s-*" beancount-directive-base-regexp))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same response as above.

;; Consider: use "\\s-*[0-9]" instead of full timestamp regexp
((or (looking-at-p beancount-timestamp-indent-regexp)
(looking-at-p beancount-directive-indent-regexp))
0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general comment: beancount-mode assumes an opinionated style for the ledger. In this style directives are separated by empty lines and makes it possible to provide functionality without having to implement a complete Beancount syntax lexer and grammar in regular expressions.

Can you please provide an example where this change improves editing?

The choice of using the minimum possible context to decide the indentation level (ie only the first character in a date) is deliberate: it allows to indent lines that are only partially written. Why are you changing this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general comment: beancount-mode assumes an opinionated style for the ledger. In this style directives are separated by empty lines and makes it possible to provide functionality without having to implement a complete Beancount syntax lexer and grammar in regular expressions.

I think that's a fair approach, but IMO if being less opinionated on what beancount-mode properly handles is not too onerous of a development effort and does not significantly adversely affect performance, then what's the downside?

Personally, my ideal world would have beancount-mode able to behave much like bean-format, fixing all indentation where the intention is unambiguous. I'd also like an option to normalize all intraline whitespace (i.e. squash all consecutive spaces down to a single space except for when used for alignment), but I haven't looked deeply into this yet.

In general my philosophy with code that processes (potentially loosely) formatted data is to be lenient (within reason) with what you accept as input and strict with what you output.

I have been separating transactions by empty lines, which IMO looks the best and should probably be the preferred style, but I think other directives are better when tightly grouped together. That being said, this change is more-or-less unrelated to that (that is more relevant to beancount-find-directive-extents and friends).

Can you please provide an example where this change improves editing?

The primary driver of this change was to explicitly check for non-timestamped directives to have 0 indentation.

The choice of using the minimum possible context to decide the indentation level (ie only the first character in a date) is deliberate: it allows to indent lines that are only partially written. Why are you changing this?

The change from simply checking if the first character is a digit to checking if the line (after whitespace) began with a complete date was primarily to have "parallel structure" between the check for timestamped and non-timestamped directives for indentation purposes.

That being said, my change does not change the actual behavior for attempting to indent a line beginning with a partially completed date, as such lines would fall through to the default "0 indentation" case. And as you can see by my comment, I was not 100% convinced that "single-digit to full date" change was for the best, so I'm not going to put up a fight to keep the full date check (although I do think it's conceptually "cleaner", for lack of a better word).

I do, however, stand by the change to allow for arbitrary whitespace in front of the date/digit here, as this is a function whose purpose is to compute the appropriate indentation level of a line, and I do not think it makes sense to enforce that a line already have the appropriate indentation level when determining what its indentation level should be.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the mode is trying to indent should be valid Beancount syntax. An indented date in front of a directive is not valid Beancount syntax.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that I fundamentally disagree with you here.

A function whose purpose is to properly indent a line but bases how to indent that line on how the line is already indented seems completely flawed to me.

While I'm not a huge fan of syntactically significant whitespace (probably because my primary language is C), I can understand it from a language spec perspective--trying to keep all files similarly formatted and legible is reasonable (although I think that doesn't matter as much with personal financial ledgers which are generally only viewed by one person or perhaps a small handful of people at most).

In the case of beancount, it's worth noting that the syntactically significant whitespace is completely unnecessary for unambiguously parsing a journal. I'm by no means suggesting that the beancount syntax should be changed to ignore leading whitespace, but I do want to point out the difference between beancount and, say, python, where changing the indentation of a line fundamentally changes that line's meaning.

For an editor whose goal is to try to make editing a language with syntactically significant whitespace as easy as possible, IMO there are essentially two reasonable choices for TAB/indentation behavior:

  1. TAB toggles through indentation levels, possibly with separate commands to specifically shift a line (or region) up or down by an indentation level. This is the behavior of python-mode, and IMO should only be the behavior for languages where it is impossible to unambiguously determine what the "proper" indentation level of a given line is, i.e. if changing the indentation level completely changes a statement's meaning.

  2. TAB indents the line (or region) as appropriate, regardless of the current indentation level. This is what beancount-mode currently does, with the exception of the case where a line happens to begin with 0 indentation. That distinction seems arbitrary to me, and IMO only serves to make editing more difficult.

The only reason the current behavior is even possible is because the beancount syntax only distinguishes between lines with leading whitespace and lines without leading whitespace, not caring about how much whitespace there is when whitespace is present. If there were multiple levels of syntactically significant indentation, then there would be no way for a given line to be automatically indented, as there wouldn't be a single correct indentation value for said line.

I'd also like to point out that despite your assertion that "what the mode is trying to indent should be valid Beancount syntax", the current indentation computation does not actually follow through with this in its behavior.

Consider the following snippet:

2021-01-01 * "Example"
Assets:Checking -1.00 USD
Expenses:Food 1.00 USD

This is clearly not valid Beancount syntax, as the postings are not indented. Despite this, trying to indent the "transaction" works as one would expect, changing it to valid Beancount syntax. (I use quotes around "transaction" here because, again, it is not valid syntax, and as far as Beancount is concerned it is not a transaction--it's not anything other than an error...).

Now I'd like you to consider to workflows, both of which I would consider reasonable yet are (at best) impeded by beancount-mode's behavior:

  1. Consider a brokerage account that holds both cash and a few different securities. As is often recommended, the user wants to have the one real account split into multiple sub-accounts by commodity type. It seems reasonable to me to want their open directives grouped together, without newlines separating them (and this is how Beancount's example journals are laid out):

    2021-01-01 open Assets:Brokerage:Cash               USD
    2021-01-01 open Assets:Brokerage:ABC                ABC
    2021-01-01 open Assets:Brokerage:XYZ                XYZ
    

    When inputting this in beancount-mode with electric-indent-mode enables (as is the default), one would type the first directive, then hit RET. Now the point is on the next line and indented. The user now types the second directive and hits RET. The second directive is still indented, despite this being invalid Beancount syntax. In order to get that second directive to be properly indented (i.e. to 0 indentation), the user would need to manually delete the indentation, either before or after inputting the directive itself. TAB doesn't help, because beancount-mode assumes that the correct indentation level is non-zero, despite the fact that it unambiguously is not. beancount-mode is actively working against the user, making it more difficult to edit their journal. The same process takes place for the next directive.

  2. Let's say the user has electric-indent-mode turned off, either because they simply don't like its behavior in the general case or to avoid beancount-mode from working against them as in the previous example. The workflow that they want to use is to input a complete transaction and then indent it all at once, so they first type

    2021-01-01 * "Example"
    Assets:Checking  -1.00 USD
    posting_metadata: foo
    Expenses:Food 1.00 USD
    posting_metadata: bar
    

    Then they try to indent the transaction either with M-q or highlighting the region and hitting TAB or C-M-\. If my two-level metadata indentation changes were accepted without allowing metadata to be at zero-indentation when computing what its proper indentation level should be, the result would be the same as it is currently:

    2021-01-01 * "Example"
      Assets:Checking                           -1.00 USD
      posting_metadata: foo
      Expenses:Food                              1.00 USD
      posting_metadata: bar
    

    since the metadata would not match against the metadata regexp, and would then fall through to the "is the previous line indented" condition. Clearly this is not an improvement over the existing behavior.

All of that is to say that I feel quite strongly that beancount-compute-indentation should completely ignore the current line's existing implementation when determining what its proper indentation level should be.

That got much longer-winded than I initially intended, so I apologize for the wall of text.

(concat "^\\s-+" beancount-metadata-base-regexp))

(defconst beancount-metadata-indent-regexp
(concat "\\s-*" beancount-metadata-base-regexp))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as in the previous similar instances. Furthermore, here I don't understand the rationale for the change: you match beancount-metadata-indent-regexp at the beginning of a line, thus the existing beancount-metadata-regexp would work just fine.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The important distinction between the two regexps is + being changed to *, i.e. allowing the case where a metadata line is not yet indented, and I think that sharing the common part of the two regexps is important for avoiding code repetition (particularly of a somewhat opaque "raw" regular expression rather than something that already builds upon other defined regexp constants).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A metadata line not starting with whitespace is not valid Beancount syntax. I understand being lenient with what you accept as input, but a line needs to be drawn somewhere. I am sure we should not be accepting invalid Beancount syntax and try to make sense of it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my response above, but additionally:

I understand being lenient with what you accept as input, but a line needs to be drawn somewhere.

The line, IMO, should be drawn after "ignoring a given line's existing indentation when determining how it should be indented". This allows us to unambiguously determine the user's intent and make editing their journal more natural and intuitive rather than fighting them for seemingly no reason at all.

What benefit comes from not being lenient in this one particular case?

I am sure we should not be accepting invalid Beancount syntax and try to make sense of it.

As mentioned in the above response, beancount-mode already does this.

In addition to the given example about beancount-mode accepting and making sense of the invalid Beancount syntax of non-indented postings, beancount-mode assumes any line that begins with a number is a timestamped directive, so given a line

2

which is clearly invalid Beancount syntax, beancount-mode considers it to be a valid timestamped directive, at least as far as its indentation computation logic goes.

(looking-at-p beancount-timestamped-directive-regexp)
(looking-at-p beancount-posting-regexp))
(+ (current-indentation) beancount-transaction-indent)
(current-indentation))))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice improvement. I wanted to fix this since a long time. However, I think this could be simplified and made more robust:

((looking-at-p beancount-metadata-indent-regexp)
 (if (and (= (forward-line -1) 0)
          (looking-at-p beancount-posting-regexp))
     (* 2 beancount-transaction-indent))
  beancount-transaction-indent))

or something like this, ie double the indentation if the current line is a metadata line and the previous line is a posting. WDYT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion does not work since postings can have multiple lines of metadata following them, for example, the test case that I extended:

2019-01-01 * \"Example\"
  transaction: metadata
  #foo
  ^bar
  also_transaction: metadata
  Expenses:Example                              1.00 USD
    posting: metadata
    also_posting: metadata
  Assets:Checking                              -1.00 USD
    second_posting: metadata

@@ -133,21 +133,71 @@ Return a list of substrings each followed by its face."
:tags '(indent regress)
Copy link
Collaborator

@dnicolodi dnicolodi May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the tests in the same commit that extends the functionality.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will change.

@@ -34,6 +34,7 @@
(require 'subr-x)
(require 'outline)
(require 'thingatpt)
(require 'imenu)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beancount.el does not use any functionality from imenu. The require should be in the tests only.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imenu-generic-expression is defined in imenu.el and used in beancount.el, so my assumption that the require was needed in beancount.el in addition to (implicitly) beancount-tests.el.

If this is not the case, then I'll move it to beancount-tests.el.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imenu-generic-expression is declared as a local variable in imenu.el, but you are updating the variable using setq here. If imenu.el has not been loaded yet, it will set the global value.

If you don't require imenu, I think you should set it locally, e.g. using setq-local...

FYI: In most cases, you would declare a variable from an external package using defvar without an initial value:

(defvar imenu-generic-expression)

See purcell/package-lint#205 for details.

@@ -33,6 +33,7 @@
(autoload 'ido-completing-read "ido")
(require 'subr-x)
(require 'outline)
(require 'org) ;; for org-level-x faces
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure this is not needed, is it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed if beancount-mode wants to use org's faces

Without this change, when I open my beancount journal, beancount mode loads` (along with outline-minor-mode), but the outline heading faces do not, and I receive the following error:

Invalid face reference: org-level-1 [52 times]

I assume that you load org.el somewhere else in your init file.

@@ -490,12 +490,14 @@ With an argument move to the next non cleared transaction."
;; that begin with something that looks like a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this improves. If you don't want completion while editing a string, don't invoke the completion function. What am I missing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with "don't invoke the completion function" is that we use a tab-dwim function which either toggles outline sections, attempts to complete, or indents based on context.

My goal is to be able to hit TAB to indent while editing in strings regardless of their content (in particular if the string happens to have # or ^ in it), and IMO there's no case where a user would want beancount-mode to try to complete anything in a string (with the exception of options), since strings are more-or-less opaque to beancount.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see where this is coming from now: you enabled completion of tags and links also now when on a dedicated line. Very welcome fix BTW, thank you! This enables completion of the sames in string literals too.

I admit I see advantages in permitting completion of tag and links in strings too. It is common to mention links or tags in strings for many different reasons, and account names too. Having auto completion for them in strings could be seen as an handy feature. If you don't want completion just put hit space or close the double quotes before pressing tab. I am almost tempted to open up account completion to more contexts rather to restrict existing completion contexts further.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly about not completing in strings to fight you too hard here, but I don't personally see a need for any kind of completion in strings in Beancount. I could see it being more useful in comments than in strings.

I do, however, feel strongly that beancount-collect should not grab matches that occur in strings (at least not in the generic case, perhaps there may be some exceptions in the future). In fact, I think that it should probably also ignore matches that occur in comments.

@@ -1,6 +1,7 @@
;;; beancount-test.el --- ERT for beancount-mode -*- lexical-binding: t -*-

;; Copyright 2019 Daniele Nicolodi <[email protected]>
;; Copyright 2019 Alex Smith <[email protected]>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2019?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, that was a case of "thinking one thing but typing what I'm looking at". Will fix.

@alexsmith1612
Copy link
Author

Thank you for your (in progress) review, Daniele. I'll respond to your comments individually

;; Move before the start character if we hit it
(when (eq (char-before) start-char)
(backward-char))
(looking-at (concat start-char-string "[" beancount-tag-chars "]*")))))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot this based on (re-search-backward (concat "[^" beancount-tag-chars "]")) instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is a better solution.

I was so caught up in a "don't use looking-backward" mindset that I seemed to have mentally excluded any sort of backwards regexp search and way overthought the solution.

(beancount-looking-at-tag-generic ?# "#"))

(defun beancount-looking-at-link ()
(beancount-looking-at-tag-generic ?^ "\\^"))
Copy link
Collaborator

@dnicolodi dnicolodi May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This functions take the same argument twice. You can use string-to-char or char-to-string instead.

However, I would write this like

(defun bencount--looking-at-tag ()
  (save-excursion
    (re-search-backward (concat "[^" beancount-tag-chars "]"))
    (looking-at (concat "#[" beancount-tag-chars "]"))))

at which point having a generic function is the same lines of code than repeating it. It is probably also wise to put this kind of functions in what is conventionally the private namespace (indicated with the doubly dashed name). This has not been done consistently before, and needs to be fixed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This functions take the same argument twice. You can use string-to-char or char-to-string instead.

This is not quite true, as in beancount-looking-at-link the string version of the character must be escaped so that it's not treated as a regexp anchor. I initially used char-to-string before realizing that it did not work for links.

However, I would write this like ... at which point having a generic function is the same lines of code than repeating it.

Agreed. Will change.

It is probably also wise to put this kind of functions in what is conventionally the private namespace (indicated with the doubly dashed name). This has not been done consistently before, and needs to be fixed.

I had considered this but wasn't sure whether to make just the private function in my changeset with the double dash namespace or if I should match the convention in (most of) the existing code, potentially updating the full file at once in a separate PR.

;; limit the search for the link to the 128 characters before and
;; after the point. This number is chosen arbitrarily.
(when (thing-at-point-looking-at (concat "\\^[" beancount-tag-chars "]+") 128)
(when (beancount-looking-at-link)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! I just realized how dumb this function is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants