-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indentation and completion improvements #15
base: main
Are you sure you want to change the base?
Indentation and completion improvements #15
Conversation
Now any directive, timestamped or not, will be indented to the beginning of the line. Previously, if a directive followed a transaction without an empty line separating them, the directive would be indented as if it were part of the transaction.
Previously, metadata for a posting would be indented to the same level as the posting
Set fill-paragraph-function to this more generic function instead of to beancount-indent-transaction
- metadata indentation - directive indentation - open directive currency alignment - price and balance directive number alignment
Previously the unit test invoked (goto-char) with a position (15) past the end of the temporary buffer, and certainly not within the link. I believe this was a simple typo, as position 5 is within the link in the temporary buffer.
Previously, imenu was not being loaded when running beancount-test.el in batch mode, causing the imenu-xxx tests to fail due to undefined/void functions being invoked.
Previously there were certain completions (non-timestamped directives, options, and timestamped directives) which would insert a space afterward--different behavior from the rest of the completions. Additionally this trailing space was unconditionally inserted into the journal, even if the buffer already included the full completion-with-trailing-space. This behavior was due to the space not being included in the matching regexp, and thus not being included in the (match-beginning) (match-end) bounds. If we do want to keep the completion-adds-trailing-space behavior, the matching regexps should be updated to include the optional trailing space. Finally, allow completions on bare "^option " lines which do not yet have an opening quote.
Now only match empty lines and lines beginning with potential non-timestamped directives characters. Previously pressing TAB at the beginning of a transaction would try to complete non-timestamped directive.
Previously only tags and links that existed on their own line within a transaction were able to be completed. Now a link or tag can be completed anywhere, including on the transaction directive line, and including when multiple tags or links already exist on a given line. This also updates the thing-at-point defintion for beancount-link to remove the arbitrary length limit while keeping (improving) the worst-case time complexity O(n) where n is the length of the link at the point.
Now pressing TAB while the point is at the beginning of a (potentially incomplete) tag or link on its own line in a transaction attempts to complete the tag or link instead of insertings an account.
Thank you, this looks good. I had a quick look and there are a few things that I would prefer to be done in a different way. I'll be back with a detailed review ASAP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a first batch of comments. I need to look at other changes more in depth. There are several changes for which I don't understand the motivation.
@@ -178,6 +178,9 @@ from the open directive for the relevant account." | |||
(defconst beancount-date-regexp "[0-9]\\{4\\}[-/][0-9]\\{2\\}[-/][0-9]\\{2\\}" | |||
"A regular expression to match dates.") | |||
|
|||
(defconst beancount-timestamp-indent-regexp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of these defconst
is to make the code easier to read. beancount-timestamp-indent-regexp
sounds very much like the name of a regexp that matches the indentation part of a timestamp, however, it is not this and as a consequence it makes the code harder to read. I would go without this defconst
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that beancount-timestap-indent-regexp
is not the best name. Something like beancount-timestamp-relaxed-indentation-regexp
or similar may be more clear to a reader.
My purpose for using a defconst
here was because I use this regexp in multiple places, and IMO minimizing the number of times one repeats themselves in code is a good thing.
My assumption was also that defconst
evaluates the expression only once at file load time, which would be a minor (and, admittedly, probably completely negligible) optimization when the expression is used in multiple different places throughout the file and in functions that get repeatedly executed throughout the duration of the program.
(concat "^" beancount-directive-base-regexp)) | ||
|
||
(defconst beancount-directive-indent-regexp | ||
(concat "\\s-*" beancount-directive-base-regexp)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same response as above.
;; Consider: use "\\s-*[0-9]" instead of full timestamp regexp | ||
((or (looking-at-p beancount-timestamp-indent-regexp) | ||
(looking-at-p beancount-directive-indent-regexp)) | ||
0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general comment: beancount-mode
assumes an opinionated style for the ledger. In this style directives are separated by empty lines and makes it possible to provide functionality without having to implement a complete Beancount syntax lexer and grammar in regular expressions.
Can you please provide an example where this change improves editing?
The choice of using the minimum possible context to decide the indentation level (ie only the first character in a date) is deliberate: it allows to indent lines that are only partially written. Why are you changing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general comment: beancount-mode assumes an opinionated style for the ledger. In this style directives are separated by empty lines and makes it possible to provide functionality without having to implement a complete Beancount syntax lexer and grammar in regular expressions.
I think that's a fair approach, but IMO if being less opinionated on what beancount-mode properly handles is not too onerous of a development effort and does not significantly adversely affect performance, then what's the downside?
Personally, my ideal world would have beancount-mode able to behave much like bean-format
, fixing all indentation where the intention is unambiguous. I'd also like an option to normalize all intraline whitespace (i.e. squash all consecutive spaces down to a single space except for when used for alignment), but I haven't looked deeply into this yet.
In general my philosophy with code that processes (potentially loosely) formatted data is to be lenient (within reason) with what you accept as input and strict with what you output.
I have been separating transactions by empty lines, which IMO looks the best and should probably be the preferred style, but I think other directives are better when tightly grouped together. That being said, this change is more-or-less unrelated to that (that is more relevant to beancount-find-directive-extents
and friends).
Can you please provide an example where this change improves editing?
The primary driver of this change was to explicitly check for non-timestamped directives to have 0 indentation.
The choice of using the minimum possible context to decide the indentation level (ie only the first character in a date) is deliberate: it allows to indent lines that are only partially written. Why are you changing this?
The change from simply checking if the first character is a digit to checking if the line (after whitespace) began with a complete date was primarily to have "parallel structure" between the check for timestamped and non-timestamped directives for indentation purposes.
That being said, my change does not change the actual behavior for attempting to indent a line beginning with a partially completed date, as such lines would fall through to the default "0 indentation" case. And as you can see by my comment, I was not 100% convinced that "single-digit to full date" change was for the best, so I'm not going to put up a fight to keep the full date check (although I do think it's conceptually "cleaner", for lack of a better word).
I do, however, stand by the change to allow for arbitrary whitespace in front of the date/digit here, as this is a function whose purpose is to compute the appropriate indentation level of a line, and I do not think it makes sense to enforce that a line already have the appropriate indentation level when determining what its indentation level should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What the mode is trying to indent should be valid Beancount syntax. An indented date in front of a directive is not valid Beancount syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that I fundamentally disagree with you here.
A function whose purpose is to properly indent a line but bases how to indent that line on how the line is already indented seems completely flawed to me.
While I'm not a huge fan of syntactically significant whitespace (probably because my primary language is C), I can understand it from a language spec perspective--trying to keep all files similarly formatted and legible is reasonable (although I think that doesn't matter as much with personal financial ledgers which are generally only viewed by one person or perhaps a small handful of people at most).
In the case of beancount, it's worth noting that the syntactically significant whitespace is completely unnecessary for unambiguously parsing a journal. I'm by no means suggesting that the beancount syntax should be changed to ignore leading whitespace, but I do want to point out the difference between beancount and, say, python, where changing the indentation of a line fundamentally changes that line's meaning.
For an editor whose goal is to try to make editing a language with syntactically significant whitespace as easy as possible, IMO there are essentially two reasonable choices for TAB/indentation behavior:
-
TAB toggles through indentation levels, possibly with separate commands to specifically shift a line (or region) up or down by an indentation level. This is the behavior of
python-mode
, and IMO should only be the behavior for languages where it is impossible to unambiguously determine what the "proper" indentation level of a given line is, i.e. if changing the indentation level completely changes a statement's meaning. -
TAB indents the line (or region) as appropriate, regardless of the current indentation level. This is what
beancount-mode
currently does, with the exception of the case where a line happens to begin with 0 indentation. That distinction seems arbitrary to me, and IMO only serves to make editing more difficult.
The only reason the current behavior is even possible is because the beancount syntax only distinguishes between lines with leading whitespace and lines without leading whitespace, not caring about how much whitespace there is when whitespace is present. If there were multiple levels of syntactically significant indentation, then there would be no way for a given line to be automatically indented, as there wouldn't be a single correct indentation value for said line.
I'd also like to point out that despite your assertion that "what the mode is trying to indent should be valid Beancount syntax", the current indentation computation does not actually follow through with this in its behavior.
Consider the following snippet:
2021-01-01 * "Example"
Assets:Checking -1.00 USD
Expenses:Food 1.00 USD
This is clearly not valid Beancount syntax, as the postings are not indented. Despite this, trying to indent the "transaction" works as one would expect, changing it to valid Beancount syntax. (I use quotes around "transaction" here because, again, it is not valid syntax, and as far as Beancount is concerned it is not a transaction--it's not anything other than an error...).
Now I'd like you to consider to workflows, both of which I would consider reasonable yet are (at best) impeded by beancount-mode
's behavior:
-
Consider a brokerage account that holds both cash and a few different securities. As is often recommended, the user wants to have the one real account split into multiple sub-accounts by commodity type. It seems reasonable to me to want their
open
directives grouped together, without newlines separating them (and this is how Beancount's example journals are laid out):2021-01-01 open Assets:Brokerage:Cash USD 2021-01-01 open Assets:Brokerage:ABC ABC 2021-01-01 open Assets:Brokerage:XYZ XYZ
When inputting this in
beancount-mode
withelectric-indent-mode
enables (as is the default), one would type the first directive, then hit RET. Now the point is on the next line and indented. The user now types the second directive and hits RET. The second directive is still indented, despite this being invalid Beancount syntax. In order to get that second directive to be properly indented (i.e. to 0 indentation), the user would need to manually delete the indentation, either before or after inputting the directive itself. TAB doesn't help, becausebeancount-mode
assumes that the correct indentation level is non-zero, despite the fact that it unambiguously is not.beancount-mode
is actively working against the user, making it more difficult to edit their journal. The same process takes place for the next directive. -
Let's say the user has
electric-indent-mode
turned off, either because they simply don't like its behavior in the general case or to avoidbeancount-mode
from working against them as in the previous example. The workflow that they want to use is to input a complete transaction and then indent it all at once, so they first type2021-01-01 * "Example" Assets:Checking -1.00 USD posting_metadata: foo Expenses:Food 1.00 USD posting_metadata: bar
Then they try to indent the transaction either with
M-q
or highlighting the region and hitting TAB orC-M-\
. If my two-level metadata indentation changes were accepted without allowing metadata to be at zero-indentation when computing what its proper indentation level should be, the result would be the same as it is currently:2021-01-01 * "Example" Assets:Checking -1.00 USD posting_metadata: foo Expenses:Food 1.00 USD posting_metadata: bar
since the metadata would not match against the metadata regexp, and would then fall through to the "is the previous line indented" condition. Clearly this is not an improvement over the existing behavior.
All of that is to say that I feel quite strongly that beancount-compute-indentation
should completely ignore the current line's existing implementation when determining what its proper indentation level should be.
That got much longer-winded than I initially intended, so I apologize for the wall of text.
(concat "^\\s-+" beancount-metadata-base-regexp)) | ||
|
||
(defconst beancount-metadata-indent-regexp | ||
(concat "\\s-*" beancount-metadata-base-regexp)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as in the previous similar instances. Furthermore, here I don't understand the rationale for the change: you match beancount-metadata-indent-regexp
at the beginning of a line, thus the existing beancount-metadata-regexp
would work just fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The important distinction between the two regexps is +
being changed to *
, i.e. allowing the case where a metadata line is not yet indented, and I think that sharing the common part of the two regexps is important for avoiding code repetition (particularly of a somewhat opaque "raw" regular expression rather than something that already builds upon other defined regexp constants).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A metadata line not starting with whitespace is not valid Beancount syntax. I understand being lenient with what you accept as input, but a line needs to be drawn somewhere. I am sure we should not be accepting invalid Beancount syntax and try to make sense of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my response above, but additionally:
I understand being lenient with what you accept as input, but a line needs to be drawn somewhere.
The line, IMO, should be drawn after "ignoring a given line's existing indentation when determining how it should be indented". This allows us to unambiguously determine the user's intent and make editing their journal more natural and intuitive rather than fighting them for seemingly no reason at all.
What benefit comes from not being lenient in this one particular case?
I am sure we should not be accepting invalid Beancount syntax and try to make sense of it.
As mentioned in the above response, beancount-mode
already does this.
In addition to the given example about beancount-mode
accepting and making sense of the invalid Beancount syntax of non-indented postings, beancount-mode
assumes any line that begins with a number is a timestamped directive, so given a line
2
which is clearly invalid Beancount syntax, beancount-mode
considers it to be a valid timestamped directive, at least as far as its indentation computation logic goes.
(looking-at-p beancount-timestamped-directive-regexp) | ||
(looking-at-p beancount-posting-regexp)) | ||
(+ (current-indentation) beancount-transaction-indent) | ||
(current-indentation)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice improvement. I wanted to fix this since a long time. However, I think this could be simplified and made more robust:
((looking-at-p beancount-metadata-indent-regexp)
(if (and (= (forward-line -1) 0)
(looking-at-p beancount-posting-regexp))
(* 2 beancount-transaction-indent))
beancount-transaction-indent))
or something like this, ie double the indentation if the current line is a metadata line and the previous line is a posting. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your suggestion does not work since postings can have multiple lines of metadata following them, for example, the test case that I extended:
2019-01-01 * \"Example\"
transaction: metadata
#foo
^bar
also_transaction: metadata
Expenses:Example 1.00 USD
posting: metadata
also_posting: metadata
Assets:Checking -1.00 USD
second_posting: metadata
@@ -133,21 +133,71 @@ Return a list of substrings each followed by its face." | |||
:tags '(indent regress) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the tests in the same commit that extends the functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will change.
@@ -34,6 +34,7 @@ | |||
(require 'subr-x) | |||
(require 'outline) | |||
(require 'thingatpt) | |||
(require 'imenu) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beancount.el
does not use any functionality from imenu
. The require
should be in the tests only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imenu-generic-expression
is defined in imenu.el
and used in beancount.el
, so my assumption that the require was needed in beancount.el
in addition to (implicitly) beancount-tests.el
.
If this is not the case, then I'll move it to beancount-tests.el
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imenu-generic-expression
is declared as a local variable in imenu.el
, but you are updating the variable using setq
here. If imenu.el
has not been loaded yet, it will set the global value.
If you don't require imenu
, I think you should set it locally, e.g. using setq-local
...
FYI: In most cases, you would declare a variable from an external package using defvar
without an initial value:
(defvar imenu-generic-expression)
See purcell/package-lint#205 for details.
@@ -33,6 +33,7 @@ | |||
(autoload 'ido-completing-read "ido") | |||
(require 'subr-x) | |||
(require 'outline) | |||
(require 'org) ;; for org-level-x faces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure this is not needed, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is needed if beancount-mode
wants to use org's faces
Without this change, when I open my beancount journal, beancount mode loads` (along with outline-minor-mode), but the outline heading faces do not, and I receive the following error:
Invalid face reference: org-level-1 [52 times]
I assume that you load org.el
somewhere else in your init file.
@@ -490,12 +490,14 @@ With an argument move to the next non cleared transaction." | |||
;; that begin with something that looks like a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this improves. If you don't want completion while editing a string, don't invoke the completion function. What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with "don't invoke the completion function" is that we use a tab-dwim function which either toggles outline sections, attempts to complete, or indents based on context.
My goal is to be able to hit TAB to indent while editing in strings regardless of their content (in particular if the string happens to have #
or ^
in it), and IMO there's no case where a user would want beancount-mode
to try to complete anything in a string (with the exception of options), since strings are more-or-less opaque to beancount.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see where this is coming from now: you enabled completion of tags and links also now when on a dedicated line. Very welcome fix BTW, thank you! This enables completion of the sames in string literals too.
I admit I see advantages in permitting completion of tag and links in strings too. It is common to mention links or tags in strings for many different reasons, and account names too. Having auto completion for them in strings could be seen as an handy feature. If you don't want completion just put hit space or close the double quotes before pressing tab. I am almost tempted to open up account completion to more contexts rather to restrict existing completion contexts further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly about not completing in strings to fight you too hard here, but I don't personally see a need for any kind of completion in strings in Beancount. I could see it being more useful in comments than in strings.
I do, however, feel strongly that beancount-collect
should not grab matches that occur in strings (at least not in the generic case, perhaps there may be some exceptions in the future). In fact, I think that it should probably also ignore matches that occur in comments.
@@ -1,6 +1,7 @@ | |||
;;; beancount-test.el --- ERT for beancount-mode -*- lexical-binding: t -*- | |||
|
|||
;; Copyright 2019 Daniele Nicolodi <[email protected]> | |||
;; Copyright 2019 Alex Smith <[email protected]> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2019?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, that was a case of "thinking one thing but typing what I'm looking at". Will fix.
Thank you for your (in progress) review, Daniele. I'll respond to your comments individually |
;; Move before the start character if we hit it | ||
(when (eq (char-before) start-char) | ||
(backward-char)) | ||
(looking-at (concat start-char-string "[" beancount-tag-chars "]*"))))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot this based on (re-search-backward (concat "[^" beancount-tag-chars "]"))
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is a better solution.
I was so caught up in a "don't use looking-backward
" mindset that I seemed to have mentally excluded any sort of backwards regexp search and way overthought the solution.
(beancount-looking-at-tag-generic ?# "#")) | ||
|
||
(defun beancount-looking-at-link () | ||
(beancount-looking-at-tag-generic ?^ "\\^")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This functions take the same argument twice. You can use string-to-char
or char-to-string
instead.
However, I would write this like
(defun bencount--looking-at-tag ()
(save-excursion
(re-search-backward (concat "[^" beancount-tag-chars "]"))
(looking-at (concat "#[" beancount-tag-chars "]"))))
at which point having a generic function is the same lines of code than repeating it. It is probably also wise to put this kind of functions in what is conventionally the private namespace (indicated with the doubly dashed name). This has not been done consistently before, and needs to be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This functions take the same argument twice. You can use string-to-char or char-to-string instead.
This is not quite true, as in beancount-looking-at-link
the string version of the character must be escaped so that it's not treated as a regexp anchor. I initially used char-to-string
before realizing that it did not work for links.
However, I would write this like ... at which point having a generic function is the same lines of code than repeating it.
Agreed. Will change.
It is probably also wise to put this kind of functions in what is conventionally the private namespace (indicated with the doubly dashed name). This has not been done consistently before, and needs to be fixed.
I had considered this but wasn't sure whether to make just the private function in my changeset with the double dash namespace or if I should match the convention in (most of) the existing code, potentially updating the full file at once in a separate PR.
;; limit the search for the link to the 128 characters before and | ||
;; after the point. This number is chosen arbitrarily. | ||
(when (thing-at-point-looking-at (concat "\\^[" beancount-tag-chars "]+") 128) | ||
(when (beancount-looking-at-link) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! I just realized how dumb this function is.
This contains various improvements to the indentation/alignment and
completion systems.
There are some things that I left in place but were obsoleted and
could potentially be removed (e.g.
beancount-indent-transaction
hasbeen superseded by the more general
beancount-indent-directive
) or forwhich I added an option to change existing behavior but we could
potentially just only use the new behavior for (e.g. always aligning
numbers when indenting a line via
beancount-tab-dwim
--I'm not surewhy one wouldn't want to align numbers while indenting just in this
one case).
Another thing to note is that the new indentation logic assumes that
the only lines which (after potential leading whitespace) begin with a
timestamp are timestamped directives, so if/when postings with
timestamps are added to beancount this logic will need to be updated.
In general, there are a number of behavior decisions that I made that
may warrant discussion.
This probably could be broken up into multiple smaller PRs, so I can
resubmit as such if you'd like.
I'll also mention that, while I'm not new to programming in general,
I'm a complete novice at elisp, so I may have accidentally done some
non-canonical things or not followed standard elisp conventions/best
practices.
Thank you,
Alex