Indentation and completion improvements #15

alexsmith1612 · 2021-05-10T01:38:33Z

This contains various improvements to the indentation/alignment and
completion systems.

There are some things that I left in place but were obsoleted and
could potentially be removed (e.g. beancount-indent-transaction has
been superseded by the more general beancount-indent-directive) or for
which I added an option to change existing behavior but we could
potentially just only use the new behavior for (e.g. always aligning
numbers when indenting a line via beancount-tab-dwim--I'm not sure
why one wouldn't want to align numbers while indenting just in this
one case).

Another thing to note is that the new indentation logic assumes that
the only lines which (after potential leading whitespace) begin with a
timestamp are timestamped directives, so if/when postings with
timestamps are added to beancount this logic will need to be updated.

In general, there are a number of behavior decisions that I made that
may warrant discussion.

This probably could be broken up into multiple smaller PRs, so I can
resubmit as such if you'd like.

I'll also mention that, while I'm not new to programming in general,
I'm a complete novice at elisp, so I may have accidentally done some
non-canonical things or not followed standard elisp conventions/best
practices.

Thank you,
Alex

Now any directive, timestamped or not, will be indented to the beginning of the line. Previously, if a directive followed a transaction without an empty line separating them, the directive would be indented as if it were part of the transaction.

Previously, metadata for a posting would be indented to the same level as the posting

Set fill-paragraph-function to this more generic function instead of to beancount-indent-transaction

- metadata indentation - directive indentation - open directive currency alignment - price and balance directive number alignment

Previously the unit test invoked (goto-char) with a position (15) past the end of the temporary buffer, and certainly not within the link. I believe this was a simple typo, as position 5 is within the link in the temporary buffer.

Previously, imenu was not being loaded when running beancount-test.el in batch mode, causing the imenu-xxx tests to fail due to undefined/void functions being invoked.

Previously there were certain completions (non-timestamped directives, options, and timestamped directives) which would insert a space afterward--different behavior from the rest of the completions. Additionally this trailing space was unconditionally inserted into the journal, even if the buffer already included the full completion-with-trailing-space. This behavior was due to the space not being included in the matching regexp, and thus not being included in the (match-beginning) (match-end) bounds. If we do want to keep the completion-adds-trailing-space behavior, the matching regexps should be updated to include the optional trailing space. Finally, allow completions on bare "^option " lines which do not yet have an opening quote.

Now only match empty lines and lines beginning with potential non-timestamped directives characters. Previously pressing TAB at the beginning of a transaction would try to complete non-timestamped directive.

Previously only tags and links that existed on their own line within a transaction were able to be completed. Now a link or tag can be completed anywhere, including on the transaction directive line, and including when multiple tags or links already exist on a given line. This also updates the thing-at-point defintion for beancount-link to remove the arbitrary length limit while keeping (improving) the worst-case time complexity O(n) where n is the length of the link at the point.

Now pressing TAB while the point is at the beginning of a (potentially incomplete) tag or link on its own line in a transaction attempts to complete the tag or link instead of insertings an account.

dnicolodi · 2021-05-10T09:54:41Z

Thank you, this looks good. I had a quick look and there are a few things that I would prefer to be done in a different way. I'll be back with a detailed review ASAP.

dnicolodi

I added a first batch of comments. I need to look at other changes more in depth. There are several changes for which I don't understand the motivation.

dnicolodi · 2021-05-11T08:32:09Z

beancount.el

@@ -178,6 +178,9 @@ from the open directive for the relevant account."
 (defconst beancount-date-regexp "[0-9]\\{4\\}[-/][0-9]\\{2\\}[-/][0-9]\\{2\\}"
  "A regular expression to match dates.")

+(defconst beancount-timestamp-indent-regexp


The purpose of these defconst is to make the code easier to read. beancount-timestamp-indent-regexp sounds very much like the name of a regexp that matches the indentation part of a timestamp, however, it is not this and as a consequence it makes the code harder to read. I would go without this defconst.

I agree that beancount-timestap-indent-regexp is not the best name. Something like beancount-timestamp-relaxed-indentation-regexp or similar may be more clear to a reader.

My purpose for using a defconst here was because I use this regexp in multiple places, and IMO minimizing the number of times one repeats themselves in code is a good thing.

My assumption was also that defconst evaluates the expression only once at file load time, which would be a minor (and, admittedly, probably completely negligible) optimization when the expression is used in multiple different places throughout the file and in functions that get repeatedly executed throughout the duration of the program.

dnicolodi · 2021-05-11T08:32:57Z

beancount.el

+  (concat "^" beancount-directive-base-regexp))
+
+(defconst beancount-directive-indent-regexp
+  (concat "\\s-*" beancount-directive-base-regexp))


Same response as above.

dnicolodi · 2021-05-11T08:43:11Z

beancount.el

+     ;; Consider: use "\\s-*[0-9]" instead of full timestamp regexp
+     ((or (looking-at-p beancount-timestamp-indent-regexp)
+          (looking-at-p beancount-directive-indent-regexp))
+      0)


As a general comment: beancount-mode assumes an opinionated style for the ledger. In this style directives are separated by empty lines and makes it possible to provide functionality without having to implement a complete Beancount syntax lexer and grammar in regular expressions.

Can you please provide an example where this change improves editing?

The choice of using the minimum possible context to decide the indentation level (ie only the first character in a date) is deliberate: it allows to indent lines that are only partially written. Why are you changing this?

As a general comment: beancount-mode assumes an opinionated style for the ledger. In this style directives are separated by empty lines and makes it possible to provide functionality without having to implement a complete Beancount syntax lexer and grammar in regular expressions.

I think that's a fair approach, but IMO if being less opinionated on what beancount-mode properly handles is not too onerous of a development effort and does not significantly adversely affect performance, then what's the downside?

Personally, my ideal world would have beancount-mode able to behave much like bean-format, fixing all indentation where the intention is unambiguous. I'd also like an option to normalize all intraline whitespace (i.e. squash all consecutive spaces down to a single space except for when used for alignment), but I haven't looked deeply into this yet.

In general my philosophy with code that processes (potentially loosely) formatted data is to be lenient (within reason) with what you accept as input and strict with what you output.

I have been separating transactions by empty lines, which IMO looks the best and should probably be the preferred style, but I think other directives are better when tightly grouped together. That being said, this change is more-or-less unrelated to that (that is more relevant to beancount-find-directive-extents and friends).

Can you please provide an example where this change improves editing?

The primary driver of this change was to explicitly check for non-timestamped directives to have 0 indentation.

The choice of using the minimum possible context to decide the indentation level (ie only the first character in a date) is deliberate: it allows to indent lines that are only partially written. Why are you changing this?

The change from simply checking if the first character is a digit to checking if the line (after whitespace) began with a complete date was primarily to have "parallel structure" between the check for timestamped and non-timestamped directives for indentation purposes.

That being said, my change does not change the actual behavior for attempting to indent a line beginning with a partially completed date, as such lines would fall through to the default "0 indentation" case. And as you can see by my comment, I was not 100% convinced that "single-digit to full date" change was for the best, so I'm not going to put up a fight to keep the full date check (although I do think it's conceptually "cleaner", for lack of a better word).

I do, however, stand by the change to allow for arbitrary whitespace in front of the date/digit here, as this is a function whose purpose is to compute the appropriate indentation level of a line, and I do not think it makes sense to enforce that a line already have the appropriate indentation level when determining what its indentation level should be.

What the mode is trying to indent should be valid Beancount syntax. An indented date in front of a directive is not valid Beancount syntax.

I think that I fundamentally disagree with you here.

A function whose purpose is to properly indent a line but bases how to indent that line on how the line is already indented seems completely flawed to me.

While I'm not a huge fan of syntactically significant whitespace (probably because my primary language is C), I can understand it from a language spec perspective--trying to keep all files similarly formatted and legible is reasonable (although I think that doesn't matter as much with personal financial ledgers which are generally only viewed by one person or perhaps a small handful of people at most).

In the case of beancount, it's worth noting that the syntactically significant whitespace is completely unnecessary for unambiguously parsing a journal. I'm by no means suggesting that the beancount syntax should be changed to ignore leading whitespace, but I do want to point out the difference between beancount and, say, python, where changing the indentation of a line fundamentally changes that line's meaning.

For an editor whose goal is to try to make editing a language with syntactically significant whitespace as easy as possible, IMO there are essentially two reasonable choices for TAB/indentation behavior:

TAB toggles through indentation levels, possibly with separate commands to specifically shift a line (or region) up or down by an indentation level. This is the behavior of python-mode, and IMO should only be the behavior for languages where it is impossible to unambiguously determine what the "proper" indentation level of a given line is, i.e. if changing the indentation level completely changes a statement's meaning.

TAB indents the line (or region) as appropriate, regardless of the current indentation level. This is what beancount-mode currently does, with the exception of the case where a line happens to begin with 0 indentation. That distinction seems arbitrary to me, and IMO only serves to make editing more difficult.

The only reason the current behavior is even possible is because the beancount syntax only distinguishes between lines with leading whitespace and lines without leading whitespace, not caring about how much whitespace there is when whitespace is present. If there were multiple levels of syntactically significant indentation, then there would be no way for a given line to be automatically indented, as there wouldn't be a single correct indentation value for said line.

I'd also like to point out that despite your assertion that "what the mode is trying to indent should be valid Beancount syntax", the current indentation computation does not actually follow through with this in its behavior.

Consider the following snippet:

2021-01-01 * "Example" Assets:Checking -1.00 USD Expenses:Food 1.00 USD

This is clearly not valid Beancount syntax, as the postings are not indented. Despite this, trying to indent the "transaction" works as one would expect, changing it to valid Beancount syntax. (I use quotes around "transaction" here because, again, it is not valid syntax, and as far as Beancount is concerned it is not a transaction--it's not anything other than an error...).

Now I'd like you to consider to workflows, both of which I would consider reasonable yet are (at best) impeded by beancount-mode's behavior:

Consider a brokerage account that holds both cash and a few different securities. As is often recommended, the user wants to have the one real account split into multiple sub-accounts by commodity type. It seems reasonable to me to want their open directives grouped together, without newlines separating them (and this is how Beancount's example journals are laid out):

2021-01-01 open Assets:Brokerage:Cash USD 2021-01-01 open Assets:Brokerage:ABC ABC 2021-01-01 open Assets:Brokerage:XYZ XYZ

When inputting this in beancount-mode with electric-indent-mode enables (as is the default), one would type the first directive, then hit RET. Now the point is on the next line and indented. The user now types the second directive and hits RET. The second directive is still indented, despite this being invalid Beancount syntax. In order to get that second directive to be properly indented (i.e. to 0 indentation), the user would need to manually delete the indentation, either before or after inputting the directive itself. TAB doesn't help, because beancount-mode assumes that the correct indentation level is non-zero, despite the fact that it unambiguously is not. beancount-mode is actively working against the user, making it more difficult to edit their journal. The same process takes place for the next directive.

Let's say the user has electric-indent-mode turned off, either because they simply don't like its behavior in the general case or to avoid beancount-mode from working against them as in the previous example. The workflow that they want to use is to input a complete transaction and then indent it all at once, so they first type

2021-01-01 * "Example" Assets:Checking -1.00 USD posting_metadata: foo Expenses:Food 1.00 USD posting_metadata: bar

Then they try to indent the transaction either with M-q or highlighting the region and hitting TAB or C-M-\. If my two-level metadata indentation changes were accepted without allowing metadata to be at zero-indentation when computing what its proper indentation level should be, the result would be the same as it is currently:

2021-01-01 * "Example" Assets:Checking -1.00 USD posting_metadata: foo Expenses:Food 1.00 USD posting_metadata: bar

since the metadata would not match against the metadata regexp, and would then fall through to the "is the previous line indented" condition. Clearly this is not an improvement over the existing behavior.

All of that is to say that I feel quite strongly that beancount-compute-indentation should completely ignore the current line's existing implementation when determining what its proper indentation level should be.

That got much longer-winded than I initially intended, so I apologize for the wall of text.

dnicolodi · 2021-05-11T08:48:22Z

beancount.el

+  (concat "^\\s-+" beancount-metadata-base-regexp))
+
+(defconst beancount-metadata-indent-regexp
+  (concat "\\s-*" beancount-metadata-base-regexp))


Same comment as in the previous similar instances. Furthermore, here I don't understand the rationale for the change: you match beancount-metadata-indent-regexp at the beginning of a line, thus the existing beancount-metadata-regexp would work just fine.

The important distinction between the two regexps is + being changed to *, i.e. allowing the case where a metadata line is not yet indented, and I think that sharing the common part of the two regexps is important for avoiding code repetition (particularly of a somewhat opaque "raw" regular expression rather than something that already builds upon other defined regexp constants).

A metadata line not starting with whitespace is not valid Beancount syntax. I understand being lenient with what you accept as input, but a line needs to be drawn somewhere. I am sure we should not be accepting invalid Beancount syntax and try to make sense of it.

See my response above, but additionally:

I understand being lenient with what you accept as input, but a line needs to be drawn somewhere.

The line, IMO, should be drawn after "ignoring a given line's existing indentation when determining how it should be indented". This allows us to unambiguously determine the user's intent and make editing their journal more natural and intuitive rather than fighting them for seemingly no reason at all.

What benefit comes from not being lenient in this one particular case?

I am sure we should not be accepting invalid Beancount syntax and try to make sense of it.

As mentioned in the above response, beancount-mode already does this.

In addition to the given example about beancount-mode accepting and making sense of the invalid Beancount syntax of non-indented postings, beancount-mode assumes any line that begins with a number is a timestamped directive, so given a line

2

which is clearly invalid Beancount syntax, beancount-mode considers it to be a valid timestamped directive, at least as far as its indentation computation logic goes.

dnicolodi · 2021-05-11T08:51:55Z

beancount.el

+                (looking-at-p beancount-timestamped-directive-regexp)
+                (looking-at-p beancount-posting-regexp))
+            (+ (current-indentation) beancount-transaction-indent)
+          (current-indentation))))


This is a nice improvement. I wanted to fix this since a long time. However, I think this could be simplified and made more robust:

((looking-at-p beancount-metadata-indent-regexp) (if (and (= (forward-line -1) 0) (looking-at-p beancount-posting-regexp)) (* 2 beancount-transaction-indent)) beancount-transaction-indent))

or something like this, ie double the indentation if the current line is a metadata line and the previous line is a posting. WDYT?

Your suggestion does not work since postings can have multiple lines of metadata following them, for example, the test case that I extended:

2019-01-01 * \"Example\" transaction: metadata #foo ^bar also_transaction: metadata Expenses:Example 1.00 USD posting: metadata also_posting: metadata Assets:Checking -1.00 USD second_posting: metadata

dnicolodi · 2021-05-11T09:53:48Z

beancount-tests.el

@@ -133,21 +133,71 @@ Return a list of substrings each followed by its face."
  :tags '(indent regress)


Please add the tests in the same commit that extends the functionality.

Ok, will change.

dnicolodi · 2021-05-11T09:55:08Z

beancount.el

@@ -34,6 +34,7 @@
 (require 'subr-x)
 (require 'outline)
 (require 'thingatpt)
+(require 'imenu)


beancount.el does not use any functionality from imenu. The require should be in the tests only.

imenu-generic-expression is defined in imenu.el and used in beancount.el, so my assumption that the require was needed in beancount.el in addition to (implicitly) beancount-tests.el.

If this is not the case, then I'll move it to beancount-tests.el.

imenu-generic-expression is declared as a local variable in imenu.el, but you are updating the variable using setq here. If imenu.el has not been loaded yet, it will set the global value.

If you don't require imenu, I think you should set it locally, e.g. using setq-local...

FYI: In most cases, you would declare a variable from an external package using defvar without an initial value:

(defvar imenu-generic-expression)

See purcell/package-lint#205 for details.

dnicolodi · 2021-05-11T09:55:59Z

beancount.el

@@ -33,6 +33,7 @@
 (autoload 'ido-completing-read "ido")
 (require 'subr-x)
 (require 'outline)
+(require 'org) ;; for org-level-x faces


I am pretty sure this is not needed, is it?

It is needed if beancount-mode wants to use org's faces

Without this change, when I open my beancount journal, beancount mode loads` (along with outline-minor-mode), but the outline heading faces do not, and I receive the following error:

Invalid face reference: org-level-1 [52 times]

I assume that you load org.el somewhere else in your init file.

dnicolodi · 2021-05-11T10:00:16Z

beancount.el

@@ -490,12 +490,14 @@ With an argument move to the next non cleared transaction."
         ;; that begin with something that looks like a


I don't understand what this improves. If you don't want completion while editing a string, don't invoke the completion function. What am I missing?

The problem with "don't invoke the completion function" is that we use a tab-dwim function which either toggles outline sections, attempts to complete, or indents based on context.

My goal is to be able to hit TAB to indent while editing in strings regardless of their content (in particular if the string happens to have # or ^ in it), and IMO there's no case where a user would want beancount-mode to try to complete anything in a string (with the exception of options), since strings are more-or-less opaque to beancount.

I see where this is coming from now: you enabled completion of tags and links also now when on a dedicated line. Very welcome fix BTW, thank you! This enables completion of the sames in string literals too.

I admit I see advantages in permitting completion of tag and links in strings too. It is common to mention links or tags in strings for many different reasons, and account names too. Having auto completion for them in strings could be seen as an handy feature. If you don't want completion just put hit space or close the double quotes before pressing tab. I am almost tempted to open up account completion to more contexts rather to restrict existing completion contexts further.

I don't feel strongly about not completing in strings to fight you too hard here, but I don't personally see a need for any kind of completion in strings in Beancount. I could see it being more useful in comments than in strings.

I do, however, feel strongly that beancount-collect should not grab matches that occur in strings (at least not in the generic case, perhaps there may be some exceptions in the future). In fact, I think that it should probably also ignore matches that occur in comments.

dnicolodi · 2021-05-11T10:00:50Z

beancount-tests.el

@@ -1,6 +1,7 @@
 ;;; beancount-test.el --- ERT for beancount-mode -*- lexical-binding: t -*-

 ;; Copyright 2019 Daniele Nicolodi <[email protected]>
+;; Copyright 2019 Alex Smith <[email protected]>


Whoops, that was a case of "thinking one thing but typing what I'm looking at". Will fix.

alexsmith1612 · 2021-05-13T05:23:07Z

Thank you for your (in progress) review, Daniele. I'll respond to your comments individually

dnicolodi · 2021-05-13T22:06:50Z

beancount.el

+      ;; Move before the start character if we hit it
+      (when (eq (char-before) start-char)
+        (backward-char))
+      (looking-at (concat start-char-string "[" beancount-tag-chars "]*")))))


Cannot this based on (re-search-backward (concat "[^" beancount-tag-chars "]")) instead?

Yes, that is a better solution.

I was so caught up in a "don't use looking-backward" mindset that I seemed to have mentally excluded any sort of backwards regexp search and way overthought the solution.

dnicolodi · 2021-05-13T22:22:03Z

beancount.el

+  (beancount-looking-at-tag-generic ?# "#"))
+
+(defun beancount-looking-at-link ()
+  (beancount-looking-at-tag-generic ?^ "\\^"))


This functions take the same argument twice. You can use string-to-char or char-to-string instead.

However, I would write this like

(defun bencount--looking-at-tag () (save-excursion (re-search-backward (concat "[^" beancount-tag-chars "]")) (looking-at (concat "#[" beancount-tag-chars "]"))))

at which point having a generic function is the same lines of code than repeating it. It is probably also wise to put this kind of functions in what is conventionally the private namespace (indicated with the doubly dashed name). This has not been done consistently before, and needs to be fixed.

This functions take the same argument twice. You can use string-to-char or char-to-string instead.

This is not quite true, as in beancount-looking-at-link the string version of the character must be escaped so that it's not treated as a regexp anchor. I initially used char-to-string before realizing that it did not work for links.

However, I would write this like ... at which point having a generic function is the same lines of code than repeating it.

Agreed. Will change.

It is probably also wise to put this kind of functions in what is conventionally the private namespace (indicated with the doubly dashed name). This has not been done consistently before, and needs to be fixed.

I had considered this but wasn't sure whether to make just the private function in my changeset with the double dash namespace or if I should match the convention in (most of) the existing code, potentially updating the full file at once in a separate PR.

dnicolodi · 2021-05-13T22:22:35Z

beancount.el

-  ;; limit the search for the link to the 128 characters before and
-  ;; after the point. This number is chosen arbitrarily.
-  (when (thing-at-point-looking-at (concat "\\^[" beancount-tag-chars "]+") 128)
+  (when (beancount-looking-at-link)


Yup! I just realized how dumb this function is.

alexsmith1612 added 20 commits May 9, 2021 20:42

Indent posting metadata lines an additional level

7017a94

Previously, metadata for a posting would be indented to the same level as the posting

Align numbers in balance and price directives in addition to postings

fdfa9c2

Align open directive currencies to number alignment + 1

b4f14e4

Add customizable variable for minimum number alignment separation

80a3e8b

Add customizable variable to enable number alignment with tab-dwim

01c5a74

Add more generic beancount-indent-directive function

013e57a

Set fill-paragraph-function to this more generic function instead of to beancount-indent-transaction

Always call indent-region on TAB if the region is active

90c54d4

Update indentation tests

aff0255

- metadata indentation - directive indentation - open directive currency alignment - price and balance directive number alignment

Fix link-at-point-003 unit test

c232302

Previously the unit test invoked (goto-char) with a position (15) past the end of the temporary buffer, and certainly not within the link. I believe this was a simple typo, as position 5 is within the link in the temporary buffer.

Explicitly require imenu in beancount.el

ee36fcc

Previously, imenu was not being loaded when running beancount-test.el in batch mode, causing the imenu-xxx tests to fail due to undefined/void functions being invoked.

Require org for org-level-x faces

2489f46

Limit non-timestamped directives completions

7d07c48

Now only match empty lines and lines beginning with potential non-timestamped directives characters. Previously pressing TAB at the beginning of a transaction would try to complete non-timestamped directive.

Move posting completions after tag and link completions

c9725e3

Now pressing TAB while the point is at the beginning of a (potentially incomplete) tag or link on its own line in a transaction attempts to complete the tag or link instead of insertings an account.

Don't attempt completion inside strings except for options

c880cf6

Don't collect matches inside strings

4102a12

Add more completion tests

a721f15

Update copyright and author information

cc8ea35

dnicolodi reviewed May 11, 2021

View reviewed changes

dnicolodi reviewed May 13, 2021

View reviewed changes

TRSx80 mentioned this pull request Jul 27, 2024

Posting metadata indentation #55

Open

		@@ -133,21 +133,71 @@ Return a list of substrings each followed by its face."
		:tags '(indent regress)

		@@ -490,12 +490,14 @@ With an argument move to the next non cleared transaction."
		;; that begin with something that looks like a

Indentation and completion improvements #15

Are you sure you want to change the base?

Indentation and completion improvements #15

Conversation

alexsmith1612 commented May 10, 2021

dnicolodi commented May 10, 2021

dnicolodi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnicolodi May 11, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexsmith1612 commented May 13, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnicolodi May 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnicolodi May 11, 2021 •

edited

Loading

dnicolodi May 13, 2021 •

edited

Loading