Digging into xonsh history backends
Motivation
I want to write a per-directory-history xontrib for xonsh, like the one I used for zsh. The piece of information that that I need to do this is missing from xonsh history entries right now: the working directory a historical command was executed in. jimhester/per-directory-history tracks this by hooking into zsh's history searching and creation commands and putting each command in two history files, a global one and one specifically for the directory the command was executed in. I want to write my xontrib in the most xonshious (my word for if something works with the xonsh philosophy) way, so I don't want to rip this implementation scheme from per-directory-history
and jam it into xonsh where there's a better way. So I have to see where I have to collect, store, and read this metadata.
Some ideas I have so far are:
- add a new history backend that writes entries with the additional metadata of wherever the command was executed
- essentially step (1), but instead of adding a new history backend, augment whatever history backend is in use with this new functionlity (composition by way of monkey-patching)
- listen to some existing hooks/events for xonsh history lookup and additions and add the functionality there
Or some combination of the above, depending on what I find.
This post documents my exploration of how history is implemented in xonsh.
I'll pay particular attention to how history backends work with the shell backend abstraction so that I can write a xontrib that is as agnostic as possible about the shell implementation in use (ptk
, ptk2
, readline
, jupyter
, etc.).
Reading the existing documentation first
So I figured that I should read any existing documentation first, since it's possible that:
- The xonsh docs already include a section that tell me how to do this or something close to it
- I might find learn something that I realize is undocumented, and then I can contribute that back to the project docs
I found three documents dealing with history on xon.sh:
- Tutorial: History - explains the richer model of history that xonsh offers, and introduces
history
command usage - Tutorial: Writing Your Own History Backend - walks through authoring a new history backend with a CouchDB-backed history backend and replacing the default history backend with this new one
- History API -
While each of these is good at doing what it says, notice that none of them discusses how history backends are instantiated or how history entries are constructed during shell execution. The History API docs come closest, but that's cheating because those docs are autogenerated from docstrings in the Python source for xonsh.
How history entries are managed
Since there is no smoking gun in the Xonsh docs talking about how history backends are created and where the components of history entries come from, I decided I have to dig into the xonsh code now rather than later.
Rather than just explain when and how xonsh creates new history entries (which I will do some of), I also want to explain how I came to this understanding, since it's incredibly unlikely you're reading this doc just to learn how to write a clone of jimhester/per-directory-history`.
xonsh has support for multiple history backends, as we know. It ships with 3 backend implementations: history.json.JsonHistory and history.sqlite.SqliteHistory.
These backends are implementations of the history backend abstraction history.base.History. history.base.History
doesn't do anything useful on its own - it is just inherited by implementations and defines the things the xonsh shell expects a history backend to be able to do:
append
(add something to the history)flush
(force whatever is in memory to persist to the backend's storage, such as disk)items
(getting items for the current history session)all_items
(getting... all the items)info
(providing shell history info)run_gc
(garbage collecting).- It also allows list-like behavior via index access and slicing with getitem.
The fact that history backends implement history.base.History
is our first clue into how xonsh backends work. This fact means the xonsh shell does not interact directly with a history backend, so the shell's code doesn't know what backend it's working with - this is handled by our good friend polymorphism. For understanding how history entries are created, this establishes some constraints on what a history backend can accept as input - if the central part of the xonsh shell's code is interacting with a unique history backend through a generic abstraction, that unique history backend cannot use input that isn't passed into the generic abstraction. In other words, the xonsh shell gives a particular history item data structure to every history backend, no matter how special that history backend is, and if we want the history backend to be able to act on some other piece of data (such as the working directory the history item was executed in!), we have to alter that data structure.
The history entry data structure
I had trouble finding where these entries were defined and where they were append
ed to the history backend, but I soon realized I could drop an ipdb
break statement into my active history backend's append
method (JsonHistory.append
) and use the debugger's where
command to get a stacktrace, leading me directly to where xonsh appends history to the backend. I started up my debuggified xonsh, ran a command, watched as it paused in ipdb
, and got the traceback:
(Note that you should make sure $XONSH_DEBUG
is on or, alternatively, install xonsh
as an editable package to avoid almalgamation and can see your changes right away without re-running setup.py
.)
eddie@eddie-ubuntu ~ $ echo 'hey'
hey
> /home/eddie/source/xonsh/xonsh/history/json.py(353)append()
352 import ipdb; ipdb.set_trace()
--> 353 self.buffer.append(cmd)
354 self._len += 1 # must come before flushing
ipdb> where
/home/eddie/.virtualenvs/xonsh/bin/xonsh(7)<module>()
5 __file__ = '/home/eddie/source/xonsh/scripts/xonsh'
6 with open(__file__) as f:
----> 7 exec(compile(f.read(), __file__, 'exec'))
/home/eddie/source/xonsh/scripts/xonsh(4)<module>()
2
3 from xonsh.main import main
----> 4 main()
/home/eddie/source/xonsh/xonsh/main.py(402)main()
401 args = premain(argv)
--> 402 return main_xonsh(args)
403 except Exception as err:
/home/eddie/source/xonsh/xonsh/main.py(431)main_xonsh()
430 try:
--> 431 shell.shell.cmdloop()
432 finally:
/home/eddie/source/xonsh/xonsh/ptk2/shell.py(194)cmdloop()
193 line = self.precmd(line)
--> 194 self.default(line)
195 except (KeyboardInterrupt, SystemExit):
/home/eddie/source/xonsh/xonsh/base_shell.py(375)default()
374 tee_out = tee.getvalue()
--> 375 self._append_history(inp=src, ts=[ts0, ts1], tee_out=tee_out)
376 self.accumulated_inputs += src
/home/eddie/source/xonsh/xonsh/base_shell.py(410)_append_history()
409 if hist is not None:
--> 410 hist.append(info)
411 hist.last_cmd_rtn = hist.last_cmd_out = None
> /home/eddie/source/xonsh/xonsh/history/json.py(353)append()
352 import ipdb; ipdb.set_trace()
--> 353 self.buffer.append(cmd)
354 self._len += 1 # must come before flushing
Maybe that wouldn't have been so hard to track down manually, but history is appended to in BaseShell.default()
with a method called BaseShell._append_history()
.
So what kind of information is passed to _append_history
?
def default(self, line):
"""Implements code execution."""
line = line if line.endswith("\n") else line + "\n"
src, code = self.push(line)
if code is None:
return
events.on_precommand.fire(cmd=src)
env = builtins.__xonsh__.env
hist = builtins.__xonsh__.history # pylint: disable=no-member
ts1 = None
enc = env.get("XONSH_ENCODING")
err = env.get("XONSH_ENCODING_ERRORS")
tee = Tee(encoding=enc, errors=err)
try:
ts0 = time.time()
run_compiled_code(code, self.ctx, None, "single")
ts1 = time.time()
if hist is not None and hist.last_cmd_rtn is None:
hist.last_cmd_rtn = 0 # returncode for success
except XonshError as e:
print(e.args[0], file=sys.stderr)
if hist is not None and hist.last_cmd_rtn is None:
hist.last_cmd_rtn = 1 # return code for failure
except Exception: # pylint: disable=broad-except
print_exception()
if hist is not None and hist.last_cmd_rtn is None:
hist.last_cmd_rtn = 1 # return code for failure
finally:
ts1 = ts1 or time.time()
tee_out = tee.getvalue()
self._append_history(inp=src, ts=[ts0, ts1], tee_out=tee_out)
self.accumulated_inputs += src
if (
tee_out
and env.get("XONSH_APPEND_NEWLINE")
and not tee_out.endswith(os.linesep)
):
print(os.linesep, end="")
tee.close()
self._fix_cwd()
if builtins.__xonsh__.exit: # pylint: disable=no-member
return True
In the finally
block, we see inp
is src
, which, after digging around a big into what happens above this call, appears to be the string that was typed into the command prompt, as opposed to the code
, which is the xonsh code that was compiled and run (successfully or not) from compiling this src
. Interestingly, this means we are typing in source code each time we enter text the xonsh REPL, and xonsh is compiling/running it. The essential piece of a history entry is a bit of uncompiled source code (like ls -alh
or import sys
)!
Let's follow an ls
command entry from the prompt through BaseShell.default()
and the code that appends the entry to history.
The code block picks up just after I've entered the ls
command at the prompt.
> /home/eddie/source/xonsh/xonsh/base_shell.py(348)default()
347 src, code = self.push(line)
--> 348 if code is None:
349 return
ipdb> code
<code object <module> at 0x7f3682be44b0, file "/home/eddie/.virtualenvs/xonsh/lib/python3.7/site-packages/xontrib/fzf-widgets.xsh", line 1>
ipdb> src
'ls\n'
Note that code
is apparently wrapped in some non-ls
xontrib code I have installed. I'm unsure exactly why right now.
But note that src
is the ls
command I typed in, followed by a newline.
Once we get down to the actual appending, we see that ts0
and ts1
are the start and end timestamps of the code's execution. tee_out
is simply the output of the command.
--> 377 self._append_history(inp=src, ts=[ts0, ts1], tee_out=tee_out)
378 self.accumulated_inputs += src
379 if (
380 tee_out
381 and env.get("XONSH_APPEND_NEWLINE")
382 and not tee_out.endswith(os.linesep)
383 ):
384 print(os.linesep, end="")
385 tee.close()
386 self._fix_cwd()
387 if builtins.__xonsh__.exit: # pylint: disable=no-member
388 return True
389
ipdb> src
'ls\n'
ipdb> ts0
1560283660.1879137
ipdb> ts1
1560283660.3324323
Let's step into _append_history()
:
def _append_history(self, tee_out=None, **info):
"""Append information about the command to the history.
This also handles on_postcommand because this is the place where all the
information is available.
"""
hist = builtins.__xonsh__.history # pylint: disable=no-member
info["rtn"] = hist.last_cmd_rtn if hist is not None else None
tee_out = tee_out or None
last_out = hist.last_cmd_out if hist is not None else None
if last_out is None and tee_out is None:
pass
elif last_out is None and tee_out is not None:
info["out"] = tee_out
elif last_out is not None and tee_out is None:
info["out"] = last_out
else:
info["out"] = tee_out + "\n" + last_out
events.on_postcommand.fire(
cmd=info["inp"], rtn=info["rtn"], out=info.get("out", None), ts=info["ts"]
)
if hist is not None:
hist.append(info)
hist.last_cmd_rtn = hist.last_cmd_out = None
It isn't the most exciting code. It is really just a matter of adding return code information for failed commands and, if available, the output of the command, to the info
(history entry) provided to the backend. As a funny aside most of this method is a heuristic for deciding whether to use tee output or last_cmd_out
from the history backend, which last_cmd_out
seems to be an unused property in at least all the built-in history backends. Would be interesting to know why it ever existed at all!
The crucial thing we learn here, though, is that info
is effectively what we've been calling the history entry. It is the "packet" (concretely, a dict
) of information that is appended to the history. It defines what our history backend can save, delete, search, manipulate, etc. So any additional information we would need to add for our history backend would have to be added to info
.
Let's take a look at the info
object for two different cases. In the first, I'll call ls
in a directory with exactly one empty regular file: test
, and in the second I'll call grep something test
in the same directory. The ls
call will provide a successful return value and the grep
call will not (since test
will be empty).
Calling ls
:
{'inp': 'ls\n', 'ts': [1560285242.9592671, 1560285243.0506482], 'rtn': 0, 'out': 'test\n'}
Calling grep something test
:
{'inp': 'grep something test\n', 'ts': [1560285373.0232306, 1560285373.125136], 'rtn': 1}
There you have it - all the information available to a history backend's append()
method as far as I can tell.
Thoughts on where I should go
So I've been digging around in here to ultimately change what history items are loaded when a user interactively scrolls through the history, uses the history
command, etc., with the aim of showing only the history items that are associated with the current working directory. To do that, I have to get cwd
information into each history item.
To fast-forward a bit, I've now done that, and it's pretty simple, though it did require a change to the xonsh source code:
diff --git a/xonsh/base_shell.py b/xonsh/base_shell.py
index b7e9aff2..7088427f 100644
--- a/xonsh/base_shell.py
+++ b/xonsh/base_shell.py
@@ -393,6 +393,8 @@ class BaseShell(object):
"""
hist = builtins.__xonsh__.history # pylint: disable=no-member
info["rtn"] = hist.last_cmd_rtn if hist is not None else None
+ if builtins.__xonsh__.env.get("XONSH_STORE_CWD"):
+ info['cwd'] = os.getcwd()
tee_out = tee_out or None
last_out = hist.last_cmd_out if hist is not None else None
if last_out is None and tee_out is None:
Luckily, when I had asked whether such a xontrib as per-directory-history
existed yet, xonsh creator Anthony Scopatz told me he'd be up for modifying the history mechanism to support this kind of xontrib, so we're good here.
The next question I had was how I could make history:
- aware of this new information
- optionally able to use this information by installing a xontrib
- hopefully prompt-backend agnostic
To make history aware of this new info, I had to alter the history backends - each history backend has a different way of handling the attributes of history items. I decided to follow a depth-first way of experimenting, hoping that if I got my history functionality working with JsonHistory
, probably the most commonly used backend, I could either figure out how to get it working with other backends, or (less good) just make my xontrib available to people using the JsonHistory
backend.
Next, I looked at where history strings are loaded by xonsh, so that I could start limiting the items loaded to those that matched by cwd
. My thought was that each time history was searched by the user, by whatever mechanism, if I found the point where history strings were loaded, I could filter out those that didn't match.
I thought that creating a history backend with an overridden method would help since custom history backends are easily pluggable with the XONSH_HISTORY_BACKEND
environment varible, thus making any solution that used a custom one pretty easily installable via a xontrib.
Unfortunately there was no clear and easy way to override the history backend functionality to filter out history entries on arbitrary criteria, so I added yet another thing to the xonsh source:
diff --git a/xonsh/history/json.py b/xonsh/history/json.py
index 50b6326b..7313cfc9 100644
--- a/xonsh/history/json.py
+++ b/xonsh/history/json.py
@@ -328,6 +328,7 @@ class JsonHistory(History):
--- a/xonsh/history/json.py
--- a/xonsh/history/json.py
+++ b/xonsh/history/json.py
@@ -328,6 +328,7 @@ class JsonHistory(History):
self.tss = JsonCommandField("ts", self)
self.inps = JsonCommandField("inp", self)
self.outs = JsonCommandField("out", self)
+ self.cwds = JsonCommandField("cwd", self)
self.rtns = JsonCommandField("rtn", self)
def __len__(self):
@@ -382,10 +383,11 @@ class JsonHistory(History):
def items(self, newest_first=False):
"""Display history items of current session."""
if newest_first:
- items = zip(reversed(self.inps), reversed(self.tss))
+ items = zip(
+ reversed(self.inps), reversed(self.tss), reversed(self.cwds))
else:
- items = zip(self.inps, self.tss)
- for item, tss in items:
+ items = zip(self.inps, self.tss, self.cwds)
+ for item, tss, _ in items:
yield {"inp": item.rstrip(), "ts": tss[0]}
def all_items(self, newest_first=False, **kwargs):
@@ -413,10 +415,16 @@ class JsonHistory(History):
if newest_first:
commands = reversed(commands)
for c in commands:
- yield {"inp": c["inp"].rstrip(), "ts": c["ts"][0]}
+ if self._include_history_item(c):
+ yield {"inp": c["inp"].rstrip(), "ts": c["ts"][0]}
# all items should also include session items
yield from self.items()
+ def _include_history_item(self, item):
+ """Whether to include the history item.
+ Allows filtering history results by subclass."""
+ return True
+
In short, this diff just adds a method that checks whether a history item should be used, and in the default case (the JsonHistory
base class), it simply allows all history items.
In my xontrib, I created a custom history backend that performed the filtering I wanted:
class JsonPerDirectoryHistory(JsonHistory):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.use_local_history = True
def _include_history_item(self, item):
run_in_terminal(lambda: print(f'Got item {item}'))
run_in_terminal(lambda: print(f'Use local history: {self.use_local_history}'))
if self.use_local_history and item.get('cwd') and os.getcwd() == item.get('cwd'):
run_in_terminal(lambda: print('Using item'))
return True
run_in_terminal(lambda: print('Not using item'))
return False
Notice in this that there are some Prompt Toolkit-specific run_in_terminal
print
calls, which I've added just for very verbose logging while developing. I'd remove these when releasing this xontrib, assuming I keep support for non-Prompt Toolkit shells.
In the xontrib, I set a prompt_toolkit2
keybinding to switch this functionality on and off, and to tell the user what mode they've switched to:
import os
from sys import stdout
from prompt_toolkit import keys, print_formatted_text
from prompt_toolkit.application import run_in_terminal
from builtins import __xonsh__
from xonsh.history.json import JsonHistory
from xonsh.platform import ptk_shell_type
def toggle_per_directory_history():
if isinstance(__xonsh__.history, JsonPerDirectoryHistory):
hist = __xonsh__.history
hist.use_local_history = not hist.use_local_history
if hist.use_local_history:
return 'local'
else:
return 'global'
@events.on_ptk_create
def custom_keybindings(bindings, **kw):
def do_nothing(func):
pass
if ptk_shell_type() == 'prompt_toolkit2':
binder = bindings.add
else:
binder = bindings.registry.add_binding
key = ${...}.get('PER_DIRECTORY_HISTORY_TOGGLE')
@binder(key)
def switch_between_global_and_local_history(_):
new_hist_type = toggle_per_directory_history()
run_in_terminal(lambda: print(f'Switching to {new_hist_type} history.'))
Finally, I set my history backend to my custom one in my .xonshrc
and turned on per-directory history:
from xontrib.per_directory_history import JsonPerDirectoryHistory
$XONSH_HISTORY_BACKEND = JsonPerDirectoryHistory
$XONSH_STORE_CWD = True
Initial results
Failure, mostly. Upon opening a gnome-terminal
instance, I saw the debugging messages printed from my history backend, which was nice:
Got item {'cwd': '/home/eddie/source/xonsh', 'inp': 'ls\n', 'rtn': 0, 'ts': [1560543198.0524652, 1560543198.1038635]}
Use local history: True
Not using item
Got item {'cwd': '/home/eddie/source/xonsh', 'inp': 'z xonsh\n', 'rtn': 0, 'ts': [1560543197.440549, 1560543197.4461908]}
Use local history: True
Not using item
Got item {'cwd': '/home/eddie', 'inp': 'cd ..\n', 'rtn': 0, 'ts': [1560543047.8660042, 1560543047.8693159]}
Use local history: True
Using item
Got item {'cwd': '/home/eddie/test', 'inp': 'fancy mccheeese\n', 'rtn': 1, 'ts': [1560543038.735667, 1560543039.5000844]}
Use local history: True
Not using item
Got item {'cwd': '/home/eddie/test', 'inp': 'cd test\n', 'rtn': 0, 'ts': [1560543024.288666, 1560543024.2920816]}
Use local history: True
Not using item
Got item {'cwd': '/home/eddie', 'inp': 'ls\n', 'rtn': 0, 'ts': [1560543021.7835386, 1560543021.8042111]}
Use local history: True
Using item
Got item {'cwd': '/home/eddie', 'inp': 'ls\n', 'rtn': 0, 'ts': [1560543020.0776978, 1560543020.1177895]}
Use local history: True
Using item
Got item {'cwd': '/home/eddie/test', 'inp': 'cd test\n', 'rtn': 0, 'ts': [1560542986.1126633, 1560542986.1164727]}
Use local history: True
Not using item
My history backend was apparently being used to load existing history strings, and it was only returning those that matched the cwd
, which, in a new gnome-terminal
for me is /home/eddie
. Notice how commands with cwd
info that matches /home/eddie
are the only history items being used.
Cool. So I decided to switch to another directory. If things are working as expected, I should be able to enter history commands, go back through the history, and only get commands for this new directory. Before entering any commands in this directory, I shouldn't see any history items!
But I did. I could scroll back through all the history items that my command output just said were being used. What's more, if I switched to yet another directory, I could access all the commands I'd entered in the current prompt session since opening it.
Why???
This may be specific to Prompt Toolkit, but I don't know yet. It appears xonsh uses history backends in conjunction with prompt toolkit like this:
- xonsh spins up a new shell instance in
xonsh/shell.py
when a new shell process is started - xonsh creates a Prompt Toolkit instance and hands it all the history data up to this point, which, in the case of
JsonHistory
, is all the history in our JSON history files - Prompt Toolkit runs the prompt
- Each time a command is entered, Prompt Toolkit feeds this command to the xonsh shell, which appends the command history to the history backend in use
- Indepedently, Prompt Toolkit maintains its own searchable/scrollable record of the history since the Prompt Toolkit instance was created
- The next time a shell is loaded, the previous history is given to Prompt Toolkit from the xonsh history backend.
Do you see my problem? I had indeed changed what history items are loaded by the history backend, but I hadn't changed anything about what the Prompt Toolkit history mechanism does once a shell is up and running.
Where do I go from here?
I am going to bring this post to a close since it contains lots of cool info about how xonsh history backends work, but I will pick back up on my own historical exploits in another post.