When you spend enough time on the command line, you start to notice that you use certain commands a lot. Things like cd, git, and sudo get used a lot when you use Linux as your daily driver. Inspired by this Reddit post, I decided to create a tool to visualize the data.

Streaming data

Since different shells will provide different history files, a command that parses different history files will be time-consuming to write. But most shells provide a history command that simply provides output like this:

 9989  conda activate data_analytics
 9990  ipython
 9991  make
 9992  make clean
 9993  make
 9994  bat
 9995  make
 9996  cd ~/work
 9997  cd ~/work/byui/instructor-tools/grade-trends
 9998  ls
 9999  eog output/grade_trend_all.png
10000  ipython

This can be streamed to a tool with a Unix pipe:

history | some-tool

Here’s how I did this in Python:

cmds = []
line = sys.stdin.readline()
while line:
    line = sys.stdin.readline().strip()
    matches = re.match(r"\s*[0-9]+  (.*)", line)
    if not matches:
        continue
    parts = matches.group(1).split(" ")
    cmd = parts.pop(0)

This works well enough, but in a *nix shell you can set environment variables for the duration of a command like this:

ENV_VAR=foo some-command

We don’t want to see environment variables in the output, so we need to skip those:

while re.match(r"[a-zA-Z_][a-zA-Z_0-9]*=.* ", cmd):
    cmd = parts.pop(0)

At this point, I realized that sudo “hides”, in some senses, the commands that are actually being executed. When I run sudo emacs, I’m not running sudo to run sudo; I’m running Emacs with elevated privileges, which is done with the sudo command. It might be nice to be able to strip the sudo out and see the underlying command. I added argument parsing with argparse and updated the loop that finds the actual command:

while re.match(r"[a-zA-Z_][a-zA-Z_0-9]*=.* ", cmd) or (
    args.strip_sudo and cmd == "sudo"
):
    cmd = parts.pop(0)

The full loop is now:

while line:
    line = sys.stdin.readline().strip()
    matches = re.match(r"\s*[0-9]+  (.*)", line)
    if not matches:
        continue
    parts = matches.group(1).split(" ")
    cmd = parts.pop(0)
    while re.match(r"[a-zA-Z_][a-zA-Z_0-9]*=.* ", cmd) or (
        args.strip_sudo and cmd == "sudo"
    ):
        cmd = parts.pop(0)
    if cmd:
        cmds.append(cmd)

Now we can count everything with the collections.Counter, and load the result into a pandas DataFrame:

df = pd.DataFrame(Counter(cmds).items(), columns=["command", "count"])

The number of commands to plot can be controlled with a flag:

n_largest = df.nlargest(args.n, ["count"])

Now we’re on to the plotting. Here’s the function signature:

def circle_plot(
    data: np.ndarray,
    labels: List[str],
    max_length: int=100,
    ylim_min: int=-50,
    cmap: str="viridis",
    label_padding: int=5,
    background_color: str="gray",
):
    """ Produce circular plot of data

    Args:
      data (np.ndarray): Data array to plot
      labels (List[str]): String labels for data
      max_length (int, optional): Maximum length of bars, defaults to 100
      ylim_min (int, optional):
        Minimum y-value of plot, used to tune how close the bottom of
        the bars are to each other, defaults to -50
      cmap (str, optional): Matplotlib colormap to use, defaults to 'viridis'
      label_padding (int, optional):
        Padding between labels and the end of bars, defaults to 5
      background_color (str, optional):
        Background color of plot, defaults to 'gray'
    """

We start by normalizing the data:

# Normalize data
if not isinstance(data, np.ndarray):
    data = np.array(data)

data_max: np.int64 = np.max(data)
data_min: np.int64 = np.min(data)
data_norm: np.ndarray = data.copy()
data_norm: np.ndarray = (data_norm - data_min) / (data_max - data_min)

Normalizing the data gives us values that are strictly in the interval [0, 1], mapping a value of 0 to 0, and the highest value to 1. This converts the visualization of each data point to be relative to the others. If there’s a command that’s used much more often than the next-closest count, we don’t want a lopsided plot with a single huge bar and lots of tiny bars.

Next, we compute the bars:

# Compute bar characteristics
bar_width: float = 2 * np.pi / len(data)
bar_angles: np.ndarray = np.arange(1, len(data) + 1) * bar_width

To plot the bars in a circle, we use a polar projection, which changes how the arguments to Axes.bar are handled, and then we plot the bars:

fig, ax = plt.subplots(subplot_kw={"projection": "polar"})

if isinstance(cmap, str):
	cmap = plt.get_cmap(cmap)

bars = ax.bar(
	bar_angles,
	data_norm * max_length,
	width=bar_width * 0.9,
	color=cmap(data_norm),
	zorder=10,
)

To get thing nicely scaled, a y-axis limit was empirically determined, and the ticks are removed for aesthetics:

ylim_max: float = max_length * 2.3

ax.set_ylim(ylim_min, ylim_max)
ax.set_yticks([])
ax.set_xticks([])

To plot the labels, we do some quick trigonometry and figure out where on the unit circle the bar angle will be. This lets us set the rotation and alignment.

for label, count, bar_angle, bar in zip(labels, data, bar_angles, bars):
	rot = np.degrees(bar_angle)
	if np.pi / 2 <= bar_angle <= 3 * np.pi / 2:
		rot += 180
		rot %= 360
		alignment = "right"
	else:
		alignment = "left"
	ax.text(
		bar_angle,
		bar.get_height() + label_padding,
		s=f"{label} - {count}",
		va="center",
		rotation=rot,
		rotation_mode="anchor",
		ha=alignment,
	)

Some additional aesthetics and returning the figure and axis:

ax.grid(False)
ax.set_facecolor(background_color)
fig.set_facecolor(background_color)
for spine in ax.spines.keys():
	ax.spines[spine].set_visible(False)

return fig, ax

The outputs of the script:

Running with the --strip-sudo flag:

The code is available here, if you’d like to try it out yourself!