Tracking Keypresses

Nov 3, 2024

I wanted to track/record my keypresses—it would be interesting to analyze the various statistics associated with that. I wrote some code that reads directly from /dev/input to get this data, similar to the tool evtest. Firstly, we can create an event.rs file that implements a method to deserialize the data stream from any character device in /dev/input. We obviously use Rust here, because Python would have too great an overhead. For code that is activated each time a key is pressed, it needs to be lean.

use byteorder::{LittleEndian, ReadBytesExt};
use std::fs::File;
use std::io::{Cursor, Read, Result};

pub const SAVE_PATH: &str = "/home/kjc/keys.stat";

#[derive(Debug)]
pub struct Event {
    secs: u64,
    usec: u64,
    event_type: u16,
    code: u16,
    value: i32,
}

impl Event {
    pub fn read_from(file: &mut File) -> Result<Self> {
        let mut buf = vec![0_u8; 24];
        file.read(&mut buf)?;
        let mut rdr = Cursor::new(buf);
        Ok(Event {
            secs: rdr.read_u64::<LittleEndian>()?,
            usec: rdr.read_u64::<LittleEndian>()?,
            event_type: rdr.read_u16::<LittleEndian>()?,
            code: rdr.read_u16::<LittleEndian>()?,
            value: rdr.read_i32::<LittleEndian>()?,
        })
    }

    pub fn is_keypress(&self) -> bool {
        self.event_type == 1 && self.value == 1
    }

    pub fn keycode(&self) -> u16 {
        self.code
    }

    pub fn timestamp(&self) -> f64 {
        self.secs as f64 + self.usec as f64 / 1e6
    }
}

Now as I write this article, I'm thinking it might've been a prudent idea to just redirect the output from the character device to a log file using Bash commands and then to process the data later into a more usable form. Well, too late for that. I've already written all the code. It would be a waste to kill this darling.

In the main.rs file we can simply deserialize in a loop forever and write the results into a log file, discarding the irrelevant information. I considered implementing some form of compression, but only briefly, because the size of the log file is so minutely trivial in comparison to my total hard drive space that it would've be worthless—increasing only implementation complexity. Premature optimization is the root of all evil, as they say.

use byteorder::{LittleEndian, WriteBytesExt};
use std::fs::File;
use std::fs::OpenOptions;

use keys::event;

const KEYBOARD_EVENT_PATH: &str = "/dev/input/by-path/platform-i8042-serio-0-event-kbd";

fn main() {
    let mut out_file = OpenOptions::new()
        .append(true)
        .create(true)
        .open(event::SAVE_PATH)
        .unwrap();
    let mut events_file = File::open(KEYBOARD_EVENT_PATH).unwrap();
    loop {
        match event::Event::read_from(&mut events_file) {
            Err(why) => panic!("{}", why),
            Ok(event) => {
                if event.is_keypress() {
                    out_file
                        .write_f64::<LittleEndian>(event.timestamp())
                        .unwrap();
                    out_file.write_u16::<LittleEndian>(event.keycode()).unwrap();
                }
            }
        }
    }
}

Now, we can write a Python script to analyze that data in various ways. Numpy has a convenient feature here that allows us to rapidly read in the information (it would be slower to use the struct package).

import numpy as np
from datetime import datetime

with open('/home/kjc/keys.stat', 'rb') as fp:
    data = np.fromfile(fp, dtype=[('timestamp', 'd'), ('keycode', 'H')])

start = datetime.fromtimestamp(np.min(data['timestamp']))
end = datetime.fromtimestamp(np.max(data['timestamp']))
days = (end - start).days
print(start.date(), '—', end.date(), f'({days} days)')

cnt = data.shape[0]
print(f'{cnt:,} keys pressed ({cnt // days} per day)')

# OUTPUT:
# 2024-10-19 — 2024-11-03 (14 days)
# 509,602 keys pressed (36400 per day)

We can graph the deltas.

import matplotlib.pyplot as plt

deltas = data['timestamp'][1:] - data['timestamp'][:-1]
plt.plot(np.linspace(0,1, 128), np.histogram(deltas, range=(0,1), bins=128)[0])
plt.show()

It would be interesting to trawl through this data once it grows. As you can see, I've been running this software for only 14 days. I could possibly use Markov chains to "mimic" myself typing any piece of text, replicating the latencies between characters in the observed data. Perhaps I could observe how my typing patterns have changed over time.

I project that the log file will reach 130 MB in one year: I will consider implementing a compression method then. It would be stimulating. I have some ideas; clearly we must store the deltas rather than the timestamps themselves. I would probably take some inspiration from Facebook's time series database, Gorilla. Maybe I would use an rANS entropy coder—haven't done that before.