Over the course of multiple previous snippets, I put together several SQL queries for analyzing my browsing history.
Today, I wanted to write a script to run several of these queries in sequence to assemble a nice summary of my daily browsing histories. Maybe I’ll dump the summary into my
[[Roam Research]]
database so I can cross-link ideas with websites I’ve visited.
As a first pass, I tried to throw together a quick bash script to string the queries together. It looked a bit like this:
date=$1
if [ -z $1 ];
then
date=$(date '+%Y-%m-%d');
fi
cp ~/Library/Application\ Support/Google/Chrome/Default/History ~/Default-History
cp ~/Library/Application\ Support/Google/Chrome/Profile\ 1/History ~/Profile-1-History
function query() {
sqlite3 ~/Default-History $1;
sqlite3 ~/Profile-1-History $1;
}
echo \# YouTube Videos Watched
echo
query << SQL
select DISTINCT REPLACE(REPLACE(urls.title, " - YouTube", ""), "(1) ", ""), urls.url, "$date%"
from urls
where urls.url like "%www.youtube.com/watch%"
and datetime(last_visit_time / 1000000 + (strftime('%s', '1601-01-01')), 'unixepoch')
LIKE "$date%"
limit 50;
SQL
# Additional queries ran here...
You’ll notice a few things:
- I want to run each query on two different Chrome histories.
- I want the script to be parameterized by the date, defaulting to today if no date is specified.
If you’re proficient in Bash, you may also notice something else; it doesn’t work. The query is only used properly when I run sqlite3 on the first Chrome History, but an empty query is passed to the second query.
Rather than debug my Bash script, I switched over to Python. I used Python Fire to automatically expose all my functions as commands so within minutes I had a working CLI.
Here’s what the Python script looks like:
import datetime
import os
import subprocess
import fire
def copy_histories():
check_output = subprocess.check_output
check_output([
'cp',
os.path.expanduser('~/Library/Application Support/Google/Chrome/Default/History'),
os.path.expanduser('~/Default-History')])
check_output([
'cp',
os.path.expanduser('~/Library/Application Support/Google/Chrome/Profile 1/History'),
os.path.expanduser('~/Profile-1-History')])
def run_query(query):
call = subprocess.call
call(['sqlite3', os.path.expanduser('~/Default-History'), query])
call(['sqlite3', os.path.expanduser('~/Profile-1-History'), query])
def run(date=None):
copy_histories()
date = date or datetime.datetime.now().strftime('%Y-%m-%d')
print('## YouTube Videos Watched')
print()
run_query(f"""
select DISTINCT REPLACE(REPLACE(urls.title, " - YouTube", ""), "(1) ", "")
from urls
where urls.url like "%www.youtube.com/watch%"
and datetime(last_visit_time / 1000000 + (strftime('%s', '1601-01-01')), 'unixepoch')
LIKE "{date}%"
limit 50;
""")
# Additional queries run here...
if __name__ == '__main__':
fire.Fire()
At the command line I run this with python historyquery.py run
or python historyquery.py run --date=2020-04-26
.
With Python Fire I was able to quickly switch to a syntax I was more comfortable with, but keep the benefits of working from the context of a shell. I can now much more quickly make improvements to the script since I can operate more efficiently working in Python.
Now I have a script I can run that will produce a nice summary of my internet browsing activity for any date I specify. Running it over recent dates provides a nice walk down memory lane.