Captain's Log

2026-02-28

opencode

tags: qwen3, llama.cpp, opencode, hackernews

Here is nice little tutorial on how to setup opencode with a local model.

OpenCode + Llama.cpp Setup Guide

I found this guide on the hackernews article:

Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers

2026-02-16

Blocking Youtube shorts

tags: youtube, ublock, chrome

I'm using Chrome on macos with ublock origin lite extension.

Copy and paste the rules into Custom Filters text box. Now reload youtube and the shorts should disappear.

See:

uBlock filter list to hide all YouTube Shorts

2026-02-14

physarum-step-by-step

Finished my first project with Claude Code. Amazing tool to be honest. I learned so much!

physarum-step-by-step

2026-02-13

physarum

tags: go, slime mold, visualization

Running a go tool.

brew install go

git clone git@github.com:fogleman/physarum.git
cd physarum

go run cmd/physarum/main.go

Error: `cmd/physarum/main.go:8:2: no required module provides package github.com/fogleman/physarum/pkg/physarum: go.mod file not found in current directory or any parent directory; see 'go help modules'

go mod init physarum
go mod tidy
go run ./cmd/physarum

Now it works.

Turns out this tool does not do a realtime simulation but creates beautiful images.

2026-02-12

Simulating Physarum polycephalum slime mold

tags: generative algorithms, visualization, slime mold

Simulating Physarum polycephalum slime mold generates very cool pictures. See below.

Physarum Simulation Physarum transport model Coding Adventure: Ant and Slime Simulations

2026-02-08

72M Points of Interest

tags: venues, poi, duckdb

Interesting blog post about a dataset with 72M Points of Interest (POI) including their website.

Not only does Mark present the data but he also shows how to examine it using duckdb.

72M Points of Interest

2026-02-07

Debugging Python in vscode

Hopefully for the last time in my live this is the correct launch.json for debugging a Python script inside vscode.

{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "Python: Current File",
      "type": "debugpy",
      "request": "launch",
      "program": "${file}",
      "console": "integratedTerminal",
      "env": {
        "PYTHONPATH": "${workspaceFolder}"
      }
    }
  ]
}

2026-02-05

docling

tags: todo, ocr, ibm, docling, pdf

Docling converts messy documents into structured data and simplifies downstream document and AI processing by detecting tables, formulas, reading order, OCR, and much more.

It's github repo has over 52K stars!

How to use cli and python

2026-01-27

.sqliterc

tags: sqlite, dot file

The default config for sqlite's cli tool are not very good when displaying data in the terminal. But sqlite does support a config file called .sqliterc in your home folder.

Here is mine:

.headers on
.mode column
.nullvalue NULL
.prompt "sqlite> "
.timer on

2026-01-23

postgres_fdw

tags: postgres, aws, rds

I have been using Postgres' dblink_connect for the longest time.

Turns out there is possibly an even better way. Enter postgres_fdw. It is an supported extension even by AWS' RDS and it let's you query remote tables in normal sql.

An example:

CREATE EXTENSION postgres_fdw;

CREATE SERVER remote_pg
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (
    host 'other-db.abcdefg.us-east-1.rds.amazonaws.com',
    port '5432',
    dbname 'orders'
);

CREATE USER MAPPING FOR my_user
SERVER remote_pg
OPTIONS (
    user 'remote_user',
    password 'secret'
);

CREATE FOREIGN TABLE foreign_orders (
    id bigint,
    total numeric
)
SERVER remote_pg
OPTIONS (schema_name 'public', table_name 'orders');

A mental model for binary classifier confusion matrix

                    ACTUAL
               Positive    Negative
PREDICTED
Positive        TP           FP
Negative        FN           TN

2026-01-22

Dynamic Programming Tutorial

tags: dynamic programming, recursion

It's always fun to revisit dynamic programming. This tutorial is very well made.

How to build your own local AI stack on Linux with llama.cpp, llama-swap, LibreChat and more

tags: llama.cpp, llm, qwen, nvidia, cuda, article, huggingface, librechat, llama-swap

link

https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507/tree/main

llama.cpp does not work with the safetensors format, it works with the GGUF format. This format is optimized for quick loading and saving of models, and running models efficiently on consumer hardware.

convert safetensor to gguf

Projects that convert to gguf:

Avoid downloading models in FP32 or FP16 precision, as these unquantized formats require a lot of memory, especially for very large models.

Instead, download quantized versions of the model in the GGUF format, because they use less memory. A great starting point is the Q8_K quantization level.

Quantization Types

tags: huggingface, llm

huggingface's quantization-types

OCR with layout

tags: ocr, layout, docling, doctags

Grounded Text refers to text directly linked or anchored to specific visual regions in an image (like a bounding box), crucial for vision-language models (VLM) to understand where text is, while DocTags (Document Tags) are high-level semantic labels/metadata applied to entire documents or sections, offering what the content is about, with Grounding focusing on spatial, visual-textual alignment (e.g., a price tag on a product image) and DocTags on semantic classification (e.g., "invoice," "receipt," "contract") for better organization and retrieval in document understanding tasks.

docling document

2026-01-21

Unconventional PostgreSQL Optimizations

tags: postgres, sql

Unconventional PostgreSQL Optimizations

Magick

brew install imagemagick

magick ~/Downloads/image.jpeg -quality 50 ~/Downloads/image_small.jpeg

2026-01-20

ESPN Unofficial Public API

tags: sports, api, espn

Disclaimer: This is documentation for ESPN's undocumented public API. I am not affiliated with ESPN. Use responsibly and follow ESPN's terms of service.

public ESPN API

Backtesting.py

tags: python, finance, trading, simulation, pandas

Backtest trading strategies in Python. See backtesting.py

Also, pandas_market_calendars

2026-01-17

Docker Cheat Sheet

tags: docker, cheat sheet

docker cheat sheet

2026-01-15

Why DuckDB is my first choice for data processing

tags: duckdb, hackernews

Why DuckDB is my first choice for data processing

Ask HN: How are you doing RAG locally?

tags: RAG, hackernews, embedding

Ask HN: How are you doing RAG locally?

2026-01-14

turn off type checking

tags: vscode, python, pylance

During development the constant type checking will result in red squiggles inside the code. I find that really annoying and distraction.

In your .vscode/settings.json just add the next line.

"python.analysis.typeCheckingMode": "off"

2026-01-10

OpenCode

tags: LLM, agent, vibe coding, todo

The open source AI coding agent.

OpenCode github

Machine Learning, Statistical Inference and Induction

tags: ml, article, todo

Machine Learning, Statistical Inference and Induction

The Q, K, V Matrices

tags: todo, transformer, attention

The Q, K, V Matrices

Use Claude Code to Query 600 GB Indexes over Hacker News, ArXiv, etc.

tags: llm, search, todo, sql, fts

Show HN: Use Claude Code to Query 600 GB Indexes over Hacker News, ArXiv, etc.

Raymond Hettinger - Modern solvers: Problems well-defined are problems solved - PyCon 2019

tags: youtube, solvers, search, rl, python, tutorial

video doc

2026-01-05

Struddel

tags: music, melody, struddel

Twinkle Twinkle Little Star

note(`
<c c g g a a g@2
 f f e e d d c@2
 g g f f e e d@2
 g g f f e e d@2
 c c g g a a g@2
  f f e e d d c@2>*4
`).sound('piano')

OpenML

tags: dataset, ml, sklearn

openml

fetch_openml

Vehicle Dataset

tags: dataset, kaggle, ml

dataset

20260101

Comtrade

tags: python, comtrade

Python Comtrade

pyComtrade

20251231

postgres extensions

tags: postgres, vector db, pgvector, FTS

pgvectorscale pg_textsearch

2025-12-29

create a password

tags: cli, sh

openssl rand -base64 24

Add markdown to Google Document

tags: markdown, Google Doc

Inside a Google document click on Tool->Preferences. There Enable Markdown. Now you can Paste From Markdown.

2025-12-23

blog links

tags: blog, markdown

I have been searching for a good blog solution for awhile and finally I have found it.

Material for mkdocs github workflow pymdown extension

Instant database clones with PostgreSQL 18

tags: postgresql, sql

blog hackernews

2025-12-22

color-science

tags: python, color

repo

2025-12-21

Hands-On ML with Scikit-Learn and PyTorch

tags: python, ml, sklearn, torch, pandas, matplotlib, book

There is a new edition of my favorite ML book. This time with pytorch!

repo

numpy cheatsheet pandas cheatsheet matplotlib cheatsheet

Leap 71

tags: company, space, rocket, computational engineering

video company Lin Kayser

ResumeCV

tags: resume, yaml, pdf

Great module to create good looking resumes. The resume data is a yaml file. And that is easy to be fed into a LLM!

doc

2025-12-20

Jekyll

tags: website, blog, github, site generator

(jekyll)[https://jekyllrb.com/]

Material for MkDocs

tags: site generator, python, blog, website

repo

tutorial

blog example How To Build and Deploy a Stunning Blog for FREE using Material for MkDocs

2025-12-19

Self hosting challenges and how to limit scraper bots

tags: scraper, bot, blog, vps, self hosting, hackernews

This article prompted me to do some research of how to deal with excessive scraper bots when self hosting an app or a blog.

I got hacked, my server started mining Monero this morning.

Anubis is a Web AI Firewall Utility that weighs the soul of your connection using one or more challenges in order to protect upstream resources from scraper bots. Also see this on hackernews.

Server-Side Rate Limiting (Web Server)

Asking Gemini for a good solutions to avoid scraper bots -> gemini chat

You can configure your web server to strictly limit how fast any single IP can download pages. This makes scraping painfully slow for bots, effectively discouraging them.

If you use Nginx: Add this to your nginx.conf (http block):

limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s;

location / {
    limit_req zone=one burst=10 nodelay;
    # ... rest of config
}

This limits every IP to roughly 1 request per second. Real users won't notice, but scrapers trying to fetch 100 pages at once will get rejected (Error 503).

"Hard" Limit via Custom Script (Advanced)

If you absolutely must ensure the server shuts down after a certain bandwidth limit (e.g., 1 TB), you have to script it yourself.

Option A: Bandwidth Speed Limit (tc or wondershaper)

You can cap your server's uplink speed. For example, if you cap your upload speed to 10 Mbps, the maximum theoretical outbound traffic you can generate in a month is about 3.2 TB, making it physically impossible to exceed the 20 TB limit.

Option B: Auto-Shutdown Script

Install a tool like vnstat to monitor traffic, and write a simple cron script that checks usage every hour.

Add it to crontab to run hourly.

#!/bin/bash
# Get current monthly TX (transmit/egress) in GiB
USAGE=$(vnstat --oneline | cut -d';' -f10 | cut -d' ' -f1)

# Set limit to 1000 GiB (1 TB)
LIMIT=1000

# Compare (using integer math)
if (( $(echo "$USAGE > $LIMIT" | bc -l) )); then
    echo "Limit exceeded. Shutting down network interface."
    # Choose one:
    # ip link set eth0 down   # Kills network
    # poweroff                # Shuts down server completely
fi

htmx

tags: htmx, javascript, html, webapp, hackernews

Please just try HTMX

A quote:

Hey, I created htmx and while I appreciate the publicity, I’m not a huge fan of these types of hyperbolic articles. There are lots of different ways to build web apps with their own strengths and weaknesses. I try to assess htmx’s strengths and weaknesses here:
https://htmx.org/essays/when-to-use-hypermedia/

Also, please try unpoly:

It’s another excellent hypermedia oriented library

Edit: the article is actually not nearly as unreasonable as I thought based on the just-f*king-use template. Still prefer a chill vibe for htmx though.

See: unpoly

unidecode

tags: ascii, unicode, string, python

unidecode is a great lib for a common problem. How to make a reasonable ascii string out of unicode?

For example:

unidecode('kožušček') -> 'kozuscek'
unidecode('30 \U0001d5c4\U0001d5c6/\U0001d5c1') -> '30 km/h'

2025-12-18

Pydantic AI

tags: pydantic, Python, AI, Agents

repo

Langchain course

tags: langchain, python, AI, Agents

course

Postgresql distinct

tags: postgres, sql

Great overview of how to use the distinct keyword in PostgreSQL.

https://hakibenita.com/the-many-faces-of-distinct-in-postgre-sql

A few code examples:

CREATE TEMP TABLE tmp_employee (
    id         INT,
    name       TEXT,
    department TEXT,
    salary     INT
)
;

INSERT INTO tmp_employee (id, name, department, salary) VALUES
(30, 'Sara Roberts',     'Accounting',               13845),
(4,  'Benjamin Brown',   'Business Development',      7386),
(3,  'Carolyn Carter',   'Engineering',               8366),
(20, 'Janet Hall',       'Human Resources',            2826),
(14, 'Chris Phillips',   'Legal',                     3706),
(10, 'James Cunningham', 'Legal',                     3706),
(11, 'Richard Bradley',  'Marketing',                11272),
(2,  'Richard Fox',      'Product Management',       13449),
(25, 'Evelyn Rodriguez', 'Research and Development', 10628),
(17, 'Benjamin Carter',  'Sales',                     6197),
(24, 'Jessica Elliott',  'Services',                 14542),
(7,  'Bonnie Robertson', 'Support',                  12674),
(8,  'Jean Bailey',      'Training',                 13230)
;

-- get all unique departments
SELECT DISTINCT department FROM tmp_employee;

DISTINCT ON

-- get the highest earners per department
-- use the employee id as the tiebreaker
SELECT DISTINCT ON (department)
    *
FROM
    tmp_employee
ORDER BY
    department,
    salary DESC,
    id ASC;
;

DISTINCT FROM

DISTINCT FROM treats NULL values as real value and so comparing will get a boolean answers.

WITH old_data AS (
    SELECT 1 AS emp_id, 'Engineer' AS title UNION ALL
    SELECT 2, NULL UNION ALL
    SELECT 3, 'Manager'
),
new_data AS (
    SELECT 1 AS emp_id, 'Engineer' AS title UNION ALL
    SELECT 2, 'Analyst' UNION ALL
    SELECT 3, NULL
)
SELECT
    o.emp_id,
    o.title AS old_title,
    n.title AS new_title,
    o.title = n.title AS equals_operator,       -- this will break when one side is NULL
    o.title IS DISTINCT FROM n.title AS changed -- this works even when one side is NULL
FROM old_data o
JOIN new_data n USING (emp_id);

ARRAY_AGG

Bonus!

Aggregate all values into a json.

SELECT
    department,
    ARRAY_AGG(name) AS employees
FROM
    tmp_employee
GROUP BY
    department
;