Data Feed #4: Vectors ↗

Articles, diagrams, and news about data systems for the engineers
data-feed
Author

Elias Nema

Published

March 3, 2023

Weekly Focus: Vectors

Using open-source vector similarity extension for Postgres1:

A detailed paper from SIGMOD’203 describes how Alibaba designed and built an approximate nearest neighbor search extension for vector similarity. Great deep dive into page structures for ANN indexes, and the code is even available (though, not maintained).

Using vector functions in SingleStore’s SQL, but not clear how well the system scales since the example uses only 7000 vectors.4

An amazing use-case (as far as I’m concerned, data swamps are real) of representing individual columns in the embedding space by utilising pre-trained transformer models. Then, using those vectors to find semantically similar data within your data.5

Learning

Use Apache Iceberg in a data lake to support incremental data processing.

Access Amazon Athena in your applications using the WebSocket API.

Guide to bitwise operators in CrateDB.

Grafana Labs webinars: Reduce MTTR, build beautiful Grafana dashboards, and more.

Anomaly detection on Prometheus metrics.

So you want Change Data Capture?

Patterns for enterprise data sharing at scale.

Deep Dive

A new lecture is out from the CMU Advanced Databases course on Parallel Hash Join Algorithms. If you are into data, you should have a very good reason for not watching this playlist.

Business

Amazing (as always) write-up about using TimescaleDB in the wild, and why compression is crucial for the time-series databases.6

How Wiz used Amazon ElastiCache to improve performance and reduce costs.

How Delivery Hero uses Kubecost and Datadog to manage Kubernetes costs in the cloud.

An emerging buzzword for data platforms capable of both transactions and analytics workloads – “translytical.” A webinar from SingleStore describes precisely such platforms.7

Get Data Feed posts by ✉️ | 🐦 | 🐘