Commit Graph

63 Commits

Author SHA1 Message Date
Mohamed Bassem
347793ada2 deps: upgrade tesseract to v7 2025-12-26 23:59:52 +00:00
Mohamed Bassem
f5c32d940e feat: use reddit API for metadata extraction. Fixes #1853 #1883 2025-12-13 23:34:19 +00:00
Mohamed Bassem
c6f93b3b9b fix: migrate to metascraper-x from metascraper-twitter 2025-12-08 00:36:39 +00:00
Mohamed Bassem
e3cc546363 fix: better extraction for youtube thumbnails. #2204 2025-12-07 11:43:14 +00:00
Mohamed Bassem
86a4b39665 feat: Add automated bookmark backup feature (#2182)
* feat: Add automated bookmark backup system

Implements a comprehensive automated backup feature for user bookmarks with the following capabilities:

Database Schema:
- Add backupSettings table to store user backup preferences (enabled, frequency, retention)
- Add backups table to track backup records with status and metadata
- Add BACKUP asset type for storing compressed backup files
- Add migration 0066_add_backup_tables.sql

Background Workers:
- Implement BackupSchedulingWorker cron job (runs daily at midnight UTC)
- Create BackupWorker to process individual backup jobs
- Deterministic scheduling spreads backup jobs across 24 hours based on user ID hash
- Support for daily and weekly backup frequencies
- Automated retention cleanup to delete old backups based on user settings

Export & Compression:
- Reuse existing export functionality for bookmark data
- Compress exports using Node.js built-in zlib (gzip level 9)
- Store compressed backups as assets with proper metadata
- Track backup size and bookmark count for statistics

tRPC API:
- backups.getSettings - Retrieve user backup configuration
- backups.updateSettings - Update backup preferences
- backups.list - List all user backups with metadata
- backups.get - Get specific backup details
- backups.delete - Delete a backup
- backups.download - Download backup file (base64 encoded)
- backups.triggerBackup - Manually trigger backup creation

UI Components:
- BackupSettings component with configuration form
- Enable/disable automatic backups toggle
- Frequency selection (daily/weekly)
- Retention period configuration (1-365 days)
- Backup list table with download and delete actions
- Manual backup trigger button
- Display backup stats (size, bookmark count, status)
- Added backups page to settings navigation

Technical Details:
- Uses Restate queue system for distributed job processing
- Implements idempotency keys to prevent duplicate backups
- Background worker concurrency: 2 jobs at a time
- 10-minute timeout for large backup exports
- Proper error handling and logging throughout
- Type-safe implementation with Zod schemas

* refactor: simplify backup settings and asset handling

- Move backup settings from separate table to user table columns
- Update BackupSettings model to use static methods with users table
- Remove download mutation in favor of direct asset links
- Implement proper quota checks using QuotaService.checkStorageQuota
- Update UI to use new property names and direct asset downloads
- Update shared types to match new schema

Key changes:
- backupSettingsTable removed, settings now in users table
- Backup downloads use direct /api/assets/{id} links
- Quota properly validated before creating backup assets
- Cleaner separation of concerns in tRPC models

* migration

* use zip instead of gzip

* fix drizzle

* fix settings

* streaming json

* remove more dead code

* add e2e tests

* return backup

* poll for backups

* more fixes

* more fixes

* fix test

* fix UI

* fix delete asset

* fix ui

* redirect for backup download

* cleanups

* fix idempotency

* fix tests

* add ratelimit

* add error handling for background backups

* i18n

* model changes

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-11-29 14:53:31 +00:00
Mohamed Bassem
cc8fee0d28 deps: upgrade hono and playwright 2025-11-16 12:32:44 +00:00
Mohamed Bassem
391af8a523 deps: Upgrade typescript to 5.9 2025-11-16 12:28:28 +00:00
Mohamed Bassem
b28cd03a4a refactor: Allow runner functions to return results to onComplete 2025-11-09 20:13:39 +00:00
Mohamed Bassem
b63a49fc39 fix: Stricter SSRF validation (#2082)
* fix: Stricter SSRF validation

* skip dns resolution if running in proxy context

* more fixes

* Add LRU cache

* change the env variable for internal hostnames

* make dns resolution timeout configerable

* upgrade ipaddr

* handle ipv6

* handle proxy bypass for request interceptor
2025-11-02 17:19:28 +00:00
Mohamed Bassem
e43c7e0f1c deps: Upgrade metascraper plugins 2025-10-26 11:28:55 +00:00
Mohamed Bassem
6d234de8c9 deps: Upgrade metascraper-readability 5.49.6 2025-10-26 11:21:35 +00:00
Mohamed Bassem
851d3e292f fix: fix bundling liteque in the workers 2025-09-14 20:39:11 +00:00
Mohamed Bassem
8d32055485 refactor: Move callsites to liteque to be behind a plugin 2025-09-14 18:16:57 +00:00
Mohamed Bassem
be2646ec1d fix: Respect wal mode for the queue db 2025-08-30 17:08:41 +00:00
MohamedBassem
52d018c872 feat: Export prometheus metrics from the workers 2025-08-22 21:20:37 +03:00
MohamedBassem
b94896a0f8 refactor: Extract meilisearch as a plugin 2025-07-27 19:37:11 +01:00
MohamedBassem
77ae89b222 chore: More turbo fixes 2025-07-27 15:06:22 +01:00
MohamedBassem
8f1cb065d7 fix: Ensure that all packages are ESM packages 2025-07-27 14:36:16 +01:00
Mohamed Bassem
a441a67775 deps: Upgrade vite 2025-07-26 21:32:40 +00:00
Mohamed Bassem
2cce45b7ed fix: Run workers in prod without tsx. Fixes #1673 2025-07-19 14:35:48 +00:00
Mohamed Bassem
360ef9dbbe feat: Add proper proxy support. fixes #1265 2025-07-13 01:05:54 +00:00
Mohamed Bassem
6b77736b61 deps: Upgrade typescript to 5.8 2025-07-12 23:06:25 +00:00
Mohamed Bassem
f4436e1999 deps: Upgrade drizzle 2025-07-12 22:39:46 +00:00
Mohamed Bassem
9fb3ef6f6d fix: Prioritize crawling user added links over bulk imports. fixes #1717 2025-07-12 13:06:46 +00:00
David Woods
7cc4b08aab feat(workers): adding a local metascraper plugin for Reddit posts (#1302)
* chore: metascraper 5.x comes with its own types, including @types/metascraper is now redundant; also updating to latest versions of metascraper libraries

* feat (workers): creating a local metascraper plugin for Reddit posts

In the past, the preview images for bookmarks from Reddit links were
poorly chosen. Reddit does not use opengraph tags, so metascraper-images
simply looked for all images on the page and returned the first. This
tended to be the profile picture for the poster for the Reddit link.

This new plugin, using the existing metascraper framework, provides a
better selection of image for the bookmark when the URL domain is
'reddit'.

In addition, recent changes (I believe this was a side effect of adding
the metascraper-author and/or the metascaper-publisher plugins, but it
could also be related to the metascraper-readibility plugin) broke what
used to be a good choice of bookmark title. Previously, titles looked
like 'Tinyauth just reached 1000 stars! : r/selfhosted' with both thread
title and subreddit mentioned. After this update, all Reddit posts now
have the same title: 'The heart of the internet'.

To return to the better format, this new metascraper-reddit plugin now
attempts to retrieve the better title from reddit URLs. Note that in
order to gain precendence in title selection, the 'metascraperReddit()'
inclusion in the crawlerWorkers.ts metascraper instantiation list had to
be moved above metascraperReadability().

* chore: updated Hoarder in text to Karakeep

* chore: update metascraper versions

fix for metascraper types has been merged; the expect-error comment can
be removed

* chore: merge with master

---------

Co-authored-by: Mohamed Bassem <me@mbassem.com>
2025-06-22 21:14:43 +01:00
Mael
c70d64d4cd feat(workers): migrate from puppeteer to playwright (#1296)
* feat: convert to playwright

Convert crawling to use Playwright instead of Chrome.

- Update Dockerfile to include Playwright
- Update crawler worker to use Playwright API
- Update dependencies

* feat: convert from Puppeteer to Playwright for crawling

* feat: update docker-compose

* use separate browser context for better isolation

* skip chrome download in linux script

* readd the stealth plugin

---------

Co-authored-by: Mohamed Bassem <me@mbassem.com>
2025-06-22 18:08:21 +01:00
xuatz
d5e2973dce chore: migrate away from eslint to oxlint (#1642)
* chore: migrate away from eslint to oxlint

* revert turbo task name lint

* it seems like we can remove the seemingly default globals
2025-06-22 12:29:30 +01:00
Mohamed Bassem
f257a5ba95 deps: Upgrade readability to 0.6 & adblocker to 2.5.1 2025-04-21 02:18:13 +00:00
Mohamed Bassem
cf97bace33 feat: Add an MCP server for karakeep 2025-04-13 01:53:11 +00:00
MohamedBassem
755fc36e91 chore: Rename hoarder packages to karakeep 2025-04-12 19:37:40 +01:00
erik-nilcoast
b3417d87a0 feat(workers): Adds publisher and author og:meta tags to Bookmark (#1141) 2025-03-22 22:38:50 +00:00
Mohamed Bassem
84ba482b81 deps: Upgrade pdfjs and dompurify 2025-03-22 16:08:26 +00:00
Mohamed Bassem
59c444a503 fix: Revert the accidental upgrade of deps. #1107 2025-03-10 14:57:04 +00:00
dependabot[bot]
849cf4bdeb build(deps): bump dompurify from 3.0.9 to 3.2.4 (#1102)
Bumps [dompurify](https://github.com/cure53/DOMPurify) from 3.0.9 to 3.2.4.
- [Release notes](https://github.com/cure53/DOMPurify/releases)
- [Commits](https://github.com/cure53/DOMPurify/compare/3.0.9...3.2.4)

---
updated-dependencies:
- dependency-name: dompurify
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-09 21:14:21 +00:00
Ahmad Mujahid
e5cb9aa848 feat: Add PDF screenshot generation and display (#995)
* Updated pdf2json to 3.1.5

* Extract and store a screenshot from PDF files using pdf2pic

* Installing graphicsmagick and ghostscript

* Generate Missing PDF screenshot with tidyAssets worker for backward support

* Display PDF screenshot instead of the PDF in web if it exists.

* Display PDF screenshot in mobile app if exists.

* Updated pnpm-lock.yaml

* Removed console.log

* Revert the unnecessary changes in package.json

* Revert pnpm-lock changes

* Prevent rendering PDF files if the screenshot is not generated

* refactor: replace useEffect with useMemo for section initialization

* feat: show PDF file download button and handle large PDFs by defaulting to screenshot view

* feat: add file size to openapi spec

* feature: Add Assets preprocessing in fix mode to admin actions

* i18n: add reprocess_assets_fix_mode translation

* i18n: Add missing ar translations

* A bunch of fixes

* Fix openspec schema

---------

Co-authored-by: Mohamed Bassem <me@mbassem.com>
2025-02-17 09:25:16 +00:00
Mohamed Bassem
c0a4af7887 fix: Fix node22 error in worker container. Fixes #962 2025-02-02 15:47:21 +00:00
Mohamed Bassem
fd7011aff5 fix: Abort all IO when workers timeout instead of detaching. Fixes #742 2025-02-01 18:16:25 +00:00
Mohamed Bassem
0893446bed deps: Upgrade typescript to 5.7 2025-02-01 17:03:06 +00:00
Mohamed Bassem (aider)
17af22bb6d chore: add format:fix and lint:fix scripts to all packages 2024-12-31 12:33:48 +00:00
Mohamed Bassem
aff4e60952 deps: Upgrade drizzle-orm to 0.38.3 2024-12-29 21:29:58 +00:00
Mohamed Bassem
378ad9bc15 fix(workers): Don't block connection to chrome when failing to download adblock list. #674 2024-11-21 23:39:37 +00:00
Mohamed Bassem
10070c1752 fix: Feed refreshes were not getting re-enqueued for failed jobs 2024-11-09 13:35:28 +00:00
Mohamed Bassem
d34b538a49 feature: Schedule RSS feed refreshes every hour 2024-11-03 18:33:52 +00:00
Mohamed Bassem
cf1a25131f feature: Add support for subscribing to RSS feeds. Fixes #202 2024-11-03 17:09:47 +00:00
Mohamed Bassem
a746e9a38e deps: Extract the queue implementation into its own repos 2024-10-27 23:40:10 +00:00
Mohamed Bassem
3e727f7ba3 refactor: Move inference to the shared package 2024-10-26 20:07:16 +00:00
Mohamed Bassem
019b5d2f5e feature: Add OCR support for images. Fixes #296 2024-10-20 21:06:58 +00:00
Your Name
a822ff26ce fix(workers): Pin execa to avoid ERR_PACKAGE_PATH_NOT_EXPORTED error 2024-10-19 22:26:47 +00:00
MohamedBassem
31bcad82b8 deps: Upgrade metascraper for faster docker builds 2024-10-12 19:02:05 +00:00
MohamedBassem
1b09682685 feature: Allow customizing the inference's context length 2024-10-12 17:37:42 +00:00