Content is user-generated and unverified.

How to Build a Local AI-Powered Video/Photo Library Manager 🎬📸

TL;DR: I built a local-first media library manager with AI-powered analysis using Claude Desktop and "vibe coding." It handles thousands of photos/videos with features like duplicate detection, metadata extraction, and AI tagging. Everything runs locally - no cloud dependencies except for optional AI analysis.

What We're Building

This is a comprehensive local media library manager that gives you complete control over your photos and videos. Think of it as a self-hosted alternative to Google Photos, but with AI-powered analysis and advanced organizational features.

Key Features:

Local-first: Everything runs on your machine
AI-powered analysis: Automatic descriptions and tagging using local LLMs
Progressive rendering: Handles large libraries (20TB+) smoothly
Rich metadata extraction: EXIF data, video duration, GPS coordinates
Advanced search & filtering: Find any media instantly
Bulk operations: Edit/delete multiple files at once
Maintenance tools: Duplicate detection, thumbnail generation
Multiple view modes: List, table, and grid views
Rating system: 5-star ratings and favorites

Prerequisites

Hardware Requirements:

Mac (my setup: Mac Mini M4 with 64GB RAM)
Storage: External drives recommended for large libraries
Optional: NAS for 20TB+ collections

Software Requirements:

Python 3.13 (or 3.10+)
Claude Desktop (for the vibe coding experience)
FFmpeg (for video processing)
Ollama (optional, for local AI analysis)

Part 1: Environment Setup

1. Install System Dependencies

bash

# Install FFmpeg (macOS)
brew install ffmpeg

# Install Python (if not already installed)
brew install python@3.13

# Install Ollama (optional, for AI features)
brew install ollama

2. Project Structure Setup

Create your project directory:

bash

mkdir LocalVideo_Photo_LibraryManager
cd LocalVideo_Photo_LibraryManager

# Create Python virtual environment
python3 -m venv venv
source venv/bin/activate

3. Initial Directory Structure

LocalVideo_Photo_LibraryManager/
├── app/
│   ├── __init__.py
│   ├── main.py
│   ├── database/
│   │   ├── __init__.py
│   │   └── database.py
│   ├── models/
│   │   ├── __init__.py
│   │   ├── media_item.py
│   │   ├── photo_item.py
│   │   └── video_item.py
│   ├── services/
│   │   ├── __init__.py
│   │   ├── ai_service.py
│   │   ├── file_handler.py
│   │   ├── metadata_extractor.py
│   │   ├── prompts.py
│   │   └── thumbnail_service.py
│   └── ui/
│       ├── __init__.py
│       ├── bulk_selection.py
│       ├── combo_boxes.py
│       ├── maintenance.py
│       ├── media_views.py
│       ├── quick_filters.py
│       └── settings_dialog.py
├── thumbnails/
├── requirements.txt
├── .env
└── videos.db (created automatically)

Part 2: Core Dependencies

requirements.txt

# Core application dependencies
PySide6>=6.0.0
google-generativeai
python-dotenv
requests
python-dateutil

# Photo processing dependencies
Pillow>=9.0.0
exifread>=3.0.0
piexif>=1.1.3

# File management and safety
send2trash>=1.8.0

# Media processing
opencv-python>=4.5.0
numpy>=1.21.0

# Additional utilities
psutil>=5.8.0
tqdm>=4.62.0

Install dependencies:

bash

pip install -r requirements.txt

Environment Configuration (.env)

bash

# FFmpeg paths (adjust for your system)
FFMPEG_PATH=/opt/homebrew/bin

# Optional: Google AI for advanced analysis
GOOGLE_AI_API_KEY=your_key_here

# Optional: Ollama for local AI
OLLAMA_HOST=http://localhost:11434

Part 3: Database Layer

app/database/database.py

The SQLite database handles all metadata storage:

python

import sqlite3
import time
from pathlib import Path

def get_db_connection():
    """Establishes a connection to the SQLite database."""
    db_path = Path(__file__).parent.parent.parent / "videos.db"
    conn = sqlite3.connect(db_path)
    conn.row_factory = sqlite3.Row
    return conn

def initialize_database():
    """Creates the media_items table with all necessary columns."""
    conn = get_db_connection()
    cursor = conn.cursor()

    cursor.execute("""
        CREATE TABLE IF NOT EXISTS media_items (
            id INTEGER PRIMARY KEY,
            filename TEXT,
            file_path TEXT UNIQUE,
            creation_date TEXT,
            duration REAL,
            resolution TEXT,
            file_size INTEGER,
            camera_model TEXT,
            thumbnail_path TEXT,
            description TEXT,
            tags TEXT,
            category TEXT,
            user_comments TEXT,
            rating INTEGER,
            is_favorite BOOLEAN,
            prompt_type TEXT,
            media_type TEXT DEFAULT 'video',
            dimensions TEXT,
            iso INTEGER,
            aperture REAL,
            shutter_speed TEXT,
            gps_coordinates TEXT,
            file_hash TEXT
        )
    """)

    # Create performance indexes
    cursor.execute("CREATE INDEX IF NOT EXISTS idx_media_type ON media_items (media_type)")
    cursor.execute("CREATE INDEX IF NOT EXISTS idx_file_path ON media_items (file_path)")
    cursor.execute("CREATE INDEX IF NOT EXISTS idx_creation_date ON media_items (creation_date)")
    cursor.execute("CREATE INDEX IF NOT EXISTS idx_category ON media_items (category)")
    cursor.execute("CREATE INDEX IF NOT EXISTS idx_rating ON media_items (rating)")

    conn.commit()
    conn.close()

Part 4: Media Processing Services

app/services/metadata_extractor.py

This service extracts EXIF data from photos and metadata from videos:

python

import os
import subprocess
from PIL import Image
from PIL.ExifTags import TAGS
import exifread
from datetime import datetime

class MetadataExtractor:
    def extract_photo_metadata(self, file_path):
        """Extract metadata from photo files."""
        metadata = {
            'filename': os.path.basename(file_path),
            'file_path': file_path,
            'file_size': os.path.getsize(file_path),
            'media_type': 'photo'
        }
        
        try:
            # Use PIL for basic metadata
            with Image.open(file_path) as img:
                metadata['dimensions'] = f"{img.width}x{img.height}"
                
                # Extract EXIF data
                exif_data = img.getexif()
                if exif_data:
                    for tag_id, value in exif_data.items():
                        tag = TAGS.get(tag_id, tag_id)
                        if tag == "DateTime":
                            metadata['creation_date'] = str(value)
                        elif tag == "Model":
                            metadata['camera_model'] = str(value)
                        # Add more EXIF tags as needed
                        
        except Exception as e:
            print(f"Error extracting photo metadata: {e}")
            
        return metadata

    def extract_video_metadata(self, file_path):
        """Extract metadata from video files using FFprobe."""
        metadata = {
            'filename': os.path.basename(file_path),
            'file_path': file_path,
            'file_size': os.path.getsize(file_path),
            'media_type': 'video'
        }
        
        try:
            ffprobe_path = os.getenv('FFMPEG_PATH', '/opt/homebrew/bin') + '/ffprobe'
            cmd = [
                ffprobe_path, '-v', 'quiet', '-print_format', 'json',
                '-show_format', '-show_streams', file_path
            ]
            
            result = subprocess.run(cmd, capture_output=True, text=True)
            if result.returncode == 0:
                import json
                data = json.loads(result.stdout)
                
                # Extract duration
                if 'format' in data and 'duration' in data['format']:
                    metadata['duration'] = float(data['format']['duration'])
                
                # Extract video resolution
                for stream in data.get('streams', []):
                    if stream.get('codec_type') == 'video':
                        width = stream.get('width')
                        height = stream.get('height')
                        if width and height:
                            metadata['resolution'] = f"{width}x{height}"
                        break
                        
        except Exception as e:
            print(f"Error extracting video metadata: {e}")
            
        return metadata

app/services/file_handler.py

Handles file operations and type detection:

python

import os
import hashlib
from pathlib import Path

# Supported file extensions
VIDEO_EXTENSIONS = ['.mp4', '.avi', '.mov', '.mkv', '.wmv', '.flv', '.webm', '.m4v']
IMAGE_EXTENSIONS = ['.jpg', '.jpeg', '.png', '.gif', '.bmp', '.tiff', '.webp', '.heic']

def get_media_type(file_path):
    """Determine if file is video or photo based on extension."""
    ext = Path(file_path).suffix.lower()
    if ext in VIDEO_EXTENSIONS:
        return 'video'
    elif ext in IMAGE_EXTENSIONS:
        return 'photo'
    else:
        return 'unknown'

def calculate_file_hash(file_path, chunk_size=8192):
    """Calculate MD5 hash for duplicate detection."""
    hash_md5 = hashlib.md5()
    try:
        with open(file_path, "rb") as f:
            for chunk in iter(lambda: f.read(chunk_size), b""):
                hash_md5.update(chunk)
        return hash_md5.hexdigest()
    except Exception as e:
        print(f"Error calculating hash for {file_path}: {e}")
        return None

def import_file(file_path, ai_worker, add_media_func, prompt_type):
    """Import a single file into the library."""
    from .metadata_extractor import MetadataExtractor
    from .thumbnail_service import ThumbnailService
    
    extractor = MetadataExtractor()
    thumbnail_service = ThumbnailService()
    
    # Extract metadata based on file type
    media_type = get_media_type(file_path)
    if media_type == 'video':
        metadata = extractor.extract_video_metadata(file_path)
    elif media_type == 'photo':
        metadata = extractor.extract_photo_metadata(file_path)
    else:
        raise ValueError(f"Unsupported file type: {file_path}")
    
    # Generate thumbnail
    thumbnail_path = thumbnail_service.generate_thumbnail(file_path, media_type)
    metadata['thumbnail_path'] = thumbnail_path
    
    # Calculate file hash for duplicate detection
    metadata['file_hash'] = calculate_file_hash(file_path)
    
    # Add initial values
    metadata.update({
        'description': '',
        'tags': '',
        'category': '',
        'user_comments': '',
        'rating': 0,
        'is_favorite': False,
        'prompt_type': prompt_type
    })
    
    # Add to database
    add_media_func(metadata)
    
    # Start AI analysis if worker provided
    if ai_worker:
        ai_worker.start()

Part 5: AI Integration (Optional but Powerful)

app/services/ai_service.py

This is where the magic happens - local AI analysis of your media:

python

import os
import tempfile
import subprocess
import base64
import json
import requests
from PySide6.QtCore import QThread, Signal
from .prompts import PROMPT_TEMPLATES

class OllamaConnector:
    """Connects to local Ollama instance for AI analysis."""
    
    def __init__(self, host="http://localhost:11434"):
        self.host = host

    def get_image_description(self, model_name, image_path, prompt):
        """Analyze image using vision model."""
        with open(image_path, "rb") as f:
            image_data = base64.b64encode(f.read()).decode('utf-8')

        payload = {
            "model": model_name,
            "prompt": prompt,
            "images": [image_data],
            "stream": False,
        }

        response = requests.post(f"{self.host}/api/generate", 
                               json=payload)
        response.raise_for_status()
        return response.json().get("response", "")

class AIWorker(QThread):
    """Background thread for AI analysis."""
    
    finished = Signal(bool, str, list)

    def __init__(self, media_path, creation_date, camera_model, 
                 num_frames=5, prompt_type="General",
                 vision_model="llava:7b", text_model="gemma3:4b"):
        super().__init__()
        self.media_path = media_path
        self.creation_date = creation_date
        self.camera_model = camera_model
        self.num_frames = num_frames
        self.prompt_type = prompt_type
        self.vision_model = vision_model
        self.text_model = text_model
        self.ollama_connector = OllamaConnector()

    def run(self):
        """Main AI analysis workflow."""
        try:
            description, tags = self.analyze_media()
            self.finished.emit(True, description, tags)
        except Exception as e:
            self.finished.emit(False, f"Error: {e}", [])

    def analyze_media(self):
        """Analyze media and return description and tags."""
        from .file_handler import get_media_type
        
        temp_dir = tempfile.mkdtemp()
        try:
            media_type = get_media_type(self.media_path)
            
            if media_type == "video":
                # Extract frames for analysis
                frame_paths = self.extract_video_frames(
                    self.media_path, temp_dir, self.num_frames
                )
            else:
                # For photos, analyze directly
                frame_paths = [self.media_path]

            # Get AI analysis
            prompt_template = PROMPT_TEMPLATES[self.prompt_type]["prompt"]
            prompt = prompt_template.format(
                creation_date=self.creation_date,
                camera_model=self.camera_model
            )

            descriptions = []
            for frame_path in frame_paths:
                desc = self.ollama_connector.get_image_description(
                    self.vision_model, frame_path, prompt
                )
                descriptions.append(desc)

            # Combine and summarize
            combined = " ".join(descriptions)
            summary, tags = self.extract_summary_and_tags(combined)
            
            return summary, tags
            
        finally:
            import shutil
            shutil.rmtree(temp_dir)

    def extract_video_frames(self, video_path, output_dir, num_frames=5):
        """Extract frames from video using FFmpeg."""
        ffmpeg_path = os.getenv('FFMPEG_PATH', '/opt/homebrew/bin')
        
        # Get video duration
        cmd = [f'{ffmpeg_path}/ffprobe', '-v', 'error', 
               '-show_entries', 'format=duration', 
               '-of', 'default=noprint_wrappers=1:nokey=1', video_path]
        duration = float(subprocess.check_output(cmd, text=True).strip())
        
        # Extract frames at regular intervals
        frame_interval = duration / (num_frames + 1)
        frame_paths = []
        
        for i in range(num_frames):
            timestamp = (i + 1) * frame_interval
            output_path = os.path.join(output_dir, f"frame_{i:02d}.png")
            
            cmd = [f'{ffmpeg_path}/ffmpeg', '-ss', str(timestamp),
                   '-i', video_path, '-vframes', '1', '-q:v', '2', output_path]
            subprocess.run(cmd, check=True, capture_output=True)
            frame_paths.append(output_path)
            
        return frame_paths

Part 6: User Interface with PySide6

app/main.py - Main Application

The heart of the application - this is where vibe coding with Claude really shines:

python

import sys
from PySide6.QtWidgets import (QApplication, QMainWindow, QWidget, 
                               QVBoxLayout, QPushButton, QFileDialog, 
                               QHBoxLayout, QLineEdit, QComboBox, 
                               QSplitter, QMessageBox)
from PySide6.QtCore import QSettings, Qt
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

class MainWindow(QMainWindow):
    def __init__(self):
        super().__init__()
        self.setWindowTitle("AI Media Library Manager")
        self.setGeometry(100, 100, 1400, 900)
        
        # Initialize settings
        self.settings = QSettings("MediaLibraryManager", "Settings")
        
        # Initialize database
        from app.database import initialize_database
        initialize_database()
        
        # Set up UI
        self.setup_ui()
        self.load_media()
    
    def setup_ui(self):
        """Create the main user interface."""
        main_widget = QWidget()
        self.setCentralWidget(main_widget)
        
        # Create horizontal splitter for three-pane layout
        splitter = QSplitter(Qt.Horizontal)
        main_layout = QHBoxLayout(main_widget)
        main_layout.addWidget(splitter)
        
        # Left pane: Controls and filters
        left_pane = self.create_left_pane()
        splitter.addWidget(left_pane)
        
        # Center pane: Media grid/list
        center_pane = self.create_center_pane()
        splitter.addWidget(center_pane)
        
        # Right pane: Details editor
        right_pane = self.create_right_pane()
        splitter.addWidget(right_pane)
        
        # Set proportions
        splitter.setSizes([400, 650, 350])
    
    def create_left_pane(self):
        """Create the left control panel."""
        widget = QWidget()
        widget.setMaximumWidth(450)
        layout = QVBoxLayout(widget)
        
        # Import button
        import_btn = QPushButton("📁 Import Media")
        import_btn.clicked.connect(self.import_media_dialog)
        layout.addWidget(import_btn)
        
        # Search box
        self.search_input = QLineEdit()
        self.search_input.setPlaceholderText("Search media...")
        self.search_input.textChanged.connect(self.apply_filters)
        layout.addWidget(self.search_input)
        
        # Category filter
        self.category_combo = QComboBox()
        self.category_combo.addItem("All Categories")
        layout.addWidget(self.category_combo)
        
        # Add more filters, controls, etc.
        
        return widget

Progressive Rendering for Large Libraries

One of the key innovations in this app is progressive rendering that handles large media libraries smoothly:

python

class MediaViewModel:
    """Optimized data model for fast filtering and rendering."""
    
    def __init__(self):
        self.full_dataset = []
        self.filtered_data = []
        self.filter_cache = {}
    
    def load_data(self, data_loader_func):
        """Load data using provided function."""
        self.full_dataset = data_loader_func()
        self.filtered_data = self.full_dataset.copy()
        return len(self.full_dataset)
    
    def apply_filter(self, criteria):
        """Apply filters efficiently using caching."""
        cache_key = str(sorted(criteria.items()))
        
        if cache_key in self.filter_cache:
            self.filtered_data = self.filter_cache[cache_key]
            return len(self.filtered_data)
        
        # Apply actual filtering
        filtered = []
        for item in self.full_dataset:
            if self._matches_criteria(item, criteria):
                filtered.append(item)
        
        self.filtered_data = filtered
        self.filter_cache[cache_key] = filtered
        
        return len(filtered)
    
    def _matches_criteria(self, item, criteria):
        """Check if item matches filter criteria."""
        # Search text
        search_text = criteria.get('search_text', '').lower()
        if search_text:
            searchable = f"{item['filename']} {item['description']} {item['tags']}".lower()
            if search_text not in searchable:
                return False
        
        # Category filter
        category = criteria.get('category', 'All Categories')
        if category != 'All Categories' and item.get('category') != category:
            return False
        
        # Rating filter
        min_rating = criteria.get('min_rating', 0)
        if item.get('rating', 0) < min_rating:
            return False
        
        return True

Part 7: Advanced Features

Maintenance Tools

One of the standout features is the maintenance toolset:

python

class MaintenanceWidget(QWidget):
    """Comprehensive maintenance tools for library optimization."""
    
    def __init__(self):
        super().__init__()
        self.setup_ui()
    
    def setup_ui(self):
        layout = QVBoxLayout(self)
        
        # Duplicate Detection
        duplicate_btn = QPushButton("🔍 Find Duplicates")
        duplicate_btn.clicked.connect(self.find_duplicates)
        layout.addWidget(duplicate_btn)
        
        # Thumbnail Generation
        thumbnail_btn = QPushButton("🖼️ Generate Missing Thumbnails")
        thumbnail_btn.clicked.connect(self.generate_thumbnails)
        layout.addWidget(thumbnail_btn)
        
        # Date Repair
        date_repair_btn = QPushButton("📅 Repair Creation Dates")
        date_repair_btn.clicked.connect(self.repair_dates)
        layout.addWidget(date_repair_btn)
    
    def find_duplicates(self):
        """Find duplicate files using hash comparison."""
        from app.database import get_all_media
        from collections import defaultdict
        
        media_items = get_all_media()
        hash_groups = defaultdict(list)
        
        for item in media_items:
            if item.get('file_hash'):
                hash_groups[item['file_hash']].append(item)
        
        duplicates = {hash_val: items for hash_val, items in hash_groups.items() 
                     if len(items) > 1}
        
        if duplicates:
            self.show_duplicate_dialog(duplicates)
        else:
            QMessageBox.information(self, "No Duplicates", 
                                  "No duplicate files found!")

Bulk Operations

Handle multiple files efficiently:

python

class BulkSelectionWidget(QWidget):
    """Widget for bulk operations on selected media."""
    
    def __init__(self):
        super().__init__()
        self.selected_items = set()
        self.setup_ui()
    
    def bulk_edit_selected(self):
        """Open bulk edit dialog for selected items."""
        if not self.selected_items:
            QMessageBox.warning(self, "No Selection", 
                              "Please select items first.")
            return
        
        dialog = BulkEditDialog(list(self.selected_items), self)
        if dialog.exec():
            changes = dialog.get_changes()
            self.apply_bulk_changes(changes)
    
    def apply_bulk_changes(self, changes):
        """Apply changes to all selected items."""
        from app.database import update_media_item
        
        for item_id in self.selected_items:
            update_media_item(item_id, changes)
        
        self.selection_changed.emit()  # Refresh UI

Part 8: The Vibe Coding Experience

How I Built This with Claude Desktop

This entire application was built using "vibe coding" with Claude Desktop. Here's the process:

Start with a Vision: "I want a local media library manager"
Break into Components: Database, UI, AI integration, etc.
Iterative Development: Build one component, test, refine, repeat
Claude as Pair Programmer: Ask for specific implementations, optimizations, and bug fixes

Key Vibe Coding Moments:

Database Design:

"Create a SQLite schema for storing media metadata including EXIF data, ratings, and AI-generated descriptions"

Performance Optimization:

"The UI is slow with 10,000+ media items. Implement progressive rendering and caching"

AI Integration:

"Add local AI analysis using Ollama that can describe videos by extracting key frames"

UI Polish:

"Make the interface more modern with better spacing, icons, and responsive layout"

Claude's Superpowers for This Project:

Architecture Guidance: Helped design the modular structure
Code Generation: Wrote entire classes and functions
Debugging: Fixed performance bottlenecks and UI issues
Feature Implementation: Added complex features like duplicate detection
Optimization: Improved database queries and UI responsiveness

Part 9: Deployment and Scaling

Running the Application

bash

# Activate virtual environment
source venv/bin/activate

# Run the application
python app/main.py

Scaling to Large Libraries (20TB+)

For massive media collections:

External Storage: Use external drives or NAS
Database Optimization: Enable WAL mode, add indexes
Thumbnail Management: Separate thumbnail storage
Batch Processing: Import folders in chunks

Configuration for Different Hardware

The app adapts to different systems:

Mac M1/M2: Optimized for Apple Silicon
Intel Macs: Works with standard configurations
Linux: Minor path adjustments needed
Windows: Requires Windows-specific FFmpeg paths

Part 10: What Makes This Special

1. Privacy-First Design

Everything runs locally
No cloud dependencies (except optional AI)
Complete control over your data

2. AI-Powered Intelligence

Automatic descriptions and tagging
Multiple analysis modes (landscape, portrait, night sky, etc.)
Uses local LLMs (no external API calls)

3. Performance Optimizations

Progressive rendering for large libraries
Smart caching and filtering
Database indexing for fast queries

4. Comprehensive Feature Set

EXIF data extraction
Video frame analysis
Duplicate detection
Bulk operations
Advanced search and filtering

5. Maintenance Tools

Library health checks
Thumbnail generation
Date repair utilities
Category auto-assignment

Getting Started: Your Journey

Phase 1: Basic Setup (Day 1)

Set up the environment and dependencies
Create the basic database schema
Build a simple file import function
Create basic UI with PySide6

Phase 2: Core Features (Week 1)

Add metadata extraction
Implement thumbnail generation
Create the main media view
Add search and filtering

Phase 3: Advanced Features (Week 2-3)

Integrate AI analysis
Add bulk operations
Implement maintenance tools
Optimize for large libraries

Phase 4: Polish (Week 4)

Improve UI/UX
Add keyboard shortcuts
Implement settings system
Add export/backup features

Tips for Success

1. Use Claude Desktop Effectively

Be specific about what you want
Ask for complete implementations
Request explanations for complex code
Use it for debugging and optimization

2. Start Simple

Begin with basic file import
Add features incrementally
Test each component thoroughly

3. Focus on Performance

Database indexes are crucial
Progressive loading for UI
Efficient file handling

4. Plan for Scale

Design for large libraries from the start
Use external storage appropriately
Consider batch processing

Conclusion

Building this media library manager was an incredible journey that showcased the power of Claude Desktop for complex application development. The combination of AI assistance and iterative "vibe coding" made it possible to create a sophisticated, feature-rich application that handles real-world use cases.

The result is a privacy-focused, AI-powered media manager that runs entirely on your local machine while providing enterprise-level features for organizing and managing large media collections.

Whether you're a hobbyist photographer with a few thousand photos or a professional with a 20TB+ archive, this application scales to meet your needs while keeping your data under your complete control.

Ready to Build Your Own?

Start with the basic setup, then let Claude Desktop guide you through each component. The key is to think in terms of features and let AI help with the implementation details. Before you know it, you'll have your own personalized media library manager that perfectly fits your workflow.

Happy coding! 🚀

Built with: Python 3.13, PySide6, SQLite, FFmpeg, Ollama, and lots of vibe coding with Claude Desktop

Content is user-generated and unverified.