
Fully local, natural language search over terabytes of media
Like Google Photos, but fully local. Turn the terabytes of video, audio, meetings, and files you work with into searchable memories, without uploading anything to the cloud. Clipto automatically tags people, dialogue, and scenes, so you can instantly find any moment buried in your media just by describing what you're looking for. It's fast too: on a MacBook Pro M5, Clipto indexed 2TB of videos in just 24 hours.
Clipto is a local, AI-driven search tool that indexes and tags terabytes of media, enabling users to find specific moments by describing them. It operates entirely on-device, ensuring data privacy without the need for cloud storage.
Overall, commenters express strong interest and appreciation for Clipto's local-first approach and natural language search capabilities.
<p>We've been honing Clipto's story for a few months. At the end of our last call <a href="https://www.producthunt.com/@henry_kang" data-node-type="mention" data-mention-type="user" data-mention-id="henry_kang" target="_blank" rel="nofollow noopener noreferrer">@henry_kang</a> proved the value of the product.</p><p></p><p>He and his team were out in the desert, <em>testing Clipto remotely</em>: minimal reception, terabytes of footage sitting on his laptop, and he needed to find a specific shot for the launch video.</p><p></p><p>He searched for: <em>"the wide drone shot where the car enters the desert"</em>. </p><p>He didn't want "a cinematic moment." Not a "vibes" search.</p><p></p><p>He <em>knew</em> he had the clip but in the pre-Clipto world, it would take hours of video scrubbing to find it. </p><p></p><p><strong>He found that clip in seconds using natural language to search over his own media, fully local. </strong></p><p><strong>Just like Google Photos — but nothing lives in the cloud.</strong></p><p></p><p>This isn't an easy problem to solve. Henry's been pursuing this direction for over twenty years, when at <a href="https://www.ri.cmu.edu/" target="_blank" rel="nofollow noopener noreferrer">CMU's Robotics Institute</a> (my alma mater, FYI), he began pushing the limits of computer vision. He starting with indexing hundreds of images and then advanced to <em>millions of objects</em> — and watched recognition basically explode once memory scaled. </p><p></p><p>Clipto is in many respects the culmination of that work, pointed at your personal hard drive.</p><p></p><p>And it's quick: a modern M5 MacBook chews through ~2TB of video in about a day. Why not push yours through its paces?</p>
<p>Hi Product Hunt! I’m Henry, founder of Clipto.</p><p></p><p><strong>Clipto gives you the ability to search in natural language over terabytes of media in seconds.</strong></p><p><strong>Think: Google Photos, but fully local.</strong></p><p></p><p>During my 20 years ago at <a href="https://www.ri.cmu.edu/" target="_blank" rel="nofollow noopener noreferrer">CMU’s Robotics Institute</a>, I became obsessed with memory systems: <em>what if computers could actually remember what they’ve seen?</em></p><p></p><p>I trained robots to memorize <em>millions</em> of product images crawled from the Amazon catalog (the standard back then was to index 100s of images at a time), and discovered that <em>they could use that memory to recognize almost anything they encountered!</em></p><p></p><p><strong>By pushing computers beyond their conventional limits, I had unlocked an explosion in machine intelligence.</strong></p><p></p><p>Years later, the problem has become personal.</p><p></p><p>Our computers are full of valuable raw footage, interviews, recordings, and more, but most of that data is still painfully hard to search, revisit, or reuse. <strong><em>We are data-rich, but knowledge-poor.</em></strong></p><p></p><p>That’s why I built Clipto. Clipto helps you find what matters inside terabytes of video, audio, meetings, and files, <em>instantly</em>, turning hours of repetitive work into seconds.</p><ul><li><p>Find the wide drone shot where the cars enter frame.</p></li><li><p>Find the shot specifically in the moment the sandstorm arrives from hours of footage.</p></li><li><p>And find what you <em>know</em> is in there, without suffering through hours of scrubbing.</p></li></ul><p>Clipto's memory system live where your data already is: on your device, under your control, available anytime, even offline — so you can keep working wherever and whenever inspiration strikes.</p><p></p><p><strong>After two years of compressing, optimizing, distilling and orchestrating AI models to run entirely on-device, we are ready to share it with the Product Hunt community.</strong></p><p></p><p>It’s still early, and it’s still compute-heavy. Right now, Clipto works best on higher-performance Apple Silicon Macs (M1 Pro/Max/Ultra and newer) with 24GB+ RAM. If you have a compatible Mac, we’d love for you to try it.</p><p></p><p><strong>To celebrate our launch, we're offering 1 month free to anyone who signs up this week with code PHLNCH.</strong></p><p></p><p>I’ll be here in the comments all day and would genuinely love to hear about the strategies you've developed to find your content diamonds in your digital rough.</p>
<p>This looks really interesting.</p><p></p><p>I'm curious about how deeply it understands media content.<br></p><p>Does it recognise things like camera angles, shot types (wide, medium, close-up), camera movements, transitions, B-roll, and multi-camera sequences?<br><br>It would be incredibly useful if I could search for something like "close-up shot of a person smiling" or "drone footage with a slow pan" and instantly find matching clips across my archive.<br></p><p>Would love to know how detailed the visual understanding gets beyond basic object and dialogue detection.</p>
<p>Interesting. Local-first stops being a privacy story the second you can find a clip on your own drive faster than you'd find it in cloud storage. Question - what happens to the index when I rename or move a file in Finder after indexing? Does Clipto watch the filesystem?</p>