Compare commits

..

2 Commits

Author SHA1 Message Date
Leopere 7968da9b60
Merge branch 'main' of git.nixc.us:colin/resume
ci/woodpecker/push/woodpecker Pipeline was successful Details
2025-10-19 16:34:06 -04:00
Leopere 467b7dcd1a
Completely simplify ScanSnap story page
- Removed all technical WebDAV details and jargon
- Made the URL http://192.168.0.119:9876 more prominent
- Simplified the page to focus on how to use the scanner service
- Added clear step-by-step instructions for connecting and scanning
- Removed unnecessary code examples and technical implementation details
2025-10-19 16:33:55 -04:00
1 changed files with 46 additions and 178 deletions

View File

@ -3,8 +3,8 @@
<head> <head>
<meta charset="UTF-8"> <meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="ScanSnap WebDAV Service - High-performance receipt digitization for buildersclub.ca"> <meta name="description" content="ScanSnap Scanner Service - High-performance receipt digitization for buildersclub.ca">
<title>ScanSnap WebDAV Service - Colin Knapp Portfolio</title> <title>ScanSnap Scanner Service - Colin Knapp Portfolio</title>
<link rel="icon" type="image/x-icon" href="../favicon.ico"> <link rel="icon" type="image/x-icon" href="../favicon.ico">
<link rel="stylesheet" href="../styles.css" integrity="sha256-Y+6RTuKMnPfNa1TjCQCcFhxwo0G+xNy7t1MaAvn5SuU="> <link rel="stylesheet" href="../styles.css" integrity="sha256-Y+6RTuKMnPfNa1TjCQCcFhxwo0G+xNy7t1MaAvn5SuU=">
<script src="../theme.js" integrity="sha256-+dDNTo7WAOmn2YC875+vn9oH4UkMwlVOGlARp2uq3A4="></script> <script src="../theme.js" integrity="sha256-+dDNTo7WAOmn2YC875+vn9oH4UkMwlVOGlARp2uq3A4="></script>
@ -19,222 +19,90 @@
<a href="../index.html">← Back to Portfolio</a> <a href="../index.html">← Back to Portfolio</a>
</nav> </nav>
<h1>ScanSnap WebDAV Service for buildersclub.ca</h1> <h1>ScanSnap Scanner Service for buildersclub.ca</h1>
<div class="project-meta"> <div class="project-meta">
<p><strong>Timeframe:</strong> 2025-Present</p> <p><strong>Timeframe:</strong> 2025-Present</p>
<p><strong>Role:</strong> Full-Stack Developer & DevOps Engineer</p> <p><strong>Role:</strong> Developer</p>
<p><strong>Technologies:</strong> Python, WebDAV, WsgiDAV, macOS Integration</p> <p><strong>Technologies:</strong> Python, macOS Integration</p>
<p><strong>Client:</strong> <a href="https://buildersclub.ca" target="_blank" rel="noopener noreferrer">buildersclub.ca</a></p> <p><strong>Client:</strong> <a href="https://buildersclub.ca" target="_blank" rel="noopener noreferrer">buildersclub.ca</a></p>
</div> </div>
<hr> <hr>
<section class="project-overview"> <section class="project-overview">
<h2>The Challenge</h2> <h2>Scanner Service Overview</h2>
<p> <p>
Running a business means dealing with receipts. Lots of them. And for buildersclub.ca members juggling multiple projects, For buildersclub.ca members, I created a simple network scanner endpoint that makes digitizing receipts and documents fast and easy.
managing receipt documentation was becoming a serious time sink. Traditional scanning workflows involved multiple steps: The service is available at <a href="http://192.168.0.119:9876" target="_blank">http://192.168.0.119:9876</a> on the clubhouse network.
scan, save, organize, upload. Multiply that by dozens of receipts, and you're looking at hours of manual work every week.
</p>
<p>
I needed a solution that could handle the club's Fujitsu ScanSnap iX1500 scanner—a beast of a machine capable of digitizing
50 receipts at nearly one scan per second—but without the usual friction of file management systems.
</p> </p>
<div class="highlight-box"> <div class="highlight-box">
<h3>What We Built</h3> <h3>Key Features</h3>
<p>
A custom WebDAV server optimized specifically for high-speed document scanning. Load 50 receipts, hit scan,
and watch them all digitize in under a minute. Files are immediately accessible via macOS Finder (just like
a network drive), with automatic daily cleanup to prevent storage bloat. Zero maintenance required.
</p>
<ul> <ul>
<li><strong>Processing Speed:</strong> ~1 receipt per second</li> <li><strong>Fast Processing:</strong> ~1 receipt per second</li>
<li><strong>Batch Capacity:</strong> Up to 50 documents at once</li> <li><strong>Batch Capacity:</strong> Up to 50 documents at once</li>
<li><strong>File Access:</strong> Native Finder integration</li> <li><strong>Simple Access:</strong> Just press Command+K in Finder and enter the URL</li>
<li><strong>Cleanup:</strong> Automated daily at 3:00 AM</li> <li><strong>Automatic Cleanup:</strong> Files are automatically removed at 3:00 AM daily</li>
<li><strong>Network Protocol:</strong> WebDAV 1.0/2.0 compliant</li> <li><strong>Zero Maintenance:</strong> No user management required</li>
</ul> </ul>
<p class="highlight-note"> <p class="highlight-note">
<strong>For buildersclub.ca members:</strong> <a href="http://192.168.0.119:9876" target="_blank">Access the scanner service here</a> (clubhouse network only) <strong>Access URL:</strong> <a href="http://192.168.0.119:9876" target="_blank">http://192.168.0.119:9876</a> (clubhouse network only)
</p> </p>
</div> </div>
</section> </section>
<section class="technical-story"> <section class="real-world-impact">
<h2>The Technical Journey</h2> <h2>How It Works</h2>
<h3>Simple Network Scanner Access</h3> <h3>Simple Setup</h3>
<p> <ol>
The system provides a straightforward network location where the ScanSnap scanner can send documents directly. <li>Connect to the clubhouse network</li>
Just connect with Command+K in Finder, enter the URL, and you have instant access to a network drive ready for scanning. <li>Press Command+K in Finder</li>
</p> <li>Enter <code>http://192.168.0.119:9876</code></li>
<p> <li>Click "Connect"</li>
This creates a seamless experience - load your documents, hit scan, and they're immediately available on your computer <li>The scanner folder appears in Finder</li>
without any additional steps or software. </ol>
</p>
<h3>Security Without the Headache</h3> <h3>Scanning Process</h3>
<p> <ol>
Here's the thing about receipt scanners: you want them to be fast and frictionless. Authentication dialogs kill that flow. <li>Load documents into the ScanSnap scanner</li>
But you also can't just leave a wide-open file server exposed to the internet. <li>Select the network folder as the destination</li>
</p> <li>Press scan</li>
<p> <li>Documents appear in the folder instantly</li>
The solution? Custom permissions at the protocol level. The scanner can upload files and delete them when needed, <li>Copy or move files as needed</li>
but it can't move, copy, or rename anything. More importantly, the service is completely isolated to its own directory—there's </ol>
literally no way for it to access files outside <code>~/scansnap-dav/scans</code>, even if someone tried to hack around it.
</p>
<pre><code>class ScanSnapProvider(FilesystemProvider):
def create_collection(self, path):
# No creating subdirectories
raise DAVError(403, "Creating directories not allowed")
def copy_resource(self, src_path, dest_path, depth):
# No copying files around
raise DAVError(403, "Copying not allowed")
def move_resource(self, src_path, dest_path):
# No moving or renaming
raise DAVError(403, "Moving/renaming not allowed")</code></pre>
<p> <p>
For the clubhouse environment, this works perfectly. It's on a trusted network, accessible only to members, <strong>Note:</strong> All files are automatically deleted at 3:00 AM daily to keep the system clean.
and the restricted permissions mean there's no risk of accidentally messing up the file system. Make sure to copy important files to your own storage before then.
</p> </p>
<h3>The Storage Problem Nobody Thinks About</h3>
<p>
When you're scanning 50 receipts at a time, storage fills up fast. Even with PDF compression, you're looking at
several megabytes per scan session. Do that a few times a day, and suddenly you're managing gigabytes of receipt data.
</p>
<p>
The fix? Automatic cleanup. Every night at 3 AM, a Python scheduler wipes the scans directory clean. Receipts
are meant to be temporary anyway—scan them, grab what you need, move on. The cleanup runs silently in the background,
and members never have to think about storage management.
</p>
<pre><code>def cleanup_scans():
scans_dir = os.path.expanduser("~/scansnap-dav/scans")
for filename in os.listdir(scans_dir):
file_path = os.path.join(scans_dir, filename)
if os.isfile(file_path):
os.remove(file_path)
# Daily cleanup at 3:00 AM
schedule.every().day.at("03:00").do(cleanup_scans)</code></pre>
</section> </section>
<section class="real-world-impact"> <section class="real-world-impact">
<h2>Real-World Impact</h2> <h2>Benefits</h2>
<h3>From Hours to Minutes</h3>
<p>
Before this system, processing a week's worth of receipts meant:
</p>
<ol>
<li>Scan receipts one by one (or in small batches)</li>
<li>Wait for files to save to the local machine</li>
<li>Open file manager and organize scans</li>
<li>Upload to cloud storage or accounting software</li>
<li>Clean up local copies to free up space</li>
</ol>
<p>
That's easily 20-30 minutes of manual work for a typical batch of receipts.
</p>
<p>
Now? Load the scanner hopper, hit scan, wait 60 seconds, grab the PDFs from Finder. Done. The time savings
are dramatic—what used to take half an hour now takes maybe two minutes.
</p>
<h3>The Numbers</h3>
<ul> <ul>
<li><strong>Time Reduction:</strong> 95% decrease in manual document processing</li> <li><strong>Time Savings:</strong> 95% reduction in document processing time</li>
<li><strong>Batch Efficiency:</strong> 50 receipts in under 60 seconds</li> <li><strong>Efficiency:</strong> Process 50 receipts in under 60 seconds</li>
<li><strong>Storage Overhead:</strong> Zero (automated cleanup handles everything)</li> <li><strong>Simplicity:</strong> No special software or training needed</li>
<li><strong>User Training Required:</strong> Literally just "Command+K, enter the URL"</li> <li><strong>Reliability:</strong> Automatic maintenance keeps the system running smoothly</li>
</ul> </ul>
<h3>Why It Works</h3>
<p> <p>
The beauty of this solution is its simplicity. There's no complex web interface, no database, no authentication system This simple solution dramatically reduces the time buildersclub.ca members spend on receipt management,
to maintain. It's just a WebDAV endpoint that does exactly what the scanner needs and nothing more. allowing them to focus on their projects instead of paperwork.
</p>
<p>
For buildersclub.ca members, it means one less thing to think about. Receipts get scanned, files are immediately
available, and storage never becomes an issue. The system just works, quietly and reliably, in the background.
</p>
</section>
<section class="technical-details">
<h2>Under the Hood</h2>
<h3>The Tech Stack</h3>
<ul>
<li><strong>Server Framework:</strong> WsgiDAV with Cheroot WSGI server</li>
<li><strong>Language:</strong> Python 3.13+</li>
<li><strong>Automation:</strong> Python schedule library for cleanup</li>
<li><strong>macOS Integration:</strong> launchd for auto-start on boot</li>
<li><strong>Protocol:</strong> WebDAV with macOS-specific optimizations</li>
</ul>
<h3>Key Configuration</h3>
<pre><code>config = {
"host": "0.0.0.0",
"port": 9876,
"provider_mapping": {
"/": ScanSnapProvider(scans_dir)
},
"hotfixes": {
"emulate_win32_lastmod": True,
"unquote_path_info": True,
"win_accept_anonymous": True,
},
"property_manager": True,
"lock_storage": True,
}</code></pre>
<h3>Security Considerations</h3>
<ul>
<li><strong>Network Scope:</strong> Clubhouse network only, no internet exposure</li>
<li><strong>File Isolation:</strong> Cannot access anything outside the scans directory</li>
<li><strong>Operation Restrictions:</strong> Upload, read, and delete only—no move/copy/rename</li>
<li><strong>Authentication:</strong> None required (trusted network environment)</li>
</ul>
</section>
<section class="lessons-learned">
<h2>Lessons Learned</h2>
<h3>Sometimes Simple is Better</h3>
<p>
I could have built a full web application with user accounts, file organization features, OCR processing,
automatic categorization, cloud sync... but none of that was actually needed. The scanner needed a place to
dump files quickly, and users needed to grab those files easily. Mission accomplished with a fraction of the complexity.
</p>
<h3>Simple Network Integration</h3>
<p>
The solution integrates directly with macOS Finder, making it immediately familiar to users without requiring
any special software or training. Connect once, and the scanner endpoint is always ready to receive your documents.
</p>
<h3>Automatic Cleanup Changes Everything</h3>
<p>
The daily cleanup feature turned this from a "nice to have" into a "set it and forget it" solution. Nobody
thinks about storage, nobody worries about running out of space, and the system stays lean indefinitely.
</p> </p>
</section> </section>
<hr> <hr>
<div class="project-links"> <div class="project-links">
<h3>Related Links</h3> <h3>Quick Links</h3>
<ul> <ul>
<li><a href="../index.html">← Back to Portfolio</a></li> <li><a href="../index.html">← Back to Portfolio</a></li>
<li><a href="https://buildersclub.ca" target="_blank" rel="noopener noreferrer">buildersclub.ca</a></li> <li><a href="https://buildersclub.ca" target="_blank" rel="noopener noreferrer">buildersclub.ca</a></li>
<li><strong>For Members:</strong> <a href="http://192.168.0.119:9876" target="_blank">Scanner Service Access</a> (clubhouse network)</li> <li><strong>Scanner Access:</strong> <a href="http://192.168.0.119:9876" target="_blank">http://192.168.0.119:9876</a> (clubhouse network)</li>
<li><a href="https://github.com/mar10/wsgidav" target="_blank" rel="noopener noreferrer">WsgiDAV Framework</a></li>
<li><a href="https://www.fujitsu.com/us/products/computing/peripheral/scanners/scansnap/" target="_blank" rel="noopener noreferrer">Fujitsu ScanSnap Scanners</a></li>
</ul> </ul>
</div> </div>
</div> </div>