191 lines
6.2 KiB
Markdown
191 lines
6.2 KiB
Markdown
# RSS Feed Monitor - Google Alerts
|
|
|
|
This repository contains validated Google Alert queries for monitoring repair-related discussions across Canadian platforms.
|
|
|
|
## ⚠️ START HERE
|
|
|
|
**✨ NEW: Production-Ready Reddit Alerts Available!**
|
|
|
|
Use `docs/google-alerts-reddit-tuned.md` for **validated, high-performance alerts** that produce regular, relevant results.
|
|
|
|
**Read `REDDIT_ALERTS_COMPLETE.md`** for test results showing 100% success rate and 10/10 relevant results.
|
|
|
|
## Files
|
|
|
|
### Documentation
|
|
- **`docs/google-alerts-reddit-tuned.md`** - ✨ **START HERE** - 25 production-ready alerts (100% validated)
|
|
- **`REDDIT_ALERTS_COMPLETE.md`** - ✨ **READ SECOND** - Complete test results and setup guide
|
|
- `docs/REDDIT_KEYWORDS.md` - Consumer language keyword conversion table
|
|
- `docs/google-alerts-broad.md` - Original 84 alerts (needs tuning)
|
|
- `docs/google-alerts.md` - Regional Reddit queries (61 alerts, low volume)
|
|
- `docs/PLAYWRIGHT_SCRAPING.md` - Guide to Playwright scraping with anti-detection
|
|
- `docs/PLAYWRIGHT_RECORDING.md` - Guide to recording alert setup with codegen
|
|
|
|
### Python Tools
|
|
- `scripts/validate_alerts.py` - Validator tool that checks queries and generates fixes
|
|
- `scripts/generate_broad_queries.py` - Generates location-based broad queries
|
|
|
|
### Playwright Tools (NEW)
|
|
- `scripts/human-behavior.js` - Human-like behavior library for bot detection avoidance
|
|
- `scripts/playwright-scraper.js` - Main scraper with Google search validation
|
|
- `scripts/validate-scraping.js` - Batch validator for testing multiple alerts
|
|
- `scripts/example-usage.js` - Usage examples and demonstrations
|
|
- `scripts/scraper-config.js` - Configuration for behavior fine-tuning
|
|
- `tests/alert-setup.spec.js` - Test documenting alert setup process
|
|
- `docs/PLAYWRIGHT_RECORDING.md` - Guide to recording alert setup with codegen
|
|
|
|
## Quick Start
|
|
|
|
### 1. Test Before You Create
|
|
|
|
**Copy this query and test in Google Search (NOT Alerts):**
|
|
```
|
|
"macbook repair" ("Toronto" OR "Mississauga" OR "Kitchener")
|
|
```
|
|
|
|
If you see 50+ results → the broad approach works ✅
|
|
|
|
### 2. Choose Your Strategy
|
|
|
|
- **Want results now?** Use `docs/google-alerts-broad.md` (recommended)
|
|
- **Want Reddit-only?** Use `docs/google-alerts.md` (may have low volume)
|
|
- **Not sure?** Read `docs/ALERT_STRATEGY.md`
|
|
|
|
### 3. Set Up Alerts
|
|
|
|
1. Open the file you chose
|
|
2. Find an alert (e.g., "Data Recovery - Ontario")
|
|
3. Copy the query block (everything inside ` ``` `)
|
|
4. Go to [Google Alerts](https://www.google.com/alerts)
|
|
5. Paste the query, set `As-it-happens` → `RSS feed`
|
|
6. Click `Create Alert`
|
|
|
|
### Validating Queries
|
|
|
|
#### Python Validator (Static Analysis)
|
|
|
|
Run the validator to check query structure and limits:
|
|
|
|
```bash
|
|
python3 scripts/validate_alerts.py docs/google-alerts.md
|
|
```
|
|
|
|
To regenerate working queries from a broken file:
|
|
|
|
```bash
|
|
python3 scripts/validate_alerts.py docs/google-alerts.md --fix > docs/google-alerts-fixed.md
|
|
```
|
|
|
|
#### Playwright Validator (Live Testing) - NEW! 🚀
|
|
|
|
Test queries by actually searching Google with human-like behavior to avoid bot detection:
|
|
|
|
```bash
|
|
# Install dependencies first
|
|
npm install
|
|
|
|
# Test a single query
|
|
node scripts/playwright-scraper.js '"macbook repair" Toronto'
|
|
|
|
# Batch test multiple alerts from markdown file
|
|
node scripts/validate-scraping.js docs/google-alerts-broad.md --max 5
|
|
|
|
# Run example demonstrations
|
|
node scripts/example-usage.js 1
|
|
```
|
|
|
|
**Features:**
|
|
- 🤖 Realistic mouse movements with bezier curves and occasional overshooting
|
|
- 📜 Natural scrolling patterns with random intervals
|
|
- ⌨️ Human-like typing with variable speeds and occasional typos
|
|
- ⏱️ Random delays mimicking real user behavior
|
|
- 🎭 Randomized browser fingerprints to avoid detection
|
|
|
|
See `docs/PLAYWRIGHT_SCRAPING.md` for full documentation.
|
|
|
|
#### Recording Alert Setup Process 🎬
|
|
|
|
Use Playwright's codegen to record and document the alert setup workflow:
|
|
|
|
```bash
|
|
# Record a new alert setup process
|
|
npm run record:alert-setup
|
|
```
|
|
|
|
This opens an interactive browser where you can perform the alert setup steps, and Playwright will generate test code automatically. Perfect for documenting the exact process for future reference.
|
|
|
|
See `docs/PLAYWRIGHT_RECORDING.md` for full documentation.
|
|
|
|
## Query Design
|
|
|
|
All queries follow these limits to ensure Google Alerts fires reliably:
|
|
|
|
- **≤8 site filters** per alert
|
|
- **≤18 OR terms** per keyword block
|
|
- **≤500 characters** total length
|
|
- **≤4 exclusion terms** (`-job -entertainment -movie -music`)
|
|
|
|
## Regional Structure
|
|
|
|
Reddit-based alerts are split into 5 regions to stay within limits:
|
|
|
|
1. **Ontario-GTA**: kitchener, waterloo, CambridgeON, guelph, toronto, mississauga, brampton
|
|
2. **Ontario-Other**: ontario, londonontario, HamiltonOntario, niagara, ottawa
|
|
3. **Western**: vancouver, VictoriaBC, Calgary, Edmonton
|
|
4. **Prairies**: saskatoon, regina, winnipeg
|
|
5. **Eastern**: montreal, quebeccity, halifax, newfoundland
|
|
|
|
Each service type (Data Recovery, Laptop Repair, Console Repair, etc.) has 5 regional alerts.
|
|
|
|
## Alert Categories
|
|
|
|
### Data Recovery (15 alerts)
|
|
- General data recovery
|
|
- HDD/SSD specialty recovery
|
|
- SD card/USB recovery
|
|
|
|
### Device Repair (25 alerts)
|
|
- Laptop/MacBook logic board repair
|
|
- GPU/Desktop board repair
|
|
- Console repair & refurbishment
|
|
- Smartphone repair
|
|
- iPad repair
|
|
- Connector (FPC) replacement
|
|
|
|
### Specialized Services (10 alerts)
|
|
- Key fob repair
|
|
- Microsolder/diagnostics
|
|
- Device refurbishment & trade-ins
|
|
|
|
### Non-Reddit Platforms (11 alerts)
|
|
- Kijiji/Used.ca classifieds
|
|
- Facebook Marketplace
|
|
- Craigslist
|
|
- Tech forums
|
|
- Discord communities
|
|
- Bulk/auction sourcing
|
|
|
|
## Troubleshooting
|
|
|
|
**No results coming through?**
|
|
|
|
1. Test the query in Google Search first (not in Alerts)
|
|
2. If Google Search shows results, the alert should work
|
|
3. If no results exist, the keywords may be too specific
|
|
4. Run `python3 scripts/validate_alerts.py` to check for limit violations
|
|
|
|
**Alert stopped working?**
|
|
|
|
Re-run validation and regenerate:
|
|
|
|
```bash
|
|
python3 scripts/validate_alerts.py docs/google-alerts.md --fix > docs/google-alerts-new.md
|
|
```
|
|
|
|
## Technical Notes
|
|
|
|
- Queries use exact-phrase matching (`"keyword"`) for precision
|
|
- The `-"ALERT_NAME:..."` marker was removed from all queries (it caused false negatives)
|
|
- Exclusions are limited to high-noise terms only
|
|
- Site filters use `site:reddit.com/r/subreddit` format (not full URLs)
|