344 lines
8.7 KiB
Markdown
344 lines
8.7 KiB
Markdown
# Canadian Repair RSS Feed Monitor - Development Guide
|
||
|
||
**For Developers:** Technical documentation for maintaining and extending the RSS feed generation system.
|
||
|
||
## 🏗️ Project Architecture
|
||
|
||
### Directory Structure
|
||
```
|
||
rss-feedmonitor/
|
||
├── README.md # User-facing documentation
|
||
├── DEVELOPMENT.md # This technical guide
|
||
├── .gitignore # Git ignore rules
|
||
├── docs/ # User documentation
|
||
│ ├── WORKFLOW.md
|
||
│ ├── QUICK_CREATE_GUIDE.md
|
||
│ ├── KEYWORD_OPTIMIZATION.md
|
||
│ ├── canadian-subreddits.md
|
||
│ └── canadian-repair-searches.md
|
||
├── scripts/ # Python generation scripts
|
||
│ ├── generate_modular_rss_feeds.py
|
||
│ ├── generate_optimized_rss_feeds.py
|
||
│ ├── generate_practical_rss_feeds.py
|
||
│ ├── generate_rss_feeds.py
|
||
│ ├── extract_website_keywords.py
|
||
│ └── update_keywords_from_website.py
|
||
├── data/ # Source data files
|
||
│ ├── repair_keywords.json
|
||
│ ├── canadian_subreddits.json
|
||
│ └── remaining_queries.txt
|
||
├── feeds/ # Generated RSS feeds
|
||
│ ├── rss-feeds.json
|
||
│ ├── *.md (generated RSS docs)
|
||
│ └── *.opml (RSS reader imports)
|
||
└── archive/ # Old/deprecated files
|
||
├── ALERT_CREATION_PROCESS.md
|
||
└── ALL_SEARCH_LINKS_COMPLETE.txt
|
||
```
|
||
|
||
## 🔧 Development Setup
|
||
|
||
### Prerequisites
|
||
- Python 3.8+
|
||
- No external dependencies for core functionality
|
||
- Optional: `pyyaml` for advanced keyword extraction
|
||
|
||
### Installation
|
||
```bash
|
||
# Clone the repository
|
||
git clone <repository-url>
|
||
cd rss-feedmonitor
|
||
|
||
# No pip installs required for basic functionality
|
||
# Optional: pip install pyyaml (for extract_website_keywords.py)
|
||
```
|
||
|
||
## 📊 Data Sources
|
||
|
||
### repair_keywords.json
|
||
**Purpose:** Defines all repair keyword categories and search terms
|
||
**Structure:**
|
||
```json
|
||
{
|
||
"categories": {
|
||
"iphone_repairs": {
|
||
"name": "iPhone Repair Requests",
|
||
"description": "...",
|
||
"devices": ["iPhone", "iPhone 12", ...],
|
||
"problems": ["repair", "fix", "broken", ...]
|
||
}
|
||
},
|
||
"additional_keywords": {
|
||
"urgency_indicators": ["emergency", "urgent", ...],
|
||
"location_indicators": ["local", "near me", ...]
|
||
}
|
||
}
|
||
```
|
||
|
||
### canadian_subreddits.json
|
||
**Purpose:** Defines Canadian subreddits with metadata
|
||
**Structure:**
|
||
```json
|
||
{
|
||
"priorities": {
|
||
"critical": {
|
||
"subreddits": [
|
||
{
|
||
"name": "toronto",
|
||
"province": "ON",
|
||
"population": "2.9M",
|
||
"priority_score": 10
|
||
}
|
||
]
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
## 🛠️ RSS Generation Scripts
|
||
|
||
### generate_modular_rss_feeds.py (Primary)
|
||
**Purpose:** Main RSS feed generation script
|
||
**Features:**
|
||
- Reads from data/ source files
|
||
- Generates both Markdown and OPML outputs
|
||
- Modular design for easy maintenance
|
||
- Handles keyword categorization automatically
|
||
|
||
**Usage:**
|
||
```bash
|
||
cd scripts
|
||
python3 generate_modular_rss_feeds.py
|
||
```
|
||
|
||
**Output:**
|
||
- `feeds/rss_feeds_[timestamp].md` - Human-readable RSS feed documentation
|
||
- `feeds/rss_feeds_[timestamp].opml` - RSS reader import file
|
||
|
||
### Keyword Update Scripts
|
||
|
||
#### update_keywords_from_website.py
|
||
**Purpose:** Manually update keywords from motherboardrepair.ca
|
||
**Usage:**
|
||
```bash
|
||
cd scripts
|
||
python3 update_keywords_from_website.py
|
||
```
|
||
|
||
#### extract_website_keywords.py
|
||
**Purpose:** Extract keywords from website YAML/CSV files (requires pyyaml)
|
||
**Usage:**
|
||
```bash
|
||
pip install pyyaml
|
||
cd scripts
|
||
python3 extract_website_keywords.py
|
||
```
|
||
|
||
## 🔄 RSS Feed Generation Process
|
||
|
||
### 1. Keyword Processing
|
||
```python
|
||
# Load keywords from data/repair_keywords.json
|
||
keywords = load_keywords()
|
||
|
||
# For each category (iphone_repairs, macbook_repairs, etc.)
|
||
for category, data in keywords["categories"].items():
|
||
# Extract devices and problems
|
||
devices = data["devices"]
|
||
problems = data["problems"]
|
||
|
||
# Generate search query: (device1 OR device2) AND (problem1 OR problem2)
|
||
search_query = build_search_query(devices, problems)
|
||
```
|
||
|
||
### 2. URL Generation
|
||
```python
|
||
# Reddit search RSS format
|
||
base_url = "https://www.reddit.com/r/{}/search.rss?q={}&sort=new&type=link"
|
||
|
||
# URL encode the search query
|
||
encoded_query = urllib.parse.quote(search_query)
|
||
rss_url = base_url.format(subreddit_name, encoded_query)
|
||
```
|
||
|
||
### 3. Output Generation
|
||
|
||
#### Markdown Output
|
||
- Hierarchical structure by priority/city/category
|
||
- Search queries and RSS URLs for each feed
|
||
- Device and problem breakdowns
|
||
- Implementation guidance
|
||
|
||
#### OPML Output
|
||
- XML format for RSS reader bulk import
|
||
- Nested outlines by priority/subreddit/category
|
||
- RSS XML URLs with proper encoding
|
||
|
||
## 📝 Adding New Keywords
|
||
|
||
### 1. Edit repair_keywords.json
|
||
```json
|
||
{
|
||
"categories": {
|
||
"new_category": {
|
||
"name": "New Device Repairs",
|
||
"description": "New device type repair requests",
|
||
"devices": ["Device1", "Device2"],
|
||
"problems": ["issue1", "issue2", "issue3"]
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### 2. Regenerate RSS Feeds
|
||
```bash
|
||
cd scripts
|
||
python3 generate_modular_rss_feeds.py
|
||
```
|
||
|
||
## 🏙️ Adding New Canadian Cities
|
||
|
||
### 1. Edit canadian_subreddits.json
|
||
```json
|
||
{
|
||
"priorities": {
|
||
"medium": {
|
||
"subreddits": [
|
||
{
|
||
"name": "newcity",
|
||
"province": "AB",
|
||
"population": "500K",
|
||
"priority_score": 5
|
||
}
|
||
]
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### 2. Regenerate RSS Feeds
|
||
```bash
|
||
cd scripts
|
||
python3 generate_modular_rss_feeds.py
|
||
```
|
||
|
||
## 🔍 Reddit Search RSS Format
|
||
|
||
### URL Structure
|
||
```
|
||
https://www.reddit.com/r/[subreddit]/search.rss?q=[query]&sort=new&type=link
|
||
```
|
||
|
||
### Query Syntax
|
||
- **AND operations:** Use `AND` between device and problem groups
|
||
- **OR operations:** Use `OR` within device/problem groups
|
||
- **Exact phrases:** Use `"quotes"` for multi-word terms
|
||
- **URL encoding:** All special characters must be URL-encoded
|
||
|
||
### Examples
|
||
```python
|
||
# iPhone repairs
|
||
query = '("iPhone" OR "iPhone 12") AND ("repair" OR "broken")'
|
||
|
||
# URL encoded
|
||
encoded = urllib.parse.quote(query)
|
||
url = f"https://www.reddit.com/r/toronto/search.rss?q={encoded}&sort=new&type=link"
|
||
```
|
||
|
||
## 🧪 Testing RSS Feeds
|
||
|
||
### Manual Testing
|
||
1. Copy RSS URL to browser
|
||
2. Verify feed loads and shows recent posts
|
||
3. Check that search results match expected keywords
|
||
4. Test OPML import in RSS reader
|
||
|
||
### Automated Testing
|
||
```bash
|
||
# Test feed validity (requires feedparser)
|
||
pip install feedparser
|
||
python3 -c "
|
||
import feedparser
|
||
feed = feedparser.parse('YOUR_RSS_URL')
|
||
print(f'Feed title: {feed.feed.title}')
|
||
print(f'Entries: {len(feed.entries)}')
|
||
"
|
||
```
|
||
|
||
## 🚀 Deployment
|
||
|
||
### Git Workflow
|
||
```bash
|
||
# Update source files
|
||
git add data/*.json
|
||
git commit -m "Update keywords/subreddits"
|
||
|
||
# Regenerate feeds
|
||
python3 scripts/generate_modular_rss_feeds.py
|
||
|
||
# Commit generated files
|
||
git add feeds/
|
||
git commit -m "Regenerate RSS feeds with updated data"
|
||
|
||
# Push changes
|
||
git push origin main
|
||
```
|
||
|
||
### Version Control Strategy
|
||
- **Source files** (data/*.json): Always commit changes
|
||
- **Generated files** (feeds/*.md, *.opml): Regenerate as needed, commit for distribution
|
||
- **Scripts**: Version controlled, update as needed
|
||
|
||
## 🐛 Troubleshooting
|
||
|
||
### Common Issues
|
||
|
||
**RSS Feed Not Loading:**
|
||
- Verify subreddit name is correct
|
||
- Check if subreddit restricts RSS access
|
||
- Ensure URL encoding is proper
|
||
|
||
**No Search Results:**
|
||
- Simplify search query (Reddit search has limitations)
|
||
- Check keyword spelling and relevance
|
||
- Verify subreddit has active repair discussions
|
||
|
||
**OPML Import Issues:**
|
||
- Validate XML structure
|
||
- Check for special characters in URLs
|
||
- Test with a single feed first
|
||
|
||
### Debug Mode
|
||
Add debug prints to scripts:
|
||
```python
|
||
# In generate_modular_rss_feeds.py
|
||
print(f"Processing {len(feeds)} feeds...")
|
||
for feed in feeds[:5]: # Debug first 5
|
||
print(f" {feed['subreddit']}: {feed['category_name']}")
|
||
```
|
||
|
||
## 📈 Performance Optimization
|
||
|
||
### Feed Count Management
|
||
- Current: ~322 feeds across 23 subreddits × 14 categories
|
||
- Monitor RSS reader performance with high feed counts
|
||
- Consider priority-based feed generation for large deployments
|
||
|
||
### Update Frequency
|
||
- **Daily:** Regenerate feeds for latest subreddit activity
|
||
- **Weekly:** Update keywords based on lead quality analysis
|
||
- **Monthly:** Add new cities/subreddits as market expands
|
||
|
||
## 🔒 Security Considerations
|
||
|
||
- No API keys or authentication required (Reddit RSS is public)
|
||
- Source files contain only public subreddit information
|
||
- Generated RSS URLs are safe for public distribution
|
||
- No sensitive data stored in repository
|
||
|
||
## 📚 Related Documentation
|
||
|
||
- `docs/WORKFLOW.md` - User-facing workflow guide
|
||
- `docs/QUICK_CREATE_GUIDE.md` - Fast RSS feed creation
|
||
- `docs/canadian-repair-searches.md` - Search strategy details
|
||
- `README.md` - Project overview for users |