rss-feedmonitor/DEVELOPMENT.md

344 lines
8.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Canadian Repair RSS Feed Monitor - Development Guide
**For Developers:** Technical documentation for maintaining and extending the RSS feed generation system.
## 🏗️ Project Architecture
### Directory Structure
```
rss-feedmonitor/
├── README.md # User-facing documentation
├── DEVELOPMENT.md # This technical guide
├── .gitignore # Git ignore rules
├── docs/ # User documentation
│ ├── WORKFLOW.md
│ ├── QUICK_CREATE_GUIDE.md
│ ├── KEYWORD_OPTIMIZATION.md
│ ├── canadian-subreddits.md
│ └── canadian-repair-searches.md
├── scripts/ # Python generation scripts
│ ├── generate_modular_rss_feeds.py
│ ├── generate_optimized_rss_feeds.py
│ ├── generate_practical_rss_feeds.py
│ ├── generate_rss_feeds.py
│ ├── extract_website_keywords.py
│ └── update_keywords_from_website.py
├── data/ # Source data files
│ ├── repair_keywords.json
│ ├── canadian_subreddits.json
│ └── remaining_queries.txt
├── feeds/ # Generated RSS feeds
│ ├── rss-feeds.json
│ ├── *.md (generated RSS docs)
│ └── *.opml (RSS reader imports)
└── archive/ # Old/deprecated files
├── ALERT_CREATION_PROCESS.md
└── ALL_SEARCH_LINKS_COMPLETE.txt
```
## 🔧 Development Setup
### Prerequisites
- Python 3.8+
- No external dependencies for core functionality
- Optional: `pyyaml` for advanced keyword extraction
### Installation
```bash
# Clone the repository
git clone <repository-url>
cd rss-feedmonitor
# No pip installs required for basic functionality
# Optional: pip install pyyaml (for extract_website_keywords.py)
```
## 📊 Data Sources
### repair_keywords.json
**Purpose:** Defines all repair keyword categories and search terms
**Structure:**
```json
{
"categories": {
"iphone_repairs": {
"name": "iPhone Repair Requests",
"description": "...",
"devices": ["iPhone", "iPhone 12", ...],
"problems": ["repair", "fix", "broken", ...]
}
},
"additional_keywords": {
"urgency_indicators": ["emergency", "urgent", ...],
"location_indicators": ["local", "near me", ...]
}
}
```
### canadian_subreddits.json
**Purpose:** Defines Canadian subreddits with metadata
**Structure:**
```json
{
"priorities": {
"critical": {
"subreddits": [
{
"name": "toronto",
"province": "ON",
"population": "2.9M",
"priority_score": 10
}
]
}
}
}
```
## 🛠️ RSS Generation Scripts
### generate_modular_rss_feeds.py (Primary)
**Purpose:** Main RSS feed generation script
**Features:**
- Reads from data/ source files
- Generates both Markdown and OPML outputs
- Modular design for easy maintenance
- Handles keyword categorization automatically
**Usage:**
```bash
cd scripts
python3 generate_modular_rss_feeds.py
```
**Output:**
- `feeds/rss_feeds_[timestamp].md` - Human-readable RSS feed documentation
- `feeds/rss_feeds_[timestamp].opml` - RSS reader import file
### Keyword Update Scripts
#### update_keywords_from_website.py
**Purpose:** Manually update keywords from motherboardrepair.ca
**Usage:**
```bash
cd scripts
python3 update_keywords_from_website.py
```
#### extract_website_keywords.py
**Purpose:** Extract keywords from website YAML/CSV files (requires pyyaml)
**Usage:**
```bash
pip install pyyaml
cd scripts
python3 extract_website_keywords.py
```
## 🔄 RSS Feed Generation Process
### 1. Keyword Processing
```python
# Load keywords from data/repair_keywords.json
keywords = load_keywords()
# For each category (iphone_repairs, macbook_repairs, etc.)
for category, data in keywords["categories"].items():
# Extract devices and problems
devices = data["devices"]
problems = data["problems"]
# Generate search query: (device1 OR device2) AND (problem1 OR problem2)
search_query = build_search_query(devices, problems)
```
### 2. URL Generation
```python
# Reddit search RSS format
base_url = "https://www.reddit.com/r/{}/search.rss?q={}&sort=new&type=link"
# URL encode the search query
encoded_query = urllib.parse.quote(search_query)
rss_url = base_url.format(subreddit_name, encoded_query)
```
### 3. Output Generation
#### Markdown Output
- Hierarchical structure by priority/city/category
- Search queries and RSS URLs for each feed
- Device and problem breakdowns
- Implementation guidance
#### OPML Output
- XML format for RSS reader bulk import
- Nested outlines by priority/subreddit/category
- RSS XML URLs with proper encoding
## 📝 Adding New Keywords
### 1. Edit repair_keywords.json
```json
{
"categories": {
"new_category": {
"name": "New Device Repairs",
"description": "New device type repair requests",
"devices": ["Device1", "Device2"],
"problems": ["issue1", "issue2", "issue3"]
}
}
}
```
### 2. Regenerate RSS Feeds
```bash
cd scripts
python3 generate_modular_rss_feeds.py
```
## 🏙️ Adding New Canadian Cities
### 1. Edit canadian_subreddits.json
```json
{
"priorities": {
"medium": {
"subreddits": [
{
"name": "newcity",
"province": "AB",
"population": "500K",
"priority_score": 5
}
]
}
}
}
```
### 2. Regenerate RSS Feeds
```bash
cd scripts
python3 generate_modular_rss_feeds.py
```
## 🔍 Reddit Search RSS Format
### URL Structure
```
https://www.reddit.com/r/[subreddit]/search.rss?q=[query]&sort=new&type=link
```
### Query Syntax
- **AND operations:** Use `AND` between device and problem groups
- **OR operations:** Use `OR` within device/problem groups
- **Exact phrases:** Use `"quotes"` for multi-word terms
- **URL encoding:** All special characters must be URL-encoded
### Examples
```python
# iPhone repairs
query = '("iPhone" OR "iPhone 12") AND ("repair" OR "broken")'
# URL encoded
encoded = urllib.parse.quote(query)
url = f"https://www.reddit.com/r/toronto/search.rss?q={encoded}&sort=new&type=link"
```
## 🧪 Testing RSS Feeds
### Manual Testing
1. Copy RSS URL to browser
2. Verify feed loads and shows recent posts
3. Check that search results match expected keywords
4. Test OPML import in RSS reader
### Automated Testing
```bash
# Test feed validity (requires feedparser)
pip install feedparser
python3 -c "
import feedparser
feed = feedparser.parse('YOUR_RSS_URL')
print(f'Feed title: {feed.feed.title}')
print(f'Entries: {len(feed.entries)}')
"
```
## 🚀 Deployment
### Git Workflow
```bash
# Update source files
git add data/*.json
git commit -m "Update keywords/subreddits"
# Regenerate feeds
python3 scripts/generate_modular_rss_feeds.py
# Commit generated files
git add feeds/
git commit -m "Regenerate RSS feeds with updated data"
# Push changes
git push origin main
```
### Version Control Strategy
- **Source files** (data/*.json): Always commit changes
- **Generated files** (feeds/*.md, *.opml): Regenerate as needed, commit for distribution
- **Scripts**: Version controlled, update as needed
## 🐛 Troubleshooting
### Common Issues
**RSS Feed Not Loading:**
- Verify subreddit name is correct
- Check if subreddit restricts RSS access
- Ensure URL encoding is proper
**No Search Results:**
- Simplify search query (Reddit search has limitations)
- Check keyword spelling and relevance
- Verify subreddit has active repair discussions
**OPML Import Issues:**
- Validate XML structure
- Check for special characters in URLs
- Test with a single feed first
### Debug Mode
Add debug prints to scripts:
```python
# In generate_modular_rss_feeds.py
print(f"Processing {len(feeds)} feeds...")
for feed in feeds[:5]: # Debug first 5
print(f" {feed['subreddit']}: {feed['category_name']}")
```
## 📈 Performance Optimization
### Feed Count Management
- Current: ~322 feeds across 23 subreddits × 14 categories
- Monitor RSS reader performance with high feed counts
- Consider priority-based feed generation for large deployments
### Update Frequency
- **Daily:** Regenerate feeds for latest subreddit activity
- **Weekly:** Update keywords based on lead quality analysis
- **Monthly:** Add new cities/subreddits as market expands
## 🔒 Security Considerations
- No API keys or authentication required (Reddit RSS is public)
- Source files contain only public subreddit information
- Generated RSS URLs are safe for public distribution
- No sensitive data stored in repository
## 📚 Related Documentation
- `docs/WORKFLOW.md` - User-facing workflow guide
- `docs/QUICK_CREATE_GUIDE.md` - Fast RSS feed creation
- `docs/canadian-repair-searches.md` - Search strategy details
- `README.md` - Project overview for users