Description: I am looking for a skilled developer to create a tool that performs the following tasks:
Google Search Automation:
Perform hourly Google searches for a list of keywords (keyword list should be editable).
Identify the ‘Forums and Discussions’ SERP feature in the search results.
Extract all URLs from this feature for each keyword on the list.
Change Detection:
Compare the newly extracted URLs with the previously stored URLs.
Notify me of any changes in the URLs.
Reddit Comment Scraping:
For any Reddit URLs found in the ‘Forums and Discussions’ section, navigate to the URL and scrape all comments.
Search for comments containing URLs starting with “(the beginning of specific URLs)” with the ability to add multiple paths to the search..
If no such URLs are found, notify me of the failure to find the specific URL.
Option to add additional URL paths to look for in Reddit post comments.
Notification System:
Send notifications via email (or other preferred methods) when changes are detected or when a target URL is not found in Reddit comments.
Data Storage and Access:
Store the collected URLs in a database.
Provide access to the current list of URLs, sortable by keyword.
Requirements:
Experience with web scraping tools like Selenium and BeautifulSoup.
Knowledge of handling Google’s search results HTML structure.
Experience with scraping Reddit without using their API.
Ability to set up a database (SQLite, MySQL, etc.) for storing URLs.
Implement a notification system (email, SMS, etc.) for changes.
Optionally, create a simple web interface for accessing and sorting the URLs.
Budget: TBD
Deadline: 4 weeks from the start date
Please include:
Your relevant experience and examples of similar projects.
Estimated time and cost for completion.
Any questions or clarifications you need.
Sample Python Script Outline without Reddit API
Here’s how you might adjust the Python script to scrape Reddit comments directly using Selenium and BeautifulSoup:
python
Copy code
from selenium import webdriver
from import Keys
from bs4 import BeautifulSoup
import time
import smtplib
from import MIMEText
import sqlite3
# Database setup
conn = (‘serp_’)
cursor = ()
(”’CREATE TABLE IF NOT EXISTS serp_results
(keyword TEXT, url TEXT, timestamp DATETIME DEFAULT CURRENT_TIMESTAMP)”’)
()
def google_search(keyword):
driver = ()
(‘’)
search_box = _element_by_name(‘q’)
search__keys(keyword)
search__keys()
(3) # wait for results to load
soup = BeautifulSoup(_source, ‘’)
()
return soup
def extract_forums_and_discussions(soup):
forums_and_discussions = []
for section in _all(‘div’, {‘class’: ‘YOUR_FORUMS_AND_DISCUSSIONS_CLASS’}):
for link in _all(‘a’):
forums_and_((‘href’))
return forums_and_discussions[:4] # Get top 3-4 URLs
def check_for_changes(keyword, new_urls):
(‘SELECT url FROM serp_results WHERE keyword=? ORDER BY timestamp DESC LIMIT 4′, (keyword,))
old_urls = ()
if set(old_urls) != set(new_urls):
send_notification(keyword, new_urls)
def send_notification(keyword, new_urls):
msg = MIMEText(f’Changes detected for keyword {keyword}:\n’ + ‘\n’.join(new_urls))
msg[‘Subject’] = f’Keyword Alert: {keyword}’
msg[‘From’] = ‘’
msg[‘To’] = ‘’
s = (‘’)
(‘’, ‘your_password’)
_message(msg)
()
def store_results(keyword, urls):
for url in urls:
(‘INSERT INTO serp_results (keyword, url) VALUES (?, ?)’, (keyword, url))
()
def scrape_reddit_comments(reddit_url, target_urls):
driver = ()
(reddit_url)
(3) # wait for page to load
soup = BeautifulSoup(_source, ‘’)
()
comments = _all(‘div’, {‘class’: ‘Comment’})
for comment in comments:
if any(url in _text() for url in target_urls):
return True
return False
def main():
keywords = [‘example keyword 1’, ‘example keyword 2’] # List of your keywords
target_urls = [‘’] # List of URLs to check in Reddit comments
while True:
for keyword in keywords:
soup = google_search(keyword)
new_urls = extract_forums_and_discussions(soup)
check_for_changes(keyword, new_urls)
store_results(keyword, new_urls)
for url in new_urls:
if ‘’ in url:
if not scrape_reddit_comments(url, target_urls):
send_notification(keyword, [f”No target URL found in comments for {url}”])
(3600) # Run every hour
if __name__ == ‘__main__’:
main()
Key Changes and Additions
Reddit Comment Scraping:
Use Selenium to navigate to Reddit URLs and scrape the page content.
Use BeautifulSoup to parse the comments from the HTML.
No API Usage:
All interactions with Reddit are done via direct scraping.
APPLY FOR THIS JOB:
Company: Rhino Squad
Name: Mark Huntley
Email: