UAB stands for Useless Anyfactor Blog. This is a real project, I apologize for not going into the details.
The client asked me to scrape a really ’90s looking messaging board. It is very unusual because I thought messaging and forum activities are more or less now centralized in discord, slack and private subreddits.
He identified a user who has been consistently successful in making calls and very informative and descriptive about his process. So he asked me to create a script that will go through each of his messages and than -
- Scrape the message
- Scrape the data
- Scrape the current URL
- Scrape the messaged that he might have replied to
What I assume with this data he is going to do a event study to validate his trading prowess. There is at least few thousands of messages.
So, I was bored and because there wasn’t any security measures for the site, I just chose the most convenient thing which was selenium.
Selenium is a stupid easy tool to use if you are doing web scraping, but it is incredibly slow. I stopped using selenium for scraping and automation these days. I usually reverse engineer sites because it is less resource consuming and faster. But like I said -
I was bored
So login. Because I was “bored”, I didn’t automate the login process rather I just threw an arbitrary input statement while I manually logged in.
Then I identified the elements, did some shenanigan with the Xpath tag and bada boom I have a script going.
Then I write everything to a CSV file.
Lessons Learned
- You shouldn’t use selenium unless the data scrape is relatively small (<1000 webpage visits).
- Bloated code adds up over time.
- If you are bored and tired, don’t get to the project right away. Take some time. You might be building a foundation where you pigeonhole yourself.
- Avoid drinking coffee (!)
- Carefully assess the scope of the project before giving the quote even though if the client has a budget. You can’t re-negotiate midway into the project just because it is taking forever.
Follow me on twitter where I am currently ghost banned
Or reddit where I talk about nonsense
Or be a champ & hire me or just visit the site anyway so I know I am getting some traffic from Medium
🔗 anyfactor.xyz
(This is my 30th article on Medium, yet I haven’t seen a single visit from here. Makes me wonder why I even bother doing this …)