UAB: Scraping Messages for Investing & Trading

Anyfactor
2 min readNov 27, 2020

UAB stands for Useless Anyfactor Blog. This is a real project, I apologize for not going into the details.

The client asked me to scrape a really ’90s looking messaging board. It is very unusual because I thought messaging and forum activities are more or less now centralized in discord, slack and private subreddits.

He identified a user who has been consistently successful in making calls and very informative and descriptive about his process. So he asked me to create a script that will go through each of his messages and than -

  1. Scrape the message
  2. Scrape the data
  3. Scrape the current URL
  4. Scrape the messaged that he might have replied to

What I assume with this data he is going to do a event study to validate his trading prowess. There is at least few thousands of messages.

So, I was bored and because there wasn’t any security measures for the site, I just chose the most convenient thing which was selenium.

Selenium is a stupid easy tool to use if you are doing web scraping, but it is incredibly slow. I stopped using selenium for scraping and automation these days. I usually reverse engineer sites because it is less resource consuming and faster. But like I said -

I was bored

So login. Because I was “bored”, I didn’t automate the login process rather I just threw an arbitrary input statement while I manually logged in.

Then I identified the elements, did some shenanigan with the Xpath tag and bada boom I have a script going.

Then I write everything to a CSV file.

Lessons Learned

  • You shouldn’t use selenium unless the data scrape is relatively small (<1000 webpage visits).
  • Bloated code adds up over time.
  • If you are bored and tired, don’t get to the project right away. Take some time. You might be building a foundation where you pigeonhole yourself.
  • Avoid drinking coffee (!)
  • Carefully assess the scope of the project before giving the quote even though if the client has a budget. You can’t re-negotiate midway into the project just because it is taking forever.

Follow me on twitter where I am currently ghost banned

Or reddit where I talk about nonsense

Or be a champ & hire me or just visit the site anyway so I know I am getting some traffic from Medium

🔗 anyfactor.xyz

(This is my 30th article on Medium, yet I haven’t seen a single visit from here. Makes me wonder why I even bother doing this …)

--

--

Anyfactor

I rant about programming, freelancing and other stuff. www.anyfactor.xyz | @anyfactor