This tutorial explains how to extract data using Selenium and Python without the Facebook Graph API. The reason why we use Selenium instead of Facebook Graph API is that Facebook could possibly modify or disable any endpoint accesses to the API at any time. One reason is the Cambridge Analytica fiasco, abusing their gains on the Facebook platform.
In the meantime, I’m just thinking when consulting one of the clients about the sentiment analysis project to make this post. Basically what they want to do is get online text data like social media, blog or forums. So what I’m doing is trying to use Selenium to build a sample automation tool. It can act like a human. Similar to the Facebook Graph API but some of the endpoints including permissions and public data, seem to have been disabled. If we rely solely upon this technique when they cut off the API, it is very difficult to create a perfect web scraper tool.
Whats Selenium again? Selenium is basically a tool for automating your browser, allowing you to monitor and use your browser as if it were being used by a human. It’s kind of like imitating a user’s actions with Selenium. What I have done is instruct my script to go to mobile.facebook.com > Login > fan page post. To run this program, you can use the Jupyter Notebook. Don’t forget to download the chromedriver here and place it into the respective project directory.
import time import re from selenium import webdriver from selenium.webdriver.common.keys import Keys usr = "<your_facebook_email_address>" pwd = "<your_facebook_password>" url = "https://mobile.facebook.com/story.php?story_fbid=10156391722455952&id=157851205951" driver = webdriver.Chrome('/Users/zero/Documents/GitHub/SentimentAnalysis/chromedriver') driver.get(url) time.sleep(1) if driver.find_element_by_xpath('//*[@id="viewport"] /div/div /div/div /div/a'): driver.find_element_by_xpath('//*[@id="viewport"] /div/div /div/div /div/a').click() elem = driver.find_element_by_id("m_login_email") elem.send_keys(usr) elem = driver.find_element_by_id("m_login_password") elem.send_keys(pwd) elem.send_keys(Keys.RETURN) hasLoadMore = True while hasLoadMore: time.sleep(1) try: if driver.find_element_by_xpath('//*[@id="viewport"] /div/div /div/div/div/div/div/div /div/div/div /*[@class="async_elem"] /a'): driver.find_element_by_xpath('//*[@id="viewport"] /div/div /div/div/div/div/div/div /div/div/div /*[@class="async_elem"] /a').click() except: hasLoadMore = False users_list =  users = driver.find_elements_by_class_name('_2b05') for user in users: users_list.append(user.text) i = 0 texts_list =  texts = driver.find_elements_by_class_name('_2b06') for txt in texts: texts_list.append(txt.text.split(users_list[i] )) i += 1 comments_count = len(users_list) for i in range(1, comments_count): user = users_list[i] text = texts_list[i] print("User ",user) print("Text ",text)
The above is not a complete code. You may try trigger View more comments… button to load more comments until viewing all.
Also, you can save all the data into MongoDB for future analysis.
I’m not responsible if there any legal action or damage occurs for this. It’s just for education purposes.