Like many people I was appalled at the exposure of peoples Facebook data recently. Although I had stopped using Facebook to post about myself over a year ago, I was still using it to comment on, and react to friends posts and photos.
When I stopped posting my own stuff on Facebook, I wrote a small script to delete every post I had ever made, so I knew that the posts part of my Facebook profile was already purged and clean. What I wanted to do now, was remove all my previous activity.
I did not want to #DeleteFacebook as some of my extended family still use it heavily, and they live far enough away that regular face to face visits are impractical. Also, there are anecdotal rumours that asking Facebook to delete your profile merely tombstones that account, and the data is never properly purged from their systems (anonymised or not).
I know I cannot close the barn door on any data of mine that's already out in the wild, but I can control any further scrapes of my Facebook data by manually removing as much of my Facebook Activity as I can. Unfortunately, and not unexpectedly, Facebook do not give you a simple way to do this.
There are several browser extensions that are available to do what I am attempting, but as a hobbyist coder, it's always more fun to explore how to do these things yourself...
This didn't need to be good or clever code, it just needed to work...
Having already identified that for me at least, scraping www.facebook.com , would just be an exercise in frustration, I decided to use the very basic version of Facebook:
On this page you can see all your recent activity, but more importantly, you can see links to the years and months. It's these links we'll be using to walk the activity history. I created some code that told the WebBrowser control to navigate there and tell me when it's done.
The first step was to collect only the top-level links of the root page of the Activity Log that linked to the next layer of activity. Helpfully these links follow a standard format and always contain:
Once we've got a list of those, we navigate to each one. Again, we're only interested in any links that contain the above string. Any new links we've found (that we've not seen before) we add to the list of pages that we are interested in. In the basic version of Facebook, Activity links that are labelled with only a year , tend to route to a similar page with the year broken down into further month links. If the code walks these two levels correctly, you should have a list of URLs that contains 12 month links for every year your Facebook account has existed.
At this point we can start hunting down Comments, Reactions and Likes. The Delete button for these can be easily identified by the following string matches in the links in each page:
We use our previous list of URLs we collected, and direct the WebBrowser control to navigate to each one. Once there, we parse the all the links in the page looking for the above string matches. If a link contains one of these strings, that link is added to a list of URLs that we call the DeleteList. Once all the year/month URLs have been navigated, we should have all our basic Activity collected for deletion.
Now we tell the WebBrowser control to navigate to each of the URLs in the DeleteList. Each URL, because it sat behind the Delete button for an activity, causes Facebook to delete that specified Activity.
Additionally, I added a check for a link that had the text load more in it, as the basic version of Facebook doesn't show more than a few Activities per page. A busy month would result in several pages hidden behind nested load more links.
I deliberately added a one second throttle into the WebBrowser loop code, so that the automated navigation of the URLs remained stable, as previous scraping work like this showed that the WebBrowser control cannot keep up with fast code loops and page data can be truncated or not loaded. Also, I didn't want to trigger any anti-scraping detection that Facebook may have. They probably don't, but it's better to be safe than sorry.
Having already run through my code without actually navigating to the Delete pages, I was fairly sure that my code did what I needed to do. So I let it rip, and kissed goodbye to all my activity.
Or So I Thought...
After some time (more than an hour, less than three). I rechecked my Activity Log via mbasic.facebook.com and all seemed nice and empty. Hooray, success! To be absolutely sure, I switched to the full version of Facebook, and checked my Activity Log there. There were still hundreds of comments, likes, and reactions! Something, somewhere had gone wrong...
I could continue to bore you with all the investigatory steps I took to track down what was going on, but I won't, as you are probably bored already. It turns out that the basic version of your Activity Log does not surface all your activity. To see all your comments, or all your reactions, you need to set a filter by clicking on the big filter button and picking an activity type from the list. You know when you've done it as the top of the log shows the filter type:
In order to dig up every comment or reaction I ever made, I needed to repeat everything I'd done, for each Activity type I wanted to erase. For me, as I wasn't a heavy user of most of Facebooks features, this boiled down to just these:
Yes, Facebook makes a note of every video you clicked on in your newsfeed. I hadn't realised this and stumbled over it as I was checking each Activity type. Personally, I think this meta-data is used heavily by the algorithms that analyses your profile for advertising. For once, Facebook does the honourable thing here, and there is a delete video history button, other activities don't seem to have this option.
Anyway, once I'd re-run the code against each activity type, my activity feed is as clean as it can be. There are some activities that can't be deleted, such as changes to your profile information, but it's not much compared to what you can delete.
A couple of things I learnt on this journey:
I hope the breakdown of my attempt to erase my Facebook history without deleting my Facebook account, has been an interesting read for you. If you want to ask questions, or simply throw abuse at me (please don't...), I can be found on Twitter.
See Also: Hacker News Comment Thread
- Jaruzel, March 2018