How I got my Bot Banned by Twitter

It started with a news article about a contest-winning bot on twitter written by Hunter Scott. Mr. Scott had tested a curiosity and was rewarded by a fairly impressive collection of stuff. What Mr. Scott observed was that a lot of accounts on Twitter were offering to give away prizes for retweeting and favoring.

What did I expect?

I had expected to get a few false positive. Also, I suspected that since the article: the giveaway market on Twitter would be flooded with bots. I also assumed that the Twitter API limit would be generous and the API experience would be as good as the web experience.

 

I found out that it was fairly easy to run past your limit on the Twitter API and when using Twitter4j. Twitter4j on a few things it doesn’t:

 

  1. Rate limit the requests
  2. Attempt to interpret non HTTP-OK results from the API. The returned exception from the API client was a generic TwitterException and gave the message that came back from Twitter. (Sometimes being “You’ve hit your daily (status update limit|retweet limit|follow limit.”)

 

Additionally what I found out was that when performing a query on the API, you don’t get as reliant results as you do on the web. I typically found that out of 18 results, only 8 of them would contain some combinations of the words that I would be looking for. I also found out that there were bots who looked for other bots like mine and started retweeting content.

What did I use to write this bot?

To get better at using Akka and Scala, I used the language and the framework in combination with: Twitter4j, Gradle, Postgres, and ScalalikeJDBC.

What did I learn?

  • Twitter has a lot of bots on their platform and they’re incredibly good about detecting the most basic bots (like mine was).
  • Twitter users hold a LOT of contests. Some of them lack of applicants, and others have thousands.
  • Working with Twitter with a bot is a risky and “dark art” (There are lots of rumors about how to appropriately work with Twitter. (Even for legitimate business reasons))
  • I learned more about Akka Routers. (They’re pretty cool, but can be a little difficult to tune)
  • [Later when I started working with the Reddit API] I learned about using the RateLimiter functionality in Guava. That’s some pretty cool stuff. I’m not sure that it would be very useful in Twitter4j, as that the limits are a bit more granular with Twitter.
  • Scala/Akka is INCREDIBLY fast when you build out your application right. I burned through the Twitter API limit within a minute.

So you were banned? What happened?

I was banned because I believe that I was too aggressive at following and retweeting and I got caught by passing a threshold. It’s not too terribly surprising since my application was a lot faster than I thought it would be. I never got a chance to bring this on as a script. At the moment, the application behind the program has “Write restrictions.” (I’m not total banned, but it severely cripples the functionality)

False Positives:

One of the biggest false positives that I found was due to tweets about sport teams. For example if someone were to post “RT: SportsTeam Jim’s Team had their fourth win” my bot would pick up on that. I partially solved this issue via a keyword and username blacklist.

 

My bot also found the retweets of others that had tweeted. I resolved this by diving down to the root level of the retweet. However, this wasn’t always possible. This also cut down on multiple attempts to reply to the same tweet.

 

Part of the issue about finding these false positives was that the search API produced partial matches. This was resolved by researching through the tweets returned.

 

I would consider this to be a false positive, but it’s not by definition. There were a lot of contests that surrounded pop stars. I resolved this via a word blacklist filter. (In particular I had to blacklist gaga, 5OS, and bieber [shudder])

Other observations:

  1. There are a ton of bots on Twitter that’ll automatically follow you back if you ever interact with their account.
  2. There are a lot of bots that’ll DM you if follow their account. These got incredibly annoying because it pushed an advertisement directly to your message box.
  3. There is a guy who writes a ton of ebooks about the “Dog who _____” (Ate the airplane, burglar, drawing, etc…) [That’s one hungry dog].
  4. There are some bots that look for your “first tweet” message that twitter encourages for you to tweet. That was pretty shady.

Ok what did you win?!

I didn’t win very much. I won some “credit” on a free to play game, and I won 2 tickets to a CalTech vs UCSB basketball game. (I declined the tickets, and didn’t take the “credit”)

What would I do differently now?

  • I would expand the exception types that Twitter4j returns back. I would give more of a response of
  • I would have respected the API limits a bit more. (I would be a bit more conservative about the amount of giveaways that I would follow).
  • I would probably queue up all of the actions before doing them so that I would avoid false positives at the last step.
  • I would have extended Twitter4j’s rate limiting functionality.

In Conclusion:

I don’t think I’ll continue development on this. I’m a bit reluctant as that I’m not sure that Twitter is going to be lenient on reallowing my application, especially when I’m reluctant to share the source or that I don’t have a public facing thing that they could inspect prior to reapproving. I think I got a lot of value out of this just from the experience to be satisfied.

 

Seven Databases in Seven Weeks: Postgres

After finishing “Coders at Work, “ more on that in a future blog post, and having little experience with non-RDBMS databases, I picked the book “Seven Databases in Seven Weeks” by Eric Redmond. The book appears to be of similar quality to it’s sibling “Seven Languages in Seven Weeks” by Bruce Tate.

The book starts out with the Postgres database. At the time of writing, this database wasn’t as popular as MySQL however it does make a good starting point as a baseline of comparison. It represents the “old guard” of databases. For most of the first week, I found that the first half of the first week was not of much interest to me. However, the fuzzy search extensions and full text search extensions caught my attention. I have always been aware that the capabilities existed, however, I never knew how they worked. Additionally the downloadable source code helped with creating a testing environment right out of the box. This was the same case for the “cube” extension/datatype. I found it very exciting to find out that you could do some rather interesting operations with multidimensional data and queries. I can’t claim that I’m an expert on using these features but its rather nice to have some hands on experience for it.

I don’t believe that having that content was the greatest value of the book. I believe what gives the book the greatest value is that investigating more on the cube package it led me to finding an online directory of the available extensions. I found the Postgres Extension Network. How exciting is it to find a directory of extensions to a fairly standard database that allows you to do some cool things? You can find extensions to interact with JSON data, store bitmaps, keep key/value data, additional aggregation functions, weight averages (This is a VERY interesting addition), and even attempts to do a “connected regions” logic within data items. These are reusable components that others have created, and that I found that I could get the database to perform these actions rather than code them myself.