From “The Zen Of Akka” By Konrad Malawski: Protip: Logging Formatting

I was watching the Zen Of Akka by Konrad Malawski, when he came across an interesting tidbit.

When writting out to log files, use the {} formatting options rather than string interpolation. If you’re doing debug logging, and it’s turned off: No formating is done. With String interpolation, the string is formated prior to executing the method.

That was something that I never really thought about, but he’s right!

 

Akka/Scala vs Groovy for a Periodic Process

Last fall I created two different processes that were responsible for a reoccurring task of pulling data and broadcasting it elsewhere. The Akka based project pulled the newest meetups from the meetup page and then rebroadcasted them via Reddit and Twitter. The Groovy-based application pulled posts from a forum and then created a summary post of the posts pulled.

Both applications were reoccurring and were only activated on a schedule. However, there were two different outcomes. The Groovy based jar was scheduled as a cronjob and exits when done. The Akka process is set up as a systemd service and remains up.

Last fall I created two different processes that were responsible for a reoccurring task of pulling data and broadcasting it elsewhere. The Akka based project pulled the newest meetups from the meetup page and then rebroadcasted them via Reddit and Twitter. The Groovy-based application pulled posts from a forum and then created a summary post of the posts pulled.

Both applications were reoccurring and were only activated on a schedule. However, there were two different outcomes. The Groovy based jar was scheduled as a cronjob and exits when done. The Akka process is set up as a systemd service and remains up.

How did it turn out?

Both of the solutions work and have required very little maintenance. When running, the Groovy process takes up less than 70mb of memory, and the Akka based process takes more than 200mb  of memory. [It’s showing as 0.3% memory usage on a 65gb machine] (It’s written in Scala, and brings in Akka) Nether of the processes are intense enough to make a noticeable effect on the CPU. The ultimate package size is the following: Akka- 42mb Groovy- 12mb. (That is a little deceptive as that that the Groovy process contains less 3rd party libraries, but the amount of libraries that Scala and Akka bring in are a lot).

Now it comes down to probably the biggest concern: The time it took to develop. It took a month of lunches to develop the Scala and Akka based application. It took so long because I had to find and get up to speed on the Scala-based libraries for working with Rest services (I had spun my wheels with different clients), Scalalikejdbc, and Twitter4j. I learned another lession: Don’t use SBT. It’s a pain to use when compared to Gradle or Maven. On top of all of that I had a very challenging lesson: Don’t mix Scala versions for dependencies that weren’t compiled against the Scala version that your application is using.

The Groovy-based application took three lunches to write. One lunch (~30-40min) to write the main logic, the rest for the packaging. (The worst part of this was researching on how to make a fat Jar for Groovy under Gradle).

What did I learn?

Akka is an amazing piece of technology. I like using it, and I like using Scala. However it turns out for a process that is meant to be periodic, using REST calls: you are far better off writing the process in Groovy, letting Linux handle scheduled execution and getting it done quicker.

When would I use Akka? I would use it for a system that would expect constant requests, and that it had to be highly reactive, may include complexity, and would expect a high throughput. (The REST service is a good example for this)

I gave a technical talk and so should you

Public speaking is a lot trickier than you’d think. It’s not hard, its mentally challenging, however it can be very rewarding. The majority of the population can talk about their interests to their coworkers, friends and family for a long period of time. However, they will struggle to talk to a large amount of people about what they know. It’s a shame, and it’s an odd problem to have and it’s a phobia that some have. It’s a little odd because the perceived threat isn’t a physical attack or something that threatens one’s life.

The common negative perception based on poor performance, making mistakes that are “attributed” to one’s self, and perception from others. The reality of the situation is that even if you do very poorly: It’s incredibly rewarding. In the upside, you may be invited to talk at other places, talk on other subjects, you may be considered a subject matter expert, get paid to give talks, etc. On the worse case of this: You may get stage fright and most people can understand that. Even in the worse case: You gained a huge amount of respect from the audience, and you learned that you have something you can work on and improve.

Recently I gave a quick talk in front of my local technical community. I was encouraged by the local Chicago Java Users group to give a lightning talk. Lightning talks are brief five minute talks on a particular subject. I’ve seen others do lightning talks before and they seem to breeze by fairly quickly. Additionally the format of this isn’t necessarily to show strong talks, but it’s desgined to get people up in front of an audience, and give them a taste at presenting. A friend of mine, Mark Thompson, encouraged me to go through with my talk. My talk was on the Spock Testing Framework and why people should use it. Prior to the talk I was a bit nervous; there were 60-80 people there. During the talk I felt that it didn’t go as smoothly as I thought it would in my head. However, the audience reaction was good, and my feedback was great. I watched the video later: Despite having a few quirks that should get ironed out, I think I did well for my first public technical talk.

Things I learned:

  1. Practice the talk before hand. I believe that I had a few issues during the talk because I hadn’t vocalized what I had planned to speak about
  2. Practice with the equipment that you’re given
  3. Make sure to have it recorded. This will give you valuable feedback later. Luckly for me, this was done by the CJUG crew.
  4. Always have others review your slides (Did that)
  5. Keep your slides as short as possible (I did ok on that)
  6. Dress for the part (Dress up): I could have done much better on this.

Curious about the talk or Spock? If so take a look at the video and feel free to respond with constructive criticism. I’m always willing to listen.

A side note about the talk: I did have something to do with coercing a few colleagues of mine to do a talk. They were: Todd Ginsberg (A short introduction to Akka)  and Mary Grygleski (An introduction into Mule ESB). They did a fantastic job as well!

How I got my Bot Banned by Twitter

It started with a news article about a contest-winning bot on twitter written by Hunter Scott. Mr. Scott had tested a curiosity and was rewarded by a fairly impressive collection of stuff. What Mr. Scott observed was that a lot of accounts on Twitter were offering to give away prizes for retweeting and favoring.

What did I expect?

I had expected to get a few false positive. Also, I suspected that since the article: the giveaway market on Twitter would be flooded with bots. I also assumed that the Twitter API limit would be generous and the API experience would be as good as the web experience.

 

I found out that it was fairly easy to run past your limit on the Twitter API and when using Twitter4j. Twitter4j on a few things it doesn’t:

 

  1. Rate limit the requests
  2. Attempt to interpret non HTTP-OK results from the API. The returned exception from the API client was a generic TwitterException and gave the message that came back from Twitter. (Sometimes being “You’ve hit your daily (status update limit|retweet limit|follow limit.”)

 

Additionally what I found out was that when performing a query on the API, you don’t get as reliant results as you do on the web. I typically found that out of 18 results, only 8 of them would contain some combinations of the words that I would be looking for. I also found out that there were bots who looked for other bots like mine and started retweeting content.

What did I use to write this bot?

To get better at using Akka and Scala, I used the language and the framework in combination with: Twitter4j, Gradle, Postgres, and ScalalikeJDBC.

What did I learn?

  • Twitter has a lot of bots on their platform and they’re incredibly good about detecting the most basic bots (like mine was).
  • Twitter users hold a LOT of contests. Some of them lack of applicants, and others have thousands.
  • Working with Twitter with a bot is a risky and “dark art” (There are lots of rumors about how to appropriately work with Twitter. (Even for legitimate business reasons))
  • I learned more about Akka Routers. (They’re pretty cool, but can be a little difficult to tune)
  • [Later when I started working with the Reddit API] I learned about using the RateLimiter functionality in Guava. That’s some pretty cool stuff. I’m not sure that it would be very useful in Twitter4j, as that the limits are a bit more granular with Twitter.
  • Scala/Akka is INCREDIBLY fast when you build out your application right. I burned through the Twitter API limit within a minute.

So you were banned? What happened?

I was banned because I believe that I was too aggressive at following and retweeting and I got caught by passing a threshold. It’s not too terribly surprising since my application was a lot faster than I thought it would be. I never got a chance to bring this on as a script. At the moment, the application behind the program has “Write restrictions.” (I’m not total banned, but it severely cripples the functionality)

False Positives:

One of the biggest false positives that I found was due to tweets about sport teams. For example if someone were to post “RT: SportsTeam Jim’s Team had their fourth win” my bot would pick up on that. I partially solved this issue via a keyword and username blacklist.

 

My bot also found the retweets of others that had tweeted. I resolved this by diving down to the root level of the retweet. However, this wasn’t always possible. This also cut down on multiple attempts to reply to the same tweet.

 

Part of the issue about finding these false positives was that the search API produced partial matches. This was resolved by researching through the tweets returned.

 

I would consider this to be a false positive, but it’s not by definition. There were a lot of contests that surrounded pop stars. I resolved this via a word blacklist filter. (In particular I had to blacklist gaga, 5OS, and bieber [shudder])

Other observations:

  1. There are a ton of bots on Twitter that’ll automatically follow you back if you ever interact with their account.
  2. There are a lot of bots that’ll DM you if follow their account. These got incredibly annoying because it pushed an advertisement directly to your message box.
  3. There is a guy who writes a ton of ebooks about the “Dog who _____” (Ate the airplane, burglar, drawing, etc…) [That’s one hungry dog].
  4. There are some bots that look for your “first tweet” message that twitter encourages for you to tweet. That was pretty shady.

Ok what did you win?!

I didn’t win very much. I won some “credit” on a free to play game, and I won 2 tickets to a CalTech vs UCSB basketball game. (I declined the tickets, and didn’t take the “credit”)

What would I do differently now?

  • I would expand the exception types that Twitter4j returns back. I would give more of a response of
  • I would have respected the API limits a bit more. (I would be a bit more conservative about the amount of giveaways that I would follow).
  • I would probably queue up all of the actions before doing them so that I would avoid false positives at the last step.
  • I would have extended Twitter4j’s rate limiting functionality.

In Conclusion:

I don’t think I’ll continue development on this. I’m a bit reluctant as that I’m not sure that Twitter is going to be lenient on reallowing my application, especially when I’m reluctant to share the source or that I don’t have a public facing thing that they could inspect prior to reapproving. I think I got a lot of value out of this just from the experience to be satisfied.