I made a new thing: serialization-checker

So I just made a new thing, and open sourced it.

It’s called the serialization checker. From the readme page it’s here to solve:

The root problem that led to this project’s creation is that REST typically uses JSON, and that JSON is Schemaless. This makes it difficult to create data objects to interact with services. In the case of connecting to a third-party REST service, you typically have lots of examples. This project helps you, the developer, iterate through the creation of the data objects.

Where can you find this?

Github page: https://github.com/monksy/serialization-checker

Your project:

resolvers += Resolver.bintrayRepo("monksy","maven")

libraryDependencies += "com.mrmonksy" %% "serialization-checker" % "0.1.3"

Or even it’s Bintray: https://bintray.com/monksy/maven/serialization-checker

What have I been learning lately/What Have I been reading lately (March/April 2018)

It’s been a while since I’ve done a blog post like this.

However, I’ve been busy and I have a few things to show for it:

Exciting Meetups Attended:

Projects That I’ve been working on:

    • Scala99
    • Temperature Sensor Data Generator
      • This is a utility project that will generate a series of data from sensors that closely resemble a day’s change in temperature.
      • This is to generate a large enough dataset to really demonstrate large-scale distributed processing
    • CQRS Framework
      • An extendable framework used to track events throughout a data object’s lifespan.
      • Using Shapeless
      • Currently on hold until I can fully wrap my head arround Shapeless
    • Sample Akka-HTTP Based Application for Inventory Pricing
      • This is a sample Akka HTTP Based application that responds to Time based requests for inventory.
      • Akka HTTP was a bit irritating to setup the routing.

Technologies I’m Learning Right Now

  • Apache Spark
  • The Play Web Framework
  • Amazon RedShift
  • Shapeless

Books Read:

Things I’ve Mastered/Dealt with Cooking

  • Sous Viding
    • Experimented with Octopus (They were a bit too small to get right, and this was done with Sous Viding)
    • The Perfect Steak and Crust on the outside
    • Dry Aging Ribeye Steaks
  • Keto
    • Lemon Bars
    • Tirmasu Fudge
    • I’m very close to making a stock

Books I’m Currently Reading

Topics I want to learn/read about

  • Optaplanner
  • More with IOT
    • I had a chance to work with a Seed WIO wifi based IOT board
    • I bought a Nano PI from FriendlyElec.
  • Cassandra
  • ElasticSearch
  • Going further indepth with Kafka
  • Akka Typed

 

You can Either Try or not (Scala: Try, Either)

One of the things that I recently discovered in Scala is Try and Ether statements. They’re extremely cool to work with!

To Try

In the order I learned about them, I’ll explain what a Try is. It is a method that wraps the incoming and handles all exceptions that come out of it. If an exception comes out of it the value that is represented by the Try will be returned as Failure([exceptionvalue]). If the result was a success then the object Success([result value]) is returned. Pretty cool right? This means you don’t have to setup a Try/Catch everytime that you have an exception thrown. Combined with pattern matching, this will allow for you to simplify your statements that have Try/Catches. Even better: Try:Success is an option.

An example of this is:

class demonstration {
  def getGoogleResults(keyword: String) = {
    var googleClientResult = null
    try {
      val googleClient = new GoogleClient(authKeys)
      googleClientResult = googleClient.query(keyword)
    } catch (exception: Exception) {
      System.err.println(s"Error: $exception")
      throw exception
    }
    System.out.println(s"It found   $ {googleClientResult.size}")
  }

  def getBetterGoogleResults(keyword:String) = {
    val googleClient = new GoogleClient(authKeys)
    val googleResults = Try(googleClient.query(keyword))

    googleResults.success.map(v => System.out.println(s"It found ${v.size}")
      googleResults.failure.map(e => {
      System.err.println(s"Error $e")
      throw e
    })
  }
}

The getGoogleResults method shows the Java style of doing this. The getBetterGoogleResults shows the shorter functional try way of going about this. It reduced a bit of boilerplate and made the code a lot easier to read. Something to note: success and failure coming off of the Try object are Options.

How does this work? (Either it does or it doesn’t)

Try is a case of a language feature called Either in Scala. Either is a way to return two possible non-common objects. In other words, you can return either a String or an Integer from a method. Either uses generics to type each case (the left or the right), and it handles the direction of what object that comes back.

Since there is a possibility of different types, it’s not possible to have a single method to get the results of the try. (That would be possible in Groovy!) You have to work with both possible types (as we showed with the success and failure responses in the example above). Luckily, the left and right values coming off of the Ether are Options, which makes this less painful.

Going back to the Try vs. Either reference: A try is and Either object defined as having a Success and Failure object. The other difference is that Try handles exceptions, and it has special names for success and failure added to the class.

References

Some of the things that I’ve been up to in the last few months (March-June)

Some of the things that I’ve been up to in the last few months:

  1. I saw Metallica for the first time a few days ago! That was pretty awesome. I would say that was either my third or second best concert that I’ve ever seen. The first best was Rammstein at the Open Air Festival last year.
  2. I learned about Try/Ether in Scala. That’s pretty awesome. Expect a blog post coming out about that
  3. I started doing the X Effect challenge for 50 squats a day over 49 days. So far I’m 40 days in and I’ve only missed 3 days. That’s pretty awesome.
  4. Wearing sunglasses when it’s sunny is a good idea. It helps to prevent your eyes from straining. I feel like I should have learned this a long time ago. Doh.
  5. I’ve been experimenting with turning my phone off while at work and during social events. This has helped my concentration and focus immensely. Although, it has been a learning experience with quick attempts to check the phone.
  6. I’ve taken a hiatus from side projects for the last 4 months. It’s been nice to have a life back and not worry about getting a huge project done.
  7. My parents came up to Chicago and we visited Davenport, IA. We got to see the Mississipi River. 
  8. I got a fantastic new camera: the Lumix LX10. I love the thing. It’s been taking some great pictures.
  9. I’ve recently started planting more house plants. I added more snake plants, bird of paradise, cactus, and an aloe plant. Outside: I replaced a large weed with an annual Lilly.
  10. I learned how to make side cars with the brandy that my friend Erik gave me for my birthday.
  11. I learned about how amazing Oaxacian tamales are. I had these at the Taste of Little Village Festival

I’m sure that I’ve been up to more, but that’s all that comes up to mind

From “The Zen Of Akka” By Konrad Malawski: Protip: Logging Formatting

I was watching the Zen Of Akka by Konrad Malawski, when he came across an interesting tidbit.

When writting out to log files, use the {} formatting options rather than string interpolation. If you’re doing debug logging, and it’s turned off: No formating is done. With String interpolation, the string is formated prior to executing the method.

That was something that I never really thought about, but he’s right!

 

Akka/Scala vs Groovy for a Periodic Process

Last fall I created two different processes that were responsible for a reoccurring task of pulling data and broadcasting it elsewhere. The Akka based project pulled the newest meetups from the meetup page and then rebroadcasted them via Reddit and Twitter. The Groovy-based application pulled posts from a forum and then created a summary post of the posts pulled.

Both applications were reoccurring and were only activated on a schedule. However, there were two different outcomes. The Groovy based jar was scheduled as a cronjob and exits when done. The Akka process is set up as a systemd service and remains up.

Last fall I created two different processes that were responsible for a reoccurring task of pulling data and broadcasting it elsewhere. The Akka based project pulled the newest meetups from the meetup page and then rebroadcasted them via Reddit and Twitter. The Groovy-based application pulled posts from a forum and then created a summary post of the posts pulled.

Both applications were reoccurring and were only activated on a schedule. However, there were two different outcomes. The Groovy based jar was scheduled as a cronjob and exits when done. The Akka process is set up as a systemd service and remains up.

How did it turn out?

Both of the solutions work and have required very little maintenance. When running, the Groovy process takes up less than 70mb of memory, and the Akka based process takes more than 200mb  of memory. [It’s showing as 0.3% memory usage on a 65gb machine] (It’s written in Scala, and brings in Akka) Nether of the processes are intense enough to make a noticeable effect on the CPU. The ultimate package size is the following: Akka- 42mb Groovy- 12mb. (That is a little deceptive as that that the Groovy process contains less 3rd party libraries, but the amount of libraries that Scala and Akka bring in are a lot).

Now it comes down to probably the biggest concern: The time it took to develop. It took a month of lunches to develop the Scala and Akka based application. It took so long because I had to find and get up to speed on the Scala-based libraries for working with Rest services (I had spun my wheels with different clients), Scalalikejdbc, and Twitter4j. I learned another lession: Don’t use SBT. It’s a pain to use when compared to Gradle or Maven. On top of all of that I had a very challenging lesson: Don’t mix Scala versions for dependencies that weren’t compiled against the Scala version that your application is using.

The Groovy-based application took three lunches to write. One lunch (~30-40min) to write the main logic, the rest for the packaging. (The worst part of this was researching on how to make a fat Jar for Groovy under Gradle).

What did I learn?

Akka is an amazing piece of technology. I like using it, and I like using Scala. However it turns out for a process that is meant to be periodic, using REST calls: you are far better off writing the process in Groovy, letting Linux handle scheduled execution and getting it done quicker.

When would I use Akka? I would use it for a system that would expect constant requests, and that it had to be highly reactive, may include complexity, and would expect a high throughput. (The REST service is a good example for this)

A few things that I would like to see in Docker

I’ve mentioned before that I’ve rather taken a liking to Docker. However, there are a few things that I wish that were better.

Docker Doc

Java doc did something extremely well. (Other than getting people to hate checkstyle) it made sure there there was a standard way to write documentation on your code base, and that it could be formatted in a proper, consistent way.

It would be great to see if there was a similar tool for Docker to produce consistent documentation on what the dockerfile produces. It would also be great to have registry support for this as well.

Some of the things that I would expect to see mentioned on there:

  1. Required environment variables for the application
  2. Documentation on maintenance to be done on the container (i.e. run particular methods via the exec command)
  3. Ports that should be shared
  4. What volumes are assumed to be there
  5. How the configuration should be set  (is it a file, etc).
  6. How to connect the application with other services (databases, message queues, etc)

Docker Templates

It would be great to generate best practice docker files based on if you’re working with a Tomcat application, or a standard Jar. This would get rid of the creation time needed to make a Dockerfile. Along with Dockerfile templates, it would be nice to see better Jenkins publishing support for Docker. However, that’s a subject for another day.

Docker Traits

Multiple inheritance has a lot of issues that Java avoids, and C++ deals with. The issue with this is avoided in Python (with mixins), and Scala/Groovy with the addition of traits. Traits are aways to side load content into the docker file, post the base container, without having to use inheritance. That means that you could create a Docker container with utilities rather than baking them into the base image.

Application Profiles

Applications typically have a limited set of configurations. They are either one time running applications or they’re web applications. There are sometimes deviations from this, but they’re not so common.

For the most part, one time running applications require a configuration set to go into the application and a location to write out the data. For web applications it mostly needs a port to be open, configuration to a data storage system (i.e. MySQL, MongoDB, etc), and configuration information.

The best way that I can forsee this being setup is to have a standard mount location for these folders outside of the container, and to have them set as required inputs for starting the container.

Another item on configuration. I would love to see the configuration copied from the container to its outside storage for the starter configuration. This would mean that the Docker Registry complex configuration would have a new mounted volume outside of the container there with a configuration ready to have a change made to it.

The Scalafication of Java Libraries

Scala is an interesting language – it’s JVM based and compiled down to its Java equivalent. This gives it the ability to work directly with Java libraries. As it currently stands, there are quite a few libraries that are available under the Scala platform, but using the Java based libraries can be less than optimal.

For example, when using the Twitter4j library to send a tweet, you have to perform all of the setup rituals as you would in Java. An example of this is as follows:

val twitterClient: Twitter = new TwitterFactory(TwitterScanner.buildFromConfig().build()).getInstance()
val creds = twitterClient.verifyCredentials()
twitterClient.updateStatus(“Test”)

Ok, that was a little deceptive, as that the TwitterClient provides a means to get a singleton and it ingests the credentials via the properties. However, for most cases where the user credentials, the singleton won’t be used. How would this be different if the library was Scalafied?

I would propose that the class would be modified to allow for the following:


Twitter.updateSatus(“This is a message”)

How is this possible? Assuming that the Twitter client connection was already defined in the scope, tt’s possible due to implicit parameters. Implicit parameters are an extension of a function to bring outside context into a function. They can behave in the same way that a class bringing in contextual data into a constructor and producing a context dependent class. (This is similar to asking for the connection string and credentials for a class that is responsible for dealing with a database).

The definition for the class as defined would be the following:

Object Twitter {
    def updateStatus(val message: String)(implicit val twitterConnectionRequest : TwitterClient) : Int = {
      ....
}
}

Ultimately, the functionality will be the same, but the end result is that the code is easier to read, and you’ll be able to focus on functionality versus setup. How does this work? The second set of parameters allows for the isolation/grouping of parameters. Since all of the parameters in the second set are implicit, Scala allows for the use of the method without the second set. Additionally Scala matches the implicit values from the scope of where the method is being called.

For example the following Java code for dealing with a VMware like environment:

public class MajorUtility {
    public String cloneVMAndStartSync(String vmIdToClone, VMWareClient vmc) {
        String result = "";
        VMWareMachine vmw = vmc.getVM(vmIdToClone);
        if (vmw != null) {
            vmc.stop();
            VMWareMachine newMachine = vmc.clone();
            while (!newMachine.isReady()) {
                Thread.sleep(1000);
            }
            result = newMachine.getId();     
        }
        return result;
    }
}

Would become:

object MajorUtility {
  def cloneVMAndStartSync(vmIdToClone: String)(implicit val vmWareClient: VMWareClient): Optional[String] = {
    var result: String = Future[String] => { "" }
    val vmw: VMWare = getVM(vmIdToClone)
    if (vmw.isPresent) {
      stopVm()
      result = cloneVM(vmIdToClone).onSuccess { case value => value.getId }
    }
    result.get()
  }
}

The major difference between the two is examples are the following:

  1. All of the functional calls to work with the VMware server are implicitly sent into the function.
  2. The functions called are statically imported. (Via object classes in Scala)
  3. The cloneVM Method returns a Future object rather than a state. If the state fails, then the future returns a “onFailure” callback. (Which you can handle, but that’s beyond the scope of this article.)
  4. The value of getting the ID is changed in the future. This means that this operation is kicked off and the value is to be retrieved later.
  5. The code is a bit easier to understand as that the cloneVm operation would return immediately and it would put the processing in the background.
  6. The use of Optional simplifies the null check. (This is the case in the return statement and in the return variable.)

From here, the equivalence of the Java libraries to Scala bring added benefits. It gives the possibility for the code to be more readable, easier to write, and allows for the developer to focus on the functionality needed.

How I got my Bot Banned by Twitter

It started with a news article about a contest-winning bot on twitter written by Hunter Scott. Mr. Scott had tested a curiosity and was rewarded by a fairly impressive collection of stuff. What Mr. Scott observed was that a lot of accounts on Twitter were offering to give away prizes for retweeting and favoring.

What did I expect?

I had expected to get a few false positive. Also, I suspected that since the article: the giveaway market on Twitter would be flooded with bots. I also assumed that the Twitter API limit would be generous and the API experience would be as good as the web experience.

 

I found out that it was fairly easy to run past your limit on the Twitter API and when using Twitter4j. Twitter4j on a few things it doesn’t:

 

  1. Rate limit the requests
  2. Attempt to interpret non HTTP-OK results from the API. The returned exception from the API client was a generic TwitterException and gave the message that came back from Twitter. (Sometimes being “You’ve hit your daily (status update limit|retweet limit|follow limit.”)

 

Additionally what I found out was that when performing a query on the API, you don’t get as reliant results as you do on the web. I typically found that out of 18 results, only 8 of them would contain some combinations of the words that I would be looking for. I also found out that there were bots who looked for other bots like mine and started retweeting content.

What did I use to write this bot?

To get better at using Akka and Scala, I used the language and the framework in combination with: Twitter4j, Gradle, Postgres, and ScalalikeJDBC.

What did I learn?

  • Twitter has a lot of bots on their platform and they’re incredibly good about detecting the most basic bots (like mine was).
  • Twitter users hold a LOT of contests. Some of them lack of applicants, and others have thousands.
  • Working with Twitter with a bot is a risky and “dark art” (There are lots of rumors about how to appropriately work with Twitter. (Even for legitimate business reasons))
  • I learned more about Akka Routers. (They’re pretty cool, but can be a little difficult to tune)
  • [Later when I started working with the Reddit API] I learned about using the RateLimiter functionality in Guava. That’s some pretty cool stuff. I’m not sure that it would be very useful in Twitter4j, as that the limits are a bit more granular with Twitter.
  • Scala/Akka is INCREDIBLY fast when you build out your application right. I burned through the Twitter API limit within a minute.

So you were banned? What happened?

I was banned because I believe that I was too aggressive at following and retweeting and I got caught by passing a threshold. It’s not too terribly surprising since my application was a lot faster than I thought it would be. I never got a chance to bring this on as a script. At the moment, the application behind the program has “Write restrictions.” (I’m not total banned, but it severely cripples the functionality)

False Positives:

One of the biggest false positives that I found was due to tweets about sport teams. For example if someone were to post “RT: SportsTeam Jim’s Team had their fourth win” my bot would pick up on that. I partially solved this issue via a keyword and username blacklist.

 

My bot also found the retweets of others that had tweeted. I resolved this by diving down to the root level of the retweet. However, this wasn’t always possible. This also cut down on multiple attempts to reply to the same tweet.

 

Part of the issue about finding these false positives was that the search API produced partial matches. This was resolved by researching through the tweets returned.

 

I would consider this to be a false positive, but it’s not by definition. There were a lot of contests that surrounded pop stars. I resolved this via a word blacklist filter. (In particular I had to blacklist gaga, 5OS, and bieber [shudder])

Other observations:

  1. There are a ton of bots on Twitter that’ll automatically follow you back if you ever interact with their account.
  2. There are a lot of bots that’ll DM you if follow their account. These got incredibly annoying because it pushed an advertisement directly to your message box.
  3. There is a guy who writes a ton of ebooks about the “Dog who _____” (Ate the airplane, burglar, drawing, etc…) [That’s one hungry dog].
  4. There are some bots that look for your “first tweet” message that twitter encourages for you to tweet. That was pretty shady.

Ok what did you win?!

I didn’t win very much. I won some “credit” on a free to play game, and I won 2 tickets to a CalTech vs UCSB basketball game. (I declined the tickets, and didn’t take the “credit”)

What would I do differently now?

  • I would expand the exception types that Twitter4j returns back. I would give more of a response of
  • I would have respected the API limits a bit more. (I would be a bit more conservative about the amount of giveaways that I would follow).
  • I would probably queue up all of the actions before doing them so that I would avoid false positives at the last step.
  • I would have extended Twitter4j’s rate limiting functionality.

In Conclusion:

I don’t think I’ll continue development on this. I’m a bit reluctant as that I’m not sure that Twitter is going to be lenient on reallowing my application, especially when I’m reluctant to share the source or that I don’t have a public facing thing that they could inspect prior to reapproving. I think I got a lot of value out of this just from the experience to be satisfied.