What have I been learning lately/What Have I been reading lately (March/April 2018)

It’s been a while since I’ve done a blog post like this.

However, I’ve been busy and I have a few things to show for it:

Exciting Meetups Attended:

Projects That I’ve been working on:

    • Scala99
    • Temperature Sensor Data Generator
      • This is a utility project that will generate a series of data from sensors that closely resemble a day’s change in temperature.
      • This is to generate a large enough dataset to really demonstrate large-scale distributed processing
    • CQRS Framework
      • An extendable framework used to track events throughout a data object’s lifespan.
      • Using Shapeless
      • Currently on hold until I can fully wrap my head arround Shapeless
    • Sample Akka-HTTP Based Application for Inventory Pricing
      • This is a sample Akka HTTP Based application that responds to Time based requests for inventory.
      • Akka HTTP was a bit irritating to setup the routing.

Technologies I’m Learning Right Now

  • Apache Spark
  • The Play Web Framework
  • Amazon RedShift
  • Shapeless

Books Read:

Things I’ve Mastered/Dealt with Cooking

  • Sous Viding
    • Experimented with Octopus (They were a bit too small to get right, and this was done with Sous Viding)
    • The Perfect Steak and Crust on the outside
    • Dry Aging Ribeye Steaks
  • Keto
    • Lemon Bars
    • Tirmasu Fudge
    • I’m very close to making a stock

Books I’m Currently Reading

Topics I want to learn/read about

  • Optaplanner
  • More with IOT
    • I had a chance to work with a Seed WIO wifi based IOT board
    • I bought a Nano PI from FriendlyElec.
  • Cassandra
  • ElasticSearch
  • Going further indepth with Kafka
  • Akka Typed

 

Something I love about the builder pattern

It makes the construction of similar objects a lot easier.

For this example, let’s assume that a ComplexPerson object represents a human that has parents, and has attributes such as age, name, etc.

ComplexPerson.Builder jimJaneFamily = ComplexPerson.builder().getParent(jimReference,janeReference);

To create new children from the same template, you would just use the builder “jimJaneFamily” object, adjust the new attributes and call build. A new builder object isn’t necessary as that the modifiable fields are overwritten and the builder is no longer relevant after the build method is called. Reusing the builder object looks a lot cleaner compared to new initializations per new object.

That’s pretty cool.

(Transitive) Dependency Hell Pt2: How it could have been avoided

In the last blog post, I went over an issue where multiple logging systems built into the dependencies caused issues in the main application. All of this could have been resolved by delaying the decision on what logging implementation should be used.

The initial fix that would have made my debugging session easier

Trying to resolve issues with conflicting logging frameworks is a huge pain. It would be great if there were an option to show the initial setup of the logging framework used in the first place. If there is already documentation for this, I would love to know where it describes the actions of it.

What are JSRs

Java Specification Requests are community/corporate created specifications that define an interface for a feature in Java. For the most part, they just define the interface and the implementation can be included in the classpath. No actual concrete classes are included in the specification. If all of the sub-dependencies were using an up to date JSR for logging, this issue would have never come up. The only issue that would have shown would be “No concrete class for interface found.” A quick search for logging based JSRs turned up JSR-47.

For example, The REST JSR-311 has the API JAR. If you didn’t understand about JSRs, you’d believe that it includes the functionality to work with REST services. However, it doesn’t it requires a JSR-311 compatible implementation/provider. (Such as Jersey). Another example of this is JPA and the Hibernate JPA implementation.
On a similarly related note, it would be better if library providers (I.e. Amazon AWS SDK) provided a specification/interface collection to their libraries. If this was the case, the libraries would be coded against this and the implementation could be used at the highest level. (Where the code is actually run) This would improve on testing, as that you’re now writing code against an interface rather than an implementation you have to mock out, and it would improve on dependency conflicts. The AWS SDK has had changes in the package structure, and issues with deprecated methods.

If you took nothing out of these two blog posts, take this: Programming against collections of interfaces is much more preferable rather than playing whack a mole with multiple conflicting dependencies. Filling your POM with exclusions is no fun, and it’s incredibly risky.

(Transitive) Dependency Hell (In Java) Part 1

Transitive conflicting dependencies is like letting 20 children fight over one piece of candy. At the end, one kid may have candy, leaving everyone else either crying, fighting, and/or hurt. At the very best situation the right dependency may be used, however frequently that’s not the case and may change if the class loader feels like it. Before I get into the context of what happened, I should mention this happened nearly a year ago, and this retelling is entirely from memory.

How did this happen? I was working on a project that required bringing in Spring CLI for a small utility. Spring CLI made the wrapping of methods into CLI interfaces incredibly easy. Side note: having to build up a CLI interface to do this is absurd and makes the task needlessly complicated. (I’m looking at many of the CLI options out there)

My project had a dependency of Java Universal Logging (JUL), Log4j (which the configuration was conveniently ignored), and SLF4J. Since the project was logging what we needed (and we were lucky that it was), the conflicts between SLF4J and Log4j2 were ignored. JUL used SLF4J as it’s default logger. After all, JUL is a weak logging framework that’ll jump in bed with the next available logging implementation. Each implementation has it’s own respective configuration format, and if the one that is being used doesn’t match the configuration that you have, it won’t take that into consideration.

The downside to bringing in Spring CLI is:

SpringCLI brings in Apache Commons-logging: Oh crap.

The even worse thing about this: When I started looking into this issue It’s not very clear about what logger is being used to log out the information. That means that even changing the, what you believe is the correct, configuration to increase the verbosity of the logging won’t work. To find out which logger was winning out, I had to debug deep into the internals of where the logging was going on. It wasn’t much fun.

From here, it’s key to decide which implementation that you want to work with. I decided to go with Log4j2, as that the configuration was written for it. The next step in the process was to eliminate/exclude dependencies from coming in. If you have a dependency that is using a fat jar,you’re out of luck on this one. Fortunately, I didn’t have this issue. This can be resolved by using the dependencies::tree plugin via Maven. Find all of the alternative implementations and get rid of them.

That still left the issue unresolved. It seemed like other implementations were leaking in and/or missing once you have eliminated the other dependencies you’re going to have to use bridges to resolve the issues with the now missing transitive dependencies. The following new dependencies were added:

Oy, this has now made the POM rather large and not very clear about what is going on or why you need all of that. For the most part, most of the issues should be resolved by this. However, JCL was still being redirected incorrectly within the Spring CLI output. *sigh*

After hours of debugging: I found that Spring CLI used JUL to write out the logging statements, and the logging was still trying to go through the common logger (but not writing it out/respecting the preferred logging).

This was resolved by setting the default logging manager at startup to the Log4j manager via the property:

java.util.logging.manager to org.apache.logging.log4j.jul.LogManager

 

In the next blog post: I will discuss some of the ways that all of this could have been avoided.

 

My Second Chicago Java User’s Group: I didn’t know you could do that with Groovy!

Previously I blogged about my first experience in doing a brief tech talk in front of my local Java User’s Group. I recall that I was incredibly nervous and ill prepared. Well, I took the feedback from the talk, prepared better for this talk and came ready to give my talk. The talk was on how Groovy makes quite a few improvements over Java, and it was in the format of a lightning talk. At the moment the video is being processed and it should be up within a week or two.

While you’re waiting for the video, I’ll leave you with the slides and the Github link.

The Scalafication of Java Libraries

Scala is an interesting language – it’s JVM based and compiled down to its Java equivalent. This gives it the ability to work directly with Java libraries. As it currently stands, there are quite a few libraries that are available under the Scala platform, but using the Java based libraries can be less than optimal.

For example, when using the Twitter4j library to send a tweet, you have to perform all of the setup rituals as you would in Java. An example of this is as follows:

val twitterClient: Twitter = new TwitterFactory(TwitterScanner.buildFromConfig().build()).getInstance()
val creds = twitterClient.verifyCredentials()
twitterClient.updateStatus(“Test”)

Ok, that was a little deceptive, as that the TwitterClient provides a means to get a singleton and it ingests the credentials via the properties. However, for most cases where the user credentials, the singleton won’t be used. How would this be different if the library was Scalafied?

I would propose that the class would be modified to allow for the following:


Twitter.updateSatus(“This is a message”)

How is this possible? Assuming that the Twitter client connection was already defined in the scope, tt’s possible due to implicit parameters. Implicit parameters are an extension of a function to bring outside context into a function. They can behave in the same way that a class bringing in contextual data into a constructor and producing a context dependent class. (This is similar to asking for the connection string and credentials for a class that is responsible for dealing with a database).

The definition for the class as defined would be the following:

Object Twitter {
    def updateStatus(val message: String)(implicit val twitterConnectionRequest : TwitterClient) : Int = {
      ....
}
}

Ultimately, the functionality will be the same, but the end result is that the code is easier to read, and you’ll be able to focus on functionality versus setup. How does this work? The second set of parameters allows for the isolation/grouping of parameters. Since all of the parameters in the second set are implicit, Scala allows for the use of the method without the second set. Additionally Scala matches the implicit values from the scope of where the method is being called.

For example the following Java code for dealing with a VMware like environment:

public class MajorUtility {
    public String cloneVMAndStartSync(String vmIdToClone, VMWareClient vmc) {
        String result = "";
        VMWareMachine vmw = vmc.getVM(vmIdToClone);
        if (vmw != null) {
            vmc.stop();
            VMWareMachine newMachine = vmc.clone();
            while (!newMachine.isReady()) {
                Thread.sleep(1000);
            }
            result = newMachine.getId();     
        }
        return result;
    }
}

Would become:

object MajorUtility {
  def cloneVMAndStartSync(vmIdToClone: String)(implicit val vmWareClient: VMWareClient): Optional[String] = {
    var result: String = Future[String] => { "" }
    val vmw: VMWare = getVM(vmIdToClone)
    if (vmw.isPresent) {
      stopVm()
      result = cloneVM(vmIdToClone).onSuccess { case value => value.getId }
    }
    result.get()
  }
}

The major difference between the two is examples are the following:

  1. All of the functional calls to work with the VMware server are implicitly sent into the function.
  2. The functions called are statically imported. (Via object classes in Scala)
  3. The cloneVM Method returns a Future object rather than a state. If the state fails, then the future returns a “onFailure” callback. (Which you can handle, but that’s beyond the scope of this article.)
  4. The value of getting the ID is changed in the future. This means that this operation is kicked off and the value is to be retrieved later.
  5. The code is a bit easier to understand as that the cloneVm operation would return immediately and it would put the processing in the background.
  6. The use of Optional simplifies the null check. (This is the case in the return statement and in the return variable.)

From here, the equivalence of the Java libraries to Scala bring added benefits. It gives the possibility for the code to be more readable, easier to write, and allows for the developer to focus on the functionality needed.

Exit Codes: Why Java Gets it Wrong

Exit Codes

The standard protocol of using command line interface tools in Unix is based on a few things: standard out, standard in, standard error and the exit code. The exit code is the reason why the start method of a C program includes an int as a return type. That value is being passed back to the code that executed the application. (Typically the shell). The expected values of an exit code are: 0 for a success and anything non-0 is known as a failure code. This gives the developer a way to communicate what went wrong in a very quick fashion.

Java is a weird beast in that regard. Unless there was a JVM failure, Java will always report back a 0 exit code. This can be incredibly irritating when you want to create Java applications that are meant to be execute in a Unix environment or in a chained fashion. (As the Unix philosophy intends for an application to be run as).

The workaround for returning an non-0 exit code is to call System.exit(<code>).This has 2 draw backs. Firstly, it’s a very abrupt call, and can introduce issues later down the line. (It could cause confusion as to why the application just failed, similarly to multiple return statements in a method) Secondly, the shutdown request to the JVM is concerning, it doesn’t attempt to resolve any other threads running at the moment or give them a chance to finish before closing. For example: resources could remain unclosed or unfinished, temporary files may not be cleaned up, and network connections could be dropped. The only way to get a notification that this is happening is to setup a shutdown hook. (That is described in the documentation for System::exit)

How I got my Bot Banned by Twitter

It started with a news article about a contest-winning bot on twitter written by Hunter Scott. Mr. Scott had tested a curiosity and was rewarded by a fairly impressive collection of stuff. What Mr. Scott observed was that a lot of accounts on Twitter were offering to give away prizes for retweeting and favoring.

What did I expect?

I had expected to get a few false positive. Also, I suspected that since the article: the giveaway market on Twitter would be flooded with bots. I also assumed that the Twitter API limit would be generous and the API experience would be as good as the web experience.

 

I found out that it was fairly easy to run past your limit on the Twitter API and when using Twitter4j. Twitter4j on a few things it doesn’t:

 

  1. Rate limit the requests
  2. Attempt to interpret non HTTP-OK results from the API. The returned exception from the API client was a generic TwitterException and gave the message that came back from Twitter. (Sometimes being “You’ve hit your daily (status update limit|retweet limit|follow limit.”)

 

Additionally what I found out was that when performing a query on the API, you don’t get as reliant results as you do on the web. I typically found that out of 18 results, only 8 of them would contain some combinations of the words that I would be looking for. I also found out that there were bots who looked for other bots like mine and started retweeting content.

What did I use to write this bot?

To get better at using Akka and Scala, I used the language and the framework in combination with: Twitter4j, Gradle, Postgres, and ScalalikeJDBC.

What did I learn?

  • Twitter has a lot of bots on their platform and they’re incredibly good about detecting the most basic bots (like mine was).
  • Twitter users hold a LOT of contests. Some of them lack of applicants, and others have thousands.
  • Working with Twitter with a bot is a risky and “dark art” (There are lots of rumors about how to appropriately work with Twitter. (Even for legitimate business reasons))
  • I learned more about Akka Routers. (They’re pretty cool, but can be a little difficult to tune)
  • [Later when I started working with the Reddit API] I learned about using the RateLimiter functionality in Guava. That’s some pretty cool stuff. I’m not sure that it would be very useful in Twitter4j, as that the limits are a bit more granular with Twitter.
  • Scala/Akka is INCREDIBLY fast when you build out your application right. I burned through the Twitter API limit within a minute.

So you were banned? What happened?

I was banned because I believe that I was too aggressive at following and retweeting and I got caught by passing a threshold. It’s not too terribly surprising since my application was a lot faster than I thought it would be. I never got a chance to bring this on as a script. At the moment, the application behind the program has “Write restrictions.” (I’m not total banned, but it severely cripples the functionality)

False Positives:

One of the biggest false positives that I found was due to tweets about sport teams. For example if someone were to post “RT: SportsTeam Jim’s Team had their fourth win” my bot would pick up on that. I partially solved this issue via a keyword and username blacklist.

 

My bot also found the retweets of others that had tweeted. I resolved this by diving down to the root level of the retweet. However, this wasn’t always possible. This also cut down on multiple attempts to reply to the same tweet.

 

Part of the issue about finding these false positives was that the search API produced partial matches. This was resolved by researching through the tweets returned.

 

I would consider this to be a false positive, but it’s not by definition. There were a lot of contests that surrounded pop stars. I resolved this via a word blacklist filter. (In particular I had to blacklist gaga, 5OS, and bieber [shudder])

Other observations:

  1. There are a ton of bots on Twitter that’ll automatically follow you back if you ever interact with their account.
  2. There are a lot of bots that’ll DM you if follow their account. These got incredibly annoying because it pushed an advertisement directly to your message box.
  3. There is a guy who writes a ton of ebooks about the “Dog who _____” (Ate the airplane, burglar, drawing, etc…) [That’s one hungry dog].
  4. There are some bots that look for your “first tweet” message that twitter encourages for you to tweet. That was pretty shady.

Ok what did you win?!

I didn’t win very much. I won some “credit” on a free to play game, and I won 2 tickets to a CalTech vs UCSB basketball game. (I declined the tickets, and didn’t take the “credit”)

What would I do differently now?

  • I would expand the exception types that Twitter4j returns back. I would give more of a response of
  • I would have respected the API limits a bit more. (I would be a bit more conservative about the amount of giveaways that I would follow).
  • I would probably queue up all of the actions before doing them so that I would avoid false positives at the last step.
  • I would have extended Twitter4j’s rate limiting functionality.

In Conclusion:

I don’t think I’ll continue development on this. I’m a bit reluctant as that I’m not sure that Twitter is going to be lenient on reallowing my application, especially when I’m reluctant to share the source or that I don’t have a public facing thing that they could inspect prior to reapproving. I think I got a lot of value out of this just from the experience to be satisfied.