Akka/Scala vs Groovy for a Periodic Process

Last fall I created two different processes that were responsible for a reoccurring task of pulling data and broadcasting it elsewhere. The Akka based project pulled the newest meetups from the meetup page and then rebroadcasted them via Reddit and Twitter. The Groovy-based application pulled posts from a forum and then created a summary post of the posts pulled.

Both applications were reoccurring and were only activated on a schedule. However, there were two different outcomes. The Groovy based jar was scheduled as a cronjob and exits when done. The Akka process is set up as a systemd service and remains up.

Last fall I created two different processes that were responsible for a reoccurring task of pulling data and broadcasting it elsewhere. The Akka based project pulled the newest meetups from the meetup page and then rebroadcasted them via Reddit and Twitter. The Groovy-based application pulled posts from a forum and then created a summary post of the posts pulled.

Both applications were reoccurring and were only activated on a schedule. However, there were two different outcomes. The Groovy based jar was scheduled as a cronjob and exits when done. The Akka process is set up as a systemd service and remains up.

How did it turn out?

Both of the solutions work and have required very little maintenance. When running, the Groovy process takes up less than 70mb of memory, and the Akka based process takes more than 200mb  of memory. [It’s showing as 0.3% memory usage on a 65gb machine] (It’s written in Scala, and brings in Akka) Nether of the processes are intense enough to make a noticeable effect on the CPU. The ultimate package size is the following: Akka- 42mb Groovy- 12mb. (That is a little deceptive as that that the Groovy process contains less 3rd party libraries, but the amount of libraries that Scala and Akka bring in are a lot).

Now it comes down to probably the biggest concern: The time it took to develop. It took a month of lunches to develop the Scala and Akka based application. It took so long because I had to find and get up to speed on the Scala-based libraries for working with Rest services (I had spun my wheels with different clients), Scalalikejdbc, and Twitter4j. I learned another lession: Don’t use SBT. It’s a pain to use when compared to Gradle or Maven. On top of all of that I had a very challenging lesson: Don’t mix Scala versions for dependencies that weren’t compiled against the Scala version that your application is using.

The Groovy-based application took three lunches to write. One lunch (~30-40min) to write the main logic, the rest for the packaging. (The worst part of this was researching on how to make a fat Jar for Groovy under Gradle).

What did I learn?

Akka is an amazing piece of technology. I like using it, and I like using Scala. However it turns out for a process that is meant to be periodic, using REST calls: you are far better off writing the process in Groovy, letting Linux handle scheduled execution and getting it done quicker.

When would I use Akka? I would use it for a system that would expect constant requests, and that it had to be highly reactive, may include complexity, and would expect a high throughput. (The REST service is a good example for this)

I really like Docker

Two years ago I wrote an article about my disdain about the popularity of creating virtual machines to host applications. I was discouraged with the attitude of creating images that weren’t easy to rebuild, and felt that it encouraged bad practices around application setup and maintenance. I also felt that it was It seemed incredibly wasteful in terms of storage, memory, and cpu cycles to do a full OS emulation. I was a fan of the idea of OpenVZ, and LXC at the time. I didn’t know a lot about those options and didn’t go forward with it. However, through cheap hosting providers I learned about the downsides of oversubscribed OpenVZ hosts.


Docker came to popularity. It was an approach that went the LXC/Cgroups route and made it easier to use. Docker doesn’t attempt to virtualize the entire stack, but it does attempt to reproduce the Linux environment within the container and attempts to isolate the process running therein. It’s basically a sandbox for the filesystem, processes, and network. All of the benefits of a VM but none of the full hardware emulation needed.

Why do I like it? There are a few reasons that I like Docker they are:

It’s portable– For most of the internal structure of the VM, it’s based on pre built images. To start up an app, just pull it down from it’s online repository. For example, to startup an instance of Couchbase is a matter of running the following command:

docker run -d –name db -p 8091-8094:8091-8094 -p 11210:11210 couchbase

In the context of another NoSQL server, Riak, previously you had to fight with bad platform support and Erlang installs. With Docker, all of this is configured in the container and doesn’t negatively affect the host OS. The portability of docker means that you no longer have to figure out a new install procedure if the product is using Docker. On top of that you can configure where the persistent storage will be located. The application within Docker will have no knowledge of where it’s stored, nor will it care. The same goes with the networking configuration.

It’s opensource/free– Event the Docker Repo, Registry, and base images are freely available via the Dockerhub. When you want to move away from this model you can reproduce it within your own environment. On top of this, the Docker registry is a docker container itself, and it allows for versioning.

It’s social and collaborative- With the introduction of the DockerHub, that means that you can build your images on existing images. If you want to upgrade the underlying infrastructure (Ubuntu 14.04 to 32.01) it’s just a matter of upgrading the base image in your Dockerfile. That allows for testing and debugging in an isolated and repeatable manner (as opposed of making golden images). The organizations responsible for creating the products, i.e. Couchbase and Ubuntu, have their own official images that are frequently updated.

It’s easy to track changes- Everything about the build of the Docker image is based a declarative script known as a Dockerfile. There are a few nuances in the script (I.e. How a run command is run versus an entry point) However, it’s fairly easy to create, update, and track changes (via source control).

So far the only downside to docker that I’ve seen has been more on a creator issue: There is a tendency to try to containerize everything in the application environment. That includes a container for the database and another for the storage which both are frequently hard linked to the application container. I realize that is helpful for cases where you need to have a particular version, however I would rather have a single database install on the host OS to share between containers and maintain the security for that separately.

I gave a technical talk and so should you

Public speaking is a lot trickier than you’d think. It’s not hard, its mentally challenging, however it can be very rewarding. The majority of the population can talk about their interests to their coworkers, friends and family for a long period of time. However, they will struggle to talk to a large amount of people about what they know. It’s a shame, and it’s an odd problem to have and it’s a phobia that some have. It’s a little odd because the perceived threat isn’t a physical attack or something that threatens one’s life.

The common negative perception based on poor performance, making mistakes that are “attributed” to one’s self, and perception from others. The reality of the situation is that even if you do very poorly: It’s incredibly rewarding. In the upside, you may be invited to talk at other places, talk on other subjects, you may be considered a subject matter expert, get paid to give talks, etc. On the worse case of this: You may get stage fright and most people can understand that. Even in the worse case: You gained a huge amount of respect from the audience, and you learned that you have something you can work on and improve.

Recently I gave a quick talk in front of my local technical community. I was encouraged by the local Chicago Java Users group to give a lightning talk. Lightning talks are brief five minute talks on a particular subject. I’ve seen others do lightning talks before and they seem to breeze by fairly quickly. Additionally the format of this isn’t necessarily to show strong talks, but it’s desgined to get people up in front of an audience, and give them a taste at presenting. A friend of mine, Mark Thompson, encouraged me to go through with my talk. My talk was on the Spock Testing Framework and why people should use it. Prior to the talk I was a bit nervous; there were 60-80 people there. During the talk I felt that it didn’t go as smoothly as I thought it would in my head. However, the audience reaction was good, and my feedback was great. I watched the video later: Despite having a few quirks that should get ironed out, I think I did well for my first public technical talk.

Things I learned:

  1. Practice the talk before hand. I believe that I had a few issues during the talk because I hadn’t vocalized what I had planned to speak about
  2. Practice with the equipment that you’re given
  3. Make sure to have it recorded. This will give you valuable feedback later. Luckly for me, this was done by the CJUG crew.
  4. Always have others review your slides (Did that)
  5. Keep your slides as short as possible (I did ok on that)
  6. Dress for the part (Dress up): I could have done much better on this.

Curious about the talk or Spock? If so take a look at the video and feel free to respond with constructive criticism. I’m always willing to listen.

A side note about the talk: I did have something to do with coercing a few colleagues of mine to do a talk. They were: Todd Ginsberg (A short introduction to Akka)  and Mary Grygleski (An introduction into Mule ESB). They did a fantastic job as well!

I don’t like Virtual Machines

I don’t like virtual machines. Give that my current position and employer involves building, maintaining, supporting, optimizing, and selling solutions/services this statement is a bit ironic. I don’t hate the benefits that the technology has given us. It’s amazing about what it has provided. It’s also amazing how you can scale up a service without having to bring in lots of new hardware and maintain that as well. It’s more efficient and cost effective than the old way of doing things. It has progressed development of operating systems, and drivers.

The problem I have with virtual machines involves more of the context of what they are. In a very simple manner, a VM is an emulation of a physical machine run within a computer, also known as a hypervisor. Oftentimes, lots of virtual machines are run on the same box by using systems like VMWare ESX server, Zen, or KVM. Very frequently the difference between a physical machine and a virtual machine are very little. The differences show up with 3D/low-latency applications and VMs that depend on hardware input that cannot be emulated (I.E. number generation on VMs).  After reading that previous statement, and considering the downside, there should be something that sticks out to you. It’s something that should make you feel uncomfortable.

For me, I’m made very uncomfortable in the fact that we have many VMs running on the same box. Much of the processing, storage, and memory are consumed by redundant operating system processes, and/or files associated.  This seems really inefficient to have 30 instances of Windows Server 2012 all running IIS at the same time. The alternative to this madness is through the use of containers. I like the idea. I would love to get a chance to learn more about OpenVZ and LXC when I get more time. I like containers because they are sandboxed/managed containers which pushes the processing ability onto the actual job being performed. It feels more efficient, and more inlined with solving the problem rather than creating more infrastructure.

Prior to the virtualization era: We were encouraged to build grid services. This was great, you could throw a lot of machines at a problem and had them work in harmony. However, that didn’t work as well as we hoped due to the immature tools and frameworks offered at the time. In replacement of grid computing, the next approach was to split the problem up into individual processing united and to just to throw a lot of machines at the problem. After VMs are “nearly-free.” This really doesn’t fix the problem, it just seems like we’re timesharing on a powerful server once again.

Seven Databases in Seven Weeks: Postgres

After finishing “Coders at Work, “ more on that in a future blog post, and having little experience with non-RDBMS databases, I picked the book “Seven Databases in Seven Weeks” by Eric Redmond. The book appears to be of similar quality to it’s sibling “Seven Languages in Seven Weeks” by Bruce Tate.

The book starts out with the Postgres database. At the time of writing, this database wasn’t as popular as MySQL however it does make a good starting point as a baseline of comparison. It represents the “old guard” of databases. For most of the first week, I found that the first half of the first week was not of much interest to me. However, the fuzzy search extensions and full text search extensions caught my attention. I have always been aware that the capabilities existed, however, I never knew how they worked. Additionally the downloadable source code helped with creating a testing environment right out of the box. This was the same case for the “cube” extension/datatype. I found it very exciting to find out that you could do some rather interesting operations with multidimensional data and queries. I can’t claim that I’m an expert on using these features but its rather nice to have some hands on experience for it.

I don’t believe that having that content was the greatest value of the book. I believe what gives the book the greatest value is that investigating more on the cube package it led me to finding an online directory of the available extensions. I found the Postgres Extension Network. How exciting is it to find a directory of extensions to a fairly standard database that allows you to do some cool things? You can find extensions to interact with JSON data, store bitmaps, keep key/value data, additional aggregation functions, weight averages (This is a VERY interesting addition), and even attempts to do a “connected regions” logic within data items. These are reusable components that others have created, and that I found that I could get the database to perform these actions rather than code them myself.


First Thoughts: “Coders at Work” by Peter Seibel

I’ve just started to read the book Coders At Work. The book is a nice, recent collection of interviews from many big name developers. I’ve read other developer interview books before, but this one sticks out in an unusual way: with most “interview” books, the interview is either completely boring or incredibly interesting. In Coders At Work, the interviews have varied between amazing and neutral. I haven’t gotten to a bad interview yet.

A few things jumped out at me and made me think. Jamie Zawinski’s interview made me wonder about the value of non formally-educated developers in “today’s market.” Brad Fitzpatrick’s interview reminded me of the “I’ll build everything,” but you “must know everything” attitudes. Douglas Crockford’s interview didn’t inspire me, but it did make me consider other issues within software development.

Jamie Zawinski’s interview was an amazing conversation about a guy who has many interests in learning and doing work. He is a self taught LISP developer who can occasionally get very opinionated. I found his work experience with Netscape fascinating. As a user of the early versions of Netscape, I never knew all of the politics or construction going behind the scenes. I also found it technically intriguing that the pre-3.0 mail reader within Netscape was not written in C++. I have a lot of respect for Mr. Zawinski for being able to identify a potential bias of his – he appeared very introspective when asked about hiring new developers. He understood that he could distinguish people that he could find reputable, but not those who would make good candidates.

One of the things that struck me as a bit off-putting about Mr. Zawinski was his rejection of automatic unit testing. I feel that if it was made as easy in the 90s as it is today, software would be VERY different today.

Brad Fitzpatrick’s interview left me with mixed feeling about the guy. I’m not sure if he is a guy you would want to work with, however he sounds like the kind of guy that you would want to share war stories with over drinks. He has worked on many interesting projects, mainly LiveJournal, and is one of the early “Growth Hackers {http://en.wikipedia.org/wiki/Growth_hacking}.” I like his recommendation that you should spend some time in reading other’s code. He fights the immediate urge to ignore others’ code and his approach sounds different from what I had expected: I expected his approach to making suggestions on other people’s code would be antagonistic. However, it was described as the following:

  1. Code copies are distributed to the audience – in digital and paper form

  2. The developer presents their code line by line

  3. Q&A time

  4. Suggestions / feedback from the audience

This struck me as different from my experience where code reviews tend to be either technically or personally antagonistic (or both). This approach was more similar to proofreading a paper you just made or audience-testing a book you just wrote.

The two things that really put me off about Mr. Fitzpatrick was one of the questions he asks in interviews, and the other is the insistence of knowing everything. Mr. Fitzpatrick’s “famous” interview/programming question was a recycled question from his “AP CS” exam. The question is to write a class that handles large number arithmetic (rewrite BigDecimal). It appears that he uses his previous experience as a baseline for evaluation. I also got the feeling that it is a way for him to “show superiority over others” (over something he did many years ago a high school student). Second, he is incredibly insistent over knowing everything about how every layer works. He ranted against high-level language developers because they didn’t know that a specific way of polling may not work on a specific deployment machine. He even ranted over those who deployed towards a VM because the VM’s “virtual”->native OS/hardware has been abstracted. I feel that in 98% of the cases he’s picking up pennies in front of a steam roller.

I was not very thrilled with Douglas Crockford’s interview. Primarily because it dealt with Javascript, it was a little too high-level for my taste. During the reading of this interview, my mind went back to Mr. Fitzpatrick’s interview. It made me wonder and realize about how you find the “best” tools. I find it incredibly difficult to keep afloat of all the languages and tools available. Recently, for example, I just learned how – and why – Git, Jenkins (plus the automated unit/lint/reporting/checkstyle plug-ins), and deep Maven knowledge are really good things to know if you’re developing in Java.

When new languages, tools, and frameworks come around, I love to read about them and learn how they work (if they’re useful and interesting enough). However, time is limited: how do you identify the tools that would solve the most pressing need you have?  Prior to Jenkins, I built everything via an IDE. Why would I need an automated build tool? I’m the only developer on the project. Prior to Git, I used Subversion – it did everything I needed. Why would I want to make “sub-commits”? Prior to Maven, why would I want to have the build tool automatically deploy the WAR file or require that all unit tests pass before generating an executable? (I’m running unit tests all the time anyway.)

Later it made me think about the code reading suggestion and I realized: I’m not very happy with the code review tools I know about. ReviewBoard looks nice, but that is for only for Ruby. Should I write my own? Where are the existing tools for Java (which can also integrate with Maven and Jenkins)? Are the tools good? Are there others out there that have solved this issue? Is it worth setting up a code review tool just for this? This are questions I’m not sure how to answer.

Overall, I really enjoyed that this book goes over many topics – personal projects, interview questions, and famous debugging stories. I do occasionally enjoy a story bragging how the developer’s language or tool was miles ahead. However after reading about their accomplishments in a serial fashion, it just gets old. Perhaps interspersing their accounts in a more conversational form would have made this book more interesting, and easier to recommend.

Similarities of the individuals who were interviewed:

  1. All have a strong focus on one particular project

  2. Each interviewee has worked in many companies

  3. None of them focused on the reputation of the company they have worked for

  4. All have interesting debugging stories

Installing Maven on Centos 5 or 6/RHEL

At the moment there is no RPM package or yum install available for the latest version of Maven on Centos. The user is left to install Maven manually. To attempt to overcome this, I created a script to install the latest, at the moment: 3.1.1. At the moment, there are many things that should be added to the script, they’re listed in the TODO section of the documentation, but those features may be added later.

Instructions on how to run the script, and the script it’s self may be found at: https://github.com/monksy/centos-maven-install 

Who would benefit from a TrueCrypt API?

On my drive back from the client to my hotel, I was thinking. Wouldn’t it be cool if Seafile could synchronize to encrypted containers? From there I started thinking about what would be required to pull this off. Firstly, you would have to extend Seafile to support an attribute that lets Seafile know if the storage location was available or not, and then you would have to have something that could manage the encrypted container.

This led me to another idea: giving TrueCrypt an API. At the moment, version 7.1A, it does not provide an API to do so. However, it does provide an interface (GUI for Windows, and CLI+GUI for Linux) and a system level driver. If we were able to extend TrueCrypt to accept commands via an API just consider the possibilities:

  1. Other applications would be able to securely store your files
  2. Seafilecould download and synchronize to a secure location
  3. Shell extensions could be written for file managers to package folders into TrueCrypt encrypted files.

Ok, that isn’t as impressive of a list as I thought it would be. However, I am confident that there are more use cases out there.

There is a current attempt to do just this. It is by nightnic, however it only targets the Windows platform. One of the bigger hindrances of making your own API for TrueCrypt is that the source isn’t very extendable. For a user to include the API custom support, he or she would have to compile the source and patch in the API source. Additionally, from the looks of it TrueCrypt has a legal document in front of the source that may scare away those who wish to extend TrueCrypt.

Review: “What Compsci textbooks don’t tell you: Real world code sucks”

I’ve been catching up on my reading queue. I’ve been quite busy in the last few months that many articles has slipped by me, and they’re in my backlog. One article I’ve been meaning to review is: “What Compsci textbooks don’t tell you: Real world code sucks.” The author attempts to make a claim that textbooks should acknowledge the messy world of software development, or should be less than stellar.

I agree with a few of the author’s points on what causes bad code/designs, however I believe that the author misses the point when he relates it to the content of textbooks. Textbooks are meant to be condensed learning resources. They tend not to be fluffy and full of relatable content. With a textbook one should be able to reliably consume the facts associated with the subject, and not the current commentary of the industry surrounding it. Textbook-code that was relatable to practice would be an inefficient method of delivery to the reader. Unless it is briefly mentioned, it would be silly and unprofessional for a computer science textbook to make snide remarks on real-world coding practices, gender politics/representation in the computer industry, how most technologies are not used fully, or any other non-topic rants within a textbook. A good text book will stand the test of time.

In short, I believe that Mr. Mandl, the author, would be more interested in industry and learning social trends.

A Skill that All Technical People Can Use (System administrators, Database Administrators, Developers, etc)

Top-notch communication skills are vital when communicating detailed technical topics. Poor communication skills are seen as a lack of interest, understanding, or motivation. I realize that perception isn’t always true, but making the correct impression on others is extremely important.

I’ve seen many of these “sins” from technical presentations in graduate school and within local technical group meetings. I’ve even run into some of these issues personally. A very good video about what I’m talking about his “Package Management and Creation in Gentoo Linux.” I realize that it’s easier to criticize than to present, however after you are aware of some of the issues below the presentation is becoming aggravating.  Donnie Berkholz, if you are reading this, I am not criticizing you personally.

  1. Rambling
  2. Use of “umm” and other fillers
  3. Poor posture
  4. Bad lighting
  5. Having a monotone voice (This isn’t a huge issue in the video)
  6. Mumbling
  7. Failure to list assumptions (What skills should your audience have)
  8. Failure to explain why the content you’re presenting is valuable to the audience
  9. Unorganized slides
  10. Lack of overall summary of the presentation at the beginning
  11. Lack of prior practice
  12. Lack of mentioning of where to find the slides online
  13. Going off topic within the presentation
  14. Lack of confidence in material
  15. Overloading the slide with lots of unneeded details
  16. Fanboyism (I’ve seen it in a few of my fellow students’ presentations in graduate school)
  17. Not tabling questions that go off topic (Not shown in the video)
  18. Low quality graphics (Not an issue in the video)
  19. Speaking too quickly (Not an issue in the video)

How would someone Improve Their Presentation Skills?

It’s not possible for a presenter to be able to find their own issues. Presenting is one of those things that requires feedback from others.

  1. Take a class on public speaking at their local university, technical school, or even library. Local toastmaster organizations can help with this, too.
  2. Test your audience before and after the material. The higher post-presentation score the more they learned. For this to be a good measure, it requires 2 tests with similar material. For example, if Mr. Berkholz was to apply this: he would ask about keywords, behavior etc in a multiple choice form. This will not address issues with social cues, but it will give an overall indication on how effective the presentation was.
  3. Give out a general survey that must be filled out at the end. This works well for a large conference with lots of speakers. Ask questions such as “The speaker appears to be an authority on the subject [Strongly disagree, disagree, neutral, agree, to strongly agree]” or “The subject material is interesting …”
  4. Ask for someone’s opinion that is outside the industry the content fits into. For example in Donnie’s case, ask a psychologist to be your practice audience member.
  5. Involve audience participation throughout the entire presentation. I love this piece of advice. It turns a simple, and short, presentation into a full-length lecture. Also, it helps to understand the audience’s interest.
  6. Practice your presentation beforehand – a lot.
  7. Review your slides before hand and prepare secondary screens. Fumbling around with unnecessary external programs during the presentation irritates your audience.