My Christmas Wishlist For Groovy/Java/Scala

This should be an ongoing post until Christmas. These are the things I’ve been wanting to see in Java/Groovy/Scala. (Tooling wise/language/etc)

  • Better support/integration to show what lines of code haven’t been tested
  • Visualization of code execution within Threads.
  • A consistent templating solution to generate projects. (Maven needs a good cleaning on the templates)
  • Better organization of available libraries (Github repositories like awesome-java/groovy exist.. but they’re not great)
  • Better support for highlighting inefficient and duplicated bits of code.
    • I.e. For loops having a suggestion to use Lambdas
    • Searches for logic that is often repeated, and suggestions for refactoring hotspots.
  • PMD and findbugs does this sometime. But it would be nice to encourage library creators (Commons or Guava) to create Findbugs/PMD rulesets.
  • A basic starting template for standard applications, REST Apps, or Web apps
    • Should include testing, cucumber, the right plugins, etc
  • Lazy evaluation keyword in Groovy. (You can do that right now, but that relies on closures. In Scala, there is a lazy keyword.
  • Groovy: Lazy evaluation on collections (Built into the collections)
  • Groovy: Case Classes for Groovy (We’re close to it, but it’s not as concise)
  • Maven: Composable POMs
  • Scala: Better support amongst versions, and better support in Gradle to handle the Scala version to
  • Postgres/Hibernate: Support for dynamic data structure expansion.
    • This is handy when you have an extra column that is of type of JSON or HStore)]
    • Think of a weak MongoDB-like ability for dynamic documents, but the explicitly define structure of a Postgres SQL database
  • Scala: Support for Scala only libraries that don’t involve bundling the Scala runtime. (These would be compiled into your project)
  • All libraries: More distributing the API interface separately over the implementation (This is necessary for avoiding transitive dependency conflicts.)

“I wanted to like _______, but….”

I’ve been guilty of this. It’s a trite phrase that lends nothing but negative feelings and projects the image of “unpolished opinions.” The phrase: “I wanted to like x, but…”

I’ve used it in many of my Yelp reviews and probably a few reviews on this blog. I even knew it was bad at the time. I just didn’t understand why or what to say instead.

What should I have written?
I should describe the things that I attracted me to the place/thing. I should describe what my expectations were.  Then I should do a turnabout and describe why they didn’t match my expectations.
Once you get out the reason for why you were persuaded in a particular way, then you can evaluate if those initial reactions were even valid, or if the thing/place was as deceptive as you found it to be. In short, it’s nothing but descriptive writing. You’re not writing about an action that took place, nor are you writing a story. It’s all descriptive writing: it’s about your experience with a situation. I think I feel into this trap as that school re-enforced me to use descriptive writing only as a means to be a substitute to frozen visual imagery, rather than a possible re-enforcement of describing a situation.

I would like to say Thanks to the software Grammarly. It was the Grammar as a service based tool that pointed out awkwardly worded phrases and unnecessary statements that made my writing unclear.

Things I’ve learned so far this week (6 Dec)

It’s been a very long time since I’ve done one of these. So I thought I’d share. Not all of this was learned in the last week, but in the recent past.

  • When you do port forwarding with SSH, the default binding address for the forwarded port is 127.0.0.1. This is a security feature that prevents that port from being used by outsiders.
    • If you want to expose this, enable the configuration GatewayPorts in the SSHD configuration
  • Groovy has special DSL support for Cucumber. It makes writing cucumber steps easier and simpler to manage environment hooks
  • The obvious one, but rarely done in practice: Creating a POC in code is a lot better to get your core logic working than building out a full app and trying to make decisions later.
  • Bash: Substring extraction is possible in a very concise and easy manner. (Like most things in Bash, there are magical characters for that) It makes sub-arrays possible.
  • Bash function parameters are errrr different. Unlike most C-like languages, they define them by an array index than a variable name. (I.e. $1, $2, …)
    • You should define your local variables in your function near the top
    • All variables are global unless you declare them as local in bash

Other

  • Magic shows can be pretty cool (Thanks, Michael Carducci! [NFJS])
  • I need to see more firebreathing shows. (Rammstein included lots of fire displays)

Secure Services – It’s not just for REST or SOAP anymore

A note: I meant to write this blog post for years. I am finally knocking this blog post off of my todo list. Hurray!

In the beginning, there was the terminal, and it was great. After networking had come around, the terminal was networked and called telnet. It was thought to be good but later realized to be a disaster when people realized it was not secure. Then there came the secure shell which was a secure wrapper around telnet. Later the features for SSH expanded into:

  1. Port Forwarding (used as a jump box/service)
  2. Proxying network traffic
  3. X11 forwarding
  4. Secure shell environment
  5. Key based authentication
  6. Secure File Transfer (SFTP)

There are a lot of different uses of SSH and how you can configure it to do some pretty extraordinary things. That is something that would be out of the scope of this blog post. A great reference on SSH can be found in this book.

One of the features that caught my attention is that it is possible to create services that are purely in the Unix environment and are incredibly secure. The attack surface is small, communication is encrypted, and that your environment is sandboxed (well as much as you make it).

Authorized Keys

Passwords are incredibly low effort to unlock a system. They tend to be short, and they can be brute forced. (Even worse, they frequently have a small space of combinations as that they are human chosen). Randomly generated keys with lots of bits were created to avoid this issue. This was added to SSH to add in passwordless login and to avoid sharing passwords as well. All of the public keys are stored in the user’s authorized_keys file.

Within the authorized_keys, each entry has the following format:

<key encryption used> <public key> <comment>

Within each line, it is possible to extend the features (shell, a command to run, environment variables, etc.) of that particular login. (See the man page for more details)

To build a secure service, use the command variable. You will see in the section labeled your first secure service for an example.

The setup

  1. To setup an account, you are going to need a private key. Generate that with ssh-keygen without a password. (You can use a password, but it will make automation tough)
  2. Add the entry into the authorized_keys file. (~/.ssh/authorized_keys) With the ssh-copy-id command (ex. ssh-copy-id –i [location of your private key] [user running the command]@[server]
  3. Your ~/.ssh/authorized_keys file should have entries similar to the following:
    1. cat ~/.ssh/authorized_keys
    2. ssh-rsa A……..iJu+ElF7 steven@server
      1. See the section Authorized Keys section for an explanation of what this means.
  4. Test the access to the service by sshing into the box under that user and with that key. (Sample command: ssh –i [private key] user@server
    1. The first time you connect to a server with SSH you are going to get a message asking if it is ok to connect to a particular box. (This is something you need to do if you are automating a process as well)

Your first service

Your first service will be an incredibly simple example. Open up your authorized_keys file that was modified from the setup section. Add in the following command in front of the ssh-rsa/dsa line for the new entry.

command="echo hello the time is `date`"

The line of the login should now look similar to the following:

command="echo hello the time is `date`" ssh-rsa A.......

Now you can make a call to the configured service as such:

ssh -i .ssh/id_sitetest steven@localhost

You will receive an output of:

hello the time is Fri Nov 11 22:28:37 CST 2016
Connection to localhost closed.

Congratulations, if you followed along with the previous instructions: you have created your first secure service. All of the input communicated to the service and coming back from the service is encrypted. You have control over the format that is output and how you are going to take in input. The beauty of this service is that a terminal is not left open, and only the command that is defined is run. Once the command is done, the session automatically closes.

Suggestions

How should one best develop services?

You should develop a bash script to replicate the functionality that you would like the server to perform before setting it up to run in SSH. This gives you an isolated environment to test the process and to test it before it goes live. The SSH service setup is merely a layer above the script.

How should I get input into the service?
To do this, you would need to take in input just like how you would in a bash script. I would suggest using the ‘read’ command to do this. See this guide.

A note on this: Always validate the input. Attackers are unlikely (assuming that the key is managed properly), but it never hurts to

Is it webscale?

Honestly, I do not know the answer to this question. I guess it is possible to do this via the web, and I am not sure how stable it would be with lots of concurrent users. It is at most as scalable as the SSH service and system.

I did a brief search to see if this was possible in javascript on the client side and I could not find a source to show that it was possible.

Can you automate the use of these secure services?

Yes. However, when you create the key, you should never add a passphrase as that requires manual interaction. A word to the wise, keys should have a required live cycle and should be phased out periodically. Key management will be an issue in this situation.

No Fluff Just Stuff October 2016/Chicago

My workplace sent me to No Fluff Just Stuff to attend the conference. Some of the interesting bits about the conference were:

  • Garbage Collection in the VM
    • Lead to the following questions
      • If there a way to do leak testing in unit tests? (VC++ provides heapdump checking and you can do it there)
      • Where is the memory indexed at?
      • Is it possible to modify the code cache?
    • Ken Sipe is a great speaker and I managed to have lunch with him.
  • Tracer Bullet architecture
    • Advocates slow development (make sure things work on a smaller scale before escalating higher)
    • Some redefining the existing development change process, and some parts naming work patterns.
    • Advocated for many of the circuit breaker/fault resistance that was already baked into Akka
  • LAMBDA Architecture
    • Mostly advocated in a more efficient datacenter using Mesos/DCOS.
  • Metaprogramming and AOP programming with Groovy
    • This was a great talk on how closures can take in state
    • Went over some basic built up on extending the language through Abstract Syntax Transforms
  • Encryption/Security
    • Mostly reviewed encryption algorithms and briefly talked about exploits
    • The talks on the exploits were interesting
    • Oddly enough, the conference talked a lot about the Heartbleed bug.
      • Sidenote: The events and context surrounding this bug are very strange.

Other notes

The conference loaned out conference ipads to keep notes on and rebroadcast the sessions. I found this to be similar to having a laptop in the classroom. Mostly it just led to distraction.

Things I would have liked to see more in the conference

  • Talk some about Scala
    • There were many mentions of Groovy, but other than the Metaprogramming/AOP they were just mentions
  • Discuss things like Akka or new methodologies
  • Try to avoid boilerplate talks on Spring
  • Label the sessions as Intro/Intermediate/Advanced

Akka/Scala vs Groovy for a Periodic Process

Last fall I created two different processes that were responsible for a reoccurring task of pulling data and broadcasting it elsewhere. The Akka based project pulled the newest meetups from the meetup page and then rebroadcasted them via Reddit and Twitter. The Groovy-based application pulled posts from a forum and then created a summary post of the posts pulled.

Both applications were reoccurring and were only activated on a schedule. However, there were two different outcomes. The Groovy based jar was scheduled as a cronjob and exits when done. The Akka process is set up as a systemd service and remains up.

How did it turn out?

Both of the solutions work and have required very little maintenance. When running, the Groovy process takes up less than 70mb of memory, and the Akka based process takes more than 200mb  of memory. [It’s showing as 0.3% memory usage on a 65gb machine] (It’s written in Scala, and brings in Akka) Nether of the processes are intense enough to make a noticeable effect on the CPU. The ultimate package size is the following: Akka- 42mb Groovy- 12mb. (That is a little deceptive as that that the Groovy process contains less 3rd party libraries, but the amount of libraries that Scala and Akka bring in are a lot).

Now it comes down to probably the biggest concern: The time it took to develop. It took a month of lunches to develop the Scala and Akka based application. It took so long because I had to find and get up to speed on the Scala-based libraries for working with Rest services (I had spun my wheels with different clients), Scalalikejdbc, and Twitter4j. I learned another lession: Don’t use SBT. It’s a pain to use when compared to Gradle or Maven. On top of all of that I had a very challenging lesson: Don’t mix Scala versions for dependencies that weren’t compiled against the Scala version that your application is using.

The Groovy-based application took three lunches to write. One lunch (~30-40min) to write the main logic, the rest for the packaging. (The worst part of this was researching on how to make a fat Jar for Groovy under Gradle).

What did I learn?

Akka is an amazing piece of technology. I like using it, and I like using Scala. However it turns out for a process that is meant to be periodic, using REST calls: you are far better off writing the process in Groovy, letting Linux handle scheduled execution and getting it done quicker.

When would I use Akka? I would use it for a system that would expect constant requests, and that it had to be highly reactive, may include complexity, and would expect a high throughput. (The REST service is a good example for this)

(Transitive) Dependency Hell Pt2: How it could have been avoided

In the last blog post, I went over an issue where multiple logging systems built into the dependencies caused issues in the main application. All of this could have been resolved by delaying the decision on what logging implementation should be used.

The initial fix that would have made my debugging session easier

Trying to resolve issues with conflicting logging frameworks is a huge pain. It would be great if there were an option to show the initial setup of the logging framework used in the first place. If there is already documentation for this, I would love to know where it describes the actions of it.

What are JSRs

Java Specification Requests are community/corporate created specifications that define an interface for a feature in Java. For the most part, they just define the interface and the implementation can be included in the classpath. No actual concrete classes are included in the specification. If all of the sub-dependencies were using an up to date JSR for logging, this issue would have never come up. The only issue that would have shown would be “No concrete class for interface found.” A quick search for logging based JSRs turned up JSR-47.

For example, The REST JSR-311 has the API JAR. If you didn’t understand about JSRs, you’d believe that it includes the functionality to work with REST services. However, it doesn’t it requires a JSR-311 compatible implementation/provider. (Such as Jersey). Another example of this is JPA and the Hibernate JPA implementation.
On a similarly related note, it would be better if library providers (I.e. Amazon AWS SDK) provided a specification/interface collection to their libraries. If this was the case, the libraries would be coded against this and the implementation could be used at the highest level. (Where the code is actually run) This would improve on testing, as that you’re now writing code against an interface rather than an implementation you have to mock out, and it would improve on dependency conflicts. The AWS SDK has had changes in the package structure, and issues with deprecated methods.

If you took nothing out of these two blog posts, take this: Programming against collections of interfaces is much more preferable rather than playing whack a mole with multiple conflicting dependencies. Filling your POM with exclusions is no fun, and it’s incredibly risky.

(Transitive) Dependency Hell (In Java) Part 1

Transitive conflicting dependencies is like letting 20 children fight over one piece of candy. At the end, one kid may have candy, leaving everyone else either crying, fighting, and/or hurt. At the very best situation the right dependency may be used, however frequently that’s not the case and may change if the class loader feels like it. Before I get into the context of what happened, I should mention this happened nearly a year ago, and this retelling is entirely from memory.

How did this happen? I was working on a project that required bringing in Spring CLI for a small utility. Spring CLI made the wrapping of methods into CLI interfaces incredibly easy. Side note: having to build up a CLI interface to do this is absurd and makes the task needlessly complicated. (I’m looking at many of the CLI options out there)

My project had a dependency of Java Universal Logging (JUL), Log4j (which the configuration was conveniently ignored), and SLF4J. Since the project was logging what we needed (and we were lucky that it was), the conflicts between SLF4J and Log4j2 were ignored. JUL used SLF4J as it’s default logger. After all, JUL is a weak logging framework that’ll jump in bed with the next available logging implementation. Each implementation has it’s own respective configuration format, and if the one that is being used doesn’t match the configuration that you have, it won’t take that into consideration.

The downside to bringing in Spring CLI is:

SpringCLI brings in Apache Commons-logging: Oh crap.

The even worse thing about this: When I started looking into this issue It’s not very clear about what logger is being used to log out the information. That means that even changing the, what you believe is the correct, configuration to increase the verbosity of the logging won’t work. To find out which logger was winning out, I had to debug deep into the internals of where the logging was going on. It wasn’t much fun.

From here, it’s key to decide which implementation that you want to work with. I decided to go with Log4j2, as that the configuration was written for it. The next step in the process was to eliminate/exclude dependencies from coming in. If you have a dependency that is using a fat jar,you’re out of luck on this one. Fortunately, I didn’t have this issue. This can be resolved by using the dependencies::tree plugin via Maven. Find all of the alternative implementations and get rid of them.

That still left the issue unresolved. It seemed like other implementations were leaking in and/or missing once you have eliminated the other dependencies you’re going to have to use bridges to resolve the issues with the now missing transitive dependencies. The following new dependencies were added:

Oy, this has now made the POM rather large and not very clear about what is going on or why you need all of that. For the most part, most of the issues should be resolved by this. However, JCL was still being redirected incorrectly within the Spring CLI output. *sigh*

After hours of debugging: I found that Spring CLI used JUL to write out the logging statements, and the logging was still trying to go through the common logger (but not writing it out/respecting the preferred logging).

This was resolved by setting the default logging manager at startup to the Log4j manager via the property:

java.util.logging.manager to org.apache.logging.log4j.jul.LogManager

 

In the next blog post: I will discuss some of the ways that all of this could have been avoided.

 

In Support of Complex Software

We have some major communication and usability issues with software. I’m not referring to the lack of documentation (although quality documentation is needed), nor am I talking about UX. It appears that there are way too many choices in what software product to use, but there is very little structure into how to implement it to solve your problem. For example, there has been a bit of a fight between Swarm, Kubernetes, and Mesos. Even though they have similar usages, each of them have vague descriptions of what they actually do. When I’ve seen comparisons of the products, I tend to see complex routing strategies being discussed rather than what they have in common – or where one product succeeds over others (and fails in other ways).

What am I ranting about? Well, a lot of products describe what they do, but the descriptions rarely ever make it easy to understand to the reader. All of the projects that I talked about earlier describe what they do [they tend to make sharing physical resources transparent for containers, each have their strengths and weaknesses]. All of the projects claim that they help to build a cluster, but each tend to fail on describing how they do so on an infrastructure level which can actually give implementers a shot at making an intelligent decision for their environment/purposes. They tend to build up a picture in the user’s head, only to be let down, that it’ll make the user’s scaling problems go away. Unless the user of the project has made their application distributed (in the way they expect), these technology choices won’t help them.

Silicon Valley has latched onto a harmful mindset. This has been encouraged via the Lean Startup/MVP mentality where someone decides what they believe that you need and only that is what is built. Your actual needs from the product are dictated by someone else, and are only implemented when they decide it’s going in the product. Complex products tend to be sidestepped due to insecurity about the people using it. It makes me think that technology being produced by startups in the Valley are akin to promoting the education of math to its users, but to be in denial of operations beyond addition and subtraction. Complex products can work, and can be used to their fullest extent. The problem is due to the communication about the product. Rarely have I ever seen a complex product described in a simple-enough manner up front to make it easier for the user to explore different avenues after he/she has achieved proficiency in the basics.

 

Changing Power Adapters Types

Recently I bought a Fitbit. It’s been a great device to compete with my friends on walking. I’ve also bought a Nexus 6P. It’s also another great product.

The downside to both of these: They’ve changed the connector to charge the device. The Nexus 6P requires a USB-C cable, and the Fitbit requires a very weird connector. (Which I’m sure they charge $50 or better for a 6” cable).

From what I’ve had success with buying the cables: