Response to: Quality vs Quantity and the Ceramic Art Students

I’ve seen the following snippet copied and pasted quite a few times on Hacker News:

The ceramics teacher announced on opening day that he was dividing the class into two groups. All those on the left side of the studio, he said, would be graded solely on the quantity of work they produced, all those on the right solely on its quality. His procedure was simple: on the final day of class he would bring in his bathroom scales and weigh the work of the “quantity” group: fifty pound of pots rated an “A”, forty pounds a “B”, and so on. Those being graded on “quality”, however, needed to produce only one pot – albeit a perfect one – to get an “A”.

Well, came grading time and a curious fact emerged: the works of highest quality were all produced by the group being graded for quantity. It seems that while the “quantity” group was busily churning out piles of work – and learning from their mistakes – the “quality” group had sat theorizing about perfection, and in the end had little more to show for their efforts than grandiose theories and a pile of dead clay.

From: https://blog.codinghorror.com/quantity-always-trumps-quality/

Of all of the Hacker News comments that I’ve seen that quoted it: They’re attacks against others who claim that you should consider quality when developing software. The problem I have with the claim and the quote is that it completely misses the point. The context of the quote involves unskilled learners that are trying to build an understanding and get practical knowledge for the subject. Within that context, quantity has a lot of value. The students are developing hands-on learning, building muscle control/memory, and are applying their knowledge.

Another issue I have with the quote is that it never states what this quality was. Was it still at the student level? Was it sellable?  Also, the quote doesn’t account for: What did the students learn? What would they do differently? The goals of education are different from the goals outside of education.

As a professional software engineer, you are paid to produce professional quality software. Completing tasks to show that you completed task demonstrates that you can close tickets. It produces no evidence that you can deliver quality software. That’s the same with any engineer. Build 20 bridges, in which a couple fall, you’re going to be considered to be a bad bridge engineer. Also, all your previous work is going to be put in question.

I realize that the knee-jerk response to the previous statement is: “But Steven! Over engineering is bad, the business is going to fail because it can’t deliver/you can’t deliver like that/you have to have the experience to produce stuff and that comes from quantity.”  To address those see:

  1. Business is going to fail: There was poor planning and over promising done here many times over. Blaming the tech team for over-promising is trying to pass the blame along. Software takes time, we’ve got decades of experiences to illustrate this. If quantity were the ultimate winner, we wouldn’t have unrealistic deadlines.
  2. You can’t deliver like that: You can. You have to start with quality early, do the appropriate planning, and write software in a way where you can reuse existing code. Code reuse of quality code gives exponential returns. Sloppy code gives exponentially losses. You can’t turn around quality code later after most of the code is based on sloppy, unreliable, and untrusted code.
  3. You have to have the experience to produce stuff, and that comes from quantity: That statement is a bit of a strawman. Write the software, test it, and ship the code. Just don’t try to ship everything and then go back to try to rewrite everything because you didn’t do your due diligence the first time. Also, I never made the statement claiming there was no forgiveness for misunderstandings before. When you go back, and you went with quality first, the reason you went back to the code from before is that you realize that there was a misunderstanding. (Not because you just wanted to finish it early). The extra benefit is that you learned something.

In Support of Complex Software

We have some major communication and usability issues with software. I’m not referring to the lack of documentation (although quality documentation is needed), nor am I talking about UX. It appears that there are way too many choices in what software product to use, but there is very little structure into how to implement it to solve your problem. For example, there has been a bit of a fight between Swarm, Kubernetes, and Mesos. Even though they have similar usages, each of them have vague descriptions of what they actually do. When I’ve seen comparisons of the products, I tend to see complex routing strategies being discussed rather than what they have in common – or where one product succeeds over others (and fails in other ways).

What am I ranting about? Well, a lot of products describe what they do, but the descriptions rarely ever make it easy to understand to the reader. All of the projects that I talked about earlier describe what they do [they tend to make sharing physical resources transparent for containers, each have their strengths and weaknesses]. All of the projects claim that they help to build a cluster, but each tend to fail on describing how they do so on an infrastructure level which can actually give implementers a shot at making an intelligent decision for their environment/purposes. They tend to build up a picture in the user’s head, only to be let down, that it’ll make the user’s scaling problems go away. Unless the user of the project has made their application distributed (in the way they expect), these technology choices won’t help them.

Silicon Valley has latched onto a harmful mindset. This has been encouraged via the Lean Startup/MVP mentality where someone decides what they believe that you need and only that is what is built. Your actual needs from the product are dictated by someone else, and are only implemented when they decide it’s going in the product. Complex products tend to be sidestepped due to insecurity about the people using it. It makes me think that technology being produced by startups in the Valley are akin to promoting the education of math to its users, but to be in denial of operations beyond addition and subtraction. Complex products can work, and can be used to their fullest extent. The problem is due to the communication about the product. Rarely have I ever seen a complex product described in a simple-enough manner up front to make it easier for the user to explore different avenues after he/she has achieved proficiency in the basics.

 

I really like Docker

Two years ago I wrote an article about my disdain about the popularity of creating virtual machines to host applications. I was discouraged with the attitude of creating images that weren’t easy to rebuild, and felt that it encouraged bad practices around application setup and maintenance. I also felt that it was It seemed incredibly wasteful in terms of storage, memory, and cpu cycles to do a full OS emulation. I was a fan of the idea of OpenVZ, and LXC at the time. I didn’t know a lot about those options and didn’t go forward with it. However, through cheap hosting providers I learned about the downsides of oversubscribed OpenVZ hosts.

 

Docker came to popularity. It was an approach that went the LXC/Cgroups route and made it easier to use. Docker doesn’t attempt to virtualize the entire stack, but it does attempt to reproduce the Linux environment within the container and attempts to isolate the process running therein. It’s basically a sandbox for the filesystem, processes, and network. All of the benefits of a VM but none of the full hardware emulation needed.

Why do I like it? There are a few reasons that I like Docker they are:

It’s portable– For most of the internal structure of the VM, it’s based on pre built images. To start up an app, just pull it down from it’s online repository. For example, to startup an instance of Couchbase is a matter of running the following command:

docker run -d –name db -p 8091-8094:8091-8094 -p 11210:11210 couchbase

In the context of another NoSQL server, Riak, previously you had to fight with bad platform support and Erlang installs. With Docker, all of this is configured in the container and doesn’t negatively affect the host OS. The portability of docker means that you no longer have to figure out a new install procedure if the product is using Docker. On top of that you can configure where the persistent storage will be located. The application within Docker will have no knowledge of where it’s stored, nor will it care. The same goes with the networking configuration.

It’s opensource/free– Event the Docker Repo, Registry, and base images are freely available via the Dockerhub. When you want to move away from this model you can reproduce it within your own environment. On top of this, the Docker registry is a docker container itself, and it allows for versioning.

It’s social and collaborative- With the introduction of the DockerHub, that means that you can build your images on existing images. If you want to upgrade the underlying infrastructure (Ubuntu 14.04 to 32.01) it’s just a matter of upgrading the base image in your Dockerfile. That allows for testing and debugging in an isolated and repeatable manner (as opposed of making golden images). The organizations responsible for creating the products, i.e. Couchbase and Ubuntu, have their own official images that are frequently updated.

It’s easy to track changes- Everything about the build of the Docker image is based a declarative script known as a Dockerfile. There are a few nuances in the script (I.e. How a run command is run versus an entry point) However, it’s fairly easy to create, update, and track changes (via source control).


So far the only downside to docker that I’ve seen has been more on a creator issue: There is a tendency to try to containerize everything in the application environment. That includes a container for the database and another for the storage which both are frequently hard linked to the application container. I realize that is helpful for cases where you need to have a particular version, however I would rather have a single database install on the host OS to share between containers and maintain the security for that separately.

Awesome Projects That I’ve found Recently

After I started learning Python and Ruby I’ve started to find some interesting things.

  1. Automating Skype and MS Speech API with Python and PyWin32
  2. WkHTML2PDF (Webkit HTML to PDF)
  3. Opensource command line OCR Reader – Tesseract OCR
  4. The Bastards Book Of Ruby
  5. [Not Python or Ruby but once all of the bugs are gone, this will be awesome] Telesco.pe – Forum software in MeteorJS
  6. Ruby Version Manager – This is the only way that you should install Ruby

I don’t like Virtual Machines

I don’t like virtual machines. Give that my current position and employer involves building, maintaining, supporting, optimizing, and selling solutions/services this statement is a bit ironic. I don’t hate the benefits that the technology has given us. It’s amazing about what it has provided. It’s also amazing how you can scale up a service without having to bring in lots of new hardware and maintain that as well. It’s more efficient and cost effective than the old way of doing things. It has progressed development of operating systems, and drivers.

The problem I have with virtual machines involves more of the context of what they are. In a very simple manner, a VM is an emulation of a physical machine run within a computer, also known as a hypervisor. Oftentimes, lots of virtual machines are run on the same box by using systems like VMWare ESX server, Zen, or KVM. Very frequently the difference between a physical machine and a virtual machine are very little. The differences show up with 3D/low-latency applications and VMs that depend on hardware input that cannot be emulated (I.E. number generation on VMs).  After reading that previous statement, and considering the downside, there should be something that sticks out to you. It’s something that should make you feel uncomfortable.

For me, I’m made very uncomfortable in the fact that we have many VMs running on the same box. Much of the processing, storage, and memory are consumed by redundant operating system processes, and/or files associated.  This seems really inefficient to have 30 instances of Windows Server 2012 all running IIS at the same time. The alternative to this madness is through the use of containers. I like the idea. I would love to get a chance to learn more about OpenVZ and LXC when I get more time. I like containers because they are sandboxed/managed containers which pushes the processing ability onto the actual job being performed. It feels more efficient, and more inlined with solving the problem rather than creating more infrastructure.

Prior to the virtualization era: We were encouraged to build grid services. This was great, you could throw a lot of machines at a problem and had them work in harmony. However, that didn’t work as well as we hoped due to the immature tools and frameworks offered at the time. In replacement of grid computing, the next approach was to split the problem up into individual processing united and to just to throw a lot of machines at the problem. After VMs are “nearly-free.” This really doesn’t fix the problem, it just seems like we’re timesharing on a powerful server once again.

Seven Databases in Seven Weeks: Postgres

After finishing “Coders at Work, “ more on that in a future blog post, and having little experience with non-RDBMS databases, I picked the book “Seven Databases in Seven Weeks” by Eric Redmond. The book appears to be of similar quality to it’s sibling “Seven Languages in Seven Weeks” by Bruce Tate.

The book starts out with the Postgres database. At the time of writing, this database wasn’t as popular as MySQL however it does make a good starting point as a baseline of comparison. It represents the “old guard” of databases. For most of the first week, I found that the first half of the first week was not of much interest to me. However, the fuzzy search extensions and full text search extensions caught my attention. I have always been aware that the capabilities existed, however, I never knew how they worked. Additionally the downloadable source code helped with creating a testing environment right out of the box. This was the same case for the “cube” extension/datatype. I found it very exciting to find out that you could do some rather interesting operations with multidimensional data and queries. I can’t claim that I’m an expert on using these features but its rather nice to have some hands on experience for it.

I don’t believe that having that content was the greatest value of the book. I believe what gives the book the greatest value is that investigating more on the cube package it led me to finding an online directory of the available extensions. I found the Postgres Extension Network. How exciting is it to find a directory of extensions to a fairly standard database that allows you to do some cool things? You can find extensions to interact with JSON data, store bitmaps, keep key/value data, additional aggregation functions, weight averages (This is a VERY interesting addition), and even attempts to do a “connected regions” logic within data items. These are reusable components that others have created, and that I found that I could get the database to perform these actions rather than code them myself.

 

First Thoughts: “Coders at Work” by Peter Seibel

I’ve just started to read the book Coders At Work. The book is a nice, recent collection of interviews from many big name developers. I’ve read other developer interview books before, but this one sticks out in an unusual way: with most “interview” books, the interview is either completely boring or incredibly interesting. In Coders At Work, the interviews have varied between amazing and neutral. I haven’t gotten to a bad interview yet.

A few things jumped out at me and made me think. Jamie Zawinski’s interview made me wonder about the value of non formally-educated developers in “today’s market.” Brad Fitzpatrick’s interview reminded me of the “I’ll build everything,” but you “must know everything” attitudes. Douglas Crockford’s interview didn’t inspire me, but it did make me consider other issues within software development.

Jamie Zawinski’s interview was an amazing conversation about a guy who has many interests in learning and doing work. He is a self taught LISP developer who can occasionally get very opinionated. I found his work experience with Netscape fascinating. As a user of the early versions of Netscape, I never knew all of the politics or construction going behind the scenes. I also found it technically intriguing that the pre-3.0 mail reader within Netscape was not written in C++. I have a lot of respect for Mr. Zawinski for being able to identify a potential bias of his – he appeared very introspective when asked about hiring new developers. He understood that he could distinguish people that he could find reputable, but not those who would make good candidates.

One of the things that struck me as a bit off-putting about Mr. Zawinski was his rejection of automatic unit testing. I feel that if it was made as easy in the 90s as it is today, software would be VERY different today.

Brad Fitzpatrick’s interview left me with mixed feeling about the guy. I’m not sure if he is a guy you would want to work with, however he sounds like the kind of guy that you would want to share war stories with over drinks. He has worked on many interesting projects, mainly LiveJournal, and is one of the early “Growth Hackers {http://en.wikipedia.org/wiki/Growth_hacking}.” I like his recommendation that you should spend some time in reading other’s code. He fights the immediate urge to ignore others’ code and his approach sounds different from what I had expected: I expected his approach to making suggestions on other people’s code would be antagonistic. However, it was described as the following:

  1. Code copies are distributed to the audience – in digital and paper form

  2. The developer presents their code line by line

  3. Q&A time

  4. Suggestions / feedback from the audience

This struck me as different from my experience where code reviews tend to be either technically or personally antagonistic (or both). This approach was more similar to proofreading a paper you just made or audience-testing a book you just wrote.

The two things that really put me off about Mr. Fitzpatrick was one of the questions he asks in interviews, and the other is the insistence of knowing everything. Mr. Fitzpatrick’s “famous” interview/programming question was a recycled question from his “AP CS” exam. The question is to write a class that handles large number arithmetic (rewrite BigDecimal). It appears that he uses his previous experience as a baseline for evaluation. I also got the feeling that it is a way for him to “show superiority over others” (over something he did many years ago a high school student). Second, he is incredibly insistent over knowing everything about how every layer works. He ranted against high-level language developers because they didn’t know that a specific way of polling may not work on a specific deployment machine. He even ranted over those who deployed towards a VM because the VM’s “virtual”->native OS/hardware has been abstracted. I feel that in 98% of the cases he’s picking up pennies in front of a steam roller.

I was not very thrilled with Douglas Crockford’s interview. Primarily because it dealt with Javascript, it was a little too high-level for my taste. During the reading of this interview, my mind went back to Mr. Fitzpatrick’s interview. It made me wonder and realize about how you find the “best” tools. I find it incredibly difficult to keep afloat of all the languages and tools available. Recently, for example, I just learned how – and why – Git, Jenkins (plus the automated unit/lint/reporting/checkstyle plug-ins), and deep Maven knowledge are really good things to know if you’re developing in Java.

When new languages, tools, and frameworks come around, I love to read about them and learn how they work (if they’re useful and interesting enough). However, time is limited: how do you identify the tools that would solve the most pressing need you have?  Prior to Jenkins, I built everything via an IDE. Why would I need an automated build tool? I’m the only developer on the project. Prior to Git, I used Subversion – it did everything I needed. Why would I want to make “sub-commits”? Prior to Maven, why would I want to have the build tool automatically deploy the WAR file or require that all unit tests pass before generating an executable? (I’m running unit tests all the time anyway.)

Later it made me think about the code reading suggestion and I realized: I’m not very happy with the code review tools I know about. ReviewBoard looks nice, but that is for only for Ruby. Should I write my own? Where are the existing tools for Java (which can also integrate with Maven and Jenkins)? Are the tools good? Are there others out there that have solved this issue? Is it worth setting up a code review tool just for this? This are questions I’m not sure how to answer.

Overall, I really enjoyed that this book goes over many topics – personal projects, interview questions, and famous debugging stories. I do occasionally enjoy a story bragging how the developer’s language or tool was miles ahead. However after reading about their accomplishments in a serial fashion, it just gets old. Perhaps interspersing their accounts in a more conversational form would have made this book more interesting, and easier to recommend.

Similarities of the individuals who were interviewed:

  1. All have a strong focus on one particular project

  2. Each interviewee has worked in many companies

  3. None of them focused on the reputation of the company they have worked for

  4. All have interesting debugging stories

An Easier Way to Deal with Thread.sleep

If you’ve ever had to use the sleep method you know how “painful” (well a minor annoyance) it is to convert the amount of minutes/hours/days into milliseconds. Well to make this easier, you can use google to do the conversion for you. Granted this is a rather minor thin in the grand scheme of development but it is rather nice to see Thread.sleep(604800000) and then be able to quickly get the answer of 7 days. (Rather than dividing that by 1000*60*60*60*24) To convert the milliseconds, search google with the phrase: “604800000 milliseconds to days” (or hours, etc). You can also reverse the statement to get the amount of milliseconds in a new unit of time. (For example: “5 hours to milliseconds”).

Word of warning: I would never advise another developer to use Sleep for anything longer than a minute or so. The numbers used above were just as an example.

Installing Maven on Centos 5 or 6/RHEL

At the moment there is no RPM package or yum install available for the latest version of Maven on Centos. The user is left to install Maven manually. To attempt to overcome this, I created a script to install the latest, at the moment: 3.1.1. At the moment, there are many things that should be added to the script, they’re listed in the TODO section of the documentation, but those features may be added later.

Instructions on how to run the script, and the script it’s self may be found at: https://github.com/monksy/centos-maven-install 

My response to “7 Open Source Projects to cut your teeth on (and the ones to avoid)”

I’ve been meaning to write an article about a few of the communities in the open source world. However, I believe that the article “7 Open Source Projects to cut your teeth on (and the ones to avoid)” by Rikki of ITWorld has said what I wanted to. Some of the open source projects that I’ve had good/bad experiences contributing to have been:

Good:

Bad:

  • XBMC [They will not take bug reports or feature suggestions]
  • Tiny-Tiny RSS
  • OpenStack Folsom Install Guide [The official documentation doesn’t agree with some of the suggestions, and I’ve tried to point this out]

I understand that these tend to be non-work projects, and that it can take a lot of work to maintain a community. But its rather frustrating that people who attempt to chip in to help make the system/application/code better are treated rather roughly.