Divide and conquer. About Developers Productivity

It’s really difficult to measure developer productivity. Lines of code, number of features, or number of bugs are bad metrics. But intuitively we all know what productivity is. For me being productive means doing things as quickly as possible and minimizing wasted time as much as possible. This post is about ways to improve productivity.

Speed and direction

Let’s start with easy questions. Why do we want to be productive?

Life is too short and time is always limited, so we want to do more in less time. And from my point of view, being productive does not mean being in a hurry. Being productive means choosing the best tools, and the right approaches, being able to quickly determine the right direction, and investing time wisely. The direction is more important than speed. Many are going nowhere fast.

This post is about personal productivity. And this post is mainly about developers' productivity, which means that all examples are from my experience and some of them will be illustrated by code to be clear for developers. But the same principles could be applied in other areas.

Automate it

Even though this post is for developers, it is worthwhile to mention that simple steps have to be automated. Good developers are trying to avoid boring manual work as much as possible. If you perform the same sequence of steps several times, you probably do not think about it and do it automatically. Delegate it to the machine to have it done truly automatically. Don’t spend your time doing simple things. The machine doesn’t make stupid mistakes, it’s more predictable and saves time for you allowing you to think about complex things.

Some actions can’t be delegated, unfortunately, but can be automated. For example, we still need to type the code manually (it can’t be uploaded to the computer from the head directly). Touch typing and hotkeys improve productivity dramatically. You don’t spend time moving the sight between the keyboard and monitor, you don’t spend time moving the mouse cursor to open some menu and select some item. You feel like your thoughts are translated into written code and actions without any effort. This is because you use your muscle memory. You’re focused on what you do instead of how you do it.

Productive Thinking

After simple steps are automated most of the time is spent on thinking. You think about something when you have no solution and you need to find it. Or there are multiple solutions and you need to choose the best one. And sometimes it is not clear how to approach the task at all. Or you’re implementing some solution but feel that there is a better way, which requires some research. You could spend a lot of time thinking and making no progress. It makes you nervous because time is always limited and you start implementing something just to have something done, but then you realize that you went in the wrong direction and it would be better to spend this time on thinking. So, how to make thinking productive?

Dīvide et imperā

There is the Divide-and-conquer algorithm. The idea is to recursively split a big task into small subtasks until subtasks are simple enough to be solved. After subtasks are solved you combine results to solve the initial task. There are a lot of approaches and methodologies based on this idea. Lean, agile, triz you name it. If you have something big and unmanageable break it down into small manageable parts.

Why is this way used everywhere? This is because our mind is very limited. It’s difficult for us to keep everything in our heads. You can’t be focused on multiple levels at the same time. Breaking something big down into small parts allows you to be focused.

Write it down

Thinking about a problem is a very unmanageable thing. You don’t have a solution and something is unclear so you can’t estimate how much time you need to find a solution, you can’t even say whether will you find a solution or not. Let’s break it down. The first step in solving any problem is understanding the problem. Before trying to find a solution I usually make sure that I understand why is it a problem and why am I going to solve it. These simple questions saved me a lot of time. Very often I realized that the problem I’m going to solve is a side effect of another problem or not a problem at all. Here I described why it’s important to write everything down. In a nutshell, it allows you to offload everything from your head into a piece of paper and make a space in your head for questions and ideas.

Plan and Prototype

As soon as I see the problem I usually try to identify what I know already and what’s unclear. All questions and all unclear points are also written. This is how I have the problem broken down into smaller parts. I know what to do with some parts, but other parts are still not clear. At this stage, my goal is to identify the riskiest parts and try to deal with them. There are different reasons why something is risky:

I’ve never faced such a problem before. I need to find ways to solve it. It could include learning some theoretical things, technologies, programming languages, etc
I know the theory but never did it practically.
I know how to solve the problem, but I’m not sure that this is what users expect to have. Or I know how to implement something, but I’m not sure that it will meet performance, memory, or other requirements and limitations.

I’ll talk about the first point later in Learning new section because it’s a big topic. Sometimes the third type of problem can be solved by discussing solutions with other participants in product development. If someone has a better understanding of how the solution should look like it’s faster to discuss the solution than to implement it. In this case, I’d discuss it. But usually, problems 2 and 3 need to be implemented to make sure that you’re on the right way. Sometimes the riskiest part is the last step in product development. For example, in the last step, I’m going to use a database to persist some processing results and I’m not sure that this database meets our performance requirements. Changing the database on the last step could lead to architectural changes, and it’s very expensive. To avoid this problem most risky parts of the system should be checked before it’s too late. And the best check is to implement a prototype. So I would implement the persistence layer before I have real data. I can feed generated fake data which has the same size and schema as the future real data to make sure that implementation meets our performance requirements. So, I killed two birds with one stone. I checked my idea and implemented a performance test checking a part of the system. This is how I prefer to implement anything. This way works for implementing a new complex system, new feature, or just a small bug fix. Split it into small manageable parts, prioritize them, and implement them step by step.

Sometimes a task is not risky but could block someone else. It should be prioritized also.

10000-foot view

There is a good way to deal with complexity. It is called abstraction. For example, you don’t need to know how the internal combustion engine works to drive a car. A car is a steering wheel, pedals, gearbox, mirrors, and four wheels that bring you from A to B. This is an abstraction of the car most people use. Abstraction allows us to don’t care about unnecessary details. Developers also create abstractions. You don’t need to know how SDD works on a low level to write a file. And in most programming languages same code writes a file to HDD and SDD. It’s really difficult to find a good abstraction. To find a good abstraction you should be able to generalize. As soon as all unclear parts are clarified I’m trying to get a birds-eye view of the whole picture. At this point, I’m trying to make sure that there are no unclear parts. I need to check that I translated all business requirements into technical language. There is always some language describing business requirements (which is called ubiquitous language in terms of DDD) and technical language is the abstractions you created to describe business terms in a programming language. Check requirements one by one and check levels of abstractions one by one. As soon as you see that all requirements are translated it’s time to start thinking about testing and implementation.

I’d like to add a note for perfectionists. You always think that maybe you missed something, and you’re trying to avoid any problems. It’s not possible. As soon as you feel that your abstractions are good enough and you understand everything, switch to thinking about testing because it’s also a part of the whole picture. Maybe at some step, you’ll realize that you missed something. That’s ok. You always can [Refacor it].

Why is it better to have the whole picture before starting coding? Just because when you’re coding and face some problem you need to stop and think about it. It’s not productive. You switch from one activity to another which always means that you lose context. You were thinking about some low-level details while coding and you realized that you need to go to a higher level because there is a conflict between details and abstractions. Even if you found a solution it takes time to refresh the context and continue coding. In the worst case, you need to reimplement what you were implementing before. In this case, you lose even more time. Having a whole picture before coding allows you to avoid these problems. That’s the purpose of the whole picture.

Many try to skip this step as quickly as possible and begin to code. But the cost of an error at the time of design is much lower than at the time of implementation. If you find a mistake in your plan, you can change the plan quickly, because it is only in your head and on a piece of paper. If you find an architectural problem at the last step of the project, it is very expensive to fix it, because you have to redo a lot. Check up on the ideas at a time when it is easy to correct mistakes.

Coding. First things first

Start by creating main entities. No implementation is needed. Just classes, method signatures, and comments inside methods. So this is how you dump your whole picture from your head into code very fast. This way allows you to quickly check your abstractions. Focus on abstractions at this stage. As soon as you have placeholders for future implementation you can check it one more time and make sure that your abstractions work.

It’s applicable on all levels. Imagine that you need to combine some data which is required by the client in some DTO and send it. The DTO has 6 fields and there are multiple services providing pieces of the data.

@Data
@Builder
public class UserDto {
    private final UserId userId;
    private final String firstName;
    private final String lastName;
    private final UserPic userPic;
    private final List<Orders> orders;
    private final List<Item> wishList;
}

You can fetch the field values from services one by one and set them in the DTO. But in this case, it’s very easy to forget something.

For example, in the video, we forget… Did you catch it? Yes, we forget to set userId.

Another way is to copy all fields to the method. You have placeholders for all fields, so you’ll never forget to set some fields.

This is how you go from interfaces and high-level to low-level and implementation details. This way allows you to quickly check your abstractions. And also since you’re implementing things layer by layer you don’t need to switch your focus from low-level abstractions to high-level abstractions, you always work on some certain level, so, it’s easier to be focused.

Test it while it’s manageable

Everyone makes mistakes. Every software has bugs. Our goal is to minimize them and catch the most critical ones at the early stages. Some developers think that writing tests are waste of time. Or at least it has low priority than code. I can’t be productive without tests. Unit, integration, end-to-end, and performance tests are the only way to make sure that my code works. If my code doesn’t work, if it has critical issues then implementing it was a waste of time. I’m not a big fan of TDD but I’m pretty sure that tests shouldn’t be postponed. It shouldn’t be the last step of development. When you have a big amount of code and it has multiple problems then fixing these problems could take more time than implementing the code. Tests allow me to change my code easily because I know that I didn’t break anything, my tests passed. Fixing a code without tests is unpredictable. You fix something but break another part. You could get a combinatory explosion of bugs. And sometimes it’s faster to reimplement everything from scratch than to fix all bugs in existing code.

And one more point. If you don’t have tests and you need to reproduce some problem then the only way is to run the system manually and perform all steps causing the problem manually. If you debug a problem you’ll probably need to reproduce it multiple times. It’s boring and time consumptive. Tests allow you to automate reproducing of bugs and regression checks (So Automate it).

The bottom line is that you need a way to check that your code works. If you don’t know how to check it then it’s too early to write code. That’s actually the idea of TDD, from my point of view. And it’s not about the tests only. How would you check that your system works? How many resources does it use? How could it fail? What are the ways to prevent it?

All good?

As soon as I have the whole picture it’s time to start implementing the system. It’s already split and prioritized so I know the steps. After each step, it’s good to check that everything is ok. Take a look at the whole picture again, and check the direction. After each step, I learn something new and sometimes I realize that difference between my plan and practice is not as wide in the plan as in the practice. That’s why it’s very important to do retrospection and correct a course.

Refactor it

We already talked about abstraction. Sometimes things are changing and abstraction is leaking. Bad abstractions make the code not readable and not manageable. I usually don’t spend much time making my abstractions perfect. I know that probably I’ll change them during development. It doesn’t mean that I don’t have any vision of future system components. As I described in Plan and Prototype I have a vision of the system, before I start implementing it. The bottom line here is that I’m not trying to generalize too much. For example, I could spend a lot of time trying to create a very common API of some microservice. I could try to avoid any possible problems, and try to predict how the API will be used by different consumers. But after the API is ready and implemented I see that some methods are not used, and some consumers need another representation of data. Requirements are changing very fast. It’s better to try to create something small, useful, manageable, and flexible than to try to avoid any problems and spend a lot of time creating a Swiss Army knife. In most cases, you’ll need to refactor it anyway.

So in most cases, it’s better to create a system from top to bottom. Create high-level things first and go deeper. Start from the entry point of your system (or component). Implement methods to receive input and dummy methods to process it. Go deeper creating interfaces and placeholders everywhere. As soon as the system looks good, start adding code to empty methods and test them one by one. At some level, you could realize that you need something more from high levels, so go back and refactor high levels. This is how you could create a skeleton of the system very fast and test your abstractions. This way allows avoiding wasting time implementing details that don’t fit your abstractions. So, in a nutshell, start from interfaces and postpone implementation. This is how you can check your abstractions.

Learning new

Queen of Hearts said to Alice "Here we must run as fast as we can, just to stay in place. And if you wish to go anywhere you must run twice as fast as that". I guess it was about software development and learning. As I said before abstractions allow us to develop new things and don’t care about details. Lots of abstractions are created every day and we must run as fast as we can to learn them. There is no silver bullet, so the more abstractions you know the better solution you choose for your tasks. But sooner or later we face a problem occurred on a low level and details become important. We must run twice as fast as that to understand how abstraction works under the hood to fix a problem.

New less predictable

Say, you’re a junior developer and you just learned a new cool abstraction. The abstraction is called a map (or sometimes it’s called a dictionary). It allows you to find a value by key and time complexity is constant O(1) (so it doesn’t depend on the size of the collection). How cool is that!

You started using it. You created your own class Key.

class Key {
    private int value;

    public Key(int value) {
        this.value = value;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof Key)) return false;
        Key key = (Key) o;
        if (value != key.value) return false;
        return true;
    }

    @Override
    public int hashCode() {
        return value;
    }

    public void setValue(int value) {
        this.value = value;
    }

    public int getValue() {
      return this.value;
  }
}

You implemented some java code using this map. Everything was ok, but after some time you faced a strange problem. You’re trying to get a value by key, but it’s not there anymore.

Map<Key, String> map = new HashMap<>();
Key key = new Key(42);
map.put(key, "Goodbye, mad world");

//a lot of code here
//but we omit it

//get value
log.debug("value is '{}'", map.get(key));

//has the map size been changed?
log.debug("map size is '{}'", map.size());

//do we still have the key in the map?
Key firstKey = map.keySet().iterator().next();
log.debug("check hash codes. equals? '{}'", firstKey.hashCode() == key.hashCode());
log.debug("is it the same key? '{}'", firstKey.equals(key));

//does map contain the key?
log.debug("map contains key? '{}'", map.containsKey(key));

You added some traces and the output is really odd.

value is 'null'
map size is '1'
check hash codes. equals? 'true'
is it the same key? 'true'
map contains key? 'false'

What happened? The key is still in the map. But you can’t get the value by it.

Not ok Google

If it’s not your pet project, the time is probably very limited. You need to fix this problem and go ahead. The fast and simple way is to just google it. Probably someone faced the same problem, fixed it and described a solution. Why this way is not ok?

There is no guarantee that you’ll find an answer.
You should spend time reading about different problems, trying to copy some solutions to your code, and… see point 1.
Even if you’re lucky and the answer is found and copied into your code it’s very unlikely that you’ve learned something. You just copied someone else’s solution.

But, come on, I really don’t have time to go deep trying to understand the problem’s root cause. I just need a solution. Everyone is waiting for the fix, my boss is not happy.

Sometimes this is the case. And I saw developers spending hours googling and trying different solutions. I even know developers who think that they don’t need to read anything about a new language. They just trying to use a new language to implement some projects and google to fix any occurring issues.

As I described in Dīvide et imperā if you understand a problem but don’t know how to solve it try to break the problem into small manageable parts.

I don’t know how the map works under the hood. All I know is that I put a key-value pair into it and I can’t get the value by the key. Let’s start breaking the problem down.

What are the possible root causes?

The map doesn’t work. It was implemented in the wrong way and it’s not possible to get a value by a key.
The map is ok in most of the cases but I found some corner cases.
The map is absolutely ok but I used it in the wrong way.
Using the map is ok, but another part of the code (another thread running in parallel for example) corrupts the map.
My code is ok, but something is wrong with my environment and it somehow impacts the map.

The more our junior developer know the more detailed points are added.

Now, this is not just an "I have no idea what’s going on". Our junior developer has some theories to check. The theories could be prioritized and checked one by one. Point 1 looks like a good start. Because it’s easy to check. Our junior developer put another key-value pair and I’m pretty sure that this time the new value is accessible by the new key. So the map works. We have fewer points to check.

Make it working

Ok. Now we have the map working. We know at least one happy path. And even if we have no documentation and we’re in the plane without access to Google, we already can add breakpoints to map sources, run the debugger and compare happy and unhappy paths.

And this is one more reason to split big tasks into small ones. You start from a small part of the big thing. After you made it work and added regression tests, you can incrementally add new things. If something doesn’t work you always can back to the history of commits in your revision control system and check what has been changed. The problem is in the changes. (And this is actually why you should commit after any small part of the job is done).

This approach allows you to learn new technologies quickly. Even if you have no time to read documentation or books, but you found some tutorial or just a working example it’s a good starting point. You run it, see how it works, and change it step by step to adapt to your requirements.

Separate learning from coding

But ideally, it’s better to allocate some time to learning new things. If you are at the beginning of learning some new technology it’s not a good idea to add the technology to your project and try to make them work together. As soon as you get some problem it’s harder to find a root cause because it’s a part of the project. Sometimes it’s difficult to identify if the problem occurred because of new technology or old bugs.

That’s why we have 4 and 5 possible root causes. Our junior developer started to use the map without a deep understanding of internal details and now the map is a part of a big project. It’s not isolated. So it’s difficult to exclude the influence of existing code from possible root causes.

One more reason to spend some time learning a new technology before using it in a real project is that the possibility of problems is very high. You’re not sure that you use unfamiliar technology in the correct way, so it’s easy to make a mistake. Any mistake makes you googling or reading a book/tutorial/source code to find a solution. So this is the same problem as I discussed in 10000-foot view. You need to switch from coding to googling and vice versa. You lose context every time and you need to spend some time to recall what you were doing. It’s not efficient.

Don’t fight with technologies

One more reason to learn new technology before adding it to your project is… It can be not applicable to your project and your task. If you don’t know how the technology works you can’t be sure that the technology does what you expect. Our deployment is slow, let’s use another CI/CD system. We have lots of data let’s start using Hadoop. Our SQL database is very slow, we have to switch to a NoSQL database. Our NoSQL database makes our data inconsistent, we have to use SQL database instead because it guarantees consistency. I heard such proposals many many times.

What happens after you integrated new cool technology? Sooner or later you realize that it doesn’t fit your requirements. And you start configuring it to adapt, but since you used the wrong technology it’s not flexible enough to be changed to your needs. You could spend a lot of time searching for a way to adapt it, but there is no solution. It was the wrong choice. The only solutions are to find a good replacement and spend time switching from the wrong technology to the right one.

Dive deep. Understanding makes you fast

So, our junior developer starts reading java doc describing hash map to learn how it works. There is a very important comment.

/**
* <p>Note: great care must be exercised if mutable objects are used as map
 * keys.  The behavior of a map is not specified if the value of an object is
 * changed in a manner that affects <tt>equals</tt> comparisons while the
 * object is a key in the map.

Hmmm, our Key class is mutable. It doesn’t control its own state, some external class can change the state by calling setValue method.

Let’s check the code base.

Map<Key, String> map = new HashMap<>();
Key key = new Key(42);
map.put(key, "Goodbye, mad world");

//a lot of code here
//but we omit it

//This is the problem!!!!
key.setValue(0);

//a lot of code here
//but we omit it


//get value
log.debug("value is '{}'", map.get(key));

//has the map size been changed?
log.debug("map size is '{}'", map.size());

//do we still have the key in the map?
Key firstKey = map.keySet().iterator().next();
log.debug("check hash codes. equals? '{}'", firstKey.hashCode() == key.hashCode());
log.debug("is it the same key? '{}'", firstKey.equals(key));

//does map contain the key?
log.debug("map contains key? '{}'", map.containsKey(key));

Our junior developer removes setValue method from Key class, fixes all calls of the method and the problem is fixed.

So, what happened? Instead of spending time googling, our hypothetical junior developer learned how HashMap works under the hood.

This is the implementation of containsKey method:

public boolean containsKey(Object key) {
        key = convertKey(key);
        int hashCode = hash(key);
        HashEntry entry = data[hashIndex(hashCode, data.length)]; // no local for hash index
        while (entry != null) {
            if (entry.hashCode == hashCode && isEqualKey(key, entry.key)) {
                return true;
            }
            entry = entry.next;
        }
        return false;
    }

When you call HashMap.put method your key and value are stored in the data array as a HashEntry. The index in the array depends on the key’s hashCode. If the state of this key is changed you will never find it in the array because hashCode of the key is changed. Instead in some cases, you could find an absolutely different value associated with another key.

Now, it’s obvious why the key should be immutable. Should our developer relax? Yes and no :)

Just removing setValue method doesn’t make our Key class completely immutable. Why?

That’s a good question. Actually, the best way to learn something is to be curious and ask questions.

What could be a HashMap key? Can you have a Thread as a key? Can another Map be a key? What if I pass null as a key? What if I remove a key during hash map iterating? How many keys HashMap can contain? And so on…

To satisfy curiosity, our hypothetical junior developer decided to read some books on Java. And books say that to make some object immutable it’s better to declare class fields as final.

In which case the absence of this modifier can break immutability? If you try to read Java Memory Model specification you could find an answer.

There could be a potential race condition and final allows us to avoid it. This is a really rare case and to reproduce it I had to use -UseCompressedOops jvm option. It doesn’t mean that we shouldn’t try to avoid such problems. On the contrary, we must make every effort to fix it. In fact, the problem with rarely reproducible issues is that they occur on production at the most inappropriate moment (during a demo, for example), but in your test environment, it is very hard to reproduce them and, that’s why it is very hard to fix them.

I used jcstress library to demonstrate the issue. The test creates two threads. The first one instantiates the key and the second one reads the key’s value. It repeats many times to see get all possible cases and see how often each of them occurs.

@JCStressTest
@Description("Checking Key immutability.")
@Outcome(id = "[-42]", expect = Expect.ACCEPTABLE,             desc = "The key has not yet been published. It's ok the access is not syncronized")
@Outcome(id = "[42]",  expect = Expect.ACCEPTABLE,             desc = "The key has published and the value is visible. That's what we expect to have any time")
@Outcome(id = "[0]",   expect = Expect.ACCEPTABLE_INTERESTING, desc = "The key has published but default value is visible.")
@State
public class KeyIsNotImmutable {
    int x = 42;
    Key key;

    @Actor
    public void publish() {
        key = new Key(x);
    }

    @Actor
    public void consume(IntResult1 res) {
        Key localCopy = key;
        if (localCopy != null) {
            res.r1 = localCopy.hashCode();
        } else {
            res.r1 = -42;
        }
    }
}

Here are the results:

Observed state	Occurrences	Expectation	Interpretation
[-42]	155,903,168	ACCEPTABLE	The key has not yet been published. It’s ok and not interesting
[42]	14,756,652	ACCEPTABLE	The key has published and the value is visible. That’s what we expect to have any time
[0]	1,630	ACCEPTABLE_INTERESTING	The key has published but default value is visible.

Observed state

Occurrences

Expectation

Interpretation

[-42]

155,903,168

ACCEPTABLE

The key has not yet been published. It’s ok and not interesting

[42]

14,756,652

ACCEPTABLE

The key has published and the value is visible. That’s what we expect to have any time

[0]

1,630

ACCEPTABLE_INTERESTING

The key has published but default value is visible.

So, the final guarantee that we don’t have the third case. Now, we know why immutable objects should have final fields.

This is what I call curiosity-driven learning. You learn something new, you ask questions, and you run some experiments to see how things work. The more you learn the more questions you have. The more answers you find the deeper your knowledge. The better your productivity.

Ask questions, think, write code, and be productive!

Konoplev's Blog