Clean Code – Chapter 6: Objects and Data Structures

Tram Ho

There is a reason that we keep our variables private. We do not want anyone else to depend on them. We want to keep the freedom to change their type or performance to their own use. Why, then, do many developers automatically add "getters and setters" to their objects, exposing their private variables as if they were public?

Data Abstraction

Consider the difference between Listing 6-1 and Listing 6-2. Both represent data of a point on the Cartesian plane. But one Listing exposes the implementation and the other completely conceals it.

The nice thing about Listing 6-2 is that there's no way you can tell if the implementation is in rectangular or polar coordinates. It could be none! However, the interface is still unmistakably represent the data structure. But it represents more than just a data structure. Methods to enforce an access policy. You can read the individual coordinates independently, but you have to put the coordinates together acting as a method. Listing 6-1, on the other hand, is explicitly implemented in rectangular coordinates and it forces us to manipulate those coordinates independently. This exposes the implementation. Indeed, it will expose the implementation even if the variables are private and we are using single variable getters and setters. Hiding implementation is not just a matter of putting a function class between variables. Hiding implementation is about abstraction! A class does not simply push its variables through getters and setters. Instead, it exposes abstract interfaces that allow users to manipulate the nature of data without having to know the implementation.

Consider Listing 6-3 and Listing 6-4. The first uses specific terms to convey the fuel level of a vehicle, while the second does so with the abstraction of percentages. In the case of Concrete Vehicle, you can be fairly certain the variables are used. In the case of Abstract Vehicle, you have no clue what form of data is.

In either case, the second option is preferable. We do not want to disclose the details of our data. Instead we want to display our data in the abtract terms.

Data / Object Anti-Symmetry

These two examples show the difference between objects and data structure. Objects hide their data behind abstraction and expose the functions that operate on it. Data structures expose their data and have no meaningful function. Note the nature of the two definitions. They are virtual opposites. This difference may seem trivial, but it has far-reaching implications.

For example, consider the procedure shape example in Listing 6-5. Geometry works on three shape layers. Shape classes are simple data structures without any behavior. All behaviors are in Geometry class.

Object-oriented developers may wrinkle their nose because of this and complain that it is a procedural process and they are right. But the mockers may not be guaranteed quality. Consider what happens if a perimeter () function is added to Geometry . Shape classes will not be affected! Any other classes that depend on shape classes will not be affected! On the other hand, if I add a new shape, I have to change all the functions in Geometry to deal with it.

Now consider the object-oriented solution in Listing 6-6. Here the area () method is polymorphic. No Geometry class is needed. So if I add a new shape, none of the existing functions are affected, but if I add a new function, all the shapes must be changed!

Again, we see the nature of these two definitions; They are virtual opposites! This exposes the basic dichotomy between objects and data structures:

  • Procedural code (code that uses a data structure) makes it easy to add new functions without changing the existing data structure. On the other hand, OO code makes it easy to add new classes without changing existing functions.
  • Procedural code makes adding new data structures difficult because all functions must change. OO code makes adding new functions difficult because all classes have to change

So the hard things for OO are easy for the procedure and the hard things for the procedure are easy for OO! In any complex system, there will be times when we want to add new data types instead of new functions. For these cases, object and OO are most appropriate. On the other hand, there will be times when we want to add new functions. In that case, the Procedural code and data structure would be more appropriate.

The Law of Demeter

There is a famous heuristic called the Demeter2 Law saying that a module should not know about the interior of the objects it manipulates. As we saw in the previous section, objects hide their data and expose operations. This means that an object should not expose its internal structure through accessors because doing so is revealing, rather than hiding, its internal structure. More precisely, Demeter's Law says that a method of class C should only call the following methods:

  • C
  • An object created by f
  • An object is passed as an argument to f
  • An object kept in a variant of C The method should not invoke methods on objects that are returned by any of the allowed functions. In other words, talk to friends, not to strangers.

Train Wrecks

This type of code is often called a train wreck because it looks like a series of paired wagons. Call strings like these are often considered sloppy and should be avoided. Usually it is best to divide them as follows:

Whether or not it violates a Demeter depends on whether or not ctxt, Options and ScratchDir are objects or data structures. If they are objects, then their internal structure should be hidden rather than exposed, and so knowledge of their internal parts is a clear violation of Demeter's Law. On the other hand, if ctxt, Options, and ScratchDir are just non-behavioral data structures, they will naturally expose their internal structure and so Demeter does not apply. Using accessor functions confuses the problem. If the code were written as follows, we would probably ask about Demeter's violation

This problem is less confusing if the data structure simply has public and non-functional variables, while objects have private variables and public functions. However, there are frameworks and standards that require that even simple data structures have access and transducers.

Hybrids

This confusion sometimes leads to unfortunate hybrid structures that are half object and half data structure. They have functions that do important things, and they also have public or access variables and public converters, for all intents and purposes, implementing public privacy variables, change other external functions that use those variables in such a way that a procedural program will use the data structure.

These hybrids make it difficult to add new functions, but it also makes it difficult to add new data structures. They are the worst of both worlds. Avoid creating them. They are a sign of a bad design that their authors are not sure of or worse, unaware of whether they need protection from functions or types .

Hiding Structure

What happens if ctxt, Options and scratchDir are objects that actually behave? Then, because objects are supposed to hide their internal structures, we can't navigate through them. How then are we going to get the absolute path of the top directory?

or

The first option could lead to an explosion of methods in the ctxt object. The second assumption is that getScratchDirectoryOption () returns the data structure, not an object. No option feels good. If ctxt is an object, we should tell it to do something; we should not ask it about its internals. So why do we want the absolute path of the top directory? What are we going to do with it? Consider this code from (more lines further) in the same module:

The combination of different levels of detail is a bit confusing. Dots, slashes, file extensions and File objects must not be carelessly mixed together and mixed with the included code. However, ignoring that, we find that the purpose of getting the absolute directory of the scratch directory is to create a scratch file of a specific name. So, what if we told the ctxt object to do this?

That seems like a reasonable thing for an object to do! This allows ctxt to hide its internal contents and prevent the current function from violating Demeter's Law by navigating through objects it should not know.

Data Transfer Objects

The quintessential form of the data structure is a class with public and no functions. This is sometimes called a data transfer object, or DTO. DTOs are very useful constructs, especially when communicating with databases or analyzing messages, and the like. They often become the first of a series of translation stages that convert raw data in a database into objects in the application code. Somewhat more common is the "bean" form shown in Listing 6-7. Beans have private variables manipulated by getters and setters. The quasi-encapsulation of beans seems to make some OO purists feel better but usually provides no other benefit.

Conclusion

Objects expose behavior and hide data. This makes it easy to add new types of objects without changing existing behaviors. It also makes it difficult to add new behaviors to existing objects. Data structure exposes data and has no significant behavior. This makes it easy to add new behaviors to existing data structures but makes it difficult to add new data structures to existing functions.

In any particular system, we sometimes want to be flexible about adding new data types and so we like objects for that part of the system. Other times we will want the flexibility to add new behaviors, and so in that part of the system, we like data types and processes. Good software developers understand these issues without compromising and choose the best approach for the job at hand.

Share the news now

Source : Viblo