The Three Reasons for Data Encapsulation
Most object oriented programming textbooks and courses explain the need for access protection and data encapsulation by claiming that it separates interface from implementation. Thus it allows programmers to vary the implementation independently of the interface for optimization, backend changes, or other reasons. This is wrong, and this claim is the source of a lot of bad theorizing on a host of subjects including finality and the choice between interfaces and concrete classes.
It’s not that data encapsulation does not separate interface from implementation, or that separating interface from implementation is unimportant. It’s just that this is by far the least important reason for data encapsulation; and yet it’s the primary or even only reason that’s mentioned in most textbooks and courses. Most classes will never need more than one implementation. The other two reasons you want data encapsulation apply to essentially all classes and are vastly more important in practice.
Reason #1: Design By Contract
Data encapsulation allows programmers to enforce class invariants, preconditions, and postconditions. For example, in a Clock
class it is reasonable to require that the hours
field always be between 1 and 12, the minutes field be between 0 and 59, and the seconds also be between 1 and 59 inclusive. These are class invariants. By marking the fields private
, we can check all access to these fields and make sure that no one ever sets them to illegal values. If someone tries, the method throws an exception and does not make the change:
public class Clock { private int hours; // 1-12 private int minutes; // 0-59 public Clock(int hours, int minutes) { if (hours < 1 || hours > 12) { throw new IllegalArgumentException("Hours must be between 1 and 12"); } if (minutes < 0 || minutes > 59) { throw new IllegalArgumentException("Minutes must be between 0 and 59"); } this.hours = hours; this.minutes = minutes; } public final void setHours(int hours) { if (hours < 1 || hours > 12) { throw new IllegalArgumentException("Hours must be between 1 and 12"); } this.hours = hours; } public final void setMinutes(int minutes) { if (minutes < 0 || minutes > 59) { throw new IllegalArgumentException("Minutes must be between 0 and 59"); } this.minutes = minutes; } // ... }
However if the fields were public, there would be nothing to stop someone from bypassing all our carefully constructed checks with a simple statement like this:
Clock c = new Clock(12, 3); c.hours = -56;
This is the first reason we need access protection. Marking things private enables us to write code safe in the knowledge that values will always be in bounds. It enables us to isolate bounds checking and other constraints in a few carefully designed methods. It produces more robust, reliable code.
Reason #2: Human Interface
The I in API stands for interface, and interfaces are for people, not just machines. APIs can be complex, confusing things. The less there is of it, the better. The smaller and simpler the API is, the easier it is to learn and use; the more likely it is that the API will be used correctly.
The more pieces you can mark private, the more pieces are hidden from client programmers. In a language like C or Fortran without access protection essentially every function is available, whether it was intended for public use or not. Even if you don’t document or provide headers for the private functions, people will figure them out and start using them anyway. A good private
is far more effective at keeping programmers’ hands out of your class’s internal parts. This helps you stop worrying about whose code you’ll break if you change something. This helps your users because they have less to learn and understand. It’s a win-win.
There’s a second way data encapsulation improves human interface, and it falls out of Reason #1. When a class enforces its own conditions and invariants, the client programmer has less to worry about. If they pass bad data to a method, they’ll get an immediate exception. Early and obvious failure is immensely preferable to creating a bad object and not noticing the problem till something else fails somewhere down the call chain sometime later.
Reason #3: Separating implementation from interface
Finally, we get to the first thing most textbooks cover. Separating implementation from interface allows the implementation to vary independently of the interface. For instance, you can switch a class from storing persistent data in a file to storing it in a database or vice versa. Sometimes this is important; but be honest: how many times have you really needed to do this? Out of the thousands of classes you’ve written, how many have had significant changes in their internal data structures? Maybe 1 in 10? And in how many have you actually needed multiple simultaneous implementations? Maybe 1 in 100, or less?
Sure it’s good that we can separate implementation from interface, and marking fields and as many methods as possible private
does this; but don’t go overboard. In particular don’t replace all your classes that can enforce preconditions, postconditions and invariants with interfaces that can’t.
June 1st, 2006 at 12:39 pm
I agree with making things private when they shouldn’t be used by anyone else. But that doesn’t mean you need to have less interfaces. The “interface” of the system should be well known so that the complexity of using the interface becomes a question for the documentation “what interface do I use to accomplish X?” Unless you have an entirely closed system (even to internal changes), this question will always come up.
“Finally, we get to the first thing most textbooks cover. Separating implementation from interface allows the implementation to vary independently of the interface. For instance, you can switch a class from storing persistent data in a file to storing it in a database or vice versa. Sometimes this is important; but be honest: how many times have you really needed to do this? Out of the thousands of classes you’ve written, how many have had significant changes in their internal data structures? Maybe 1 in 10? And in how many have you actually needed multiple simultaneous implementations? Maybe 1 in 100, or less?”
Any platform that has a plugin interface will need this kind of flexibility. I deal with this daily. Each plugin has its own schema of data. The domain knowledge in the plugin is entirely separated from the system inerfaces.
Now, I am not sure if you are arguing that all “design by interface” is bad or are you just saying that “you need to be able to close the interface when you want to”.
I am firmly behind the latter point:
To your clock example. It looks like you have the class for an analog clock face. Down the road your Product Manager asks you to develop a digital clock face. Now there is a problem, digital clock faces usually have AM and PM indicators on North American clocks. Also imagine that you need to develop for Europe and need a digital clock that handles 24-hour time. How would you be able to do that while still maintaining the current class as it is implemented? You can’t subclass and override your Final methods, so you would have to rewrite your clock class entirely to fit this change in. If you had built an IClock interface that has “contracts” for the constructor Clock (hrs, min) as well as virtual methods for setHours, setMinutes methods. Your “final” class would be “AnalogClock12HR”, and now you can build “DigitalClock12HR” as well as “DigitalClock24HR”. Now tell me how designing by interface does not let you perform data encapsulation. You can entirely close your class while letting the structure be flexible.
June 2nd, 2006 at 5:51 am
The 24 hour clock example was addressed in the previous post. In brief, you can add new methods that have different conditions; but under no circumstances is a subclass allowed to relax the postconditions of methods it overrides in the superclass. That’s a violation of polymorphism. If you do that, the subclass is no longer a proper instance of the superclass. Code that depends on getting a value between 1 and 12 back from the
getHours()
method will break when passed an instance of the subclass that returns values between 0 and 23.This is almost as bad as going back and changing the original class so that its
getHours()
method returns a value between 0 and 23. You’ve broken compatibility, and you could do this because the class was not closed. Do you need to do this sometimes? Sure, but subclassing has not gained you anything. Don’t labor under the illusion that merely because you’ve subclassed instead of changing the base class, you haven’t broken anything. You absolutely have.The solution is simple: add new methods to the subclass that do what you want. Do not overide and break the existing methods, conditions and invariants.
October 17th, 2007 at 4:54 pm
I need to see a good explanation from someone, about why you would protect a data member, and have a public getter AS WELL AS a setter, instead of just making the member public.
I’m a developer, and all of my schooling is in software. However, everyone I work with (for the most part) are all EE majors, and are all 10 years older than I am. I realized that I have no explanaition to give them, of why it is better to protect your data, even if public functions allow changes that would give the same effect as leaving the members public in the first place.
Aaron
December 15th, 2009 at 12:16 am
Aaron,
Answering your comment is an act of necromancy, but your comment was, too, so:
If you make an object public instead of providing a get and set, then you can never change the object without affecting external users. You could say “so what” to that, but your software may undergo many revisions including changing the way data is organized. Right now the public object might be a stand-alone float, tomorrow you might want to put it into a record, etc.. As long as you don’t rename the get and set, you can change your objects’ names without affecting others.
That is the real value of encapsulation: not “hiding” what you have or do from the eyes of other developers, just ensuring independence of development. It’s not a rule, it’s a benefit to be weighed against its costs.
March 30th, 2010 at 12:12 am
Hi J H ,
As you said “If you make an object public instead of providing a get and set, then you can never change the object without affecting external users.” Can you please explain this in programming way. I am not able to understand what you are trying to say.
Please explain bit more .
Thanks
April 6th, 2010 at 9:59 pm
This is precisely why I dislike dependency injection – by forcing a public getter and setter, it essentially forces all of your fields to be public. True, one can enforce limits in the ‘set’ method, but nobody ever seems to bother. After all, who has the time to meddle with each of the overwhelming myriad of get and set methods generated by the IDE?
My personal preference is to enforce immutablilty when possible (by only setting from the constructor) and then limiting the number of fields with public ‘set’ methods to those strictly required.
Just my take on it, YMMV. 🙂
-= miles =-
June 26th, 2011 at 9:17 pm
Was actually doing a search and came across this site. I must say that this info is on point! Keep writing more. I will be following your sites
June 30th, 2011 at 2:06 am
My good friend normally linked me to this web-site unfortunately this is actually the very first post I had checked out up to now. Im quite fascinated and so now a fan.
August 27th, 2011 at 1:41 am
Sorry, I dont have much time to say why cause Im in a hurry and happen to read your post, nonetheless i feel I have to write you that I strongly disagree with you. Ill come back and explain myself better later. Cheers -instantempo
January 24th, 2013 at 5:12 pm
@miles zarathustra
Many dependency injection frameworks support injection through constructors or private fields. This may not have been true 3 years ago when you made your comment.