OK, so I know what a code smell is, and the Wikipedia Article [1] is pretty clear in its definition:
In computer programming, code smell is any symptom in the source code of a computer program that indicates something may be wrong. It generally indicates that the code should be refactored or the overall design should be reexamined. The term appears to have been coined by Kent Beck on WardsWiki. Usage of the term increased after it was featured in Refactoring. Improving the Design of Existing Code.
I know it also provides a list of common code smells. But I thought it would be great if we could get clear list of not only what code smells there are, but also how to correct them.
Huge methods/functions. This is always a sure sign of impending failure.
Huge methods should be refactored into smaller methods and functions, with more generic uses.
See also these related questions on SO:
[1] http://stackoverflow.com/questions/20981/how-many-lines-of-code-is-too-manyCommented out code. I realize that this one is a lot of people favorite thing to do, but it is always wrong.
We are in an age when it is easy to setup a source code versioning system. If code is worth keeping, then check it in, it is saved now and forever. If you want to replace it with a new version delete it:
I once saw a function that had over a hundred lines of commented out code, when I removed the commented out code, it was only 2 lines long.
Sometimes we comment out code during debugging and development. This is OK, just make sure to delete it before it is checked in.
Duplicate or copy/pasted code. When you start seeing things that could easily be reused that aren't, something is very wrong.
Refactoring to make reusable assemblies is the only cure.
See the Once and Only Once principle, and the Don't Repeat Yourself principle.
Simian [1], a code similarity detection tool works wonders for fixing this.
duplo [2] is a good open source project that provides a free alternative.
[1] http://www.redhillconsulting.com.au/products/simianMethods with a ridiculous (e.g. 7+) amount of parameters. This usually means that there should be a new class introduced (which, when passed, is called an indirect parameter.)
Avoid abbreviations.
Variable names such as x
, xx
, xx2
, foo
etc (obviously if you are using Cartesian coordinates, 'x' is perfectly appropriate.) Rename.
Empty catch blocks. Especially if the exception being ignored is java.lang.Exception
/System.Exception
.
If a block of code could throw multiple exceptions, there should be multiple catches. Each handling the appropriate exception accordingly.
An empty catch might mean the developer doesn't have an understanding of logic in the try, and the empty catch was added to pass the compiler. At the very least they should contain some kind of logging logic.
throws
clause. - Mehrdad Afshari
Comments that focus on what the code does and not why. The code tells you what it does (okay, there are exceptions, but that is another smell). Try to use the comment to explain why.
nested ifs [1]
if (cond1) {
/* Something */
if (cond2) {
/* Something else */
if (cond3) {
/* Something else, again */
if (cond4) {
/* Something else, again */
if (cond5) {
/* Something else, again */
if (cond6) {
/* Something else, again */
if (cond7) {
/* Something else, again */
if (cond8) {
/* Something else, again */
if (cond9) {
/* Something else, again */
if (cond10) {
/* Something else, again */
if (cond11) {
/* Something else, again */
if (cond12) {
/* Something else, again */
if (cond13) {
/* Something else, again */
if (cond14) {
/* Something else, again */
if (cond15) {
/* Something else, again */
if (cond16) {
/* Something else, again */
if (cond17) {
/* Something else, again */
if (cond18) {
/* Something else, again */
if (cond19) {
/* Something else, again */
if (cond20) {
/* Something else, again */
if {
/* And goes on... */
a severe stench emanates when a horizontal scroll bar appears
[1] http://stackoverflow.com/questions/114342/what-are-code-smells-what-is-the-best-way-to-correct-them#115975If code has lots of numbers all the way through it will be a pain to change them and you may miss something. Those numbers might be documented or commented, but comments and code can very easily get out of sync with each other. When you next read the code will you remember what the number means or why it was chosen?
Fix this by replacing the numbers with constants that have meaningful names. But don't make the names too long. It's up to you whether to import these constants from another file or limit them to the immediate scope.
Similarly for excessive amounts of string literals in the code, either use well-named constants or read them from a resource file. This can also aid internationalisation/translation efforts.
99 chances out of 100 that this code is wrong. Either it's too complicated or just badly engineered.
Find the code author, make him to explain what the darn code does until he starts to cry "I wrote it yesterday, how can I remember what it does?! I would never write such code again! I promise!!!"
Refactor to make the code plain. Everything has a good name.
quicksort(input_list)
). - quanticle
Huge code blocks in switch
statements.
Move code to separate functions. Consider using a function table (dictionary).
Reusing Variables
Using the same variable to mean different things in different parts of the same thing, just to save a couple of bytes on the stack (or even just because you couldn't be bothered declaring a new variable). It's very confusing - don't do it!
Declare a new variable - the compiler is smart enough to place them in the same memory location for you if their uses don't overlap.
Example
var x = value;
if (x > y) doSomething();
x = someOtherValue;
if (x == 'Doh') doSomethingElse();
Premature optimization
If you read application level code littered with bit shifts instead of multiplication, and similar optimization tidbits, consider educating the author about the tradeoff between optimization and readability.
Getting and re-getting the same property.
e.g.
if(getValue() != null && getValue().length() > 0
&& !getValue().startWith("Hugo") ...)
Who knows what is going on inside getValue()? Could be something costly.
I would prefer:
String value = getValue();
if(value != null && value.length() > 0
&& !value.startsWith("Hugo") ...)
Edit
I would agree with those commentators who say leave it up to the JIT to optimise the call. This will work in most cases. However, this became a bit of a problem for me in a framework where some getters did not simply pick up a property but in fact performed a hash table lookup or even disk or network action (can't remember which now). I guess the JIT can't do much there.
My point is that you don't always know what the getter does. Perhaps a naming convention such as getX for properties and fetchX for other actions might help. I just think getting/fetching a value once and then working with a copy of the value avoids the problem (unless you really know what is going on in the getter, of course)
if (Foo(getValue())
. Now your code is clearer and there's only one query and no new local - Jay Bazuzi
object.X
when there are no conceptual side effects, object.GetX()
when there are side effects. - Coincoin
Overengineered design
This is common when introducing ten thousands frameworks and then handling everything indirectly, even for very simple chores.
So you have an ORM and a MVC framework, and several different layers in your application: DAO, BO, Entities, Renderers, Factories, at least a couple of contexts, interfaces crawl all over your packages, you have adapters even for two classes, proxies, delegate methods... to the point that there isn't even a single class which does not extend or implement something else.
In this case you'd better prune some dead branches: remove interfaces wherever they don't provide useful class interchangeability, remove man-in-the-middle classes, specialize too generic methods, throw in some generics/templates where you wrote your own classes that are only wrappers for collections and don't really add any value to your design.
NOTE: of course this does not apply to larger, ever changing applications where the layers and the intermediate objects are really useful to decouple stuff that otherwise would be handled by shotgun surgery.
They are a burden on the human mind.
I ran into a piece of code such as:
if( ! value != 1 )
Quite confusing to read! I suspect the original programmer was debugging and changing the logic made the program work; however was too lazy to properly change to:
if( value == 1 )
Else
is a negativeWhen you see else
you have to mentally negate the original
condition. If the original condition already includes a
negative, then you have to work extra hard. Negate the
condition and swap the conditional clauses.
If the else
clause is the "happy path", i.e. the non-
error case, then your brain has to work to follow the flow
of code. Use Guard Clauses instead.
Even single negatives require mental effort. So it's easier to read:
if (IsSummer())
than
if (!IsWinter())
Also
Methods with boolean boolean arguments tend to hurt readability. A common example I've seen is the following:
myfile = CreateFile("foo.txt", true);
In this example, it's reasonable to assume that this snippet creates a file called "foo.txt", but we have no idea what the "true" means. However, if the snippet was instead:
myfile = CreateTempFile("foo.txt");
You'd have a much better idea of what the coder intended. With that in mind, it's generally a good idea to break up methods with boolean arguments into two methods. So that
File CreateFile(String name, boolean isTemp);
becomes
File CreateFile(String name);
File CreateTempFile(String name);
You can also get arround this by creating an enum:
myFile = CreateFile("foo.txt", FileType.Temp);
CreateFile
doesn't necessarily create a file because it's actually only creating a file handle, then things become clearer. "Creating" a readonly file means: create a handle to an existing file that provides readonly access. (Incorrectly named methods is the root cause of many bugs!) - Craig Young
setVisible
which takes a single boolean parameter, and that is an example where it is perfectly fine and intuitively to use a boolean argument. Almost all other cases is a sign of a poor API where the implementer has prioritized making the function easy to write rather than easy to use. - hlovdal
Static variables or the design patterns variant: Singletons
How to refactor:
Find the largest scope in which the variable lives, and move it there as a non-static variable (if it is a singleton, convert it to a non-singleton and make an instance to pass around).
You should now get lots of compile errors (missing reference to the static variable/singleton). For each one, decide whether it makes best sense to inject the reference as a instance member in the class (either in the constructor, if it is a mandatory dependency, or in a setter-method, if it is an optional dependency) or to pass the reference in the method-call. Make the change. This will probably propagate outwards until you reach the outer scope where the variable now lives.
A dependency injection framework, such as Guice, can make this easier. The DI framework can be configured to create only one instance of the class and pass it as a constructor parameter to all classes which use it. In Guice you would annotate the singleton class with @Singleton [1] and that would be it.
[1] http://code.google.com/p/google-guice/wiki/ScopesThis is a very good sign of presence of types violating SRP (Single Responsibility Principle).
Analyze such types carefully, Often you can find that single class contains a bunch of unrelated methods - split it. Sometimes you can find that some method is used only once (often happens after refactoring), consider purge such methods.
Inconsistent naming of variables, methods, functions etc (mixing CamelCase
with hpwsHungarian
with everything_separated_with_underscores
)
Just a general comment about code smells:
Both of my answers have received several comments like this: "But sometimes XX is the right thing to do" or "If you always replace YY with ZZ you are going to end up with an overengineering pile of ...".
I think these remarks mistake the meaning of a code smell: a code smell is not the same as an error - if they were, we would probably just make the compiler find them and return an error.
A code smell is nothing more than something that suggests that here is a possible refactoring. Smells may be more or less strong, and it is usually impossible to make hard and fast rules about them.
Sometimes a method with six arguments may be the best solution, I don't think I would like a method with seven arguments, but I would oppose a coding standard that forbid them. In some applications, a static variable might make perfect sense, but I wouldn't like that a large application hid its entire internal dependency structure in a big clump of static variables.
To summarize: code smells are simple heuristics that indicate that you might want to consider refactoring and suggest a possible appropriate refactoring.
Primitive Obsession
Always using "int" and "double" where you should have a class such as "Money" or "OrderValue" so that you can apply different logic or rounding. It also ensure method parameters are better structured.
string
is the most common object of such obsession.
...
Judging by the comments, an example is called for. Here's one I saw a couple weeks ago.
We have an application that processes Word document content. All through our app, we had string parameters that took file paths to those documents. We found a bug in one place where, due to a poorly-named parameter, we were passing in a file name rather than a file path, causing the app to look for the file (and not find it) in the default application path.
The immediate problem was the poorly named parameter. But the real problem was passing strings around. The solution was to use a class like this:
public class FileLocation
{
readonly string fullPath;
public FileLocation(string fullPath)
{
this.fullPath = fullPath;
}
public string FullPath
{
get { return fullPath; }
}
}
Your first reaction may be that this is a class that hardly does anything. You're right; but that's not the point. The point is that now instead of
public void ProcessDocument(string file)
we have
public void ProcessDocument(FileLocation file)
As long as we're constructing FileLocation objects correctly in the one place that generates them, we can pass them through the rest of the app with no worries, because now we can't possibly confuse them with file names, or directories, or any other type of string. This is the most fundamental reason to correct primitive obsession.
Smell: Testing of "normal" code instead of exceptions.
Problem: The "normal operation" or most commonly/naturally executed code is put inside if bodies or attached as an else body to some error checking.
Solution: Unless it is impossible or there is a very good reason for not to, always test for exceptions and write your code so that the normal case is written without any extra indentation.
Examples:
void bad_handle_data(char *data, size_t length)
{
if (check_CRC(data, length) == OK) {
/*
* 300
* lines
* of
* data
* handling
*/
} else {
printf("Error: CRC check failed\n");
}
}
void good_handle_data(char *data, size_t length)
{
if (check_CRC(data, length) != OK) {
printf("Error: CRC check failed\n");
return;
}
/*
* 300
* lines
* of
* data
* handling
*/
}
void bad_search_and_print_something(struct something array[], size_t length, int criteria_1, int criteria_2, int criteria_3)
{
int i;
for (i=0; i<length; i++) {
if (array[i].member_1 == criteria_1) {
if (array[i].member_2 == criteria_2) {
if (array[i].member_3 == criteria_3) {
printf("Found macth for (%d,%d,%d) at index %d\n", criteria_1, criteria_2, criteria_3, i);
}
}
}
}
}
void good_search_and_print_something(struct something array[], size_t length, int criteria_1, int criteria_2, int criteria_3)
{
int i;
for (i=0; i<length; i++) {
if (array[i].member_1 != criteria_1) {
continue;
}
if (array[i].member_2 != criteria_2) {
continue;
}
if (array[i].member_3 != criteria_3) {
continue;
}
printf("Found macth for (%d,%d,%d) at index %d\n", criteria_1, criteria_2, criteria_3, i);
}
}
Rule of thumb: Never test the normal case, test exceptions.
Treating Booleans as magical, unlike other values. When I see
if (p)
return true;
else
return false;
I know I'm seeing a rookie coder and prepare myself accordingly. I strongly prefer
return p;
p
that evaluates to true, you should still just use return p == true;
. - Kristian Antonsen
return !!p;
. Or return Boolean(p);
if maximum readability is your aim. It's the if statement that I find smelly, not the soft truthiness of the response. (That's a separate discussion entirely, and one which can be played both ways in dynamically typed languages.) - pcorcoran
Testing a bool for equality with true
bool someVar
....
if(someVar == true)
{
doStuff();
}
The compiler will probably fix it, so it doesn't represent a huge problem in itself, but it indicates that whoever wrote it struggled with, or never learnt the basics, and your nose should be on extra high alert.
_Bool
type is guaranteed to be either 0 or 1. The standard definition of bool
in <stdbool.h>
expands to _Bool
, and true
and false
expand to 1
and 0
. - dreamlax
_Bool
, it can be any type that can be converted to an int. - JeremyP
<
, <=
, ==
, !=
etc always result in 0 or 1. Any assignment to a _Bool
type always result in 0 or 1 too. But, you are right: if
and while
always check for non-zero expression rather than a boolean type (which is different in C++, which does). - dreamlax
Any thread which depends on a sleep(n) call where n is not 0 (to release other threads).
For an example of why - see here: http://www.flounder.com/badprogram.htm#Sleep
Basically - the time you set for a sleep(n) call can always be wrong.
To avoid this sort of thing, coders should be using a more "waitFor a specific event" structure than "sleep for some arbitrary time waiting for other things" which nearly always can be wrong...
More of a pet peeve: not conveying role when naming variables.
For example:
User user1;
User user2;
instead of:
User sender;
User recipient;
Also, expressing a role with respect to the wrong context. Class attributes should be named with respect to their class.
Method parameters should be named with respect to their role within the method NOT the role of the passed arguments with respect to the calling code.
Inappropriate Intimacy
When classes access each other's fields or methods too much.
Some suggestions how to refactor/avoid:
Comments that precede code such as the following:
// The following is a bit hacky, but seems to work...
...or...
// Quick fix for <xxx>
Switch statements
Switch statements might be okay, but they are a smell that shows you should consider using inheritance or a State or Strategy pattern.
One example: (obvious, but I have seen something like this several time):
switch(GetType().ToString())
{
case "NormalCustomer":
// Something
break;
case "PreferredCustomer":
// Something else
break;
}
A bit less obvious:
switch(this.location.Type){
case Local:
// Something
break;
case Foreign:
// Something else
break;
}
How to refactor
Move the switch to a separate method.
Do you already have an appropriate inheritance hierarchy?
2a. If you do, you might only need to move the individual case-statements to individual subclasses.
2b. If not: introduce inheritance - either directly, or by using the State- or Strategy-pattern.
Edit: I am not arguing that you should replace all switches with polymorphism. Only that switches are a place where you should ask yourself whether polymorphism might make the code simpler. If replacing the switch would make the code more complex ("overengineered"), just don't replace it.
In languages that support OO, switching on type (of any kind) is a code smell that usually points to poor design. The solution is to derive from a common base with an abstract or virtual method (or a similar construct, depending on your language)
eg.
class Person
{
public virtual void Action()
{
// Perform default action
}
}
class Bob : Person
{
public override void Action()
{
// Perform action for Bill
}
}
class Jill : Person
{
public override void Action()
{
// Perform action for Jill
}
}
Then, instead of doing the switch statement, you just call childNode.Action()
Not directly code, but it smells when VCS [1] log messages are empty or has such pattern:
"Changed MyClass.java.".
Fix: write useful comments telling why the change has been done, not that it was done.
E.g.: "Fixed bug #7658: NPE when invoking display of user."
[1] http://en.wikipedia.org/wiki/Revision%5FcontrolCatching exceptions in the same method that threw them. This can indicate that exceptions are being used for control flow, which is a big no-no. Use a break or goto instead.
Steve McConnell's book "Code Complete: A Practical Handbook of Software Construction" is essentially a 900 page answer to this question. It's an outstanding book.
http://www.amazon.com/Code-Complete-Practical-Handbook-Construction/dp/0735619670 [1]
[1] http://rads.stackoverflow.com/amzn/click/0735619670Dumb comments or comments which are not updated when the code changes:
// Check all widgets (stops at 20 since we can never
// have more than 20 widgets).
for(int i = 0; i < 55 && widget[i]; i++)
processWidget(widget[i]);
Large classes, that have more than one responsibility. Should then be separated.
Reinventing the wheel.
For instance, I love doing code reviews and finding some brand-spanking-new shiny version of 3DES. (Happens more often than you'd think! Even in JavaScript!)
"Whaaat? We MUST encrypt the CC/pwd/etc! And 3DES is SOOO easy to implement!" It's always a challenge to find the subtle flaws that make their encryption trivially breakable...
How to correct it - quite simply, use the platform provided wheels. Or, if there is REALLY an ACTUAL reason not to use that, find a trusted, reviewed module already built by somebody who knows what he/she was doing.
In the above example, almost every modern language provides built-in libraries for strong encryption, much better than you can do on your own. Or you could use OpenSSL.
Same goes for other wheels, don't make it up on your own. It's stinky.
Presence of GOTO statement.
Usually it means that either algorithm too complicated or function control flow is screwed.
No general practice unfortunately. Each case should be analyzed individually.
Excessive Inheritance
Many newcomers to object-oriented programming immediately pick up on the idea of inheritance. It's intuitive because it is superficially isomorphic to the way we organize concepts into hierarchies. Unfortunately, it's often the case that this is the only abstraction that they end up using, so you end up with class hierarchies like:
Buzz
BuzzImpl
FizzBuzz
BatchFizzBuzz
BatchFizzBuzzWithFoo
BatchFizzBuzzWithBar
BatchFizzBuzzWithNeitherFooNorBar
FizzBuzzThatSendsEmail
BuckFizzBuzz
BuckFizzBuzzWithFoo
...etc.
To fix, use composition rather than inheritance. Instead of having FizzBuzz inherit from Buzz, have it take a Buzz in its constructor.
Can't believe this hasn't been provided: high cyclomatic complexity [1] for a method/function. Anything over 20 (with some exceptions like dispatcher methods) should have functions extracted out.
Source Monitor [2] is an excellent tool for gathering source code metrics like this.
[1] http://en.wikipedia.org/wiki/Cyclomatic%5FcomplexityThis is from " Practical Common Lisp [1]"
[1] http://www.gigamonkeys.com/book/practical-a-simple-database.htmlA friend of mine was once interviewing an engineer for a programming job and asked him a typical interview question: how do you know when a function or method is too big? Well, said the candidate, I don't like any method to be bigger than my head. You mean you can't keep all the details in your head? No, I mean I put my head up against my monitor, and the code shouldn't be bigger than my head.
Unused variables or fields.
Remove them. There are helper tools (different IDE [1]s, checkstyle [2], etc.) which may inform you if you have some.
[1] http://www.jetbrains.com/idea/Using magic numbers for return values.
int orderID = DAL.GetOrderIDForUser(1);
if(orderID == -1)
ShowNoOrdersPage();
else
ShowOrder(orderID);
It is just a matter of time before you end up with a -2 or -3.
Enum result = DAL.GetOrderIDForUser(1)
? - CMR
enum {NoOrders = -1}
- George V. Reilly
Putting in a "temporary" fix.
Temporary fixes have a funny way of becoming permanent because you never seem to have the time/inclination/memory to go back and fix them.
If you're going to fix something, fix it the right way the first time.
If it seems like it will be a huge undertaking to change it then maybe you need to re-evaluate why it needs to be changed. If it's unavoidable that it must be changed then put in the best fix that you can in the time allotted and assume that it will be permanent (because it will be).
Variable Scope too large.
Global variables make it hard to keep track of which function modified something and when.
Refactor so that variables exist only within a function. Functions should pass information to each other via arguments and return values.
Wrong spelling of class and method names. Look up in a dictionary if not sure or use plugin in your IDE to check spelling for you.
String concatenation, specially used for giant prepared SQL-statements, when there are lines like:
String select = "select ..."
// Lots of code here
select += "and c.customer_id = ? "
select += "and da.order_id in (?, ?, ?)"
And worse, then tracking the position of the index:
preparedSt.setInt(++cnt, 15);
preparedSt.setString(++cnt, 15);
Repeatedly accessing beans or value objects properties like this:
customer.getOrders().get(0).getPaymentDetail().getAmount();
Seeing nested 'ifs' like this when single-level if's or a switch statement will suffice:
if (cond1) {
/* Something */
} else {
if (cond2) {
/* Something else */
} else {
if (cond3) {
/* Something else, again */
} else {
/* And goes on... */
}
}
}
Using two boolean variables to refer to a simple, unique condition, that can be deduced just by one of them:
boolean finished, nextIsPending;
if (task.isRunning()) {
finished = false;
nextIsPending = true;
}
Gratuitous (unnecessary) use of multithreading.
I've taken over many multithreaded applications, and I never needed a code smell to know there was a problem. Multithreaded applications are usually incredibly unreliable, and fail frequently in impossible-to-reproduce ways. I can't count the number of times I've seen an application start the execution of a long-running task on a separate thread, and then go into polling mode on the main thread (usually using Thread.Sleep(n)) and wait for the long-running task to complete.
Classes or structs with lots of member variables.
A class or struct with more than about a dozen member variables probably hasn't been correctly factored into sub-components/classes.
eg:
class Person
{
string Name;
string AddressLine1;
string AddressLine2;
string AddressLine3;
string Addressline4;
string City;
string ZipCode;
string State;
string Country;
string SpouseName;
string ChildName1;
string ChildName2;
string ChildName3;
int Age;
// and on and on and on
}
Should be:
class Address
{
string[] AddressLines;
string ZipCode;
string State;
string Country;
}
class Person
{
string Name;
Address Address;
Person Spouse;
Person[] Children;
int Age;
}
And this is just one contrived example.
This is a general criticism of poorly written or poorly designed software.
Example usage: "I was disappointed when I saw the source code. Everything basically works, but that code smells".
There are a lot of reasons why software might qualify as "smelly". People have listed quite a few specifics here.. Things like having overly complicated data structures, global variables and goto statements. But while these are all symptoms of smelly code, the truth is that there isn't a hard and fast rule. In programming, any specific problem could be solved a handful of ways, but not every answer is as good as the next.
We value code that is easy to read. Most programmers will probably spend the majority of their time reading and editing existing code, even if it is code that they wrote themselves.
Along the same lines, reusable code is also considered valuable. This doesn't mean that code is copied and pasted.. It means that code has been organized into a logical system, allowing specific tasks to be performed by the same piece of code (with maybe just a few differences each time, like a value in each calculation).
We value simplicity. We should be able to make single changes to a program by editing code in one place, or by editing a specific module of code.
We value brevity.
Smelly code is hard to read, hard to reuse, hard to maintain, and is fragile. Small changes may cause things to break, and there is little value in the code beyond its one time use.
Code that simply "works" isn't very difficult to write. Many of us were writing simple programs as teenagers. On the other hand, a good software developer will create code that is readable, maintainable, reusable, robust, and potentially long-lived.
for index 0 to len-1
style looping over a list in languages where iterators exist.
for item in collection:
person ;) - tzot
Often makes program behavior unclear and complicated, can cause unpredicted side effects.
Consider returning new value rather than modify existing one i.e. instead of:
void PopulateList(ref List<Foo> foos);
use
List<Foo> GetListValues();
Example:
List<Foo> Foos; // returns List<T> to provide access to List.Count property
Often this leads to misuse of data structures and unwanted data modifications.
Consider providing as much data as needed.
IEnumerable<Foo> Foos; // Returns iterable collections of Foos.
int FooCount; // Returns number of Foo objects.
Excessive use of code outlining
It's the new big. People use code outlining to hide their behemoth classes or functions. If you need any outlining at all to read your code, it should be a warning sign.
Consider the following:
Magic Strings
I've seen Magic numbers mentioned here more than once and I wholeheartedly agree. One often overlooked smell I haven't seen mentioned is Magic Strings.
Example:
public string GetState()
{
return "TN";
}
Solution:
public string GetState()
{
return States.TN;
}
Always be weary of code that is working with hard coded string data just as you would with hard coded numeric data. Consider making the string a constant variable with a comment explaining what it represents as a minimum. There is a really good chance that this should actually be in an Enumeration.
Side note: It's not technically a smell but it greatly annoys me to see something like this as well:
if(0 == foo)
{
DoSomething();
}
Overuse of casting
Lots of casting between (potentially) unrelated types makes it difficult to tell what type variable pointed to actually is, and is usually a sign that someone is trying to be too clever with memory usage. Use a union if you really want to do this.
Interface overuse. Any time anyone thinks that interfaces are the answer to all their problems and decides to use the same pattern everywhere, there is something wrong at a low level.
Time to sit these people down and make them understand that they can code simple, clear classes without interfaces too.
Comments often make code smell. Steve McConnell said it before me, but
Summary: strive to move information from comments into code!
Smell: Operator overloaded to perform a non-intuitive function. e.g. operator^ used to convert a string to uppercase.
Problem: Very likely clients will use it incorrectly.
Solution: Convert to a function with a sensible name.
When a method contains out parameters, especially when there are more than one out parameter, consider returning a class instead.
// Uses a bool to signal success, and also returns width and height.
bool GetCoordinates( MyObject element, out int width, out int height )
Replace with a single return parameter, or perhaps a single out parameter.
bool GetCoordinates( MyObject element, out Rectangle coordinates )
Alternatively you could return a null reference. Bonus points if the class implements the Null Object pattern [1]. This allows you to get rid of the boolean as the class itself can signal a valid state.
Rectangle GetCoordinates( MyObject element )
Further, if it makes sense, have a specialised class for the return value. While not always applicable, if the return value is not a simple true/false for success then it may be more appropriate to return a composite of the returned object plus state. It makes caller's code easier to read and maintain.
class ReturnedCoordinates
{
Rectangle Result { get; set; }
CoordinateType CoordinateType { get; set; }
GetCoordinateState SuccessState { get; set; }
}
ReturnedCoordinates GetCoordinates( MyObject element )
Admittedly overuse of this can lead to further bad smells.
Note that [out] parameters are still useful, especially in the following Tester- [2] Doer [3] pattern.
// In the Int32 class.
bool TryParse(string toParse, out int result)
this is far more efficient than
// BAD CODE - don't do this.
int value = 0;
try
{
value = int.Parse(toParse);
}
catch {}
when you expect the input string toParse
is probably not valid.
In many cases, the presence of any [out] parameters indicates a bad smell. Out parameters make the code harder to read, understand, and maintain.
[1] http://exciton.cs.rice.edu/javaresources/DesignPatterns/NullPattern.htmConditional spree
Complex behaviour implemented by intricated if...then...elseif... blocks or long switch...case copy/pasted all over the class(es).
Suggested refactoring: Replace Conditional with Polymorphism [1].
NOTE: overusing this strategy is also "code smell".
NOTE2: I love the Kent Beck quotation from one of his books on Extreme Programming.
If it smells change it (Grandma Beck on childrens rearing).
(or something like that, I don't have the book handy right now).
EDIT: For a comprehensive list have you considered this post on Coding Horror [2]?
[1] http://www.refactoring.com/catalog/replaceConditionalWithPolymorphism.htmlThe first wiki ever (http://c2.com) has a lot of stuff about refactorings and code smells.
http://c2.com/cgi/wiki?CodeSmell
This is my main source of information about refactorings.
Side-effecting getters:
public Object getMyObject() {
doSomethingThatModifiesSomething();
return myObject;
}
Negated booleans.
Wrong:
Dim NotReady as Boolean
...
If Not NotReady Then
Right:
Dim Ready as Boolean
...
If Ready then
IsUnready
- jpbochi
Common error of novice programmers. Instead of handling I/O exceptions, they check file for existence.
Forget about File.Exists-like methods unless you use files as markers\locking objects. Always handle file I/O errors when trying to read/write some meaningful data.
In C#, invoking GC.Collect repeatedly, for any reason. This is a serious pet-peeve for me, and is almost always indicative of "Let the garbage collector handle it!" syndrome.
Yes, in C# we have a garbage-collector. No, it doesn't make up for messy allocation/deallocation logic.
To correct: disable the GC.Collect code, use perfmon, CLR memory counters, and find those leaks. Make sure that dispose is implemented properly, and called properly.
Use "using" statements whenever possible. Refactor if necessary to make "using" statements work for you.
They call it a self-tuning garbage collector for a reason, trust it to do it's job properly.
Inconsistent enum members.
I know 2 variations of this problem:
enum members that actually belong to different domains (good example is .NET BindingFlags).
Tricky enum members:
enum MathOp
{
Plus,
Minus,
Empty
}
This often happens when enum values used in UI.
Cure:
Exception handling blocks which say only:
/* We are SO SCREWED! */
Trying to be more precise in an error case by parsing the error message.
I often saw something like
try {
File f = new File("anyfile.txt");
if(file.isDirectory()) {
throw new IOException("File is a directory!");
}
file.open();
}
catch(IOException ex) {
if(ex.getMessage().indexOf("File is a directory")>=0) {
System.out.println("The file is a directory!");
}
else if(ex.getMessage().indexOf("File does not exist")>=0) {
System.out.println("The file does not exist!");
}
}
The strange thing is, if you change the error message, the behavior of the code will change ;-)
How to avoid:
Split the code-block and react to the errors individually. More to type but definitely worth it.
Smell: Long lines of code
My definition of a long line of code (because I'm a .NET developer), is any line of code that requires a horizontal scroll bar to be viewed in the editor in Visual Studio (without collapsing the toolbox or the Solution Explorer pane). The developer should visualise the poor programmer working at a low resolution, with a seemingly never ending horizontal scroll bar.
Example:
Dim cn As New SqlConnection(ConfigurationManager.ConnectionStrings("DatabaseConnectionString").ConnectionString)
Solution: Use New Lines
Break up your code into appropriately sized pieces, not bite sized, not generous, but just right.
Example:
Dim cn As New SqlConnection( _
ConfigurationManager.ConnectionStrings("DatabaseConnectionString") _
.ConnectionString)
How about code without indentation. I seen a friend of mine with a master's degree write software without indentation and using variables like x, x2 and y. He actually applied to a position at Microsoft and sent them a code sample like this. How fast do you think they tossed it in the garbage???
Code is for humans to read.
Please indent.
What would you do if you received un-indented code as a part of an interview?
Printing "Your password may (only|not) contain the characters...". Ever. Ditto for comments and most other non-key text fields. The existence of such restrictions is a sure sign that the data is being mishandled somewhere, unless the code is just preemptively being a pain in the user's ass.
To fix:
Unnamed boolean parameters to functions, especially when there is more than one. Here is such a function declaration (in a C-like pseudo language):
cookBreakfast(boolean withEggs, boolean withToast, boolean withJuiceNotMilk);
The function call is incomprehensible:
cookBreakfast(true, false, true);
Solution: use enums or named parameters instead. How this is done will be language dependent.
cookBreakfast(eEggsYes, eToastNo, eJuice);
or
cookBreakfast( withEggs => true, withToast => false, withJuiceNotMilk => true);
or
BreakfastOrder bo;
bo.withEggs = true; bo.withToast = false; bo.withJuiceNotMilk = true;
cookBreakast(bo);
Reassigning parameters in the method body is a bad smell. It's a source of confusion and can be a source of errors when the code is edited later.
Was the programmer trying to alter the caller's reference, or were they just lazy and unimaginative? Was it a mistake?
void Foo( MyClass x )
{
if( x.SomeProperty ) ....
// ...
if( someCondition ) { // yuck!
x = new MyClass(); // reassign's local reference x, parameter x is lost
}
// ...
DoSomething(x); // which x should this be?
}
To fix, create a new variable, or consider refactoring such that reassignment is not necessary.
Useless logging....
try {
}
catch (Exception e)
log.error(e.printStackTrace());
}
Instead try to think about what sort of error might occur, and put something useful in the logs. This could be something like, "Properties File Not Found" or "Unable To Connect To Database"
Try to catch specific errors, rather than a general exception so that when it fails, the program you wrote won't be immediately blamed. For example, if there is a connection error, put it in plain english, "Database Connection Error".
Better yet....handle the error in the code without necessarily making it to the catch block.
Assumptive code.
Code that assumes something has happened before it, or assumes that a value has been set. I know some compilers like to tell you about it but I have seen this very widespread in PHP in particular. An example.
if ( $foo == 'bar' ) {
$bar = true;
}
if ( $bar ) {
// code...
}
This becomes a huge problem when poor structure doesn't create objects. Then later code starts using the objects, or worse someone directly sets values into an object that doesn't exist and PHP helpfully creates a standard object, with the value in it. So later checks for is_object return true.
Solution.
If you are going to start using an object make sure that it actually exists.
$object->foo='bar';
Will create an object but it won't be the object that you think it is. Accessors are there for a reason. Use them when ever possible. This also removes the problem of assuming something is there to use as the script will error out and then it has to be fixed.
A common thing I see with new programmers is having 20 includes at the top of a header file. I found that the developer is trying to do too much in one class/file (depending on language) or they are calling everything in an assembly to simply use one object/method/whatever in it.
Another one that has raised its head lately. Three state booleans. Yep you heard me right, three state booleans. Database fields set to be a boolean but allow null. So you can get true, false and null from them. Then code that checks not only for value but also type. Since 3 state booleans don't exist this causes some major headaches.
I hate it when I see code like this:
if (a==b){
//Do This
if (c==d) {
//Do That
}
else
{
//Something Else
}
}
else
{
//Do This Again
if (c==d) {
//Do That Other Thing
}
else
{
//Something Else Again
}
}
especially if most of the code is common between the cases.
it looks much nicer if we convert the code to separate functions instead of copy paste, and write something simpler like:
dothis();
if (c==d)
if (a==b) dothat() else dothatotherthing();
else
dosomethingelse();
All we need to do is analyze the logic, and simplify the code. encapsulating code in small functions with descriptive names is always a good thing to do.
Lot of static methods
This means that those methods do not belong to the class in which they are. So move them to a relevant class.
Overuse of meaningless variable names
If you see a function that has lots of variables like "a, b, c, x, y, z", it can indicate that the function's logic was poorly thought out. Clear variable names tend to show that the author has a more clear understanding of the operation of the function, and also assist the reader's understanding. It should be refactored by renaming the variables to something more descriptive.
Methods that are exactly the same except for one parameter.
I recently picked up an applications to review that had 20 methods, every pair of methods were exactly the same except they were processing two different types of data...
I refactored this into a base class with the majority of the functionality and two child classess that only overrode the processing that was different between the two types of data.
Much easier to understand and if a change to the way things were processed was required I usually only had to make the change in one place in the base class.
C# (perhaps smelly in Java too): Use of collections of type object
To me this smells really funny and indicates that the purpose of a collection may have not been thought out very well. As far as I know, these should only crop up in implementations of something like a completely generic property bag paired with some helper method that performs an attempt to cast and retrieve.
Otherwise this usually indicates the objects going into the collection should implement a common interface which in turn would be the type of the collection elements.
Global variables
Normally most developers with even a thimble worth of knowledge stay away from them. But somewhere down the line, in some year, somebody inevitably adds one to short circuit some logic or just get a legacy system to work.. further down the line this causes issues in multithreaded systems which are caught much much later and almost too easily escape regression tests.
Lack of abstraction.
Description: the code is written such that everything happens on the same level. There is no separation between different concerns and no splitting into parts.
Key indicators: you suddenly find presentation code in the business layer, or you find business code in the data access layer. The line count of a feature is much too big for what it does. The code space looks 'flat' and you don't find yourself having to look up and down the chain of abstraction.
Fixing: refactoring is key as always. Define your abstraction layers; for instance data access, business logic and presentation. Then slowly begin pushing code into the right layer when you find it. Suddenly other code smells will show up in each abstraction layer (code duplication is common) making it possible to further simplify the code. It is very much possible to refactor such code into elegance.
Code Smell: A giant chain of classes descending from each other, and only the very last one is ever used.
Solution: Usually this says the design is waaay over-engineered. Usually the descendants can be rolled up into a simpler parent-child relationship.
Comments used to mark out unrelated or loosely related sections of code. Usually means that a file is trying to do too much and should be broken apart into separate files/classes.
//########### Code to do foo ###########
// 500 lines of code...
//########### Code to do bar ###########
// another 500 lines of unrelated code...
//########### Code to do baz ###########
// ...
Putting too much meaning into boolean parameters. Eg, the method that starts with:
public void Foo(bool isMonday)
{
int hoursToCheck = 24;
bool ignoreHeader = false;
string skipLinesContaining = "";
if (isMonday)
{
hoursToCheck = 12;
ignoreHeader = true;
skipLinesContaining = "USD";
}
...
}
The isMonday parameter is loaded with too much meaning, and the three implied parameters should be passed on their own.
The same smell manifests itself in enums and configuration settings as well. Be on the lookout for boolean-like parameters that have vague names that could imply many assumptions.
isEasterSunday
? - CMR
Voodoo code
Code that repeats a call do something more than necessary, just in case.
Or, my favorite example: putting try/catch blocks around code that can't possibly throw an exception. Again, just in case.
Smell: Database columns with names like value1, value2 ... valueN
Problem: Modleing a many-to-many relationship without a marriage table.
Solution: Create a marraige table to normalize the data model.
Many Helper Classes
Prolific use of helper type classes. I'm defining a helper class as one that contains a bunch of related methods that do some common task. They are typically used when you find yourself repeating the same type of code over and over again.
I believe that this either points to lazy coders not thinking about the best place to put the code in the existing API or a failure of the API itself not providing decent default behavior.
Too many classes, too many objects, too many event-like actions
Not every noun should be a class, and not every verb should be a method. If an object doesn't exist to capture and retain user input, be sure you really really need it.
Examples:
If you are given an array of x,y points, and you want to produce a graph, chances are all you need is a routine to draw it, not build it out of point and line objects.
If you have a tree structure, and there is a property of the tree that depends on properties of its subtrees, chances are all you need are functions that sweep through the tree, not a whole lot of event-handling to propogate change events from children upward.
Register and Unregistering anything smells because those are events whose purpose usually is to incrementally maintain correspondence between things. The rationale is usually to save cycles. Machines are incredibly fast these days. You could possibly just rebuild the structure in less time than you would ever notice, and it would take a whole lot less code and bugs. Another way is to get good at Diff-type algorithms to maintain correspondences.
Back pointers smell. Usually the reason is to save cycles. Then you need unit tests to try to prove that they never get inconsistent. If you design the data structure so there's almost no way you can change it to something inconsistent, then there's almost nothing to unit test.
Everybody loves object-oriented programming, right? But don't let that mean the more objects the better! Let it mean the fewer objects the better. Just because event- or message-based programming is a good way to handle certain problems, don't let that mean it's the ideal paradigm for everything. One may think they're saving cycles, but every cycle saved costs a 100 in managing complexity. Usually this turns into a creaky monstrosity that is never quite right because of messages being forgotten or duplicated or happening at the wrong times.
Comment written on a proposal of mine by my boss, a long time ago: KISS. "What's that mean?" I asked. "Keep It Simple, Stupid!" was the reply. Been living by that ever since.
Wow, most bad smells gone already. Ok in C++ how about delete this
, whereby an object commits ritual hari kari without letting anyone else know. I've seen this used variously for watcher and status monitoring routines, and it often smells pretty bad, notably when there are references to the same object elsewhere in the program that aren't informed of the objects demise.
The way I usually refactor this is to make the object pointer a member of another object with broader scope, e.g. the application object.
Input variables that are then modified within a routine. If you ever need to revert back to what was passed in, it has already been changed. I always set an input variable to some form of working variable. That way, you can always reference the original value and it keeps the code clean.
Catch blocks that simply do exception.printStackTrace()
Exceptions should be handled properly, not simply printed.
This applies to product-level code... I'd be more lax on this rule if it's for an internal tool.
Not checking for null arguments on publicly visible methods.
Explanation
Any method which is publicly visible assumes that all its arguments have a value.
Solution
Check for null arguments and either deal with them if it's possible or just throw an ArgumentNull exception.
Pattern Duplication
Not just copy/paste of lines but similarity in the methodology. For example, always setting up a transaction, calling arbitrary methods on objects, returning a list of objects, tidying up, etc. This could be refactored into a base class
Catching un-checked exceptions - i.e. NullPointerException
In contrast to the aforementioned "megaclasses", the opposite (nullclass?) are also smelly: classes that have absolutely no responsibility and are simply there for enterprisey generica. Same goes for methods that call the next method with the same arguments, which then call the next method with the same arguments, etc.
The solution, as with the megaclass, is to properly define proper responsibilities for each class, and purpose for each method. Try to flatten the class hierarchy as much as possible, adding complexity and abstractions only when absolutely necessary.
General error handling
In languages where exception handling is possible, typical bug is to have all exceptions caught. This means the developer is not aware what kind of errors could occur.
For example in PL/SQL 'EXCEPTION WHEN OTHERS' smells.
Too many dimensions for arrays, like:
double *********array = NULL;
That's definitely a call for better data handling.
Nicely disguised question. Voting smells up and down :)
Isn't this duplication of the refactoring book? Since all of this is already available in a nice place called http://refactoring.com/catalog/index.html with examples to boot..
Configurable constants included in the code
Suggest you read Refactoring: Improving the Design of Existing Code by Martin Fowler, Kent Beck, John Brant, and William Opdyke (Hardcover - Jul 8, 1999)
// This should never happen.
Catching an exception to just rethrow it, in the same form or another form.
It is very common (unfortunately) to meet this kind of code (Java) with 3 different ways to (not) deal with caught exceptions:
try {
...
} catch (ExceptionA e) {
throw e;
} catch (ExceptionB e) {
log exception
throw e;
} catch (ExceptionC e) {
throw new RuntimeExceptionC(e);
}
This code is plain useless: if you don't know what to do with an exception then you should let it pass through to the caller. Moreover, this bloats the code and hinders readability.
Note that in the third case (throw new RuntimeExceptionC(e);), we can argue in some specific cases where this might be useful (normally rare however).
Annother Double negative issue in C#
if (!!IsTrue) return value;
I have seen he double !
cause bugs in the past.
instead remove the !!
to avoid confusion
The Smell
When dependencies are directly instantiated within the dependent class, managing those dependencies rapidly becomes a maintenance nightmare. In addition, when dependency management is fully encapsulated within the dependent class, mocking those dependencies for unit testing is impossible, or at best extremely difficult (and requires things like full-trust reflection in .NET.)
Solution: Dependency Injection
The use of Dependency Injection (DI) [1] can be used to solve most dependency management issues. Rather than directly instantiating dependencies within the dependent class, dependencies are created externally and passed into the dependent class via a constructor, public setter properties, or methods. This greatly improves dependency management, allowing more flexible dependencies to be provides to a single class, improving code reuse. This allows proper unit testing by allowing mocks to be injected.
Better Solution: Inversion of Control Container
A better solution than simply using dependency injection is to make use of an Inversion of Control (IoC) [2] container [3]. Inversion of Control makes use of classes that support DI, in combination with extern dependency configuration, to provide a very flexible approach to wiring up complex object graphs. With an IoC container, a developer is able to create objects with complex hierarchical dependency requirements without needing to repeatedly and manually create all of the dependencies by hand. As IoC containers usually use external configuration to define dependency graphs, alternative dependencies may be provided as part of configuration and deployment, without the need to recompile code.
[1] http://en.wikipedia.org/wiki/Dependency%5FinjectionI'm thinking of this one I saw when trying to change my password somewhere:
NOTE: Using a colon (“:”) in your password can create problems when logging in to Banner Self Service. If your password includes a colon, please change it using the PWManager link below.
Solution: Don't be lazy. Sanitize user input properly.
Smell: query in a program that is either "SELECT *" or an insert without column names.
Problem: if the structure of the table changes, the code will break.
Solution: be explicit about what columns are being selected or inserted.
Allowing access to objects you own
Smell: long chains of .GetX().GetY().
Problem:
If you allow users to get access to the objects used by your class, either by making them public or via a get method then your just asking for trouble, someone could come along and do:
A a;
a.GetB().GetC().GetD().SetSize(43);
2 things are wrong with this.
Joe Bloggs can come along and suddenly change the size of D, in fact he can do almost anything he wants with D. This won't be a problem if you've taken that into consideration whilst writing A, but how many people check that kind of thing?
The users of your class can see and have access to how its implemented. If you want to change something, say implement C so that it uses a Q instead of D, then you'll break everyone's code that uses the class.
Solution: The fix depends on how your class will be used, but in both cases the first step is to remove the GetX().
If a user really does need to be able to call SetSize(43) then you should write a wrapper function in each of the classes that passes the new value down. Then if you choose to implement C so that it uses a Q instead of D then no one apart from C will have to know about it.
A a;
a->SetSize(43);
class A
{
SetSize(int size){b.SetSize(size);}
};
etc.
If the user of the class shouldn't need to call SetSize then just don't implement a wrapper for it.
If you find that most of D's functions need to be pulled up to A then this may indicate that your design is starting to smell, see if there is a way to rewrite C and B so they don't rely directly on D.
Naming by Type
It is totally useless to name a variable, member or parameter by type. In former times this may had sense, but today a simple click on or hover over the name in the IDE shows the declaration.
Better to use the meaning of the value for the context.
Unfortunately some IDEs like Idea have the bad "functionality" of completing a name by writing the type. Never use this!
Smell:
File1:
def myvariable;
File2:
def my_variable;
File3:
def myVariable;
Problem: Spaghetti inclusions (if it's not just poor variable naming). Someone is sidestepping redefinition compiler errors (in a statically typed language) or data loss (in a dynamically typed language) by using different variables for the same purpose in each file.
Solution: clean up the inclusion mess, narrow the scope of each instance of the variable (preferably through some form of encapsulation) or both.
A ridiculous number of compiler warnings.
Even compilers dislike smelly code. I use Visual Studio with Resharper. Many of items posted in this thread would be warnings. i.e. unused variables, variables assigned a value that is never used, unused private methods, hiding inherited members etc.
Too many big blocks of code.
Variable naming inside classes
Variable names are good enough when understood within the scope of the class.
I shouldn't have to write Shader.ShaderTextureSamplers[i].SampleTexture.TextureWidth
, when Shader.Samplers[i].Texture.Width
is just as readable.
Unnecessary if .. else block
Code like this should be better avoided:
if(map.find(key) != map.end())
return true;
else
return false;
Replace it with just one statement:
return map.find(key) != map.end();
Got a couple lately that defy logic and boy do they smell.
php mysql calls that use $_POST and $_GET directly in the sql.
function calls in PHP function($value=false); then there is comments in the function saying that this never worked right and they didn't know why.
mysql calls that have "or die(mysql_error());" on the end.
No abstraction layer, each database call goes through the motions of creating the database connection.
I could just go on and on about this code, but I have a rm -rf to run.
Multi function functions
If a function is doing more than one thing. This has been mentioned implicitly in other answers but it should get a mention on its own.
Example:
// if function altList = null function returns a list of X ids
// if altList is set the function returns a list of objects of type Y
Function getData(searchStr, altList=null)
{
// common code
some code;
if(altList == null)
{
other code;
return array(int);
}
else
{
more other code;
return array(object y);
}
}
This really should be three functions:
Give your code a good bath using quality soap, then air out for a week, the smell will be gone.
loops with lots of exit points:
for (...)
{
if (... ) {
continue;
}
else if ( .. )
{
if (... ){
break;
}
}
}
Smell: In C++, an object is dynaimically created and assigned to a bare pointer.
Problem: These must be explicitly deleted in all paths out of the program, including exceptions.
Solution: Manage the object in a smart pointer object, including std::auto_ptr, or one from the Boost libraries.
Ignoring all the fundementals or any particular construct; if it feels wrong then you've just got a whiff of a smell. Why - well because you will need to use the code you've just made and that 'wrong' feeling will give you little confidence when using it. A second opinion may be required if you're having trouble refactoring.
Too short code fragments or structured to death
If I see a lot of short source files, object definitions and methods with only one row real content, I feel this project was structured to death. More brackets then code lines is the first sign.
Dare to write complete (but small !) code snippets/methods/etc. without feeling the pressure to stamp out into unreadable particles.
Defensive coding
Code where there are a lot of null checks is a smell. If this is the case then there is some issue with your design.
Use of specific class instead of interface when declaring variable. Smells especially bad in combination with Java's collection framework.
Instead of
ArrayList list = new ArrayList();
use
List list = new ArrayList();
When Map is used to hold unique elements. E.g.
Map map = new HashMap();
map.put(name, name);
Use Set instead:
Set set = new HashSet();
set.add(name);
Use of Vector and Hashtable in Java. This usually means that programmer knows nothing about collection framework and newer versions. Use List and Map interfaces instead and any implementation you like.
"Orphaned" functions that are all put into a file without any organization/order. For example, an 8000 line ASP file of 100 functions that look like spaghetti code or the beginnings of it. There may be more than one smell to this, but when I come across it, there is some pain in having to maintain legacy applications that have this "feature" ;)
The fix for this is to create some classes that group the functions and determine which are useful functions that go into a class and which should be refactored into something else.
Usually happens when using boolean-like enums ( enum { Off, On } ).
Consider using if-else statements
Usually means that code author deserves to be beaten hard with baseball bat.
Don't even try to do it unless you have a very good reason. Did I mention baseball bat, no?
Protected Member Variables
The only place I have ever come across these is when I find the source of bug.
When these are used in code there is normally no distinction between a private member and a protected member. A developer could easily change its value thinking they can see all uses of it in the current class so it wouldn't cause a problem.
You can replace protected
with private
and add some protected accessors functions. Developers are used to calling base class functions so it would be easier to understand what is happening.
A comment containing the word TODO
{
//TODO clean this up a bit
if (klwgh || jkhdfgdf || ksdfjghdk || bit << 8 == askfhsdkl)
if (klusdhfg)
return 1 + blah;
}
In OO languages, private members exposed through getter and setter methods. I'm frightened how many times I've seen students write code with members that are supposedly private but can be read or written through public getter and setter methods. Such classes 'expose the rep' just as thoroughly as public members, but with a lot more cognitive overhead.
Remove the smell by figuring out what secret the module or class is supposed to hide and reducing the number of methods accordingly.
Huge methods/functions. This is always a sure sign of impending failure.
Huge methods should be refactored into smaller methods and functions, with more generic uses.
Potential Solution:
I've found that a really great way to avoid this is to make sure you are thinking about and writing unit tests as you go.
You find whenever a method/function gets too large, it becomes very difficult to work out how to unit test it well, and very clear on when and how to break it up.
This is also a good approach to refactoring an existing huge function.
Weird constructs designed to get around best practice
e.g.
do
{
...
if (...)
break;
...
} while (false);
That's still a goto, even in disguise.
The reek [1] project will help automatically check for code smells in Ruby.
[1] http://github.com/kevinrutherford/reek/Dynamically generated SQL smells. It has its limited uses, but in general makes me nauseous.
A very simple example below ...
Don't:
DECLARE @sql VARCHAR(MAX), @somefilter VARCHAR(100)
SET @sql = 'SELECT * FROM Table WHERE Column = ''' + @somefilter + ''''
EXEC(@sql)
Do:
SELECT * FROM Table WHERE Column = @somefilter
Lack of ASSERT()
New guys never seem to use them. I prefer to break my code as early as possible. At the very least, ASSERT() all input parameters (and some return values from functions which you call).
MY own assert is a macro which calls the standard #ASSERT() if I am testing (#ifdef TESTING) otherwise, in a production system it writes something to the system log file.
Switch cases that fall through
switch (value)
{
case 1:
case 2:
case 3: // some - usually very long - block of code
break;
case 4: ...etc
}
These are always worth look at closely, as they invite bugs. And they can generally be worked around by putting the common code into a function. Look for a break
after every case
;
Functions in C++ that take pointers as arguments and then check for NULL on the first line. Pass by reference or const-reference. Makes the contract so much clearer.
My list - http://computinglife.wordpress.com/2008/06/03/what-really-is-bad-code-levels-of-bad-ness/
Excerpts -
From a database perspective (SQL Server to be specific) More than two layers of nested ifs - very easy to have a logic error Dynamic SQl in a stored proc - make sure that you aren't open to sql-injection attacks Cursors - is there a set-based solution? More than 5 joins especially when no columns from some of the joined tables are in the result set (do we perhaps need to denormalize for performance?) Use of subqueries instead of derived tables Over use of user-defined functions use of Select *
everyone is pointing out specific things that bother them and that they consider a "code smell".
I think after a while, once you get the hang of it, you sort of develop a "sixth sense" about what is wrong with a piece of code. You may not be able to immediately pinpoint how to refactor it or make it cleaner, but you know something is wrong. This will drive you to find a better pattern/solution for it.
Understanding all the examples others have posted is a good start for developing this sense.
Any code that is repeated, or any instance when a variable is assigned more than once.
Both are appropriate in certain circumstances, and given the constraints of the environment, but still.
Rediculous comments:
catch
{
/*YES- THIS IS MEANT TO BE HERE! THINK ABOUT IT*/
}
Apologies if this was already mentioned, but I just looked through the answers and didn't see this mentioned.
Code Coupling can make maintenance a nightmare. If you have display code directly tied into with other logic, making anything but routine maintenance will be all but impossible.
When you are designing your display code, think long and hard about your design and try to keep the design part separate from the rest of your app.
Smell: looping through an ordered resultset using a cursor in SQL (or walking an ordered recordset in middleware, etc) aggregating values, setting values based on other values, etc.
Possible problem: the coder hasn't looked for a set based solution. Put another way, they haven't yet had the epiphany that SQL is a declarative language and a SQL DML statement is more like a spec than a piece of code e.g. when I write...
UPDATE MyTable
SET my_column = 100
WHERE my_column > 100;
...I'm telling the DBMS what to do rather than how to do it.
A large number of Ifs or Cases usually begs for creation of new classes that extend some base class.
When you are doing the same thing (manipulating an object, doing similar SQL calls, processing data, changing a control) more than once in more than one place it's time to refactor. That gives way to smelly redundant code.
Second-guessing assertions.
Ran across this recently:
assert(vector.size() == 1);
for(int i = 0; i < vector.size(); ++i) {
do_something(vector[i]);
}
If you're asserting that there's only one item in the vector, you don't need the loop:
assert(vector.size() == 1);
do_something(vector.front());
I don't want to go into lots of boring detail; there was a good reason for having the vector for other cases, but in this branch of the code it should have always had size 1.
Obviously it's not a hard and fast rule, but to me it increases the complexity of the code (introducing a loop, another level of indentation) when you're saying that you don't ever expect to need it.
A quick search suggests that this post is the first identification of the code smell "Intelliscents":
Extravagantly roundabout code bespeaking the typist's lack of familiarity with the classes and established idioms of some .net namespace, and his/her reliance on Intellisense to solve a problem at hand.
And my code is redolent with (of?) it.
Structure or class with a dozen or more member fields. There is probably not a coherent abstraction here. To remove the smell, break the structure or class into pieces. A good source of ideas is Raymie Stata's dissertation. [1].
[1] http://publications.csail.mit.edu/lcs/pubs/pdf/MIT-LCS-TR-711.pdfeval('some PHP/JS/Perl code')
Common to most scripting languages, and useful occasionally. But nearly every case i've seen in the real world was an ugly, bug-ridden example of arbitrary code injection just waiting to happen. If you see code that's using it, it's almost certainly doing something it shouldn't need to be doing. (The sole exception that springs to mind is dynamically generated classes -- and even then, i'd at least look at other options first.)
The fix:
Don't execute strings.
The code inside your eval
ed string can almost always be converted into real, executable code. If it can't, then question your intentions -- executing code from a database (for example) is almost always a horrible idea.
Lifecycle methods in classes
class Life {
private boolean initialized = false;
public void init() {
try {
// ...
initialized = true;
} catch (XXXException e) {
// ...
}
}
public void doSomething() {
if ( !initialized ) {
throw XXLException(...);
// instead call init() and continue
}
// ...
}
}
smell: Unnecessary recursion
why: You've guessed it --> risk of stack overflow (had to get that one in)
solutions:
1) If you can, rewrite the recursive routine as an iterative routine, for example using divide and conquer techniques.
2) If not, examine the stack frame usage and try to minimise, for example by changing from breadth first to depth first analysis on a tree.
Here's a somewhat obvious one, but I'd had to put up w/ it a few times...
You check out code from source control.... and it doesn't compile. Drives me crazy.
Sample:
enum SwitchState { On, Off }
if (x == SwitchSteate.On) ...
Consider using boolean variables instead of enums in such cases.
bool isTurnedOn = ...
if (isTurnedOn) ...
You see a function that always returns the same thing.
There's lots of classes implementing those methods just for other parts of the code to get the right value. Sometimes even constructors are being defined just to call the base/super/inherited constructor to pass on those constant parameters.
Change the method to return a value from a private field initialized in constructor and use the Factory pattern to construct the objects (for example using DI as a mega Factory pattern implementation in this case). This makes the inheritance structure in most cases obsolete.
Break and Continue are the same as GoTo
One should be able to look at the head or tail of a loop to immediately be able to tell under what conditions it terminates.
What to do: Use descriptive (boolean) variables instead of a direct break/continue and test them in the appropriate place (head/tail) of the loop.
break
and continue
only allow you to jump to specific locations (either to the end of the loop or beginning of the loop) whereas goto
allows you to jump anywhere. They are clearly not the same. - gablin
If
statement without corresponding else
Not all if
statements need an else, but when one isn't present you should check if it really should be there. If one program state needs special handling, often the opposite state does too.
Example where an else
may be necessary:
if (teacher != null) {
addStudents(teacher, period, students);
}
// Else?
else {
// Why is `teacher` null? Is this an error state?
// Will the students be 'lost' for this period?
}
Solutions:
if
check to an assertIf no else is necessary, but this isn't immediately apparent, add an empty else block:
} else {
// Nothing to do here
}
else
condition having been ignored (same applies to case
statements). However, that said; when considering how often if
without else
is perfectly valid, I'm kind of in two-minds as to the feasibility of using it as a code smell. Either way, Guard Conditions should be excluded from this evaluation. - Craig Young