Stack OverflowBiggest performance improvement you've had with the smallest change?
[+156] [184] JoelFan
[2009-02-13 13:08:12]
[ optimization polls performance ]

What's the biggest performance improvement you've had with the smallest change? For example, I once improved the performance of a certain page on a high-profile web app by a factor of 10, just by moving "where customerID = ?" to a different place inside a complicated SQL statement (before my change it had been selecting all customers in a join, then later selecting out the desired customer).

(1) It is enlightening that a majority of the changes appear to be fixing database issues, query optimizers that don't, missing indexes etc - EvilTeach
(1) @EvilTeach: That's because usually you are dealing with alot of data, and fixing the databse issue results in a better complexity O(log n) vs O(n) through such a simple change. - WW.
(4) Is this question purely for the hell of it? What possible use could any answers be? - skaffman
(11) @skaffman: If you read about some smart fix here, and then later find yourself in a similar situation - there would be a use. - Evgeny
[+190] [2009-02-13 13:30:49] User

The most chat-addicted guy in the room took a day off.

(4) I find it objectionable that in many cases people upvote an answer just because it's funny - In this case, "funny but true". Therefore the more technical and insightful answers get buried. Having said that, +1 for "funny but true". - tsilb
(4) I've already tried a uservoice issue with "techincal rep" and general rep where people could distinguish what they are giving, thus allowing for funny answers as well as truly difficult questions. - Spence
@Spence: Hi there, Slashdot 2.0. :-) Let's not make the system more complicated. Over time, the more technically astute answers will get up-voted. I'm not sure I like how questions can't be upvoted after a period of time... - Chris Kaminski
Hang a sign outside your cube/office that says "NO LOITERING". It actually worked for me =) - StingyJack
(46) Isn't this a productivity improvement rather than a performance improvement, though? Or did the chat-addicted guy somehow slow down the application by his very presence? - Dan Tao
+ for funny but true lol! - LnDCobra
I won't work at such a place. - phaedrus
(3) if I had 100 reputation i'd vote this down as "funny but not really an answer". guess i better get busy here and make me some points huh? - msulis
[+137] [2009-02-13 13:13:12] Gerrie Schenck

In some old code I inherited from a coworker, I replaced string concatenations (+ operator) with StringBuilder (.NET). Execution time went from 10 minutes to 10 seconds.

(87) I seriously doubt that. - Bombe
(34) No really. It was a huge method with triple-nested for loops wich all appended strings to the main string. In the specific scenario there were thousands and thousands of strings to be concatenated. - Gerrie Schenck
(1) I had the same thing with a Base64 encoding/decoding class I did a long time ago in Java 1.0.2. Changed from String to StringBuilder was huge. - Moose
(36) This is real. With a lot of loops that use the + operator on strings, this will happen very, very easily. I had a program that looped a few thousand times fall from a couple of hours to 15 minutes once by doing this. - Chris
(7) I had the same experience. We had a process drop from ~30min to 5min after making a change like this. - Kevin Tighe
(2) Yep. It adds up. - Adam Jaskiewicz
(17) And I thought that "StringBuilder" was the very first performance thing ever, and everybody knew about it. - Anthony
I'll take note of this... - Aaron
(1) I did that too, maybe 8 years before, in a software that manipulated a lots of XML. All XML -> String methods of this program used the basic concatenation (ie +). Moving to StringBuffer was a huge performance improvement... - romaintaz
(2) I've had a similar experience. With enough data, it's very possible to go from minutes to seconds. - Richard Hein
I had a similar experience in Classic ASP, but that was changing string concatenation to Response.Write. It makes a huge difference when the size of the string gets to be over a megabyte. 15 minutes -> 30 seconds. - aehiilrs
(3) I did this, also. Due to the ridiculousness of the original implementation, went from "Did Not Finish" (OOM exception) to about 1 minute. My second pass involved turning the StringBuilder into a filestream (it was ultimately being written to a file) that brought the time required down to a couple seconds. - Greg D
(1) If I'm not wrong: Bless javac that replaces looped string concatenations with StringBuilder at compile time. I think Java developers that don't know about the StringBuilder can remain oblivious to it. - Cecil Has a Name
This is the entire reason StringBuilder exists - the performance issues that can arise from liberally using string concatenation in a tight loop where known way back in the early 32bit Delphi days. Almost any old school Delphi developer has implimented one of these at some point. .NET then came along and provided one right out of the box. - David
(5) It is a very common problem. Joel Spolsky gave it a name: Shlemiel the painter's algorithm. See - Peter Mortensen
(2) @Cecil: you are both right and wrong. javac replaces concatenation with a StringBuilder, but only within the same expression, so it does not eliminate this kind of problem. - Michael Borgwardt
Happens in PHP, too. I re-wrote an email attachment parsing routine to throw around indices instead of string fragments and the speedup was about two orders of magnitude. - staticsan
@Cecil: Nice application of Curry's Paradox! - nikie
here's a little benchmark i did for stringbuilder vs concatenation - John Boker
The "break even" point for switching to StringBuilder as opposed to a simple concatenation is somewhere around 8 to 12 concatenations, I believe. With more, you're better off with StringBuilder; with fewer you should use concatenate. - tylerl
Remember that going from 10 minutes to 10 seconds sounds like a phenomenally exaggerated improvement, but really it's only a 60x speedup (less than two orders of magnitude even!) I've managed similar levels of improvement when changing a particularly dire naiive implementation to a much well thought-out one. - Coxy
[+115] [2009-02-13 15:43:28] TM.

Changing a lot of logging to check log levels first.

From this:

log.debug("some" + big + "string of" + stuff.toString());

To this:

if (log.isDebugEnabled()) {
    log.debug("some" + big + "string of" + stuff.toString());

Made a HUGE impact on production performance. Even though log.debug() only logs when debug logging is enabled anyway, the string is built BEFORE it is passed to log.debug() as a parameter, so there was loads and loads of string building that got completely eliminated in production.

Especially considering that some of our toString() methods produced about 10 lines worth of info, by calling toString() on fields, which call toString() on their fields... and so on.

thats a good tip, thank you. - Orentet
(5) Yeah, string-building and formatting looks like just a 1-liner, so how bad could it be? It's easy to overlook that it exercises a major chunk of the run time library. - Mike Dunlavey
(2) For cleanness wouldn't it be better to add the log.isDebugEnabled() to the log.debug method? - Dscoduc
(7) The point is that the argument still has to be created before the method can be called. Deciding inside the method not to do anything is too late as the parameter work has already been done. - jackrabbit
(2) I usually create a utility wrapper method that takes a format string and builds the output using String.Format(), similar to Console.WriteLine() in .NET or System.out.printf() in Java. This would avoid most of the cost (except when toString() and similar methods are used explicitly). - Hosam Aly
(3) @Dscoduc log.debug already does that check on whether debug is enabled. So yes, we are checking a boolean twice (unless it's false). As @jackrabbit said, the issue is that the parameters are evaluated BEFORE they are sent into the function, so the real "work" isn't avoided. - TM.
same applies to printf and (I suppose) every other language. In C++ I tend to replace "log.debug" with "#define logit if (logging) log.debug" - gbjbaanb
We use log4net and it specifically recommends this approach in the documentation - Richard Ev
(1) ... better still, use slf4j's technique... - alex
Do we really need to still give this tip ? Come on is everyone an amateur. - mP.
(1) Everyone has to learn somewhere. - GMan
(6) In Dotnet you have the Conditional Attribute which the compiler will use to remove the method completely at compile time - see - benPearce
(8) @mP - not everyone knows everything - benPearce
(1) I deal with this case by making the log function take a Func<String>. Delayed execution allows the debug check to be done in only one place. - Strilanc
(1) @benPearce that is a good idea, but does that allow you to alter the logging level at runtime? Changing log levels to figure out some weird issue can be very helpful in some cases. - TM.
@TM: "Changing a lot of logging" would have been "Changing a single line of code" if your project used Aspects. - Dave Jarvis
[+95] [2009-02-13 13:28:38] RoadWarrior

This is the same answer as I gave here [1]:

I was working at Enron UK on a power trading application that had a 2-minute start-up time. This slowness was really annoying the traders using the application, to the point where they were threatening dire retribution if the problem wasn’t fixed. So I decided to explore the issue by using a third-party profiler to look in detail at the start-up performance.

After constructing call graphs and mapping the most expensive procedures, I found a single statement that was occupying no less than 50% of the start-up time! The two grid controls that formed the core of the application’s GUI were referenced by code that marked every other grid column in bold. There was one statement inside a loop that changed the font to bold, and this statement was the culprit. Although the line of code only took milliseconds to run, it was executed over 50,000 times. The original developer had used small volumes of data and hadn’t bothered to check whether the routine was being called redundantly. Over time, as the volume of data grew, the start-up times became slower and slower.

After changing the code so that the grid columns were set to bold only once, the application’s start-up time dropped by nearly a minute and the day was saved. The moral here is that it’s very easy to spend a lot of time tuning the wrong part of your program. It’s better to get significant portions of your application to work correctly and then use a good profiler to look at where the real speed bumps are hiding. Finally, when your whole application is up and running correctly, use the profiler again to discover any remaining performance issues caused by your system integration.


(26) +1 for the profiler plug... Until you use such tools, you are only guessing at the potential bottlenecks. - DGM
(41) It must be comforting to know how much Enron benefited from your code improvement. ;) - Chris Lutz
Out of curiosity, what profiler did you use? - Kyralessa
@Kyralessa: It was the Compuware one - this was a while ago. Nowadays I use the Ants profiler. - RoadWarrior
@Chris: Enron Europe was quite profitable, especially after electricity de-regulation in the UK. But then the US cut all funding... - RoadWarrior
Had ones of those - worse it was n^2 with the number of rows. Somehow the toolkit rescanned the table from the top for each new formatting element added. - Martin Beckett
[+92] [2009-02-13 13:36:15] Johannes Weiß

Add an index on a field of a table used for a complex SQL query. You can sometimes easily improve the performance by 90% or so.

This article: is the useful when identifying indexes that need to be created - get Sql Server to tell you what indexes it needed, but couldn't use! - Paul Suart
I can't count the number of times I've used indexes to strip massive amounts of time off report queries... SQL going from tens of minutes to a few seconds... - Damovisa
(17) Or, not adding an index to a table that is constantly being written to. - Evan Plaice
[+74] [2009-02-13 13:10:14] Peter Štibraný

Enabling gzip compression for a dynamic web page. Uncompressed page had more than 100k ... compressed only about 15k. It felt so fast afterwards :-)

Just out of curiosity, what's the cpu impact of enabling gzip? I really have no idea. - TM.
(8) I never did any measurements myself, but I think that benefits outweigh costs in this case. Sure, gzip will eat some CPU, but will also need less bandwidth to transfer data. See also for similar discussion. - Peter Štibraný
I had something like this. We used to render a huge table (15x10x40 cells, or something like that) using Javascript. Back in the day, JS used to be sloooow. I suggested rendering the table server-side as HTML, use mod_gzip. It was much, much faster... - alex
(2) @alex, you rendered cubic table in HTML? Wow... :) - Constantin
(1) @Constantin: the table showed 15 days, with 10 or so columns for each day and 40 rows... :-p - alex
And you used HTML? - Sneakyness
I've seen this take a page from 6.5 megs down to 500kb. Holy $#%@. - rooskie
(2) I created an HTML Whitespace removing filter to use in addition to a GZIP filter - reduces so much in the generated HTML's file size! - MetroidFan2002
use DEFLATE instead.… - David Murdoch
[+67] [2009-02-13 13:13:54] Bob Moore

Turning off disk compression on a database server. Even accounting for the time taken to slap the sysadm, this was a huge net benefit :-)

(85) +1 for slapping the admin - Matthew Whited
(7) Thankfully Microsoft SQL Management Studio 2005 will actually refuse to let you mount a database on a compressed NTFS volume/folder last time I checked. - David
(1) @David: They may have compressed it later... Plus people tend to dismiss warnings without reading them. - tsilb
[+58] [2009-02-13 13:42:51] Remy Blank

A one-character change yielded an infinite speedup:

int done = 0;
    done = areWeDoneYet();

Guess what the change was...

(7) int done = 1; cute. - rtperson
(8) @rtperson: wrong. Look at the end of the while() line. - Graeme Perrow
semicolon is misplaced. - ryeguy
did you use a for loop? .. just kidding - community_owned
(28) good argument for why you should do while(x) { instead of newline before { - Joe Philllips
(6) @d03boy: I fail to see how having the open brace on the same line would avoid the bug. If anything, it would obscure it further by hiding it amongst more symbols. - rmeador
(2) I find with the "{" on the same line that I'm a LOT less likely to accidently type a ";". So, that advice really does work in practice. - Brian Knoblauch
I have to admit that my current coding style is while(x) { (it wasn't at the time) and I deliberately used the other convention here. - Remy Blank
(2) Too bad the compiler couldn't optimize away that empty but infinite loop. :) - Eddie
(8) @rmeador: True, it's no easier to fix when reading the code, but it's definitely easier to avoid in the first place. Your muscle memory is so used to typing semicolon-enter all the time. Typing semicolon-openbrace-enter would feel extremely awkward, you would notice right away. - Adam Bellaire
@Eddie: he he :-) - Al pacino
(2) It is trivial for the compiler to detect an empty statement attached to a loop/'if' followed by a block statement that is not attached to a loop/'if'. The compiler should have warned you. - Tim Matthews
I'm always extra suspicious of While loops... Maybe it's just my nature, but I tend to use For loops or replace them with "while (!(doSomething() && areWeDoneYet())) ; ". Both return bool; doSomething can just return true for no reason or true meaning success... I'm paranoid that way. - tsilb
You did remove the ! ? - Stephan Eggermont
just TODAY that happened to us! F*king semicolon! - ante.sabo
(2) That is not a performance problem, that's a bug. - Tim Büthe
Haha, I didn't see that one right away! - Jake Petroules
One more reason why I like Python - Evan Plaice
[+47] [2009-02-13 13:51:22] community_owned

Just recently I did a Project Euler problem. I used a Python list to look up already computed values. The program took maybe 25 to 30 minutes to run (I didn't measure it). The lookup has to iterate through all values until it finds a matching one in the list. Then I changed the list to a set which basically does a hash lookup. Now the program runs in 15 seconds. The change was simply to put set() around the list.

Moral: choose the right data structure!

(32) This is basically the moral of every Project Euler problem. - Eric
[+47] [2009-06-23 03:51:29] Dave Jarvis
void slow() {
  if( x % 16 ) {

void fast() {
  if( x & 15 ) {

Converting modulus of powers of two to an equivalent bitwise and operation moved a real-time MPEG-to-JPEG transcoder from producing B&W images to producing full colour JPEGs of a movie, with CPU cycles to spare.

Response to Optimization

To determine if a compiler performs an optimization, test it. People have said to me, "The compiler should optimize that." In theory, yes, it could. In practice, a compiler will only optimize code for scenarios that optimizing code has been written. Some optimizations are not as important than others.

Try It Yourself

For those who insist that the compiler should optimize this, just try it.

$ gcc --version
gcc (Ubuntu 4.3.3-5ubuntu4) 4.3.3

$ cat t.c
#include <stdio.h>

int main( int argc, char **argv ) {
  int i = 0;
  int j = 0;

  for( i = 0; i < 1000; i++ ) {
    for( j = 0; j < 100000; j++ ) {
      int q = j & 15;

      if( q ) {
        printf( "j X 15 = %d\n", q );

  return 0;

$ gcc -O3 t.c
$ time ./a.out  > /dev/null

real    0m6.750s
user    0m6.732s
sys     0m0.016s

$ cat t2.c
#include <stdio.h>

int main( int argc, char **argv ) {
  int i = 0;
  int j = 0;

  for( i = 0; i < 1000; i++ ) {
    for( j = 0; j < 100000; j++ ) {
      int q = j % 16;

      if( q ) {
        printf( "j X 16 = %d\n", q );

  return 0;

$ gcc -O3 t2.c
$ time ./a.out  > /dev/null

real    0m13.668s
user    0m13.633s
sys     0m0.040s

This usage of & instead of modulus for powers of two cannot be optimized by gcc. Read the comments for details.

See Also


(16) This sounds like something that the compiler should be handling - Graphics Noob
nice call - those hardware classes do sometimes come in handy :) - warren
(8) x % 16 and x & 15 are not the same thing when x is an int. They are the same when x is an unsigned int, and I'd guess you'll see GCC compile them to the same code. - Baffe Boyois
(1) You are correct, Baffe. The code I was working with originally had int variables, not unsigned int. The two different sources above result in nearly identical binaries when using unsigned int. - Dave Jarvis
[+37] [2009-02-13 13:11:28] community_owned

Turned off ODBC logging on a production database (someone had turned it on and forgotten it) - got about a 1000x performance improvement!

ODBC logging is the worst idea ever. If someone needs their server logs in a DB they can use a parser later. :( - Jon Tackabury
I agree, it should be set to run after the fact or on downtime in the middle of the night. - Sneakyness
[+37] [2009-02-13 13:11:33] Goran

Truncate table BigTable.

Queries returned no records but it was faaaaaast!

(3) +1 LOL-Point for funny answer - Zaagmans
(8) Wasn't funny at the time :) - Goran
(2) Had a similar experience with an intern who used LIMIT 1 to achieve that result. - community_owned
Way faster than doing a delete. - Kibbee
(9) God, I thought you truncated Google's BigTable :P - Ionuț G. Stan
i tried tis on my servr and my apps stopped wroking. Plzhelpkthx!!! - Joey Adams
enter "rollback" and pray! - Goran
[+26] [2009-02-13 13:15:04] CraigTP

When maintaining someone else's code, I encountered a stored procedure that was taking approximately 4-5 seconds to run and producing a result with only a few rows. After examining the query in the stored procedure and the table that the query was running against, there was a distinct lack of indexes on the table. Adding just a single index improved that stored procedure from 4-5 seconds to about 0.2 seconds! Since this query was being run many times, it was a big improvement overall!

If an index is added, then the write performance might suffer. Did you measure the write performance? - portoalet
In typical usage, this application had about 1 write for every 1000 reads, so the overall performance of the application was drastically increased. - CraigTP
[+23] [2009-02-13 13:47:03] rtperson

I was writing a Java MergeSort, just to experiment and see how much of my old Data Structures course I could still put into practice. My first time around I implemented my merge routine with ArrayLists, and set it to sort all the words in War and Peace. It took five minutes.

The second time I changed from using the Collection classes to simple arrays. Suddenly the time to sort over 500K words dropped to less than two seconds.

This hammered home to me just how expensive object instantiation can be, especially when you're creating a lot of objects. Now when I'm troubleshooting for performance, one of the first things I check for is whether objects are being instantiated within a loop. It's much cheaper to reinitialize an existing object than it is to create a new one.

Current Java implementations are much better at object instantiation. - Eddie
With 1.5+, is it still better to re-initialize? - cdmckay
Peter Norvig elaborated on this for Java collections once. See - quark
(3) @Eddie, I was using Java 1.6. I'm sure instantiation has gotten faster, but there is still a significant amount of overhead involved in object creation. - rtperson
(1) There were two larger lessons to be learned there too: you're unlikely to beat the Java standard library implement on your pass at solving any problem it already covers, and you should never presume you can improve how something performs without profiling it first. - Greg Smith
[+20] [2009-02-13 13:38:23] Robert Gould

Removing some rogue sleep()'s in some Java code.

WTF?? Why were there sleeps there?! - JoelFan
(10) Probably to "fix" race conditions. I enjoyed cleaning up after another developer after they stacked sync locks. They used upwards of 5 locks nested in the same method instead of fixing the real problem. In trying to create a singleton (which really should of been an instance but I won't even get into that) he didn't declare the lock object as static. This caused every call to get it's own lock and the nested locked just slowed the code down enough to stop the 3 two 5 requests from having problems. - Matthew Whited
@Matt: +1 for misuse of Singletons. - tsilb
@Matthew Whited: grammar check - should have been, "Should have been an instance...", and not "should of been". That doesn't even make sense and I am amazed how many people get it wrong. - iamserious
[+19] [2009-02-13 15:37:03] Jacob Adams

When updating WinForms controls realtime, simply doing something like

if (newValue != txtValue.Text)
   txtValue.Text = newValue;

instead of always doing

txtValue.Text = newValue;

took the CPU utilization from 40% down to almost nothing.

+1 I had this problem with a TreeView updating routine which took an age to run! - mdresser
I have tested this myself and found that doing the initial check is just as expensive in time as setting the property - benPearce
(1) @benPearce Is it an ordinary property or or something that causes the UI to repaint itself? The expensive part of this is that the UI keeps repainting itself even though it's that same data. - Jacob Adams
@Jacbon: SuspendLayout() will stop it from painting. ResumeLayout() will cause it to start again. On some controls, however, this doesn't help much. Telerik controls were horrid in certain WinForm circumstances like this. 45+ seconds to load a form when with normal .NET controls it loaded in 5 seconds. - Nazadus
(11) TextBox already does it: if (value != base.Text) { base.Text = value; ... } - Ian Boyd
@Nazadus, SuspendLayout and ResumeLayout only help if you are batching multiple control updates. If each control is being updated real-time, these calls would just add uneeded overhead. - Jacob Adams
@Ianboyd, interesting find. It looks like Control also has this. - Jacob Adams
[+19] [2009-02-13 21:03:22] community_owned

Removed a html tag from a web application, gained 100% performance increase.

At some point I noticed that requests were duplicated. It took me some time to figure out it was caused by an empty image tag lost in sh*tload of HTML;

<img src="" />

For obvious reasons, Django's template system don't throw errors when a variable does not exists, so we didn't notice anything unusual when we inadvertently removed a template variable, which happened to contain an image src (for a small icon).

Removed the tag, the application loaded twice as fast.

(5) +1 for figuring that one out. -1 for using Django. You came out even :) - tsilb
(1) Stepping through a page with <img src="" /> in the debugger can be quite confusing as well. - Matti Virkkunen
[+18] [2009-02-13 15:51:13] Frans
  • Adding two indexes to a table speeded up a stored procedure from 12.5 hours to 5 minutes.
  • Moving a straight data copy operation from SQL's DTS to just a "insert into ... select from" statement reduced copy time from an hour to 4 minutes.

A more common example, however, was when a colleague had used sub-selects on SQL to get certain values from a child table. Worked fine on small datasets, but when the main table grew, the query would take minutes. Replacing the sub-selects with a join on a derived table made the whole thing much, much faster.


       (count(*) from absences a where a.perid = person.perid) as Absencecount
FROM Person

is very bad, as SQL will have to do a new select statement for each row in Person. There are different ways of making the above more efficient but using a derived table can be a very efficient way.

SELECT Name, Absensecount  
FROM Person left join  
   (select perid, count(*) as Absencecount from absences group by perid) as a  
ON a.perid = person.perid

The problem with SQL is that it is very easy to write very bad SQL. SQL Server is so good at optimising stuff that most of the time you don't even realise you are writing bad code until it doesn't scale well. One of the golden rules that I always look for is; "Is my inner query referencing anything in the outer query"? If the answer is yes then you have a non-scaling query.

(2) SELECT p.Name, count(*) FROM Person p INNER JOIN absences a on a.perid = p.perid GROUP BY p.perid, p.Name - Carl Manaster
@Carl, that's not equivalent. Consider a person with no absences. - David B
[+18] [2009-02-13 17:06:05] bogertron

Using a connection pool. Who would have guessed that something that is known to makes things faster actually does make things faster?

[+17] [2009-02-13 13:28:22] strager

Go from single-core to quad-core.

(Hey, you didn't strictly say programming related!)

That's a small change? How long did it take? - Michael Myers
(3) @mmyers, It took a simple swap of parts. Maybe an hour at most. - strager
+1 funny. $250 and half an hour of your time is worth it. - tsilb
that's like if you're adding some ram :P - Atmocreations
(2) even 2 cores on windows can help ALOT since some programs take 100% of available CPU and you can only restart computer. If you have two cores you can actually press CTRL-ALT-DEL and kill problematic process, since other core is available for windows... - ante.sabo
(1) @as That is the whole reason I got my first dual core when they just came out - PeteT
[+17] [2009-04-14 15:32:21] Conrad

Letting go that developer who fondly and erroneously believed that demonstrating how clever you are is the same thing as getting work done.

Sometimes to improve the code -- improve the team.

+1 : I know how you feel. - Deepak Singh Rawat
[+12] [2009-02-13 13:09:41] Brian Knoblauch

Replacing a "MUL" with a "SHL"/"ADD" series in some x86 graphical code also resulted in about an order of magnitude improvement.

(2) That must have been a long time ago… - Bombe
(4) 8088 10mhz clone with an ATI EGAWonder800+ video card! :-) A very long time ago... - Brian Knoblauch
sigh them were the days. - Justicle
[+11] [2009-02-13 13:43:54] Skizz

One project I worked on had a very long build time - over half an hour for a full rebuild. After a bit of investigation I traced it down to the precompiled header settings. I then wrote a small app to scan all the source files and reduce the header file dependencies and correctly set up the precompiled headers. Afterwards, full rebuild time was less than a minute.


Impressive :) How many files? How many projections/solutions and which platform? - Ketan
A hundred or so files, Windows. The problem was, the compiler was rebuilding the PCH file for every source file. - Skizz
[+11] [2009-02-13 14:21:25] Si.

Changing log4net logger level from "DEBUG" to "INFO".

(1) Interesting SO issue, adding a full stop equates to 60% contribution. - Si.
[+11] [2009-02-13 15:11:39] mch

After profiling showed that a large amount of time as being spent in std::map<>::find(), I looked at the key space and found that it was pretty much contiguous and uniform. I replaced the map with a simple array, which reduced the time required by about 80%.

Choosing appropriate data structures and algorithms is the best first step to improving performance.

(1) I used to think hashing would have good performance when the key space is contiguous and uniform, probably near to that of an array (with a good optimizing compiler). What makes the map so worse than an array? Is it that the hash was being computed every time the key was used? - Hosam Aly
(8) std::map isn't a hash, it's a binary tree. so there was a binary search every time he called find. Replacing that with a straight array lookup would be miles better. - Sol
Supposing it WAS a hash table, if the key space was [mostly?] contiguous that would indicate the method of hashing used wasn't distributing items uniformly throughout hash-space, which is an absolute necessity for a hash map to have good performance. - Brian Vandenberg
[+10] [2009-02-13 13:45:24] ScottStonehouse

Changed a SQL query from a cursor to a set based solution.

(1) Treating databases like arrays is one of the biggest mistakes I see. It's also one of my biggest pet peeves. While cursors aren't always bad, they should only be used when absolutely necessary, which is almost never. - Eric
You can always tell the people who came from a DB2 or Oracle background because they go for a cursor-based solution prior to considering standard CRUD operations. - tsilb
@tsilb - I see it mostly in application developers who work with SQL after only working in their app code. "Use what you know" - StingyJack
[+10] [2009-02-13 15:43:11] Steve

Switch from the VS compiler to the Intel Compiler for some numeric routines. We saw a 60% speedup just by recompiling and adding a few flags. Utilizing OpenMP on the routine's for loops yielded a similarly large speedup.

(2) We did the same for some image analysis algorithms and saw a nice improvement as well. - Ed S.
[+10] [2009-02-14 00:52:17] Ryan Bigg

Indexed a database. Imagine driving a Daewoo Matiz that suddenly morphs into a Lamborghini.

[+7] [2009-02-13 13:13:51] weazl

My biggest performance improvement was gzipping a 700 Kb XML file downloaded by thousands of clients a day and then caching the gzipped output in memory, dropped bandwidth usage somewhat but more importantly dropped server load from about 0.7 to 0.00.

[+7] [2010-10-14 15:46:44] Abe Miessler

Changed a stored proc from this:

@numberParam varchar(16)
FROM ...
WHERE id = CAST(@numberParam as int)

to this:

@numberParam int
FROM ...
WHERE id = @numberParam

Hello indexes!

Ooops, upvotes do not add reputation for community wiki posts - WebMAOhist
[+6] [2009-02-13 16:21:30] Don Branson

In log4j on a server-side app, changing something like this:

log.debug("Stuff" + variable1 + " more stuff " + variable2);

to this:

    log.debug("Stuff" + variable1 + " more stuff " + variable2);

Gave us a 30% boost.

(2) This is already mentioned in another answer, with some interesting comments. - Hosam Aly
ah, thanks. i didn't look hard enough. - Don Branson
okay, i looked, and only saw the log4net comment. it's a little different - they got a boost by logging less, we got a boost by checking the debug level before building the parm list for the debug call. related, but not the same. - Don Branson
Actually I meant TM's answer:… - Hosam Aly
okay, mine's completely different, because, uh, well, uh, his example uses curly braces. yeah, that's it. ;) Seriously, though, thanks for bringing it to my attention. It's the same. - Don Branson
@Don Branson: I approve of this post :) +1 - TM.
Thanks, TM. I plussed you, too. - Don Branson
I cant believe people are so dumb they find this useful... - mP.
Because you always knew everything, right? - GMan
SLF4J: log.debug("Stuff {} and more stuff {}", var1, var2); Both cleaner and faster. - Tim
[+6] [2009-02-13 23:55:04] Kevin Pang

debug="true" to debug="false" in an ASP.NET web.config file

Don't you mean the other way? - Ian Boyd
Whoops. Yes, you are totally right. Fixed now. - Kevin Pang
[+6] [2009-02-14 17:26:24] Tobias Hertkorn

I strongly urge everybody else to do as I do: Do the improvement and forget about it the second you did it. Otherwise you will do premature optimizations in a subsequent project. ;) Always consult a profile before doing anything (e.g. the "always use stingbuilder" notion is usually not necessary - if not hurtful). Us the best readable thing. and worry about performance within one tier later on. Make it readable and correct (in that order) and then, maybe, make it faster.

(1) Unfortunately some coders have no idea how to write efficient code. Saying to them that it's ok not to worry about it is worse than premature optimization. - kosoant
(4) Hmm, IMHO the coder you speak of will do even more damage when coding with performance in mind. Because he will do all his programming with the one "proven" performance improvement he "learnt" about while googling. The horror. I'd rather have slow, readable code which a good programmer can easily improve on. - Tobias Hertkorn
[+5] [2009-02-13 17:04:49] Binoj Antony
<%@ OutputCache Duration="3600" VaryByParam="none" %>

[+5] [2009-02-13 19:51:01] dash-tom-bang

A few projects back we were just short of reaching performance targets. I ran the profiler and found that sqrt() was occupying 42% of our frame time! I'm not sure why it was so slow on this hardware (Nintendo Wii), and it was only called a few hundred times per frame but wow.

I replaced it with a 3-iteration sqrt estimator and got almost all of that 42% back! (The estimation was "guess at a reasonable value of the sqrt, then refine by choosing the midpoint between that estimate and the result of dividing the estimate into the initial value." Picking a good initial guess was important, too.)

(3) I think that's called the newton-raphson method. - DavidN
It's also called the quake3-sqrt because it was found in the open-sourced quake3 code, except that the magician who did it there used only one iteration. :) - erjiang
[+5] [2009-02-13 15:30:15] Roger Lipscombe

I recently rewrote a SQL query (for removing duplicates from a table), bringing the runtime down from still-not-finished after 47 hours to 30 seconds.

The trick: realising that it was an upgrade script, and I didn't need to worry about concurrency, since the database was in single-user mode. Thus, instead of removing duplicates from the table, I could just SELECT DISTINCT into a temporary table, TRUNCATE the first one and then move the rows back.

was there a specific, important change that did the trick? - pc1oad1etter
(2) why were there duplicates in the first place? - renegadeMind
I had something fairly similar but in a reporting database. I added a column to check off processed rows versus the original way of inserting a row into a second table. Instead of my WHERE IN growing for each row the list to check grew shorter. - Matthew Whited
[+5] [2009-02-13 15:01:59] revs

I swapped around the order of a selection criteria for a database query once and the runtime went from a 6 or so hours to a few seconds! The customer was pretty happy!!

[+4] [2009-02-13 14:01:39] Mike Dunlavey

On a 68000 [1], some years ago, in this C code:

 struct {
} A[1000];


int i;
for (i = 0; i < 1000; i++){
   ... A[i] ...

One very small change caused a 3-times speedup. What was it?

Hint: sampling the call stack a few times showed the program counter in the integer-multiply-subroutine being called in the code from A[i].


wow... my guess is that the 68000 didn't have a register capable of holding a value of 1000 (or perhaps just the memory address pointed to by it?) and thus you switched to using pointer arithmetic to iterate over the array to avoid the multiply... how far off am I? :D - rmeador
@meador: I should have said the code actually allows i to skip around. Pointer indexing would actually be even faster for this code. You're in the ballpark. Let's see if there are any more guesses for the small change. - Mike Dunlavey
Oh, all right. Just declare i as short. That allows it to use the 16-bit multiply instruction, rather than the subroutine. - Mike Dunlavey
I've got an agenda: to show how useful stack-sampling is. If you didn't know that the multiply routine was being called from A[i], you'd be left guessing "why are we multiplying so much?" And it wouldn't help to know the whole loop was taking a lot of time. - Mike Dunlavey
I bet that you declared a pointer to &A[0] outside the loop and then incremented the pointer at each pass through the loop. This way you aren't doing so much multiplication and other arithmetic to find the current array index. (Yes, I've looked at C compiler assembly output!) - Eddie
Oh, really, changing i from int to short made that much of a difference? I would never have guessed, but that makes sense. Especially on the 68000 (and other architectures from that era) where an integer multiply took a long time. - Eddie
(1) Right. Even if a 16-bit multiply instruction is slow, it's a lot faster than a subroutine to do 32-bit multiply. If you like that, they used to do floating-point with libraries also. 300 instructions to do an Add. - Mike Dunlavey
Using a pointer would speed that up, but failing that, counting back to 0 (comparing to 0 is very cheap) would be the best optimization. - Hooked
@Hooked: You're right. Unrolling would make it go even faster. The question was what's the biggest improvement for smallest change. Also, I think the best lesson to draw from this is not how the problem was fixed, but how it was found. - Mike Dunlavey
[+4] [2009-02-13 13:49:19] Jim Blizard

I inherited a time tracking application that was written in VB 3 and used an Access database. It was the first VB application written by a very experienced COBOL programmer. Rather than using SQL and letting the database engine get the data he wanted efficiently he opened the table and went from record to record testing each one to find the one he wanted. This worked okay for a while, but when the table grew to 300,000 records it got a "little slow". Looking for a single programmers time entries would take about 5 minutes. I replaced his code with a really simple SQL statement and the same search went down to about 10 seconds. The original programmer thought I was a god.

(1) But still, 10 seconds to query a 300K table? Even Access should be able to handle that in milliseconds. - Juliet
(1) Not too surprising if it was on a network share - Matthew Whited
Yes, it was on a network share on a very busy token ring network. - Jim Blizard
[+4] [2009-02-13 22:51:48] The_Fox

I discovered code that build a string with an IN statement that was inserted in a WHERE clause of another SQL statement. Creating the string with the IN statement took about 15-20 seconds. The IN statement consisted of thousands of ids: the IN statement was splitted into several IN statements because Firebird can only take 1500 elements in one IN statement.

I removed the code and moved the SQL to get the ids to build the IN statement directly into the WHERE clause of the other statement. The size of that statement went down from more than 70.000 characters to only 1500 or so.

My main query was faster and I lost the time to build that IN statement.


join TABLE_B B on B.A_ID = A.ID 
where B.ID IN (1, 2, 4, 5, ...1496 more) AND 
B.ID IN (2012, 2121, 2122, 2124,  ...1496 more) AND so on...


join TABLE_B B on B.A_ID = A.ID 
where B.FOO = 2

facepalm . - Greg
[+4] [2009-02-13 23:46:45] fionbio

I knew a guy who was running some electron accelerator-related simulations that he wrote in C under Linux that took about a hour to complete on a Pentium-120 (it was a long time ago), during which he took lunch. I (mis)advised him to put gcc -O2 option in his Makefile, after which the program started taking several seconds and his nice excuse for lunch break was gone :) The secret was that the program had lots of nested loops in it, and most calculations were done in the innermost loop while for most of them it wasn't really necessary. gcc -O2 turned out to be smart enough to move these calculations outside of the loops, causing the unbelievable performance boost.

[+4] [2009-02-14 11:35:20] mrdenny

I removed a Cartesian join from a query and the nightly job went from hours to seconds. Was tested in QA, but no one ever questioned that the job wasn't suppose to take hours to complete so they passed it.

Same company, with some simple re-indexing I took time sensitive nightly batch processing that was taking 6-8 hours to complete and got it completed in 1-2 hours.

Just yesterday I had a client add a few indexes to a few tables and reduced the run time of a procedure from 8 minutes to 6 minute. Not great mind you, but the tables are very large, and the procedure runs every 10 minutes. So over the corse of a day I saved the SQL Server 2 hours of processing.

[+4] [2009-09-07 21:12:29] seengee

The best performance improvement I've ever seen is my performance when I turned off twitter :)

(1) That's a productivity improvement, not a performance improvement! ;) - Andrew Grimm
[+4] [2009-11-19 13:36:57] Andreas


for( int n = 0; n<things.getSize(); ++n ){ ... }


int count = things.getSize();
for( int n = 0; n<count; ++n ){ ... }

Saved about 11% in the rendering loop. (count was around 50000)

for( int n = things.getSize()-1; n>=0; --n ){ ... } works too if order isn't important - Josh
[+3] [2010-02-10 10:38:51] NLV

I once wrote a sequence of SQL queries which worked on huge amount of records. The performance was really poor taking 4 minutes to execute. Then i wrapped it with Begin Transaction and End Transaction. It spit the result in 5 secs.

[+3] [2011-01-08 15:21:10] Gonzalo Larralde

Maybe the best and shorten improvements ever, starts with:



Not always, and not on all databases. Indexes are often (always?) kept in b-trees, and rebalancing those can be costly. It highly depends on what data will be in there. - MPelletier
Yes, but this cost is, AFAIK, absorbed by the db server in idle time, so in the worst scenario, if you make a wrong decision at the moment of create an index you'll not get more performance, but you didn't lose neither. I insist, AFAIK :P - Gonzalo Larralde
[+3] [2010-03-01 14:28:46] Mike Trpcic

Database Indexes. We had an application that was using lookup tables fairly heavily, but there were no indexes on any of the appropriate columns. A coworker [1] and I did two things:

  1. Added indexes to all the id columns on the lookup tables
  2. Switched the ORM for our heavier queries to do find_by_sql

Those two changes netted us a roughly 50% speed increase in database access, and made the application noticeably faster. It just goes to show you that you can't disregard good database design because you've got an ORM handling most of the work for you.


[+3] [2009-02-14 00:56:16] community_owned

Moved a function call outside of nested loops. I think that was about 10x improvement. Changed the switch from application server to database from 100Mbps to 1Gbps, this improved performance during high traffic.

[+3] [2009-02-14 01:07:55] DSO

In an ASP.NET application there was a page which displayed a lot of records (order of 1000s) from a SQL database query.

Originally the app was storing results in a DataSet before sending the results to client. This was causing users to have to wait a long time to get the results, as well as causing scalability problems because the server was storing the entire result set in memory (DataSet) before returning it to the client. A long wait would also cause users to constantly hit refresh, worsening the problem.

I removed the DataSet and had the code stream out the query results using Response.Write, and this greatly improved the scalability of the server and the perceived performance from the user's perspective (since they were getting results streamed to them immediately).

(1) Code please ? - Ian Boyd
[+3] [2009-02-14 00:45:55] Chris Lutz

In C, I was writing a subroutine to slurp an entire file into one variable (bad practice, wastes a lot of memory, but it's the best solution and I only do it to one file). It used malloc() to create a 100 char array and realloc() to resize the array dynamically whenever it got full. I tested it on a 118448-byte file, and it took ten seconds to read it. I tried making it a 200 char array and increasing the size by 200 bytes, and it still took 10 seconds. Then I smacked myself and changed this:

if(size == strlen(string)) {

to this:

if(size == counter) { // counter is the index of the last char in the string

It now reads and processes the same file almost instantaneously.

EDIT: Fixed typo.

was going to post something very similar, was trying to figure out why a co-workers code was so slow and I saw this: for (int ixChar=0; ixChar<strlen(reallyfreakinglongstrong); ixChar++) The really annoying thing is that it had already been spotted in a code review. - Andrew Barrett
Increase your memory by amount proportional to the current size (e.g., 1.3*size) instead of fixed amount (200 bytes in your case). It greatly diminishes number of costly realloc() calls. - J.F. Sebastian
It would be faster, but it's not a performance issue right now. I may tweak it a little, but It Works On My Machine™. I also hear that using powers of 2 makes it faster. - Chris Lutz
[+3] [2009-02-14 09:19:17] community_owned

i had a program that was address checking tens of millions of addresses. it could do a few hundred per second but it still took the program about 4 days to finish each run. the problem was that it was doing one address at a time.

we made the program multi-threaded (didn't take much work at all) and had it use 5 threads.

the program went from taking a few days to complete to a few hours.

note: we were making calls to another program that would do the address check

[+3] [2009-05-26 19:17:55] victor hugo

Once an application was having a TERRIBLE performance, it took about 15 secs to display a simple aspx with no complex logic. Three developers were tuning SQL statements, business logic and even the HTML in the page. I checked it out and resolved the issue by changing this attribute in main web.config:

debug="true" to debug="false"

Am I a genius? Hahaha, I'm really not!

[+3] [2009-02-19 08:06:44] Michael Borgwardt

When writing a solver for a game, adding very simple and limited dead-end recognition to prune the search tree brought down solving time for a big level from 15 minutes to near instantaneous.

[+3] [2009-02-19 08:16:34] Coentje

Rewriting a join:


Select from a
  left join b on a.idb =
  left join c on a.idc =
  left join d on = a.idb or = a.idc


Select from a
  left join b on a.idb =
  left join c on a.idc =
  left join d on 
       when a.idb is not null then
       when a.idc is not null then
       else null
    end) =

The query went from 3+ minutes to 8s after some more tweaking it eventually came down to about a second which was acceptable for this one.

[+3] [2009-06-17 21:13:16] womp

I changed a VB6 function that was concatenating hundreds of strings together to output a tree control in the early days of ASP. It called a function that looked like:

mystring = mystring + param1 + param2 + param3 + param4

Adding a single set of parentheses to change the order of concatenation:

mystring = mystring + (param1 + param2 + param3 + param4)

optimized the time the page took to load by over 99%. Went from over 2 minutes to under 1 second.

[+3] [2009-07-27 17:32:19] Steve Wortham

In the early days, I had some code that grabbed hundreds of rows out of a SQL database table based on a where clause. The whole purpose of this code was to get the number of rows returned.

After learning that I can get the number of rows from a given query with the COUNT(*) statement in SQL, I drastically improved performance of that page.

[+3] [2009-09-02 22:56:11] Gavin H

Replacing a frequently hit division by 4 to a bit shift operation.

compiler should do this ... and it's potentially unsafe - aehlke
Why is it potentially unsafe? If it's potentially unsafe, then the compiler wouldn't do it. - erjiang
I think the rule of thumb is not "the compiler will handle that", it's more like "you'd be surprised". For instance, GCC does not optimize division/modulus for signed integers. - Joey Adams
[+3] [2009-02-13 19:24:31] Edison Gustavo Muenz

While programming in CUDA [1] for GPUS you must provide the correct number of threads to be launched. The program was launching with the incorrect number of threads, so it was running in serial. While chaning the line:

kernel <<< numberOfThreads >>> ()


kernel<<< numberOfThreads, numberOfThreads>>>()

the program ran ~ 500 times faster


[+3] [2009-02-13 18:38:55] Harold

Took a web page load from 3 minutes to 3 seconds by indexing the primary search term. Problem was the table had 1,000,000+ rows. Their "developer" just couldn't make it go any faster and had them purchase a new Quad Server 8G RAM machine.

[+3] [2009-02-13 13:52:49] LiorH

Installed profiler on the application server. It makes the plumbing work much more fun.

[+3] [2009-02-13 14:50:42] community_owned

Switched from PHP to Python for pet projects.

-1, quite subjective, not really valuable, right? - Carl Hörberg
[+3] [2009-02-13 13:14:14] Robin Day

Set NOCOUNT on a complex cursor based stored procedure.

It was returning a row count of 1 a few million times even though the application had no need to know it.

The gain was purely in network I/O.

[+2] [2009-02-13 13:25:32] Joachim Sauer

Updating the database statistics on Oracle 9 using DBMS_STATS.GATHER_DATABASE_STATS reduced the runtime of a (rather simple) query from around 12 minutes to 200 ms. Oracle 9 decided that multiple full table scans were a better approach than using the index because the statistics were broken.

[+2] [2009-02-13 14:57:17] David Thornley

A long time ago, I removed an index, and sped up my query by a factor of at least 300. I never did figure out why Oracle 7 figured it needed to do a full Cartesian join if it had the index, and not if it didn't.

[+2] [2009-02-13 14:00:44] community_owned

Until recently we had an intern who had a special method of optimization. He put together a sql statement that took over 20 minutes to run and had to be called quite often. He became aware that the sql statement would finish real fast when he put a LIMIT 1 at the end. I think I destroyed his faith in humanity when I told him that this will not return the results he needs.

[+2] [2009-02-13 20:48:39] Jeremy Frey

Call .Dispose for objects implementing IDisposable. There's a reason why those objects are implementing IDisposable, ya know!

The application (inherited from a former employee) went from needing a restart every day to running like a champ nonstop for the next 2 years.

[+2] [2009-02-13 16:21:29] Dan Howard

Removed the ORDER BY clauses from our SQL statements and moved sorting code to the Objects. This gives you a clean consistent query plan and moves the sorting work from the database to the clients (or web servers) where it's distributed.

[+2] [2009-02-13 17:01:03] Binoj Antony


if( string.Compare( prevValue, nextValue, StringComparison.Ordinal ) != 0 )

Instead of

 if( prevValue == nextValue )

I doubt this, at least in .NET 3.5. Some browsing in Reflector shows that operator== calls Equals(a,b), which in turn calls EqualsHelper, and the latter does an ordinal comparison. The method you're using (equivalent to calling string.CompareOrdinal) does almost the same logic, ... - Hosam Aly
... except that the compare method has to perform additional logic to return a negative or positive value (or 0). - Hosam Aly
[+2] [2009-02-13 15:53:28] Gulzar Nazim

Used some internal caching for a heavily used http module and the performance improved by a big factor.

[+2] [2009-02-13 22:22:56] IanL

Reduce remote calls such as database or web service calls. In most applications this is what produces most if not all of the latency, because it usually involves trips over the network.

[+2] [2009-09-02 22:59:46] CodeByMoonlight

Changing a SQL query against several million rows so that instead of

WHERE dbo.fn_TrimDate(ActionDate) = @Today

I had

WHERE ActionDate BETWEEN @Today AND (@Today + 1)

fn_TrimDate being an ugly function that ripped off the time part of a datetime field.

The query went from an average of 0.5 secs to being almost instananeous.

[+2] [2009-09-02 23:04:44] neilprosser

Recently while writing a Java application which reads REST responses our team were using DOM based XML parsers, mainly because selecting things out by XPath is nice and easy to code. Bad move!

We switched parsing and serialisation over to event-based XML classes (in our case StAX [1]). It vastly improved the memory footprint of the application, which has a massive impact on scalability and sped up the processing by at least an order of magnitude.


Just re-read the actual question... It wasn't really a small change, but it did make things better! - neilprosser
[+2] [2009-09-02 23:06:15] community_owned

The line

new Regex(pattern, RegexOptions.IgnoreCase);

was changed to:

new Regex(pattern);

It improved performance by about 1400% as case sensitivity wasn't required.

[+2] [2009-09-02 22:32:56] community_owned

Spent less time on Stack Overflow for a day.

[+2] [2009-09-02 22:51:17] SebastianK

I doubled the performance of an in memory matrix calculation by storing the matrix row-wise instead of column-wise. This improved cache locality.

[+2] [2009-09-04 05:42:49] monksy

Changing getPixel value from Bitmap object (.NET) to direct unsafe bit manpulation. The performance caused the method go from 4 minutes to 1 second.

[+2] [2009-09-07 20:54:32] rein

Can't remember the exact code but we changed this:

int readSize = 1024;
result = fread(buffer, readSize, 1, file);


int readSize = 1024*1024;
result = fread(buffer, readSize, 1, file);

Never underestimate how slow I/O is.

[+2] [2009-07-27 17:35:09] Carl Hörberg

Linq to SQL don't cache, so use ToList() when you are enumerating a IQuerable<> multiple times.

var db = MyDataContext();
var query = db.Where(a => a.lot.quering == a);


var db = MyDataContext();
var query = db.Where(a => a.lot.quering == a).ToList();

this reduced the time for a regression calculation and graph generation over the data from ~20sec to <1sek

[+2] [2009-07-27 15:42:24] Nelson

I once made a C program twice as fast by changing the array size to be a power of 2 and thereby avoid integer multiplication. In the center of my simulation code I had a 2d array named stored on the heap, Here are two ways to index into it:

#define worldState(x,y) (*(world + (y) * worldYSize + (x)))    
#define worldState(x, y) (*(world + ((y) << worldYSizeBits) + (x)))

On the 1995-era Sparc I was running this code on integer multiplication took 33 clock cycles; one cycle per bit in the word. The bit shift took 1 cycle. And by far the main thing my code was doing was fetching states out of the world, so I saved 50% of my runtime by constraining my code to only work on world sizes that were powers of 2.

I found it with a profiler; fortunately the multiplication showed up as a call to the function _imul() which the gcc runtime was providing. Compiling with -O would hide that, btw, but at the time the profiler didn't work with optimized code.

[+2] [2009-07-27 15:47:27] Sneakyness

Well the server was taking forever to load, so long that it was timing out!

I plugged in the ethernet cable and everything loaded instantly. It was beautiful.

(Happened in a Network Administration class I was taking, teacher moved the server but forgot to plug everything back in.)

[+2] [2009-07-27 14:52:31] Marco van de Voort

Adding a few SQL Server nolock directives for static tables (prices that were updated once an year).

[+2] [2009-02-14 11:28:22] Rauhotz

In a tight loop, i replaced the return value of a function from IEnumerable<int> to int[] and worked with for instead of foreach. This reduced garbage collection to a minimum and increased performance by factor 10.

[+2] [2009-04-14 15:45:11] chris

I was porting a professor's C code for a Travelling Salesman genetic algorithm to Java. The majority of the work was moving from procedural to OO.

We were carrying about 1000 trial solutions, and killing off 100 each generation.

Each solution was simply an object which contained an array of nodes to visit (in order) and a couple of methods to get costs and manipulate the crossover.

First (successful) run took 8 hours -- I blew the memory a few times first.

Instead of de-referencing the objects, I stuck them in a pool for reuse and performance increased to about 5 minutes. after a few generations the Garbage collector was running constantly and I was spending more cycles cleaning up the mess than I was processing the data.

[+2] [2009-02-14 20:08:22] community_owned

Switched from using Linq to some older style array looping. :) cut the processing time on a particularly lengthy method nearly in half. (from 940ms to 501ms).

[+2] [2009-10-23 05:19:14] warren

Switched from an OR construct to an IN construct in MySQL - over a 10x speed improvement!

[+2] [2009-09-18 12:22:40] Mayo

An old application started going haywire on submissions of new data when we moved to a new SQL Server silo. It went from 1-2 seconds to several minutes. Obviously something changed on the SQL/network side but after 3 days we weren't able to identify it.

Upon examining the code we noticed that it had a random identifier based on the time (goofy design - not mine - SQL Identity or GUID work fine for me), only it was seeding to the millisecond. So the code only had 100 different seeds meaning it would likely hit the same pattern of randoms and cycle through until it found the next available one.

We seeded to the current time (instead of millisecond) and boom, 1-second submissions.

On a side note, our development environment had the same SQL/network problem but it went unnoticed because the Web server (a VM) was so slow that the random identifier algorithm (20 random characters based on current millisecond) produced an identifier built from several different random seeds whereas prod built from a single random seed. A glorious bug that was kind of fun to uncover / resolve.

[+1] [2009-09-18 12:27:40] Liran Orevi

Dynamic Programming [1], Sometimes It's amazing how the use of a simple look-up table of some values, in a recursive function, can help. for a small example check this Fibonacci in C++ [2].


[+1] [2009-09-17 03:24:12] Haluk

Dropping Java's array clone method and using other methods instead. It turns out cloning is very resource consuming and it should be used only when definitely necessary.

It dramatically improved my Java code's performance.

What "other method" were you using? - portoalet
[+1] [2009-09-18 12:43:55] Kirill V. Lyadvinsky

Once upon a time I've added /SSE2 option to my Visual C++ project and got +10% performance.

[+1] [2009-10-13 02:53:39] community_owned

Instead of doing all of the lookups against the database in our web app, the lookup information is pulled into a HashTable in memory and kept for an hour:

HttpContext.Current.Cache.Insert(Name, htData, Nothing, DateTime.Now.AddHours(1),

We really don't need anything fresh to the minute, and looking the info up from the DB once an hour (instead of 10 times a second) improved performance trememdously.

[+1] [2009-10-16 21:43:59] Alexander

Using Sqlceresultset instead of Insert Query.It boosts performance in pocket pc applications especially when u deal with Bulk inserts.

Using datareader instead of datatable as datagrid datasource if u deal with above than 1000 records resultset.

Partition in oracle database.It improves %25.

Using string.empty instead of "" if u want to check a variable's "" value.

[+1] [2010-03-01 14:19:32] sankar

I just removed the try catch block and put the if condition check so that it wont throw the exception. That code block was executing more than 10K times to deserialize the data and kind of expected to be throwing exception and my previous developer just left that code unremoved. when i had to look for improving performance of loading the serialized file, i did this small tweak and improved a lot from kind of 36 secs to 3 secs.

Note : This would have been mentioned in either of the provided answer, but as I could not read all the answers and confirm myself whether it is already present, I am typing this answer. Sorry about if this is the repeated answer.

in case of Java, Hotspot optimizations apparently are not applied to blocks within try/catch, which might explain this difference in performance. - ccpizza
[+1] [2009-11-26 08:35:56] CodeByMoonlight

Changing this :

WHERE dbo.fn_TrimDate(DateTimeField) = dbo.fn_TrimDate(GetDate())

into this :

DECLARE @StartToday datetime
SELECT @StartDay = dbo.fn_TrimDate(GetDate())
WHERE DateTimeField BETWEEN @StartDay AND @StartDay + 1

[+1] [2010-03-08 17:17:22] keithwarren7

When I put an SSD in my laptop

[+1] [2010-03-08 17:21:45] iKnowKungFoo

Application spiked the CPU on the SQL Server to 100% at 8am as each time zone logged in for the first time. The server had 128GB of RAM and maxed out # of CPUs. New DBA, and by "new DBA" I mean the first DBA they ever hired in their 6 years of operation, found a query with an LTRIM() on a numeric column that was the join between two tables.

Removed the LTRIM and the CPU basically flat-lined.

[+1] [2010-03-08 17:30:06] Gabe

In some C# code I replaced some reflection to dynamically get property values with dynamically compiled lambdas and got about 100-1000x speed increase!

[+1] [2010-03-08 17:33:09] Seva Alekseyev

Created 3 indices in the database. The net performance went up about 25-fold.

[+1] [2011-01-19 13:01:21] Rohit

This code was getting called 3000 times in a loop and was causing CPU to go to 100% and even after processing was complete,CPU utilization didnot return to normal.

public static void WriteToEventLog(string strMessage,SqlInt16 EntryType)
    EventLog log = new EventLog();

            string EventSource = "EDiscLog";
        catch (Exception ex)
            log.WriteEntry(ex.Message, EventLogEntryType.Error);

I changed to

public static void WriteToEventLog(string strMessage,SqlInt16 EntryType)
    using(EventLog log = new EventLog())
            string EventSource = "EDiscLog";
        catch (Exception ex)
            log.WriteEntry(ex.Message, EventLogEntryType.Error);

And then CPU utilization went maximum to 21%.

[+1] [2011-01-08 14:46:56] ajreal

does this count?

switch from IE to Firefox -> chrome

[+1] [2010-03-09 05:06:38] HotTester

When i joined a project mid-way where already around 80% of coding had been done, i was given the task of looking into the project for any optimization possible. The first thing that i came across was the habit where the objects were not disposed after their use. So I just introduced the following in the finally block

//Declare some object
MyClass ob1;

    //instantiate the object 
    ob1 = new MyClass();

    //Perform operations
    //perform some operation
    ob1 = null;

And it worked wonders and the application now was working 30% faster.

[+1] [2010-03-11 04:28:04] MPelletier

I once came accross this gem:

for (int i = 0; i < count; i++)
    var result = dosomething();

Of course, dosomething() would always return the same result at every iteration. Moving it out of the loop helped!

I believe the hotspot compilers are smart enough to do the same - Pangea
[+1] [2010-05-07 21:11:10] FredOverflow

Changing two tokens improved my toy vector performance from O(n^2) to amortized O(n) when inserting n elements.

Slow: new_capacity = old_capacity + 10

Fast: new_capacity = old_capacity * 1.5

Good ol' powers of 1.5 for resizing arrays. - Joey Adams
[+1] [2010-05-18 08:30:24] Nick Dandoulakis

I recently did such an improvement in a Qt4 project.

Loading 5k lines (or 50k fields, tabular data) from a text file into a QStandardItemModel object took ~5-6 sec. Now it takes ~0,5 sec.

The problem was that the model was attached on a view object.
The solution was to detach the model, load the data and then attach the model again.

I added 2 simple lines of code and I speed it up by 10x.

Perhaps there is a proper Qt way for that (like preparing the view for massive updates) but I didn't have the time to discover it and my quick n dirty hack worked great.

[+1] [2010-08-20 17:09:23] user426578

Passed by reference instead of value. A huge structure containing image data.

[+1] [2009-02-14 13:33:39] Limbic System

We had a huge multi-project Maven1 build structure that was just insane, over 200 project modules. Due to inter-dependencies, it was not even possible to do a full automated build- modules had to be "released" to the CM group manually, a process which sometimes took 2 days.

The first optimization was to convert from Maven1 to Ant [1]+ Ivy [2]. This allowed automated builds, taking about 90 minutes for a full release.

The second optimization was to stop doing "scp artifact.jar remote-server:repository" manually for each artifact. I replaced that with a single call to rsync the whole structure up to the repository, which brought the whole build down to 5 minutes. And a totally automated 5 minutes at that. :-)

EDIT: After re-reading the question, I guess this doesn't really count as a "smallest change", but I'll leave it here and risk the down-voting.


[+1] [2009-06-02 21:29:47] ykaganovich

Added buffering to a FileOutputStream that was being written out 1 byte at a time. Took that step of processing down to 4 min from about 1.5 hours. Big difference considering this was for a security-sensitive app where an operator has to be present in the secure room for the duration of the step.

[+1] [2009-06-12 16:15:50] B0rG

There was this stored procedure (Sybase T-SQL) used for reporting purposes, that used temporary table, that had basically this structure:

CREATE TABLE #temptable (
   position int not null

It was joined with other tables (integer to integer join) on the position field, but there were couple of tables, that had the position value declared as char field, this caused the index not to be used, so, the solution was to modify the #temptable structure to:

CREATE TABLE #temptable (
   position      int      not null,
   position_char char(12)     null  

And just after it was filled in, do the update:

UPDATE #temptable SET position_char = convert(char(12), position)

So, the joins are made without converting the values, plus having extended the index on this table to the additional field made things go much faster.

[+1] [2009-04-14 15:30:48] Jhonny D. Cano -Leftware-

Rewriting a job with cursors on TSql to work with helper tables, a long ago, so, I don't have the code, but it improves from 2 hours to ten seconds

[+1] [2009-03-31 13:37:26] flybywire
  • Have apache and not tomcat serve static resources
  • Use gzip compression
  • Minify, compress and stick together multiple .js files
  • Minify, compress and stick together multiple .css files
  • Add caching to resources

Loading of a web page went from 30s to 4s (first time) and to 0.5s (cached).

[+1] [2009-09-07 20:57:36] rein

Removed VIEW STATE from an ASP.NET page. Page went from 800KB per request to about 10KB per request. That view state can be evil.

[+1] [2009-09-07 20:50:27] Stefan Steinegger

Some time ago I had a column in an Oracle database, which had a value when the column had been processed, and was null when not. The table had several hundred thousand items.

Of course there was an index on this column. But Oracle does (at least did in version 8) not store null values in to the index.

So a query like this

select * from VeryHugeTable where ProcessingId is null

took hours, although it only returned a few records.

We changed the null value to an arbitrary negative number:

select * from VeryHugeTable where ProcessingId = -9

I can't remember how fast it was, but it was incredible, a few minutes if not even faster.

[+1] [2009-07-27 17:22:19] Asaf R

It was a simple network simulator done as a homework assignment (in C#) and meant to run only once. However, when it ran it did so so slowly it would have taken over 24 hours to finish.

A rather quick glance at the code discovered that every simulation step recalculated the average of elements of a list. That list also grew at each step, thus landing a nice O(n^2) complexity. I changed the calculation by keeping the last average and using it to calculate the new, resulting in an O(n) complexity.

The total time decreased from an expected over 24H to about 15 minutes, about two orders of magintude.

[+1] [2009-09-02 22:34:56] Pavel Shved

Once I didn't change an application, but just "waved the wand" and the speed increased ten times! I ran CPAN [1] update to upgrade to the newest versions of the Perl unofficial modules. This increased my speed due to a bugfix in one of the application-critical modules.


[+1] [2009-09-02 23:12:37] tsilb

I was working with a very long DB2 query. It ran in the Test environment in 30 seconds, but in Production we had to cut it off after running all weekend due to the massive amount of data.

The query was optimized to death and could not be made faster by structure alone.

So we added this to one of its subjugate WHERE clauses:

and (1=1 or 1=1 or 1=1 or 1=1 or 1=1 or 1=1 or 1=1 or 1=1 or 1=1[...])

Doing so caused the DB2 parser to add a couple additional SORTs to the execution path and ended up making it run in two hours.

Doesn't this change the output? - recursive
bah, fixed - tsilb
[+1] [2009-02-13 22:37:35] WildJoe

I took an old process that built a bunch of static HTML pages serially and mult-threaded it. Went from about 4 hours for 10,000-ish pages to about 30 minutes. Saved us from buying another server too. The change was basically to call the same getPage() function the same amount of times, but called it as a ThreadStart delegate.

I also had an instance where someone typo'd the mysql InnoDB memory setting to 01GB intstead of 10GB. Fixing that made a large difference (though admittedly it wasn't code).

[+1] [2009-02-13 22:39:46] Daniel C. Sobral

I reduced processing time on a CDR pre-processing filter from 30 minutes down to 4 seconds by replacing a split in perl with a regex on the fields I wanted -- and excluding all the trailing fields, which represented about 75% of each line.

So, instead of:

@array = split $line, ",";

I had:

($field1, $field2, ... $field8) = $line =~ /^(?:[^,]*,){5}([^,]*),(?: etc)/;

[+1] [2009-02-13 22:44:24] Spence

Changed a TSQL cursor to a set based query. Same result in seconds not minutes. Bonus from the boss that week :).

[+1] [2009-02-13 21:26:35] Chris Lively

In a C# app, I moved some code that instantiated some xmlserializers to the global.asax application_start method.

There were 10 of these, and it dropped page load times by over 15 seconds each.

[+1] [2009-02-13 15:58:00] jalbert

I was asked to troubleshoot an application which in production was completely pegging the CPU on the database server (SQL Server). After running a trace, it was evident that the table designer wasn't aware of something called a primary key (or any other indexes for that matter). I added the key live. All of the sudden, the clouds parted and the CPU % went down to reasonable levels for the amount of traffic.

[+1] [2009-02-13 14:52:10] cjk

Updating the stats on a MS SQL Server [1] database gave a 90x performance increase on certain queries, i.e. 90 minutes to 1 minute.


Woah, which version of MSSQL - that's just pathetic and awesome at the same time! - EnocNRoll
2005 - the DB had been messed up a bit with running out of space etc. - cjk
My wife had a case where destroying stats helped - the table got real big for a couple of hours, then shrank down, and they were running the stats at night when it was tiny. - David Thornley
[+1] [2009-02-13 13:37:41] Jay S

I refactored a SQL query that was running as a batch job. It had several functions in it that were horribly inefficient, and the query itself was poorly written.

After spending a few days rewriting it, the run time went from 13.5 hours to 1.5 hours. I have still not been able to beat that efficiency increase to this day.

[+1] [2009-02-13 13:14:00] John Leidegren

Cache locality


Harsh guys...

I switched out an object graph for a linear memory representation where cache misses basically went away. With perfetching and some C++ template tricks I could define a nicely laid out memory representation which the CPU would crunch in no time at all.

This optimization wasn't really that much work but it signifies how horrible poor memory access patterns can be and God forbid, reference types...

Would you mind elaborating? - Alex Angas
I think it's pretty obvious what John meant. - kubi
Obvious what he meant, but the request was for examples of dramatic effects, not things to think about when optimizing. If John would like to post about the time he rolled up a loop and got a big performance boost, that would belong here. - David Thornley
Upvoted to counter Harsh Guys, who surely won't return to take back their petty downvotes. - Constantin
[0] [2009-02-13 15:05:25] Alejandro Mezcua

Learn to cache bitmap objects in .NET. The bitmaps were generated on the fly but many could be reused instead of regenerated. From an unusable app went to a pretty performant one.

Did you cache them in memory or on disk? I added support to my photogallery to use isolated storage for resized images and havn't looked back (15K+ x 4MB jpegs take a while resize for thumbnails, down scales, and even recompressed images) Also changing from an HTTPHandler webservice to a WCF restful service allowed for better cacheing of images on the client - Matthew Whited
For that particular case bitmaps were cached on memory, for a custom .NET WinForms control... - Alejandro Mezcua
[0] [2009-02-13 15:12:58] Chris Doggett

Turned off automatic row/column resizing on a DataGridView. Due to the way our app was written by another developer, the cell formatting would cause some checkbox column's value to be repopulated, causing the entire grid to recalculate it's size every time that column was painted. Clicking a button to add a row to the table took an exponential amount of time. Around 12 seconds to add a row by the time it got to the fourth row.

I turned the AutoRowSize off for the grid, and everything was almost instantaneous, as it should be.

[0] [2009-02-13 14:16:34] Esko Luontola

In a game application, I had an immutable class representing a cell in the game area's grid. It had getter methods which calculated the corners of the cell lazily, which included allocating new objects to represent the coordinates. The profiler showed those getters to be the bottleneck in the AI algorithms. Calculating them eagerly in the class's constructor improved the performance very much (I don't remember the exact numbers, maybe more than doubled the speed).

Before the code was like this:

public Point[] allPoints() {
    return new Point[]{center(), topRight(), topLeft(), bottomLeft(), bottomRight()};

public Point center() {
    return new Point(x + inner(width) / 2, y + inner(height) / 2);

public Point topLeft() {
    return new Point(x, y);

public Point topRight() {
    return new Point(x + inner(width), y);

The allPoints() method was the bottleneck. And after optimizing, the creation of all those values was moved to the constructor and stored as instance variables, after which all the getters were trivial.

It's always best to first do the simplest thing that could possibly work [1], and change it to something more complex only when there is evidence that the simplest thing is not good enough.


Were you not persisting the results after lazily loading them the first time? - Simucal
No. I created them with 'new' always when the method was called. It was the simplest thing that could possibly work, and only after the profiler showed that to be the bottleneck did I optimize it. - Esko Luontola
I edited the post to show how it was. - Esko Luontola
[0] [2009-02-13 13:56:25] Mark Struzinski

The best thing I ever did was learn NHibernate and incorporate it into all my projects. My SQL is now always properly formed, and I don't have bottlenecks from that end of the project.

--And properly indexed tables that perform a lot of lookups!

[0] [2009-02-13 16:17:33] EnocNRoll

These tips can each make a huge difference:

  • Added the NOLOCK SQL hint to massively complex SQL.

  • Removed Order By's from nested subqueries within SQL.

  • Refactored SQL to avoid the need for the DISTINCT hint.

[0] [2009-02-13 16:18:01] Steve Levine

Improved the time it took to run Spring JUnit tests under Maven 1.1 by adding the following to the


This was a huge improvement because most of the tests were leveraging the SpringJUnit4ClassRunner, and by setting forkmode to once, the Spring context was only loaded once per Maven invocation instead of once per unit test invocation.

[0] [2009-02-13 15:44:50] vartec

1) switching from in-house application with Expat parser, to XSLT and generic Sablotron, 100-fold improvement in speed and memory consumption

2) hacking Python code by calling directly objects properties, rather than setters/getters. 10-fold improvement in speed (although decreased code readability)

[0] [2009-02-13 19:07:29] community_owned

I added an index to a column in MySQL. This should have been there from the beginning, but everyone else overlooked it. I did some simple query explains, and found it. The main page of the site started loading 33% faster. Pretty nice for a quick index.

[0] [2009-02-13 22:11:30] slacy

The biggest gains will come from using the most appropriate datastructure. For example, I've seen huge improvement gains in C++ when switching an improper use of map<> to hash_map<>. The code was doing random lookups, and map<> is O(N), where hash_map<> lookups are O(1). The speedup was immediate and made the code many many times faster.

map<> should be O(log n), not O(N). Still might be a significant improvement depending on the problem. - Mark Ransom
[0] [2009-02-13 21:07:05] Dan

I wrote some code in work which was used to process large log files. It had to read each entry and match certain parts of it to previous entries. As you can imagine, the more entries were read, the more had to be searched to perform these matches. After quite a while of pulling my hair out, I realized I was able to make some assumptions on the entries which allowed me to store them in a hash table instead of a list. Now instead of needing to search each previous entry every time a new entry was read, it could simply do a hash table lookup.

Performance obviously jumped quite a bit. I believe for a particular log file, the list approach took about an hour and a half to process, while the hash table version took about 30 seconds.

[0] [2009-02-13 21:11:33] muerte

Turning off Compiled flag for RegexOptions on Vista 64-bit.

Due to some strange bug with the .NET 2.0 Framework, Regex parsing is two orders of magnitude slower if the flag is turned on!

Was it a bug, or where you creating new instances of the same regex repeatedly in your application? A buddy of mine experienced the same issue, until someone pointed out that he should create the regex outside of a loop, or else disaster will strike. - Juliet
The regex was called just few hundred times. The problem is in Compiled flag. If we simply remove that flag, everything is drastically faster. - muerte
I have a hard time buying this. - Ed S.
Why don't you then look it up?… - muerte
[0] [2009-02-14 00:40:51] TonyNeallon

Usually when I fine tune an app, I find using stringbuilder for any heavy string work gives a huge performance boost.

[0] [2009-02-14 00:02:44] PeteT

Changing a bit of VB.NET code from looping and running the same SQL statement roughly 50 times and inserting a record each time to one SQL insert statement. 30 secs to 2 seconds.

The previous developer didn't seem to understand SQL (or much of anything), still it got me to a good start on the job.

[0] [2009-09-02 23:30:27] staticsan

Heeding the top-level question, probably the biggest improvement I've had for the smallest change would be to correctly size the settings of a MySQL server for the hardware it was on. The defaults for MySQL - even the 'huge' ones - are extremely conservative. In particular, several of the memory parameters (e.g. sort_buffer) can be increased a thousand times and this will give a significant boost of performance. And table_cache is often way too low. I've had it up at 1500 on some servers.

[0] [2009-09-02 23:47:33] Matt H

Converting some Oracle Pro*C code to built-in PLSQL - yes, you read right convert a C function to PLSQL.

The issue wasn't so much to do with the C code itself, I'm sure that runs fast. The problem is that the abstraction between Oracle and Pro*C is super slow. So, converting the one function sped the rest of it up by about 100 times.

I should add that some Oracle SQL code was calling this external Pro*C code repeatedly. So bringing it into PLSQL meant less call overhead and faster execution.

[0] [2009-09-03 03:04:43] user161433

C# - I used Generic.List instead of ArrayList while migrating from database to database. It saved 6 minutes and a lot of unnecessary reboots.

[0] [2009-09-03 17:37:48] galaktor

After wondering why a window took so long to show up, I once removed the following line from a colleague's code

Thread.Sleep( 5000 );

At some point this must have been meant to have the application wait for some other thread to finish, but that was not an issue anymore because the code had been refactored many times since then.

[0] [2009-09-07 21:10:24] seengee

Converting all MySQL subqueries to use combination of joins and temporary tables. The improvement was unbelievable.

[0] [2009-07-27 15:24:55] yelinna

I have to finish a VB app with crystal reports that connect to a database. The original programmer stored the data in the BD as: "Field name = X" when a check box in the VB app was checked and "Field name = " when unchecked. The crystal reports showed all those strings from the BD and the fields must had the correct number of spaces and be in the correct places or everything would be messed up. The format of the report can't be changed, but I had to changed... I made the app store "X" when the check box is checked and " " when unchecked. the rest is written in the crystal report. Now I can place the fields in the report anywhere I want and nothing gets messed.

[0] [2009-07-27 15:28:10] redsquare

Renaming the daft long contentplaceholder id in an equally daft ASP.NET page that made use of master pages, nested user controls and nested repeaters. Saved about 30 KB from the rendered markup. Funny.

[0] [2009-07-27 15:33:58] yelinna

For our thesis, my friend and me had to copy a LARGE series of data to an excel file. This data was generated from a Matlab script. Made by hand, this "copy to excel" task will take an entire day. Then I programmed two loops in that matlab script and made it to write the data in the excel file (programming this took me a couple of hours) and now this task takes half of an hour in my dual core toshiba laptop :D (30 minutes... I repeat: It was a LARGE series of data).

[0] [2009-06-17 21:16:42] BlackTigerX

I replaced a dynamically generated "or x='a' or x='b'..." to a dynamically generated "x in ('a', 'b'...)" and was able to make it run fast, before that the application was dying when executing that query

What database? SQL Server turns "IN" into "OR" - Ian Boyd
in was MSSQL, like I said, before the change, it would time out, after the change, it ran pretty quick - BlackTigerX
[0] [2009-06-23 04:38:31] Hans Malherbe

ServicePointManager.DefaultConnectionLimit throttled the connection count between web and app to 4!

[0] [2009-06-23 04:50:23] humble coffee

I was writing a script that would write about 20k files to disk. It was taking about an hour, and I couldn't figure out why. Then I remembered that I was working on an NFS mount. Once I changed the output directory to /tmp and off the network, the script ran in about 2 minutes.

[0] [2009-07-27 15:54:11] Jreeter

Remembering to implement/use an active_flag column in your SQL table/queries to return only active rows, it's a big help when you have hundred thousands of rows.

[0] [2009-07-27 16:09:42] DanSingerman

I changed someone's CF code from this:

<cfloop from="1" to="#a_large_number#">
  <cfoutput><td width="1" bgcolor="#ff000"></td></cfoutput>

(reading it I had a serious WTF moment)

to this:

  <td width="#a_large_number#" bgcolor="#ff000"></td>

(This was in 1999, hence the HTML style)

+1 for the memories! - UpTheCreek
[0] [2009-07-27 16:26:58] Nick Lewis

I was checking for nodes in a tree that could be identical. I compared every one of the 3000-5000 nodes to every other node, and my full script took around 25 minutes to complete for every tree. I then realized that I only needed to check one category of nodes, which amounted to 300 nodes or so. After pruning the tree, the script took around 1.5 minutes. The power of O(n^2).

[0] [2009-07-27 17:08:45] Nicolas Dorier

I don't remember when I've done my best performance improvement but I know what it was in C:

int a = 0;
while(a = 0)


int a = 0;
while(a == 0)

I can't say for sure what was the time improvement between these two versions, because every times I try, I TimeoutException...

if you had done 0 = a from the beginning this would never have been an issue. - benPearce
@benPearce: Which reads unnaturally, is inapplicable when both sides of the comparisons are lvalues, and is easy to forget to do. This is something the compiler should flag. - David Thornley
[0] [2009-07-27 18:25:44] Tino Didriksen

Heavily nested Perl text processor where an inner loop had a line of

s/ +/ /g; (replace all spaces with a single space)

Profiled the app and noticed that single line accounted for 95% of CPU time. Removed the line, was very happy with the rather explosive speedup...

[0] [2009-09-02 22:20:58] Rodrigo





Speed up the query executed against SQLMobile in something about 30x (measured)

[0] [2009-09-02 22:31:24] community_owned

Added nusoap_base::setGlobalDebugLevel(0); in a SOAP Server written in PHP.

It increased the performance of the SOAP server easily by a factor of five.

What is most interesting is this isn't documented anywhere as far as I could tell, and I only came across it after reading an obscure mailing list post where someone suggested this.

[0] [2009-04-14 15:28:07] BlairHippo

The report-builder component of a piece of financial software my team built had a nasty little glitch: it read the field delimiter character out of the DB every time it was inserted. (DB was Oracle 8 -- nice and heavyweight.) Several months later, my boss asked me to take a look at the code and see if I could optimize it so that reports would finish faster than "overnight". I spotted this little oopsie and stored the delimiter in a local variable after the first read. Performance increased literally 100-fold.

The original coder was ordinarily very competent. Dude just had a brain cramp the day he coded that.

[0] [2009-02-26 05:13:08] Dmitriy Matveev

In the first week of work on my first job I was asked to make some fixes (mainly UI) for application that was used internally to monitor usage of our ATMs (the number of them was below one hundred). The initial load time was very annoying, it was about ten minutes. And I was needed to make many restarts of that application to test my fixes, so I decided to find the reason of such slow start up. Without usage of any profiler I found code that was very suspicious to me. There was some method which was used to build human-readable information based on the states of ATMs stored in local database.
Structure of queried table was something like:
atmId(actually many columns here), operationId, moneyInAtm(actually many columns here), time
The (operationId, time) pairs was unique in that table.
The idea of code was following:

  1. Retrieve last (maximum value of time) rows for all operations in all ATMs for last two days.
  2. For each of that rows do another query to find row with equal values of atmId and operationId and with minimal value of time.
  3. Calculate the difference of money and add (atmId, operationId, diffMoney, time) row to some table stored in memory (It later will be shown to user).

I replaced first two steps in this procedure with something like:

  1. Retrieve all data for last two days sorted by operationId and then by time.
  2. Iterate over that result set and find first and last rows for the same operation. (That was pretty easy since the rows sorted)

After that change the application initialization time was reduced to some seconds and the users which used to always go for some tea or coffee during start-up just denied to believe that the program worked correctly with such fast start-up. There are, however, was found one regression after few weeks of usage. The data about operations which was started before range of my query was lost or in some cases corrupted, because I was missing first row of that operations, but that bug was fixed and the users was happy.

[0] [2009-06-02 21:21:46] tj111

I had a JavaScript based table sorter that would lock up the entire browser from 10 seconds to a minute on each run (pretty large data set). After profiling (and many complaints) I learned that the mootools adopt method was taking up 88% of that time. All it took was the addition of four letters and instantly I got a massive performance improvement (down to about 1.5 seconds per run, much more acceptable).





[0] [2009-05-26 18:53:23] Jason Baker

I changed an oracle query from this:

FROM ...

To this:

FROM ...

This literally made a 1000-fold difference.

[0] [2009-05-26 19:13:06] David Berger

My company hasn't always been so organized about backgrounding. For simple projects, we just run processes with the bash screen command. Typically, logging is set up with file and console appenders. Logging in to the host machine and detaching running screens once cut the time of a long series of calls by about a factor of four.

Of course, removing an errant sleep statement cut out about a factor of 10. I never figured out what it was doing there.

[0] [2009-04-14 15:37:57] John Myczek

It just happened right here [1]


[0] [2009-02-14 14:02:59] MicSim

I once tweaked a small tool for exporting master/detail customer data written in VB6/ADO on MS Access. Got a 60x performance improve (from 10 minutes to 10 seconds). It was working like this:

masterRS = getCustomers()

while not master.EOF



Guess what the problem was... :-)

moved the opening/closing of connection2 outside the loop? - community_owned
(1) 100 points! For thousands of records that was a major performance hit. - MicSim
[0] [2009-02-14 16:44:46] Marcus

In terms of web app pages loading faster, we used a filter to strip out all excess white space from the html. This decreased the actual page size 25% which speeds things up quite a bit.

Reason we had so much white space was that there was a big JSP file involved that had lots of pretty printing [1]. Pretty printing is a good thing but can increase your page size/load time in this scenario.


[0] [2009-02-15 01:02:14] Henk

I had a state machine transition function which relied on a local std::stack for temporary values. The stack always emptied before the function returned, and the function didn't need to be re-entrant or thread-safe, so I could make it a static local variable.

This avoided re-allocating/growing the stack each time, resulting in something like a 10x performance improvement.

[0] [2009-02-16 11:26:08] Patrick Manderson

I originally used an ASP.NET DataGridView to display a large and richly formatted dataset which pushed the page beyond the 580k mark.

I later replaced the DataGridView (which is made up of tables by default), with a repeater control and a carefully 'cascading' arrangement of CSS styles. The change brought the size down to the 120k region.

[0] [2009-02-14 19:24:23] user64075


<%= javascript_include_tag :defaults %>

from a Rails app that didn't need it. Even if I needed the scripts, that line was a huge bottleneck. The app, by default, included the javascript files with a random number parameter attached to the end of the filename to prevent caching.

Fixing this dropped the page load time from 7.5 seconds to 1.5 seconds.

[0] [2009-02-14 10:04:10] lmsasu

Adding a compound index on a table. It reduce the time for a select query from 83 second to 2 seconds. Note that the SQL Wizard's hints weren't appropriate, I spent one day to think about the columns to be added/their order inside the index.

[0] [2009-02-14 10:19:05] Moshe

Quite similar to what you described in your question, I didn't trust SQL Server's optimizer, and added "OPTION(HASH JOIN)" to a query - over 3 orders of magnitude faster.

[0] [2009-02-14 00:52:08] flussence

The other day I found out how bad Postgres 8.1 is at optimising prepared statements.

I changed the code from SQL ?s to sprintf %s-es, and the query went from taking over 15 minutes to under 7 seconds.

(Then I installed 8.2 on a test box and found out they'd fixed that problem...)

[0] [2009-02-14 05:00:48] Steve Brewer

Changed from using a TreeSet to a HashSet. Was performing lots of set unions. ~40 seconds to ~200 ms.

[0] [2010-03-19 03:57:16] MPelletier

In J (again).

The 'dll' library creates nice verbs to manipulate memory. mema allocates, memr reads, memw writes and memf frees.

If you have a lot of addresses to read, J will evaluate memr at every read.

So that:

memr each BunchOfAdresses

is much slower than:

15!:1 each BunchOfAdresses

The tricky thing with 15!:1 though is that it can read all adresses passed, but will crash like a bitch if you give it a null (15!:1 (0 0 _1)) (Where the first 0 is the address, the second is a byte offset, and the _1 is -1, for length, in this case "read to first null").

So, if you have a lot of addresses, what do you do? Well, you could wrap your reader like so:

memr2 =: 3 : 0
    if. 0 = {.y do.
        memr y

But that's going to be a pain. J will evaluate that every read. Plus another evaluation if you have memr and not 15!:1.

Instead, if reading a lot of addresses, replace the nulls with an address you declare yourself, where you store a default value of your choice.

adrNull =. mema 1  NB. Allocate 1 byte
'' memw adrNull, 0 1 NB. Set byte to null
bunchOfAdresses =. adrNull (bx bunchOfAdresses = 0) } bunchOfAdresses  NB. replace all null addresses with out new address.
result =. 15!:1 bunchOfAdresses,"1 [ 0 _1  NB. append a 0 offset and -1 length to all addresses and read.
memf adrNull NB. always cleanup after

[0] [2010-03-21 17:15:41] tylerl

At my previous company we were using a lot of third-party code to speed development of our main product. We found that one component in particular dramatically increased the startup time of the application. So we spent a few days pouring over their source code to figure out how we could improve performance. At one point, I ran into this little gem:

CObjectManager::CObjectManager() {
    Sleep(10000); //Required for multi-threading 

We pressed the company to explain the code, and they insisted that "multi-threaded apps require proper timing," and if you remove the Sleep, it will break.

Apparently the original developer coded himself into a race condition, and to solve it he simply called off the race.

[0] [2010-03-09 05:13:07] EJP

My brother had a case in an Ada program where the bitfield declarations were inadvertently crossing a word boundary. Fixing that improved the method by a factor of 350,000. Yes, no typo, three hundred and fifty thousand times.

[0] [2010-03-09 05:26:31] MPelletier

Example is from J.

I see this often:

v1 ,. v2 ,. v3 ,. v4 ,. v5 ...

And sometimes that's just to take some rows out of it:

idx { v1 ,. v2 ,. v3 ,. v4 ,. v5 ...

Thing is, ,. "stitches" two same length vectors, or two same length matrixes. But every time you use it, the interpretor has to create a new matrix just one more column wider and stitch. And again, and again.

Prefer the following:

> idx & { each v1 ; v2 ; v3 ; v4 ; v5

At some point using boxes gets old, because they have so much overhead. But remember, boxing every single element is often uncessessary, and very slow.

Also fun is the key conjunction /.. But most beginners will only think of boxing with it and then apply whatever verb they want to each box.

For example, getting grouped sums can be done with this:

; +/ each x </. y

But should be done like this:

x +//. y

[0] [2010-03-01 14:39:17] David

This is specific for WinForm .NET. Turn off DataGridView.AutoSizeColumnsMode and AutoSizeRowMode.

[0] [2011-01-08 15:14:12] Pangea
  1. Replaced native Java object serialization with protobuf [1]. 400+ millis went down to 180.
  2. Disabled (first made caching configurable per application) XSLT caching [2] during boot time. 2 and a half mins came down to almost nothing. We were having around 20 XSLT's which are used very very rarely
  3. Replaced application-level caching with oracle in memory cache [3]. Mpving caching to DB level reduced the heap size on all the applications in a cluster. The side effect of this means reduced memory footprint meaning reduced GC cycles making the app more responsive than earlier.

Point is that we spent more time at a higher level than the code level. Most of the code level optimizations are performed by the JIT compiler anyway.


[0] [2011-01-08 21:29:26] MPelletier

J has two stock string replacement functions. One is single characters (charsub) where it's just a 1 to 1 swap, the other is replace a string with another (rplc), of varying lengths.

Naturally, if you're replacing strings of varying sizes, you need to resize the destination string.

I don't know how many single char swaps I found that were done with the string one, but when you find it in a loop that's called enough times, the resulting gains are very real.

[0] [2010-03-08 17:36:19] JasCav

There was an application I worked on once that output a large amount of data. Basically, anything it did would be written to various files so it could be analyzed later. Depending on the type of work being done, this application could take days to finish running its calculations. The original developer had used 'endl' to terminate every print statement. I replaced the endl's with \n (a careful find/replace) and saw the performance improve by 15 - 20 percent.

[0] [2010-03-08 17:39:40] tyriker

I matched TCP packet size between two application servers (setting MTU value in the Windows registries). For whatever reason, the network between these two servers had a smaller MSS value than the typical default values, causing fragmentation/reassembly at the TCP level. Matching these two servers to the lowest common denominator between them decreased execution time down to 1/3 the original time for our distributed application.

The network tech could give me no answers so I took matters into my own hands.

[0] [2009-12-12 03:14:42] TheEruditeTroglodyte

Some image processing work I did about 8 or 10 years ago on PPC/AltiVec system when they first came out. Converted an nXm convolution originally written for i386 ported to a mercury mcos system (Very fast PPC/Altivec processors linked together by a highspeed backplane and CCNUMA memory arch). Really sped up just after a simple code port, when taking advantage of their parallel processing libraries it boosted it by about 22x over the non-parallel hand-coded version. Moral of the story - vector processors are nice! Although not as radical of decrease in run-time, I saw substantial savings in using their FFT algorithms as well . . . TheEruditeTroglodyte

[0] [2009-11-19 13:47:17] Aif

Didn't did it in practice, but I had to normalize a matrix on a parallele machine : meaning for each column, divide each value by the average of the column.

Well with Direct Maping Cache, if the matrix is stored by continuous values of rows in memory, you can get a 100% miss. It depends the most inner loop's data storage strategy.

An other "error" is to invert the "i" and "j" in the nested loops (where i is for the lines and j for columns) which makes a very easy optimisation on the code withou rewriting anything (just cut and paste)

[0] [2009-09-18 12:42:11] Jay

I broke one complex query into to two separate relatively small queries, and it improved performance by an order of magnitude, I was surprised.

[-1] [2009-02-14 19:35:52] James Jones

One time I had a JavaScript function run for about 45 seconds in IE. Chrome crunched it between 1-2 seconds.

Oh, that, and going from a Debug Build to Release Build... That was an eye opener.

[-1] [2009-03-23 08:40:03] Quamis

Removed a delay() instruction from an old DOS game made by a friend, to make it work on a 286 system.

[-1] [2009-02-14 00:19:21] user32378

Putting the OutputCache attribute on a WebMethod

The WebMethod was loading Xml files and de-serializing the data to an object graph.

[-1] [2009-02-14 00:39:04] Richard Ev

Doing some one-time processing of an XML file in Perl was taking minutes. Rewrote the routine in C# and it completed in seconds.

Rewriting an app is the smallest change you did to increase performance? - Constantin
It was a tiny app, and a huge performance change. So the code change : performance change ratio was very high. :-) - Richard Ev