Question

What's the biggest performance improvement you've had with the smallest change? For example, I once improved the performance of a certain page on a high-profile web app by a factor of 10, just by moving "where customerID = ?" to a different place inside a complicated SQL statement (before my change it had been selecting all customers in a join, then later selecting out the desired customer).

Answer 1

The most chat-addicted guy in the room took a day off.

Answer 2

In some old code I inherited from a coworker, I replaced string concatenations (+ operator) with StringBuilder (.NET). Execution time went from 10 minutes to 10 seconds.

Answer 3

Changing a lot of logging to check log levels first.

From this:

log.debug("some" + big + "string of" + stuff.toString());

To this:

if (log.isDebugEnabled()) {
    log.debug("some" + big + "string of" + stuff.toString());
}

Made a HUGE impact on production performance. Even though log.debug() only logs when debug logging is enabled anyway, the string is built BEFORE it is passed to log.debug() as a parameter, so there was loads and loads of string building that got completely eliminated in production.

Especially considering that some of our toString() methods produced about 10 lines worth of info, by calling toString() on fields, which call toString() on their fields... and so on.

Answer 4

This is the same answer as I gave here ^[1]:

I was working at Enron UK on a power trading application that had a 2-minute start-up time. This slowness was really annoying the traders using the application, to the point where they were threatening dire retribution if the problem wasn’t fixed. So I decided to explore the issue by using a third-party profiler to look in detail at the start-up performance.

After constructing call graphs and mapping the most expensive procedures, I found a single statement that was occupying no less than 50% of the start-up time! The two grid controls that formed the core of the application’s GUI were referenced by code that marked every other grid column in bold. There was one statement inside a loop that changed the font to bold, and this statement was the culprit. Although the line of code only took milliseconds to run, it was executed over 50,000 times. The original developer had used small volumes of data and hadn’t bothered to check whether the routine was being called redundantly. Over time, as the volume of data grew, the start-up times became slower and slower.

After changing the code so that the grid columns were set to bold only once, the application’s start-up time dropped by nearly a minute and the day was saved. The moral here is that it’s very easy to spend a lot of time tuning the wrong part of your program. It’s better to get significant portions of your application to work correctly and then use a good profiler to look at where the real speed bumps are hiding. Finally, when your whole application is up and running correctly, use the profiler again to discover any remaining performance issues caused by your system integration.

[1] http://stackoverflow.com/questions/193982/what-is-your-best-funniest-annoying-performance-tuning-experience/194209#194209

Answer 5

Add an index on a field of a table used for a complex SQL query. You can sometimes easily improve the performance by 90% or so.

Answer 6

Enabling gzip compression for a dynamic web page. Uncompressed page had more than 100k ... compressed only about 15k. It felt so fast afterwards :-)

Answer 7

Turning off disk compression on a database server. Even accounting for the time taken to slap the sysadm, this was a huge net benefit :-)

Answer 8

A one-character change yielded an infinite speedup:

int done = 0;
while(!done);
{
    doSomething();
    done = areWeDoneYet();
}

Guess what the change was...

Answer 9

Just recently I did a Project Euler problem. I used a Python list to look up already computed values. The program took maybe 25 to 30 minutes to run (I didn't measure it). The lookup has to iterate through all values until it finds a matching one in the list. Then I changed the list to a set which basically does a hash lookup. Now the program runs in 15 seconds. The change was simply to put set() around the list.

Moral: choose the right data structure!

Answer 10

void slow() {
  if( x % 16 ) {
  }
}

void fast() {
  if( x & 15 ) {
  }
}

Converting modulus of powers of two to an equivalent bitwise and operation moved a real-time MPEG-to-JPEG transcoder from producing B&W images to producing full colour JPEGs of a movie, with CPU cycles to spare.

Response to Optimization

To determine if a compiler performs an optimization, test it. People have said to me, "The compiler should optimize that." In theory, yes, it could. In practice, a compiler will only optimize code for scenarios that optimizing code has been written. Some optimizations are not as important than others.

Try It Yourself

For those who insist that the compiler should optimize this, just try it.

$ gcc --version
gcc (Ubuntu 4.3.3-5ubuntu4) 4.3.3

$ cat t.c
#include <stdio.h>

int main( int argc, char **argv ) {
  int i = 0;
  int j = 0;

  for( i = 0; i < 1000; i++ ) {
    for( j = 0; j < 100000; j++ ) {
      int q = j & 15;

      if( q ) {
        printf( "j X 15 = %d\n", q );
      }
    }
  }

  return 0;
}

$ gcc -O3 t.c
$ time ./a.out  > /dev/null

real    0m6.750s
user    0m6.732s
sys     0m0.016s

$ cat t2.c
#include <stdio.h>

int main( int argc, char **argv ) {
  int i = 0;
  int j = 0;

  for( i = 0; i < 1000; i++ ) {
    for( j = 0; j < 100000; j++ ) {
      int q = j % 16;

      if( q ) {
        printf( "j X 16 = %d\n", q );
      }
    }
  }

  return 0;
}

$ gcc -O3 t2.c
$ time ./a.out  > /dev/null

real    0m13.668s
user    0m13.633s
sys     0m0.040s

This usage of & instead of modulus for powers of two cannot be optimized by gcc. Read the comments for details.

See Also

Bit Twiddling Hacks ^[1]

[1] http://graphics.stanford.edu/~seander/bithacks.html

Answer 11

Turned off ODBC logging on a production database (someone had turned it on and forgotten it) - got about a 1000x performance improvement!

Answer 12

Truncate table BigTable.

Queries returned no records but it was faaaaaast!

Answer 13

When maintaining someone else's code, I encountered a stored procedure that was taking approximately 4-5 seconds to run and producing a result with only a few rows. After examining the query in the stored procedure and the table that the query was running against, there was a distinct lack of indexes on the table. Adding just a single index improved that stored procedure from 4-5 seconds to about 0.2 seconds! Since this query was being run many times, it was a big improvement overall!

Answer 14

I was writing a Java MergeSort, just to experiment and see how much of my old Data Structures course I could still put into practice. My first time around I implemented my merge routine with ArrayLists, and set it to sort all the words in War and Peace. It took five minutes.

The second time I changed from using the Collection classes to simple arrays. Suddenly the time to sort over 500K words dropped to less than two seconds.

This hammered home to me just how expensive object instantiation can be, especially when you're creating a lot of objects. Now when I'm troubleshooting for performance, one of the first things I check for is whether objects are being instantiated within a loop. It's much cheaper to reinitialize an existing object than it is to create a new one.

Answer 15

Removing some rogue sleep()'s in some Java code.

Answer 16

When updating WinForms controls realtime, simply doing something like

if (newValue != txtValue.Text)
   txtValue.Text = newValue;

instead of always doing

txtValue.Text = newValue;

took the CPU utilization from 40% down to almost nothing.

Answer 17

Removed a html tag from a web application, gained 100% performance increase.

At some point I noticed that requests were duplicated. It took me some time to figure out it was caused by an empty image tag lost in sh*tload of HTML;

<img src="" />

For obvious reasons, Django's template system don't throw errors when a variable does not exists, so we didn't notice anything unusual when we inadvertently removed a template variable, which happened to contain an image src (for a small icon).

Removed the tag, the application loaded twice as fast.

Answer 18

Adding two indexes to a table speeded up a stored procedure from 12.5 hours to 5 minutes.
Moving a straight data copy operation from SQL's DTS to just a "insert into ... select from" statement reduced copy time from an hour to 4 minutes.

A more common example, however, was when a colleague had used sub-selects on SQL to get certain values from a child table. Worked fine on small datasets, but when the main table grew, the query would take minutes. Replacing the sub-selects with a join on a derived table made the whole thing much, much faster.

Essentially;

SELECT Name, 
       (count(*) from absences a where a.perid = person.perid) as Absencecount
FROM Person

is very bad, as SQL will have to do a new select statement for each row in Person. There are different ways of making the above more efficient but using a derived table can be a very efficient way.

SELECT Name, Absensecount  
FROM Person left join  
   (select perid, count(*) as Absencecount from absences group by perid) as a  
ON a.perid = person.perid

The problem with SQL is that it is very easy to write very bad SQL. SQL Server is so good at optimising stuff that most of the time you don't even realise you are writing bad code until it doesn't scale well. One of the golden rules that I always look for is; "Is my inner query referencing anything in the outer query"? If the answer is yes then you have a non-scaling query.

Answer 19

Using a connection pool. Who would have guessed that something that is known to makes things faster actually does make things faster?

Answer 20

Go from single-core to quad-core.

(Hey, you didn't strictly say programming related!)

Answer 21

Letting go that developer who fondly and erroneously believed that demonstrating how clever you are is the same thing as getting work done.

Sometimes to improve the code -- improve the team.

Answer 22

Replacing a "MUL" with a "SHL"/"ADD" series in some x86 graphical code also resulted in about an order of magnitude improvement.

Answer 23

One project I worked on had a very long build time - over half an hour for a full rebuild. After a bit of investigation I traced it down to the precompiled header settings. I then wrote a small app to scan all the source files and reduce the header file dependencies and correctly set up the precompiled headers. Afterwards, full rebuild time was less than a minute.

Skizz

Answer 24

Changing log4net logger level from "DEBUG" to "INFO".

Answer 25

After profiling showed that a large amount of time as being spent in std::map<>::find(), I looked at the key space and found that it was pretty much contiguous and uniform. I replaced the map with a simple array, which reduced the time required by about 80%.

Choosing appropriate data structures and algorithms is the best first step to improving performance.

Answer 26

Changed a SQL query from a cursor to a set based solution.

Answer 27

Switch from the VS compiler to the Intel Compiler for some numeric routines. We saw a 60% speedup just by recompiling and adding a few flags. Utilizing OpenMP on the routine's for loops yielded a similarly large speedup.

Answer 28

Indexed a database. Imagine driving a Daewoo Matiz that suddenly morphs into a Lamborghini.

Answer 29

My biggest performance improvement was gzipping a 700 Kb XML file downloaded by thousands of clients a day and then caching the gzipped output in memory, dropped bandwidth usage somewhat but more importantly dropped server load from about 0.7 to 0.00.

Answer 30

Changed a stored proc from this:

@numberParam varchar(16)
...
SELECT ...
FROM ...
WHERE id = CAST(@numberParam as int)

to this:

@numberParam int
...
SELECT ...
FROM ...
WHERE id = @numberParam

Hello indexes!

Answer 31

In log4j on a server-side app, changing something like this:

log.debug("Stuff" + variable1 + " more stuff " + variable2);

to this:

if(log.isDebugEnabled())
    log.debug("Stuff" + variable1 + " more stuff " + variable2);

Gave us a 30% boost.

Answer 32

debug="true" to debug="false" in an ASP.NET web.config file

Answer 33

I strongly urge everybody else to do as I do: Do the improvement and forget about it the second you did it. Otherwise you will do premature optimizations in a subsequent project. ;) Always consult a profile before doing anything (e.g. the "always use stingbuilder" notion is usually not necessary - if not hurtful). Us the best readable thing. and worry about performance within one tier later on. Make it readable and correct (in that order) and then, maybe, make it faster.

Answer 34

<%@ OutputCache Duration="3600" VaryByParam="none" %>

Answer 35

A few projects back we were just short of reaching performance targets. I ran the profiler and found that sqrt() was occupying 42% of our frame time! I'm not sure why it was so slow on this hardware (Nintendo Wii), and it was only called a few hundred times per frame but wow.

I replaced it with a 3-iteration sqrt estimator and got almost all of that 42% back! (The estimation was "guess at a reasonable value of the sqrt, then refine by choosing the midpoint between that estimate and the result of dividing the estimate into the initial value." Picking a good initial guess was important, too.)

Answer 36

I recently rewrote a SQL query (for removing duplicates from a table), bringing the runtime down from still-not-finished after 47 hours to 30 seconds.

The trick: realising that it was an upgrade script, and I didn't need to worry about concurrency, since the database was in single-user mode. Thus, instead of removing duplicates from the table, I could just SELECT DISTINCT into a temporary table, TRUNCATE the first one and then move the rows back.

Answer 37

I swapped around the order of a selection criteria for a database query once and the runtime went from a 6 or so hours to a few seconds! The customer was pretty happy!!

Answer 38

On a 68000 ^[1], some years ago, in this C code:

 struct {
 ...
} A[1000];

...

int i;
for (i = 0; i < 1000; i++){
   ... A[i] ...
}

One very small change caused a 3-times speedup. What was it?

Hint: sampling the call stack a few times showed the program counter in the integer-multiply-subroutine being called in the code from A[i].

[1] http://en.wikipedia.org/wiki/Motorola%5F68000

Answer 39

I inherited a time tracking application that was written in VB 3 and used an Access database. It was the first VB application written by a very experienced COBOL programmer. Rather than using SQL and letting the database engine get the data he wanted efficiently he opened the table and went from record to record testing each one to find the one he wanted. This worked okay for a while, but when the table grew to 300,000 records it got a "little slow". Looking for a single programmers time entries would take about 5 minutes. I replaced his code with a really simple SQL statement and the same search went down to about 10 seconds. The original programmer thought I was a god.

Answer 40

I discovered code that build a string with an IN statement that was inserted in a WHERE clause of another SQL statement. Creating the string with the IN statement took about 15-20 seconds. The IN statement consisted of thousands of ids: the IN statement was splitted into several IN statements because Firebird can only take 1500 elements in one IN statement.

I removed the code and moved the SQL to get the ids to build the IN statement directly into the WHERE clause of the other statement. The size of that statement went down from more than 70.000 characters to only 1500 or so.

My main query was faster and I lost the time to build that IN statement.

Before:

SELECT id FROM TABLE_A A 
join TABLE_B B on B.A_ID = A.ID 
where B.ID IN (1, 2, 4, 5, ...1496 more) AND 
B.ID IN (2012, 2121, 2122, 2124,  ...1496 more) AND so on...

After:

SELECT id FROM TABLE_A A 
join TABLE_B B on B.A_ID = A.ID 
where B.FOO = 2

Answer 41

I knew a guy who was running some electron accelerator-related simulations that he wrote in C under Linux that took about a hour to complete on a Pentium-120 (it was a long time ago), during which he took lunch. I (mis)advised him to put gcc -O2 option in his Makefile, after which the program started taking several seconds and his nice excuse for lunch break was gone :) The secret was that the program had lots of nested loops in it, and most calculations were done in the innermost loop while for most of them it wasn't really necessary. gcc -O2 turned out to be smart enough to move these calculations outside of the loops, causing the unbelievable performance boost.

Answer 42

I removed a Cartesian join from a query and the nightly job went from hours to seconds. Was tested in QA, but no one ever questioned that the job wasn't suppose to take hours to complete so they passed it.

Same company, with some simple re-indexing I took time sensitive nightly batch processing that was taking 6-8 hours to complete and got it completed in 1-2 hours.

Just yesterday I had a client add a few indexes to a few tables and reduced the run time of a procedure from 8 minutes to 6 minute. Not great mind you, but the tables are very large, and the procedure runs every 10 minutes. So over the corse of a day I saved the SQL Server 2 hours of processing.

Answer 43

The best performance improvement I've ever seen is my performance when I turned off twitter :)

Answer 44

Changed

for( int n = 0; n<things.getSize(); ++n ){ ... }

to

int count = things.getSize();
for( int n = 0; n<count; ++n ){ ... }

Saved about 11% in the rendering loop. (count was around 50000)

Answer 45

I once wrote a sequence of SQL queries which worked on huge amount of records. The performance was really poor taking 4 minutes to execute. Then i wrapped it with Begin Transaction and End Transaction. It spit the result in 5 secs.

Answer 46

Maybe the best and shorten improvements ever, starts with:

ALTER TABLE foo ADD INDEX bar

:-)

Answer 47

Database Indexes. We had an application that was using lookup tables fairly heavily, but there were no indexes on any of the appropriate columns. A coworker ^[1] and I did two things:

Added indexes to all the id columns on the lookup tables
Switched the ORM for our heavier queries to do find_by_sql

Those two changes netted us a roughly 50% speed increase in database access, and made the application noticeably faster. It just goes to show you that you can't disregard good database design because you've got an ORM handling most of the work for you.

[1] http://stackoverflow.com/users/117608/christopher-foy

Answer 48

Moved a function call outside of nested loops. I think that was about 10x improvement. Changed the switch from application server to database from 100Mbps to 1Gbps, this improved performance during high traffic.

Answer 49

In an ASP.NET application there was a page which displayed a lot of records (order of 1000s) from a SQL database query.

Originally the app was storing results in a DataSet before sending the results to client. This was causing users to have to wait a long time to get the results, as well as causing scalability problems because the server was storing the entire result set in memory (DataSet) before returning it to the client. A long wait would also cause users to constantly hit refresh, worsening the problem.

I removed the DataSet and had the code stream out the query results using Response.Write, and this greatly improved the scalability of the server and the perceived performance from the user's perspective (since they were getting results streamed to them immediately).

Answer 50

In C, I was writing a subroutine to slurp an entire file into one variable (bad practice, wastes a lot of memory, but it's the best solution and I only do it to one file). It used malloc() to create a 100 char array and realloc() to resize the array dynamically whenever it got full. I tested it on a 118448-byte file, and it took ten seconds to read it. I tried making it a 200 char array and increasing the size by 200 bytes, and it still took 10 seconds. Then I smacked myself and changed this:

if(size == strlen(string)) {

to this:

if(size == counter) { // counter is the index of the last char in the string

It now reads and processes the same file almost instantaneously.

EDIT: Fixed typo.

Answer 51

i had a program that was address checking tens of millions of addresses. it could do a few hundred per second but it still took the program about 4 days to finish each run. the problem was that it was doing one address at a time.

we made the program multi-threaded (didn't take much work at all) and had it use 5 threads.

the program went from taking a few days to complete to a few hours.

note: we were making calls to another program that would do the address check

Answer 52

Once an application was having a TERRIBLE performance, it took about 15 secs to display a simple aspx with no complex logic. Three developers were tuning SQL statements, business logic and even the HTML in the page. I checked it out and resolved the issue by changing this attribute in main web.config:

debug="true" to debug="false"

Am I a genius? Hahaha, I'm really not!

Answer 53

When writing a solver for a game, adding very simple and limited dead-end recognition to prune the search tree brought down solving time for a big level from 15 minutes to near instantaneous.

Answer 54

Rewriting a join:

First:

Select from a
  left join b on a.idb = b.id
  left join c on a.idc = c.id
  left join d on d.id = a.idb or d.id = a.idc

After:

Select from a
  left join b on a.idb = b.id
  left join c on a.idc = c.id
  left join d on 
    (case 
       when a.idb is not null then b.id
       when a.idc is not null then c.id
       else null
    end) = d.id

The query went from 3+ minutes to 8s after some more tweaking it eventually came down to about a second which was acceptable for this one.

Answer 55

I changed a VB6 function that was concatenating hundreds of strings together to output a tree control in the early days of ASP. It called a function that looked like:

mystring = mystring + param1 + param2 + param3 + param4

Adding a single set of parentheses to change the order of concatenation:

mystring = mystring + (param1 + param2 + param3 + param4)

optimized the time the page took to load by over 99%. Went from over 2 minutes to under 1 second.

Answer 56

In the early days, I had some code that grabbed hundreds of rows out of a SQL database table based on a where clause. The whole purpose of this code was to get the number of rows returned.

After learning that I can get the number of rows from a given query with the COUNT(*) statement in SQL, I drastically improved performance of that page.

Answer 57

Replacing a frequently hit division by 4 to a bit shift operation.

Answer 58

While programming in CUDA ^[1] for GPUS you must provide the correct number of threads to be launched. The program was launching with the incorrect number of threads, so it was running in serial. While chaning the line:

kernel <<< numberOfThreads >>> ()

to

kernel<<< numberOfThreads, numberOfThreads>>>()

the program ran ~ 500 times faster

[1] http://www.nvidia.com/cuda

Answer 59

Took a web page load from 3 minutes to 3 seconds by indexing the primary search term. Problem was the table had 1,000,000+ rows. Their "developer" just couldn't make it go any faster and had them purchase a new Quad Server 8G RAM machine.

Answer 60

Installed profiler on the application server. It makes the plumbing work much more fun.

Answer 61

Switched from PHP to Python for pet projects.

Answer 62

Set NOCOUNT on a complex cursor based stored procedure.

It was returning a row count of 1 a few million times even though the application had no need to know it.

The gain was purely in network I/O.

Answer 63

Updating the database statistics on Oracle 9 using DBMS_STATS.GATHER_DATABASE_STATS reduced the runtime of a (rather simple) query from around 12 minutes to 200 ms. Oracle 9 decided that multiple full table scans were a better approach than using the index because the statistics were broken.

Answer 64

A long time ago, I removed an index, and sped up my query by a factor of at least 300. I never did figure out why Oracle 7 figured it needed to do a full Cartesian join if it had the index, and not if it didn't.

Answer 65

Until recently we had an intern who had a special method of optimization. He put together a sql statement that took over 20 minutes to run and had to be called quite often. He became aware that the sql statement would finish real fast when he put a LIMIT 1 at the end. I think I destroyed his faith in humanity when I told him that this will not return the results he needs.

Answer 66

Call .Dispose for objects implementing IDisposable. There's a reason why those objects are implementing IDisposable, ya know!

The application (inherited from a former employee) went from needing a restart every day to running like a champ nonstop for the next 2 years.

Answer 67

Removed the ORDER BY clauses from our SQL statements and moved sorting code to the Objects. This gives you a clean consistent query plan and moves the sorting work from the database to the clients (or web servers) where it's distributed.

Answer 68

used

if( string.Compare( prevValue, nextValue, StringComparison.Ordinal ) != 0 )

Instead of

 if( prevValue == nextValue )

Answer 69

Used some internal caching for a heavily used http module and the performance improved by a big factor.

Answer 70

Reduce remote calls such as database or web service calls. In most applications this is what produces most if not all of the latency, because it usually involves trips over the network.

Answer 71

Changing a SQL query against several million rows so that instead of

WHERE dbo.fn_TrimDate(ActionDate) = @Today

I had

WHERE ActionDate BETWEEN @Today AND (@Today + 1)

fn_TrimDate being an ugly function that ripped off the time part of a datetime field.

The query went from an average of 0.5 secs to being almost instananeous.

Answer 72

Recently while writing a Java application which reads REST responses our team were using DOM based XML parsers, mainly because selecting things out by XPath is nice and easy to code. Bad move!

We switched parsing and serialisation over to event-based XML classes (in our case StAX ^[1]). It vastly improved the memory footprint of the application, which has a massive impact on scalability and sped up the processing by at least an order of magnitude.

[1] http://en.wikipedia.org/wiki/StAX

Answer 73

The line

new Regex(pattern, RegexOptions.IgnoreCase);

was changed to:

new Regex(pattern);

It improved performance by about 1400% as case sensitivity wasn't required.

Answer 74

Spent less time on Stack Overflow for a day.

Answer 75

I doubled the performance of an in memory matrix calculation by storing the matrix row-wise instead of column-wise. This improved cache locality.

Answer 76

Changing getPixel value from Bitmap object (.NET) to direct unsafe bit manpulation. The performance caused the method go from 4 minutes to 1 second.

Answer 77

Can't remember the exact code but we changed this:

int readSize = 1024;
result = fread(buffer, readSize, 1, file);

to

int readSize = 1024*1024;
result = fread(buffer, readSize, 1, file);

Never underestimate how slow I/O is.

Answer 78

Linq to SQL don't cache, so use ToList() when you are enumerating a IQuerable<> multiple times.

var db = MyDataContext();
var query = db.Where(a => a.lot.quering == a);
doThingWithDataManyTimes(query);
doThingWithDataEvenMoreManyTimes(query);

to

var db = MyDataContext();
var query = db.Where(a => a.lot.quering == a).ToList();
doThingWithDataManyTimes(query);
doThingWithDataEvenMoreManyTimes(query);

this reduced the time for a regression calculation and graph generation over the data from ~20sec to <1sek

Answer 79

I once made a C program twice as fast by changing the array size to be a power of 2 and thereby avoid integer multiplication. In the center of my simulation code I had a 2d array named stored on the heap, Here are two ways to index into it:

#define worldState(x,y) (*(world + (y) * worldYSize + (x)))    
#define worldState(x, y) (*(world + ((y) << worldYSizeBits) + (x)))

On the 1995-era Sparc I was running this code on integer multiplication took 33 clock cycles; one cycle per bit in the word. The bit shift took 1 cycle. And by far the main thing my code was doing was fetching states out of the world, so I saved 50% of my runtime by constraining my code to only work on world sizes that were powers of 2.

I found it with a profiler; fortunately the multiplication showed up as a call to the function _imul() which the gcc runtime was providing. Compiling with -O would hide that, btw, but at the time the profiler didn't work with optimized code.

Answer 80

Well the server was taking forever to load, so long that it was timing out!

I plugged in the ethernet cable and everything loaded instantly. It was beautiful.

(Happened in a Network Administration class I was taking, teacher moved the server but forgot to plug everything back in.)

Answer 81

Adding a few SQL Server nolock directives for static tables (prices that were updated once an year).

Answer 82

In a tight loop, i replaced the return value of a function from IEnumerable<int> to int[] and worked with for instead of foreach. This reduced garbage collection to a minimum and increased performance by factor 10.

Answer 83

I was porting a professor's C code for a Travelling Salesman genetic algorithm to Java. The majority of the work was moving from procedural to OO.

We were carrying about 1000 trial solutions, and killing off 100 each generation.

Each solution was simply an object which contained an array of nodes to visit (in order) and a couple of methods to get costs and manipulate the crossover.

First (successful) run took 8 hours -- I blew the memory a few times first.

Instead of de-referencing the objects, I stuck them in a pool for reuse and performance increased to about 5 minutes. after a few generations the Garbage collector was running constantly and I was spending more cycles cleaning up the mess than I was processing the data.

Answer 84

Switched from using Linq to some older style array looping. :) cut the processing time on a particularly lengthy method nearly in half. (from 940ms to 501ms).

Answer 85

Switched from an OR construct to an IN construct in MySQL - over a 10x speed improvement!

Answer 86

An old application started going haywire on submissions of new data when we moved to a new SQL Server silo. It went from 1-2 seconds to several minutes. Obviously something changed on the SQL/network side but after 3 days we weren't able to identify it.

Upon examining the code we noticed that it had a random identifier based on the time (goofy design - not mine - SQL Identity or GUID work fine for me), only it was seeding to the millisecond. So the code only had 100 different seeds meaning it would likely hit the same pattern of randoms and cycle through until it found the next available one.

We seeded to the current time (instead of millisecond) and boom, 1-second submissions.

On a side note, our development environment had the same SQL/network problem but it went unnoticed because the Web server (a VM) was so slow that the random identifier algorithm (20 random characters based on current millisecond) produced an identifier built from several different random seeds whereas prod built from a single random seed. A glorious bug that was kind of fun to uncover / resolve.

Answer 87

Dynamic Programming ^[1], Sometimes It's amazing how the use of a simple look-up table of some values, in a recursive function, can help. for a small example check this Fibonacci in C++ ^[2].

[1] http://en.wikipedia.org/wiki/Dynamic%5Fprogramming
[2] http://talkbinary.com/programming/c/special-functions/fibonacci-in-c/

Answer 88

Dropping Java's array clone method and using other methods instead. It turns out cloning is very resource consuming and it should be used only when definitely necessary.

It dramatically improved my Java code's performance.

Answer 89

Once upon a time I've added /SSE2 option to my Visual C++ project and got +10% performance.

Answer 90

Instead of doing all of the lookups against the database in our web app, the lookup information is pulled into a HashTable in memory and kept for an hour:

HttpContext.Current.Cache.Insert(Name, htData, Nothing, DateTime.Now.AddHours(1),
                                 System.Web.Caching.Cache.NoSlidingExpiration)

We really don't need anything fresh to the minute, and looking the info up from the DB once an hour (instead of 10 times a second) improved performance trememdously.

Answer 91

Using Sqlceresultset instead of Insert Query.It boosts performance in pocket pc applications especially when u deal with Bulk inserts.

Using datareader instead of datatable as datagrid datasource if u deal with above than 1000 records resultset.

Partition in oracle database.It improves %25.

Using string.empty instead of "" if u want to check a variable's "" value.

Answer 92

I just removed the try catch block and put the if condition check so that it wont throw the exception. That code block was executing more than 10K times to deserialize the data and kind of expected to be throwing exception and my previous developer just left that code unremoved. when i had to look for improving performance of loading the serialized file, i did this small tweak and improved a lot from kind of 36 secs to 3 secs.

Note : This would have been mentioned in either of the provided answer, but as I could not read all the answers and confirm myself whether it is already present, I am typing this answer. Sorry about if this is the repeated answer.

Answer 93

Changing this :

WHERE dbo.fn_TrimDate(DateTimeField) = dbo.fn_TrimDate(GetDate())

into this :

DECLARE @StartToday datetime
SELECT @StartDay = dbo.fn_TrimDate(GetDate())
...
WHERE DateTimeField BETWEEN @StartDay AND @StartDay + 1

Answer 94

When I put an SSD in my laptop

Answer 95

Application spiked the CPU on the SQL Server to 100% at 8am as each time zone logged in for the first time. The server had 128GB of RAM and maxed out # of CPUs. New DBA, and by "new DBA" I mean the first DBA they ever hired in their 6 years of operation, found a query with an LTRIM() on a numeric column that was the join between two tables.

Removed the LTRIM and the CPU basically flat-lined.

Answer 96

In some C# code I replaced some reflection to dynamically get property values with dynamically compiled lambdas and got about 100-1000x speed increase!

Answer 97

Created 3 indices in the database. The net performance went up about 25-fold.

Answer 98

This code was getting called 3000 times in a loop and was causing CPU to go to 100% and even after processing was complete,CPU utilization didnot return to normal.

public static void WriteToEventLog(string strMessage,SqlInt16 EntryType)
{
    EventLog log = new EventLog();

        try
        {
            string EventSource = "EDiscLog";
        }
        catch (Exception ex)
        {
            log.WriteEntry(ex.Message, EventLogEntryType.Error);
        }        
}

I changed to

public static void WriteToEventLog(string strMessage,SqlInt16 EntryType)
{
    using(EventLog log = new EventLog())
    {
        try
        {
            string EventSource = "EDiscLog";
        }
        catch (Exception ex)
        {
            log.WriteEntry(ex.Message, EventLogEntryType.Error);
        } 
    }       
}

And then CPU utilization went maximum to 21%.

Answer 99

does this count?

switch from IE to Firefox -> chrome

Answer 100

When i joined a project mid-way where already around 80% of coding had been done, i was given the task of looking into the project for any optimization possible. The first thing that i came across was the habit where the objects were not disposed after their use. So I just introduced the following in the finally block

//Declare some object
MyClass ob1;

try
{
    //instantiate the object 
    ob1 = new MyClass();

    //Perform operations
....
....
}
catch
{
    //perform some operation
....
....    
}
Finally
{
    ob1 = null;
}

And it worked wonders and the application now was working 30% faster.

Answer 101

I once came accross this gem:

for (int i = 0; i < count; i++)
{
    var result = dosomething();
    useresult(result,i);
}

Of course, dosomething() would always return the same result at every iteration. Moving it out of the loop helped!

Answer 102

Changing two tokens improved my toy vector performance from O(n^2) to amortized O(n) when inserting n elements.

Slow: new_capacity = old_capacity + 10

Fast: new_capacity = old_capacity * 1.5

Answer 103

I recently did such an improvement in a Qt4 project.

Loading 5k lines (or 50k fields, tabular data) from a text file into a QStandardItemModel object took ~5-6 sec. Now it takes ~0,5 sec.

The problem was that the model was attached on a view object.
The solution was to detach the model, load the data and then attach the model again.

I added 2 simple lines of code and I speed it up by 10x.

Perhaps there is a proper Qt way for that (like preparing the view for massive updates) but I didn't have the time to discover it and my quick n dirty hack worked great.

Answer 104

Passed by reference instead of value. A huge structure containing image data.

Answer 105

We had a huge multi-project Maven1 build structure that was just insane, over 200 project modules. Due to inter-dependencies, it was not even possible to do a full automated build- modules had to be "released" to the CM group manually, a process which sometimes took 2 days.

The first optimization was to convert from Maven1 to Ant ^[1]+ Ivy ^[2]. This allowed automated builds, taking about 90 minutes for a full release.

The second optimization was to stop doing "scp artifact.jar remote-server:repository" manually for each artifact. I replaced that with a single call to rsync the whole structure up to the repository, which brought the whole build down to 5 minutes. And a totally automated 5 minutes at that. :-)

EDIT: After re-reading the question, I guess this doesn't really count as a "smallest change", but I'll leave it here and risk the down-voting.

[1] http://ant.apache.org/
[2] http://ant.apache.org/ivy/

Answer 106

Added buffering to a FileOutputStream that was being written out 1 byte at a time. Took that step of processing down to 4 min from about 1.5 hours. Big difference considering this was for a security-sensitive app where an operator has to be present in the secure room for the duration of the step.

Answer 107

There was this stored procedure (Sybase T-SQL) used for reporting purposes, that used temporary table, that had basically this structure:

CREATE TABLE #temptable (
   position int not null
)

It was joined with other tables (integer to integer join) on the position field, but there were couple of tables, that had the position value declared as char field, this caused the index not to be used, so, the solution was to modify the #temptable structure to:

CREATE TABLE #temptable (
   position      int      not null,
   position_char char(12)     null  
)

And just after it was filled in, do the update:

UPDATE #temptable SET position_char = convert(char(12), position)

So, the joins are made without converting the values, plus having extended the index on this table to the additional field made things go much faster.

Answer 108

Rewriting a job with cursors on TSql to work with helper tables, a long ago, so, I don't have the code, but it improves from 2 hours to ten seconds

Answer 109

Have apache and not tomcat serve static resources
Use gzip compression
Minify, compress and stick together multiple .js files
Minify, compress and stick together multiple .css files
Add caching to resources

Loading of a web page went from 30s to 4s (first time) and to 0.5s (cached).

Answer 110

Removed VIEW STATE from an ASP.NET page. Page went from 800KB per request to about 10KB per request. That view state can be evil.

Answer 111

Some time ago I had a column in an Oracle database, which had a value when the column had been processed, and was null when not. The table had several hundred thousand items.

Of course there was an index on this column. But Oracle does (at least did in version 8) not store null values in to the index.

So a query like this

select * from VeryHugeTable where ProcessingId is null

took hours, although it only returned a few records.

We changed the null value to an arbitrary negative number:

select * from VeryHugeTable where ProcessingId = -9

I can't remember how fast it was, but it was incredible, a few minutes if not even faster.

Answer 112

It was a simple network simulator done as a homework assignment (in C#) and meant to run only once. However, when it ran it did so so slowly it would have taken over 24 hours to finish.

A rather quick glance at the code discovered that every simulation step recalculated the average of elements of a list. That list also grew at each step, thus landing a nice O(n^2) complexity. I changed the calculation by keeping the last average and using it to calculate the new, resulting in an O(n) complexity.

The total time decreased from an expected over 24H to about 15 minutes, about two orders of magintude.

Answer 113

Once I didn't change an application, but just "waved the wand" and the speed increased ten times! I ran CPAN ^[1] update to upgrade to the newest versions of the Perl unofficial modules. This increased my speed due to a bugfix in one of the application-critical modules.

[1] http://en.wikipedia.org/wiki/CPAN

Answer 114

I was working with a very long DB2 query. It ran in the Test environment in 30 seconds, but in Production we had to cut it off after running all weekend due to the massive amount of data.

The query was optimized to death and could not be made faster by structure alone.

So we added this to one of its subjugate WHERE clauses:

and (1=1 or 1=1 or 1=1 or 1=1 or 1=1 or 1=1 or 1=1 or 1=1 or 1=1[...])

Doing so caused the DB2 parser to add a couple additional SORTs to the execution path and ended up making it run in two hours.

Answer 115

I took an old process that built a bunch of static HTML pages serially and mult-threaded it. Went from about 4 hours for 10,000-ish pages to about 30 minutes. Saved us from buying another server too. The change was basically to call the same getPage() function the same amount of times, but called it as a ThreadStart delegate.

I also had an instance where someone typo'd the mysql InnoDB memory setting to 01GB intstead of 10GB. Fixing that made a large difference (though admittedly it wasn't code).

Answer 116

I reduced processing time on a CDR pre-processing filter from 30 minutes down to 4 seconds by replacing a split in perl with a regex on the fields I wanted -- and excluding all the trailing fields, which represented about 75% of each line.

So, instead of:

@array = split $line, ",";

I had:

($field1, $field2, ... $field8) = $line =~ /^(?:[^,]*,){5}([^,]*),(?: etc)/;

Answer 117

Changed a TSQL cursor to a set based query. Same result in seconds not minutes. Bonus from the boss that week :).

Answer 118

In a C# asp.net app, I moved some code that instantiated some xmlserializers to the global.asax application_start method.

There were 10 of these, and it dropped page load times by over 15 seconds each.

Answer 119

I was asked to troubleshoot an application which in production was completely pegging the CPU on the database server (SQL Server). After running a trace, it was evident that the table designer wasn't aware of something called a primary key (or any other indexes for that matter). I added the key live. All of the sudden, the clouds parted and the CPU % went down to reasonable levels for the amount of traffic.

Answer 120

Updating the stats on a MS SQL Server ^[1] database gave a 90x performance increase on certain queries, i.e. 90 minutes to 1 minute.

[1] http://en.wikipedia.org/wiki/Microsoft%5FSQL%5FServer

Answer 121

I refactored a SQL query that was running as a batch job. It had several functions in it that were horribly inefficient, and the query itself was poorly written.

After spending a few days rewriting it, the run time went from 13.5 hours to 1.5 hours. I have still not been able to beat that efficiency increase to this day.

Answer 122

Cache locality

EDIT:

Harsh guys...

I switched out an object graph for a linear memory representation where cache misses basically went away. With perfetching and some C++ template tricks I could define a nicely laid out memory representation which the CPU would crunch in no time at all.

This optimization wasn't really that much work but it signifies how horrible poor memory access patterns can be and God forbid, reference types...

Answer 123

Learn to cache bitmap objects in .NET. The bitmaps were generated on the fly but many could be reused instead of regenerated. From an unusable app went to a pretty performant one.

Answer 124

Turned off automatic row/column resizing on a DataGridView. Due to the way our app was written by another developer, the cell formatting would cause some checkbox column's value to be repopulated, causing the entire grid to recalculate it's size every time that column was painted. Clicking a button to add a row to the table took an exponential amount of time. Around 12 seconds to add a row by the time it got to the fourth row.

I turned the AutoRowSize off for the grid, and everything was almost instantaneous, as it should be.

Answer 125

In a game application, I had an immutable class representing a cell in the game area's grid. It had getter methods which calculated the corners of the cell lazily, which included allocating new objects to represent the coordinates. The profiler showed those getters to be the bottleneck in the AI algorithms. Calculating them eagerly in the class's constructor improved the performance very much (I don't remember the exact numbers, maybe more than doubled the speed).

Before the code was like this:

public Point[] allPoints() {
    return new Point[]{center(), topRight(), topLeft(), bottomLeft(), bottomRight()};
}

public Point center() {
    return new Point(x + inner(width) / 2, y + inner(height) / 2);
}

public Point topLeft() {
    return new Point(x, y);
}

public Point topRight() {
    return new Point(x + inner(width), y);
}
...

The allPoints() method was the bottleneck. And after optimizing, the creation of all those values was moved to the constructor and stored as instance variables, after which all the getters were trivial.

It's always best to first do the simplest thing that could possibly work ^[1], and change it to something more complex only when there is evidence that the simplest thing is not good enough.

[1] http://www.extremeprogramming.org/rules/simple.html

Answer 126

The best thing I ever did was learn NHibernate and incorporate it into all my projects. My SQL is now always properly formed, and I don't have bottlenecks from that end of the project.

--And properly indexed tables that perform a lot of lookups!

Answer 127

These tips can each make a huge difference:

Added the NOLOCK SQL hint to massively complex SQL.
Removed Order By's from nested subqueries within SQL.
Refactored SQL to avoid the need for the DISTINCT hint.

Answer 128

Improved the time it took to run Spring JUnit tests under Maven 1.1 by adding the following to the project.properties:

maven.junit.forkmode=once

This was a huge improvement because most of the tests were leveraging the SpringJUnit4ClassRunner, and by setting forkmode to once, the Spring context was only loaded once per Maven invocation instead of once per unit test invocation.

Answer 129

1) switching from in-house application with Expat parser, to XSLT and generic Sablotron, 100-fold improvement in speed and memory consumption

2) hacking Python code by calling directly objects properties, rather than setters/getters. 10-fold improvement in speed (although decreased code readability)

Answer 130

I added an index to a column in MySQL. This should have been there from the beginning, but everyone else overlooked it. I did some simple query explains, and found it. The main page of the site started loading 33% faster. Pretty nice for a quick index.

Answer 131

The biggest gains will come from using the most appropriate datastructure. For example, I've seen huge improvement gains in C++ when switching an improper use of map<> to hash_map<>. The code was doing random lookups, and map<> is O(N), where hash_map<> lookups are O(1). The speedup was immediate and made the code many many times faster.

Answer 132

I wrote some code in work which was used to process large log files. It had to read each entry and match certain parts of it to previous entries. As you can imagine, the more entries were read, the more had to be searched to perform these matches. After quite a while of pulling my hair out, I realized I was able to make some assumptions on the entries which allowed me to store them in a hash table instead of a list. Now instead of needing to search each previous entry every time a new entry was read, it could simply do a hash table lookup.

Performance obviously jumped quite a bit. I believe for a particular log file, the list approach took about an hour and a half to process, while the hash table version took about 30 seconds.

Answer 133

Turning off Compiled flag for RegexOptions on Vista 64-bit.

Due to some strange bug with the .NET 2.0 Framework, Regex parsing is two orders of magnitude slower if the flag is turned on!

Answer 134

Usually when I fine tune an app, I find using stringbuilder for any heavy string work gives a huge performance boost.

Answer 135

Changing a bit of VB.NET code from looping and running the same SQL statement roughly 50 times and inserting a record each time to one SQL insert statement. 30 secs to 2 seconds.

The previous developer didn't seem to understand SQL (or much of anything), still it got me to a good start on the job.

Answer 136

Heeding the top-level question, probably the biggest improvement I've had for the smallest change would be to correctly size the settings of a MySQL server for the hardware it was on. The defaults for MySQL - even the 'huge' ones - are extremely conservative. In particular, several of the memory parameters (e.g. sort_buffer) can be increased a thousand times and this will give a significant boost of performance. And table_cache is often way too low. I've had it up at 1500 on some servers.

Answer 137

Converting some Oracle Pro*C code to built-in PLSQL - yes, you read right convert a C function to PLSQL.

The issue wasn't so much to do with the C code itself, I'm sure that runs fast. The problem is that the abstraction between Oracle and Pro*C is super slow. So, converting the one function sped the rest of it up by about 100 times.

I should add that some Oracle SQL code was calling this external Pro*C code repeatedly. So bringing it into PLSQL meant less call overhead and faster execution.

Answer 138

C# - I used Generic.List instead of ArrayList while migrating from database to database. It saved 6 minutes and a lot of unnecessary reboots.

Answer 139

After wondering why a window took so long to show up, I once removed the following line from a colleague's code

Thread.Sleep( 5000 );

At some point this must have been meant to have the application wait for some other thread to finish, but that was not an issue anymore because the code had been refactored many times since then.

Answer 140

Converting all MySQL subqueries to use combination of joins and temporary tables. The improvement was unbelievable.

Answer 141

I have to finish a VB app with crystal reports that connect to a database. The original programmer stored the data in the BD as: "Field name = X" when a check box in the VB app was checked and "Field name = " when unchecked. The crystal reports showed all those strings from the BD and the fields must had the correct number of spaces and be in the correct places or everything would be messed up. The format of the report can't be changed, but I had to changed... I made the app store "X" when the check box is checked and " " when unchecked. the rest is written in the crystal report. Now I can place the fields in the report anywhere I want and nothing gets messed.

Answer 142

Renaming the daft long contentplaceholder id in an equally daft ASP.NET page that made use of master pages, nested user controls and nested repeaters. Saved about 30 KB from the rendered markup. Funny.

Answer 143

For our thesis, my friend and me had to copy a LARGE series of data to an excel file. This data was generated from a Matlab script. Made by hand, this "copy to excel" task will take an entire day. Then I programmed two loops in that matlab script and made it to write the data in the excel file (programming this took me a couple of hours) and now this task takes half of an hour in my dual core toshiba laptop :D (30 minutes... I repeat: It was a LARGE series of data).

Answer 144

I replaced a dynamically generated "or x='a' or x='b'..." to a dynamically generated "x in ('a', 'b'...)" and was able to make it run fast, before that the application was dying when executing that query

Answer 145

ServicePointManager.DefaultConnectionLimit throttled the connection count between web and app to 4!

Answer 146

I was writing a script that would write about 20k files to disk. It was taking about an hour, and I couldn't figure out why. Then I remembered that I was working on an NFS mount. Once I changed the output directory to /tmp and off the network, the script ran in about 2 minutes.

Answer 147

Remembering to implement/use an active_flag column in your SQL table/queries to return only active rows, it's a big help when you have hundred thousands of rows.

Answer 148

I changed someone's CF code from this:

<cfloop from="1" to="#a_large_number#">
  <cfoutput><td width="1" bgcolor="#ff000"></td></cfoutput>
</cfloop>

(reading it I had a serious WTF moment)

to this:

<cfoutput>
  <td width="#a_large_number#" bgcolor="#ff000"></td>
</cfoutput>

(This was in 1999, hence the HTML style)

Answer 149

I was checking for nodes in a tree that could be identical. I compared every one of the 3000-5000 nodes to every other node, and my full script took around 25 minutes to complete for every tree. I then realized that I only needed to check one category of nodes, which amounted to 300 nodes or so. After pruning the tree, the script took around 1.5 minutes. The power of O(n^2).

Answer 150

I don't remember when I've done my best performance improvement but I know what it was in C:

int a = 0;
while(a = 0)
{
    if(ShouldQuit())
    	a++;
}

to

int a = 0;
while(a == 0)
{
    if(ShouldQuit())
    	a++;
}

I can't say for sure what was the time improvement between these two versions, because every times I try, I TimeoutException...

Answer 151

Heavily nested Perl text processor where an inner loop had a line of

s/ +/ /g; (replace all spaces with a single space)

Profiled the app and noticed that single line accounted for 95% of CPU time. Removed the line, was very happy with the rather explosive speedup...

Answer 152

Changing

SELECT SOME_COLUMNS FROM TABLE WHERE ID IN ('A', 'B', 'C', 'D')

to

SELECT SOME_COLUMNS FROM TABLE WHERE (ID = 'A') OR (ID = 'B') OR (ID = 'C') OR (ID = 'D')

Speed up the query executed against SQLMobile in something about 30x (measured)

Answer 153

Added nusoap_base::setGlobalDebugLevel(0); in a SOAP Server written in PHP.

It increased the performance of the SOAP server easily by a factor of five.

What is most interesting is this isn't documented anywhere as far as I could tell, and I only came across it after reading an obscure mailing list post where someone suggested this.

Answer 154

The report-builder component of a piece of financial software my team built had a nasty little glitch: it read the field delimiter character out of the DB every time it was inserted. (DB was Oracle 8 -- nice and heavyweight.) Several months later, my boss asked me to take a look at the code and see if I could optimize it so that reports would finish faster than "overnight". I spotted this little oopsie and stored the delimiter in a local variable after the first read. Performance increased literally 100-fold.

The original coder was ordinarily very competent. Dude just had a brain cramp the day he coded that.

Answer 155

In the first week of work on my first job I was asked to make some fixes (mainly UI) for application that was used internally to monitor usage of our ATMs (the number of them was below one hundred). The initial load time was very annoying, it was about ten minutes. And I was needed to make many restarts of that application to test my fixes, so I decided to find the reason of such slow start up. Without usage of any profiler I found code that was very suspicious to me. There was some method which was used to build human-readable information based on the states of ATMs stored in local database.
Structure of queried table was something like:
atmId(actually many columns here), operationId, moneyInAtm(actually many columns here), time
The (operationId, time) pairs was unique in that table.
The idea of code was following:

Retrieve last (maximum value of time) rows for all operations in all ATMs for last two days.
For each of that rows do another query to find row with equal values of atmId and operationId and with minimal value of time.
Calculate the difference of money and add (atmId, operationId, diffMoney, time) row to some table stored in memory (It later will be shown to user).

I replaced first two steps in this procedure with something like:

Retrieve all data for last two days sorted by operationId and then by time.
Iterate over that result set and find first and last rows for the same operation. (That was pretty easy since the rows sorted)

After that change the application initialization time was reduced to some seconds and the users which used to always go for some tea or coffee during start-up just denied to believe that the program worked correctly with such fast start-up. There are, however, was found one regression after few weeks of usage. The data about operations which was started before range of my query was lost or in some cases corrupted, because I was missing first row of that operations, but that bug was fixed and the users was happy.

Answer 156

I had a JavaScript based table sorter that would lock up the entire browser from 10 seconds to a minute on each run (pretty large data set). After profiling (and many complaints) I learned that the mootools adopt method was taking up 88% of that time. All it took was the addition of four letters and instantly I got a massive performance improvement (down to about 1.5 seconds per run, much more acceptable).

From:

this.body.adopt(rows);

To:

this.body.adopt.pass(rows);

Answer 157

I changed an oracle query from this:

SELECT DISTINCT ...
FROM ...
WHERE ...

To this:

SELECT ...
FROM ...
WHERE ...

This literally made a 1000-fold difference.

Answer 158

My company hasn't always been so organized about backgrounding. For simple projects, we just run processes with the bash screen command. Typically, logging is set up with file and console appenders. Logging in to the host machine and detaching running screens once cut the time of a long series of calls by about a factor of four.

Of course, removing an errant sleep statement cut out about a factor of 10. I never figured out what it was doing there.

Answer 159

It just happened right here ^[1]

[1] http://stackoverflow.com/questions/726943/displaying-streaming-rich-text-with-wpf

Answer 160

I once tweaked a small tool for exporting master/detail customer data written in VB6/ADO on MS Access. Got a 60x performance improve (from 10 minutes to 10 seconds). It was working like this:

openConnection1
masterRS = getCustomers()

while not master.EOF
    openConnection2()
    openDetailRS
    ...
    closeDetailRS 
    closeConnection2

    master.MoveNext
wend

closeMasterRS
closeConnection1

Guess what the problem was... :-)

Answer 161

In terms of web app pages loading faster, we used a filter to strip out all excess white space from the html. This decreased the actual page size 25% which speeds things up quite a bit.

Reason we had so much white space was that there was a big JSP file involved that had lots of pretty printing ^[1]. Pretty printing is a good thing but can increase your page size/load time in this scenario.

[1] http://en.wikipedia.org/wiki/Prettyprint

Answer 162

I had a state machine transition function which relied on a local std::stack for temporary values. The stack always emptied before the function returned, and the function didn't need to be re-entrant or thread-safe, so I could make it a static local variable.

This avoided re-allocating/growing the stack each time, resulting in something like a 10x performance improvement.

Answer 163

I originally used an ASP.NET DataGridView to display a large and richly formatted dataset which pushed the page beyond the 580k mark.

I later replaced the DataGridView (which is made up of tables by default), with a repeater control and a carefully 'cascading' arrangement of CSS styles. The change brought the size down to the 120k region.

Answer 164

Removing

<%= javascript_include_tag :defaults %>

from a Rails app that didn't need it. Even if I needed the scripts, that line was a huge bottleneck. The app, by default, included the javascript files with a random number parameter attached to the end of the filename to prevent caching.

Fixing this dropped the page load time from 7.5 seconds to 1.5 seconds.

Answer 165

Adding a compound index on a table. It reduce the time for a select query from 83 second to 2 seconds. Note that the SQL Wizard's hints weren't appropriate, I spent one day to think about the columns to be added/their order inside the index.

Answer 166

Quite similar to what you described in your question, I didn't trust SQL Server's optimizer, and added "OPTION(HASH JOIN)" to a query - over 3 orders of magnitude faster.

Answer 167

The other day I found out how bad Postgres 8.1 is at optimising prepared statements.

I changed the code from SQL ?s to sprintf %s-es, and the query went from taking over 15 minutes to under 7 seconds.

(Then I installed 8.2 on a test box and found out they'd fixed that problem...)

Answer 168

Changed from using a TreeSet to a HashSet. Was performing lots of set unions. ~40 seconds to ~200 ms.

Answer 169

In J (again).

The 'dll' library creates nice verbs to manipulate memory. mema allocates, memr reads, memw writes and memf frees.

If you have a lot of addresses to read, J will evaluate memr at every read.

So that:

memr each BunchOfAdresses

is much slower than:

15!:1 each BunchOfAdresses

The tricky thing with 15!:1 though is that it can read all adresses passed, but will crash like a bitch if you give it a null (15!:1 (0 0 _1)) (Where the first 0 is the address, the second is a byte offset, and the _1 is -1, for length, in this case "read to first null").

So, if you have a lot of addresses, what do you do? Well, you could wrap your reader like so:

memr2 =: 3 : 0
    if. 0 = {.y do.
        ''
    else.
        memr y
    end.
)

But that's going to be a pain. J will evaluate that every read. Plus another evaluation if you have memr and not 15!:1.

Instead, if reading a lot of addresses, replace the nulls with an address you declare yourself, where you store a default value of your choice.

adrNull =. mema 1  NB. Allocate 1 byte
'' memw adrNull, 0 1 NB. Set byte to null
bunchOfAdresses =. adrNull (bx bunchOfAdresses = 0) } bunchOfAdresses  NB. replace all null addresses with out new address.
result =. 15!:1 bunchOfAdresses,"1 [ 0 _1  NB. append a 0 offset and -1 length to all addresses and read.
memf adrNull NB. always cleanup after

Answer 170

At my previous company we were using a lot of third-party code to speed development of our main product. We found that one component in particular dramatically increased the startup time of the application. So we spent a few days pouring over their source code to figure out how we could improve performance. At one point, I ran into this little gem:

CObjectManager::CObjectManager() {
    components.init();
    Sleep(10000); //Required for multi-threading 
    components.start();
}

We pressed the company to explain the code, and they insisted that "multi-threaded apps require proper timing," and if you remove the Sleep, it will break.

Apparently the original developer coded himself into a race condition, and to solve it he simply called off the race.

Answer 171

My brother had a case in an Ada program where the bitfield declarations were inadvertently crossing a word boundary. Fixing that improved the method by a factor of 350,000. Yes, no typo, three hundred and fifty thousand times.

Answer 172

Example is from J.

I see this often:

v1 ,. v2 ,. v3 ,. v4 ,. v5 ...

And sometimes that's just to take some rows out of it:

idx { v1 ,. v2 ,. v3 ,. v4 ,. v5 ...

Thing is, ,. "stitches" two same length vectors, or two same length matrixes. But every time you use it, the interpretor has to create a new matrix just one more column wider and stitch. And again, and again.

Prefer the following:

> idx & { each v1 ; v2 ; v3 ; v4 ; v5

At some point using boxes gets old, because they have so much overhead. But remember, boxing every single element is often uncessessary, and very slow.

Also fun is the key conjunction /.. But most beginners will only think of boxing with it and then apply whatever verb they want to each box.

For example, getting grouped sums can be done with this:

; +/ each x </. y

But should be done like this:

x +//. y

Answer 173

This is specific for WinForm .NET. Turn off DataGridView.AutoSizeColumnsMode and AutoSizeRowMode.

Answer 174

Replaced native Java object serialization with protobuf ^[1]. 400+ millis went down to 180.
Disabled (first made caching configurable per application) XSLT caching ^[2] during boot time. 2 and a half mins came down to almost nothing. We were having around 20 XSLT's which are used very very rarely
Replaced application-level caching with oracle in memory cache ^[3]. Mpving caching to DB level reduced the heap size on all the applications in a cluster. The side effect of this means reduced memory footprint meaning reduced GC cycles making the app more responsive than earlier.

Point is that we spent more time at a higher level than the code level. Most of the code level optimizations are performed by the JIT compiler anyway.

[1] http://code.google.com/p/protobuf/
[2] http://www.javaworld.com/javaworld/jw-05-2003/jw-0502-xsl.html
[3] http://www.oracle.com/technetwork/database/options/imdb-cache/index.html

Answer 175

J has two stock string replacement functions. One is single characters (charsub) where it's just a 1 to 1 swap, the other is replace a string with another (rplc), of varying lengths.

Naturally, if you're replacing strings of varying sizes, you need to resize the destination string.

I don't know how many single char swaps I found that were done with the string one, but when you find it in a loop that's called enough times, the resulting gains are very real.

Answer 176

There was an application I worked on once that output a large amount of data. Basically, anything it did would be written to various files so it could be analyzed later. Depending on the type of work being done, this application could take days to finish running its calculations. The original developer had used 'endl' to terminate every print statement. I replaced the endl's with \n (a careful find/replace) and saw the performance improve by 15 - 20 percent.

Answer 177

I matched TCP packet size between two application servers (setting MTU value in the Windows registries). For whatever reason, the network between these two servers had a smaller MSS value than the typical default values, causing fragmentation/reassembly at the TCP level. Matching these two servers to the lowest common denominator between them decreased execution time down to 1/3 the original time for our distributed application.

The network tech could give me no answers so I took matters into my own hands.

Answer 178

Some image processing work I did about 8 or 10 years ago on PPC/AltiVec system when they first came out. Converted an nXm convolution originally written for i386 ported to a mercury mcos system (Very fast PPC/Altivec processors linked together by a highspeed backplane and CCNUMA memory arch). Really sped up just after a simple code port, when taking advantage of their parallel processing libraries it boosted it by about 22x over the non-parallel hand-coded version. Moral of the story - vector processors are nice! Although not as radical of decrease in run-time, I saw substantial savings in using their FFT algorithms as well . . . TheEruditeTroglodyte

Answer 179

Didn't did it in practice, but I had to normalize a matrix on a parallele machine : meaning for each column, divide each value by the average of the column.

Well with Direct Maping Cache, if the matrix is stored by continuous values of rows in memory, you can get a 100% miss. It depends the most inner loop's data storage strategy.

An other "error" is to invert the "i" and "j" in the nested loops (where i is for the lines and j for columns) which makes a very easy optimisation on the code withou rewriting anything (just cut and paste)

Answer 180

I broke one complex query into to two separate relatively small queries, and it improved performance by an order of magnitude, I was surprised.

Answer 181

One time I had a JavaScript function run for about 45 seconds in IE. Chrome crunched it between 1-2 seconds.

Oh, that, and going from a Debug Build to Release Build... That was an eye opener.

Answer 182

Removed a delay() instruction from an old DOS game made by a friend, to make it work on a 286 system.

Answer 183

Putting the OutputCache attribute on a WebMethod

The WebMethod was loading Xml files and de-serializing the data to an object graph.

Answer 184

Doing some one-time processing of an XML file in Perl was taking minutes. Rewrote the routine in C# and it completed in seconds.