20 tricks for Python performance optimization

Optimization algorithm time complexity

The time complexity of the algorithm has the greatest influence on the execution efficiency of the program. In Python, the time complexity can be optimized by selecting the appropriate data structure. For example, the time complexity of the list and set to find an element is O(n) and O, respectively. 1). Different scenarios have different optimization methods. In general, there are generally thoughts of dividing, branching, greedy, and dynamic planning.

Reduce redundant data

For example, use a triangle or a triangle to save a large symmetric matrix. Sparse matrix representations are used in matrices where 0 is the majority of the elements.

Use copy and deepcopy rationally

For data structures such as dict and list, direct assignment uses the reference method. In some cases, you need to copy the entire object. You can use copy and deepcopy in the copy package. The difference between these two functions is that the latter is recursively copied. The efficiency is not the same: (The following program runs in ipython)

Import copya = range(100000)%timeit -n 10 copy.copy(a) # Run copy.copy(a)%timeit -n 10 copy.deepcopy(a)10 loops, best of 3: 1.55 ms per loop10 Loops, best of 3: 151 ms per loop

The -n after timeit indicates the number of runs, and the last two rows correspond to the output of two timeit, the same below. This shows that the latter is an order of magnitude slower.

Use dict or set to find elements

Both python dict and set are implemented using a hash table (similar to unordered_map in the c++11 standard library). The time complexity of finding elements is O(1)

a = range(1000)s = set(a)d = dict((i,1) for i in a)%timeit -n 10000 100 in d%timeit -n 10000 100 in s10000 loops, best of 3: 43.5 ns Per loop10000 loops, best of 3: 49.6 ns per loop

The dict is slightly more efficient (it takes up more space).

Reasonable use of generators and yield

%timeit -n 100 a = (i for i in range(100000))%timeit -n 100 b = [i for i in range(100000)]100 loops, best of 3: 1.54 ms per loop100 loops, best of 3 : 4.56 ms per loop

Using () to get a generator object, the required memory space has nothing to do with the size of the list, so the efficiency will be higher. In specific applications, such as set (i for i in range (100000)) will be faster than set ([i for i in range (100000)]).

But for situations that need to loop through:

%timeit -n 10 for x in (i for i in range(100000)): pass%timeit -n 10 for x in [i for i in range(100000)]: pass10 loops, best of 3: 6.51 ms per loop10 Loops, best of 3: 5.54 ms per loop

The latter is more efficient, but if there is a break in the loop, the benefits of using generators are obvious. Yield is also used to create generators:

Def yield_func(ls): for i in ls: yield i+1def not_yield_func(ls): return [i+1 for i in ls]ls = range(1000000)%timeit -n 10 for i in yield_func(ls):pass %timeit -n 10 for i in not_yield_func(ls):pass10 loops, best of 3: 63.8 ms per loop10 loops, best of 3: 62.9 ms per loop

For memory that is not a very large list, you can return a list directly, but the readability yield is better (people prefer).

The python2.x built-in generator functions include the xrange function, the itertools package, and so on.

Optimization cycle

Things that can be done outside the loop are not in the loop. For example, the following optimization can be doubled:

a = range(10000) size_a = len(a)%timeit -n 1000 for i in a: k = len(a)%timeit -n 1000 for i in a: k = size_a1000 loops, best of 3: 569 μs per Loop1000 loops, best of 3: 256 μs per loop

Optimize the order of multiple evaluation expressions

For and, we should put the satisfaction of the conditions in front of it. For or, put the conditions that meet the conditions in front of it. Such as:

a = range(2000) %timeit -n 100 [i for i in a if 10 < i < 20 or 1000 <i <2000]%timeit -n 100 [i for i in a if 1000 <i <2000 or 100 < i < 20] %timeit -n 100 [i for i in a if i % 2 == 0 and i > 1900]%timeit -n 100 [i for i in a if i > 1900 and i % 2 == 0] 100 loops, best of 3: 287 μs per loop100 loops, best of 3: 214 μs per loop100 loops, best of 3: 128 μs per loop100 loops, best of 3: 56.1 μs per loop

Use join to merge strings in iterators

In [1]: %%timeit ...: s = '' ...: for i in a: ...: s += i ...:10000 loops, best of 3: 59.8 μs per loopIn [2 ]: %%timeits = ''.join(a) ...:100000 loops, best of 3: 11.8 μs per loop

Join has about a 5x increase in accumulative method.

Choose the right format character

S1, s2 = 'ax', 'bx'%timeit -n 100000 'abc%s%s' % (s1, s2)%timeit -n 100000 'abc{0}{1}'.format(s1, s2) %timeit -n 100000 'abc' + s1 + s2100000 loops, best of 3: 183 ns per loop100000 loops, best of 3: 169 ns per loop100000 loops, best of 3: 103 ns per loop

In all three cases, the % mode is the slowest, but the gap between the three is not large (both are very fast). (Personal feels the best readability of %)

Exchange values ​​of two variables without intermediate variables

In [3]: %%timeit -n 10000 a,b=1,2 ....: c=a;a=b;b=c; ....:10000 loops, best of 3: 172 ns per loopIn [4]: ​​%%timeit -n 10000a,b=1,2a,b=b,a ....:10000 loops, best of 3: 86 ns per loop

Use a, b = b, a instead of c = a; a = b; b = c; to exchange the value of a, b, can be more than 1 times faster.

Use if is

a = range(10000)%timeit -n 100 [i for i in a if i == True]%timeit -n 100 [i for i in a if i is True]100 loops, best of 3: 531 μs per loop100 Loops, best of 3: 362 μs per loop

Using if is True is almost twice as fast as if == True.

Use cascade comparison x < y <z

x, y, z = 1,2,3%timeit -n 1000000 if x < y < z:pass%timeit -n 1000000 if x < y and y < z:pass1000000 loops, best of 3: 101 ns per loop1000000 loops , best of 3: 121 ns per loop

x < y < z is slightly more efficient and more readable.

While 1 is faster than while True

Def while_1(): n = 100000 while 1: n -= 1 if n <= 0: breakdef while_true(): n = 100000 while True: n -= 1 if n <= 0: break m, n = 1000000, 1000000 %timeit -n 100 while_1()%timeit -n 100 while_true()100 loops, best of 3: 3.69 ms per loop100 loops, best of 3: 5.61 ms per loop

While 1 is much faster than while true, because in Python 2.x, True is a global variable, not a keyword.

Use ** instead of pow

%timeit -n 10000 c = pow(2,20)%timeit -n 10000 c = 2**2010000 loops, best of 3: 284 ns per loop10000 loops, best of 3: 16.9 ns per loop

** is 10 times faster!

Use cProfile, cStringIO and cPickle to use c to achieve the same function (respectively, profile, StringIO, pickle) package

Import cPickleimport picklea = range(10000)%timeit -n 100 x = cPickle.dumps(a)%timeit -n 100 x = pickle.dumps(a)100 loops, best of 3: 1.58 ms per loop100 loops, best of 3 : 17 ms per loop

Packages implemented by c, 10 times faster!

Use the best deserialization method

The following is a comparison of the efficiency of deserialization of the corresponding string using the eval, cPickle, and json methods:

Import jsonimport cPicklea = range(10000)s1 = str(a)s2 = cPickle.dumps(a)s3 = json.dumps(a)%timeit -n 100 x = eval(s1)%timeit -n 100 x = cPickle. (s2)%timeit -n 100 x = json.loads(s3)100 loops, best of 3: 16.8 ms per loop100 loops, best of 3: 2.02 ms per loop100 loops, best of 3: 798 μs per loop

It can be seen that json is almost 3 times faster than cPickle and 20 times faster than eval.

Use C Extension

At present, there are mainly CPython (the most common way of Python implementation) native APIs, ctypes, Cython, and cffi. Their role is to make Python programs can call the dynamic link library compiled by C. Its features are:

CPython native API: By introducing the Python.h header file, the corresponding C program can use the Python data structure directly. The implementation process is relatively complicated, but it has a relatively large scope of application.

Ctypes: Normally used to wrap C programs and allow pure Python programs to call functions in dynamic link libraries (dlls in Windows or so files in Unix). If you want to use already existing C libraries in Python, using ctypes is a good choice. With some benchmarks, python2+ctypes is the best way to perform.

Cython: Cython is a superset of CPython that simplifies the process of writing C extensions. The advantage of Cython is that its syntax is concise, and it is well compatible with libraries that contain a large number of C extensions, such as numpy. Cython's make the scene is generally optimized for an algorithm or process in the project. In some tests, there can be several hundred times better performance.

Cffi: cffi is the implementation of ctypes in pypy (see below). It is also compatible with CPython. Cffi provides a way to use C libraries in Python. You can write C code directly in Python code and support linking to existing C libraries.

Using these optimization methods is generally aimed at the optimization of existing project performance bottleneck modules, which can greatly improve the overall program's operating efficiency with a small number of changes to the original project.

Parallel programming

Because of the existence of the GIL, it is difficult for Python to take full advantage of multi-core CPUs. However, the following parallel modes can be implemented by the built-in module multiprocessing:

Multi-process: For CPU-intensive programs, you can use multiprocessing such as Process, Pool, and other encapsulated classes to implement parallel computing through multiple processes. However, because the communication costs in the process are relatively large, the efficiency of programs that require large amounts of data exchange between processes may not be greatly improved.

Multi-threading: For IO-intensive programs, the multiprocessing.dummy module uses multiprocessing's interface to encapsulate threading, making multi-threaded programming also very easy (for example, Pool's map interface can be used, concise and efficient).

Distributed: The Managers class in multiprocessing provides a way to share data in different processes and based on which you can develop distributed programs.

Different business scenarios can choose one or several combinations to optimize program performance.

Final killer: PyPy

PyPy is Python implemented in RPython (a subset of CPython), which is more than six times faster than the CPython implementation of Python, according to the official website's benchmark data. The reason for the fast is the use of the Just-in-Time (JIT) compiler, which is a dynamic compiler. Unlike static compilers (such as gcc, javac, etc.), it is optimized using the data of the program's running process. For historical reasons, the GIL is still currently in pypy, but the ongoing STM project attempts to turn PyPy into Python without GIL.

If the Python program contains C extensions (in a non-cffi way), the JIT's optimization performance will be greatly reduced, even slower than CPython (than Numpy). So it's better to use pure Python or use cffi extensions in PyPy.

With the completion of STM, Numpy and other projects, it is believed that PyPy will replace CPython.

Use performance analysis tools

In addition to the timeit module used above in ipython, there is cProfile. The use of cProfile is also very simple: python -m cProfile filename.py, filename.py is the name of the file to run the program, you can see in the standard output each function is called the number of times and running time to find the program's Performance bottlenecks can then be targeted for optimization.

Copper Flat Wire Rolling Machine

Copper Flat Wire Rolling Machine,Special Shaped Copper Strip Machine,Ultra Thin Copper Strip Machine,Copper Strip Annealing Equipment

Jiangsu Lanhui Intelligent Equipment Technology Co., Ltd , https://www.lanhuisolar.com