High Performance Python

Understanding Performant Python

Profiling

Lists and Tuples

内部实现都是array？

Dictionaries and Sets

字典元素：__hash__ + __eq__/__cmp__
entropy（熵）
locals() globals() __builtin__
列表理解/生成器理解：（一个用[]，一个用()）
[<value> for <item> in <sequence> if <condition>] vs (<value> for <item> in <sequence> if <condition>)
itertools：
1. imap, ireduce, ifilter, izip, islice, chain, takewhile, cycle
p95 Knuth‘s online mean algorithm？

Iterators and Generators

Matrix and Vector Computation

老是在举‘循环不变式’的例子，这是编译器没优化好吧？
$ perf stat -e cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,\
cache-references,cache-misses,branches,branch-misses,task-clock,faults,\
minor-faults,cs,migrations -r 3 python diffusion_python_memory.py
numpy
1. np.roll([[1,2,3],[4,5,6]], 1, axis=1)
2. ？Cython能够优化数据结构吗？还是说只能处理代码？
3. In-place operations, such as +=, *=
  1. => numexpr
    1. from numexpr import evaluate
    2. evaluate("next_grid*D*dt+grid", out=next_grid)
4. ？Creating our own roll function
scipy
1. from scipy.ndimage.filters import laplace
2. laplace(grid, out, mode=‘wrap‘)
3. page-faults显示scipy分配了大量内存？instructions显示scipy函数太过通用？

Compiling to C

编译到C：
1. Cython
  1. zmq也用到了？
  2. setup.py
    from distutils.core import setup
    from distutils.extension import Extension
    from Cython.Distutils import build_ext
    setup( cmdclass = {‘build_ext‘: build_ext},
    ext_modules = [Extension("calculate", ["cythonfn.pyx"])]
    )
  3. $ python setup.py build_ext --inplace
  4. Cython Annotations：代码行更黄代表“more calls into the Python virtual machine,”
  5. 添加Type Annotations
    1. cdef unsigned int i, n
  6. 禁止边界检查：#cython: boundscheck=False（修饰函数）
  7. Buffer标记协议？
    1. def calculate_z(int maxiter, double complex[:] zs, double complex[:] cs): ...
  8. OpenMP
    1. prange
    2. -fopenmp（对GCC？）
    3. schedule="guided"
2. Shed Skin：for non- numpy code
  1. shedskin --extmod test.py
  2. 额外的0.05s：用于从Python环境复制数据
3. Pythran
基于LLVM的Numba：specialized for numpy
1. 使用Continuum’s Anaconda版本
2. from numba import jit
  1. @jit()
3. Experimental GPU support is also available？
4. #pythran export evolve(float64[][], float)
VM & JIT：PyPy
1. GC行为：Whereas CPython uses reference counting, PyPy uses a modified mark and sweep（从而可能回收不及时）
2. Note that PyPy 2.3 runs as Python 2.7.3.
3. STM：尝试移除GIL
其他工具：Theano Parakeet PyViennaCL Nuitka Pyston（Dropbox的）~~PyCUDA~~（低级代码无法移植？）
ctypes、cffi（来自PyPy）、f2py、CPython模块
1. $ f2py -c -m diffusion --fcompiler=gfortran --opt=‘-O3‘ diffusion.f90
JIT Versus AOT

Concurrency

并发：避免I/O wait的浪费
In Python, coroutines are implemented as generators.
For Python 2.7 implementations of future-based concurrency, ... ？
1. gevent（适合于mainly CPU-based problems that sometimes involve heavy I/O）
  1. gevent monkey-patches the standard I/O functions to be asynchronous
  2. Greenlet
    1. wait
    2. The futures are created with gevent.spawn
    3. 控制同时打开的资源数：from gevent.coros import Semaphore
      1. requests = [gevent.spawn(download, u, semaphore) for u in urls]
  3. import grequests？
  4. 69x的加速？这是否意味着对应的不必要的IO waits？
  5. event loop可能either underutilizing or overutilizing
2. tornado（By Facebook，适合于mostly I/O-bound的异步应用）
  1. from tornado import ioloop, gen
  2. from functools import partial
  3. AsyncHTTPClient.configure("tornado.curl_httpclient.CurlAsyncHTTPClient", max_clients=100)
  4. @gen.coroutine
    1. ... responses = yield [http_client.fetch(url) for url in urls] #生成Future对象？
    2. response_sum = sum(len(r.body) for r in responses)
    3. raise gen.Return(value=http://www.mamicode.com/response_sum)
  5. _ioloop = ioloop.IOLoop.instance()
  6. run_func = partial(run_experiment, base_url, num_iter)
  7. result = _ioloop.run_sync(run_func)
  8. 缺点：tracebacks can no longer hold valuable information

In Python 3.4, new machinery introduced to easily create coroutines and have them still return values

asyncio

yield from：不再需要raise异常，以便从coroutine中返回结果

very low-level => import aiohttp

@asyncio.coroutine
def http_get(url): #
<span style="white-space:pre">	</span>nonlocal semaphore
<span style="white-space:pre">	</span>with (yield from semaphore):
<span style="white-space:pre">		</span>response = yield from aiohttp.request('GET', url)
<span style="white-space:pre">		</span>body = yield from response.content.read()
<span style="white-space:pre">		</span>yield from response.wait_for_close()
<span style="white-space:pre">	</span>return body
return http_get

tasks = [http_client(url) for url in urls]
for future in asyncio.as_completed(tasks):
<span style="white-space:pre">	</span>data = http://www.mamicode.com/yield from future>

allows us to unify modules like tornado and gevent by having them run in the same event loop

multiprocessing

Process Pool Queue Pipe Manager ctypes（用于IPC？）
In Python 3.2, the concurrent.futures module was introduced (via PEP 3148)
PyPy完全支持multiprocessing，运行更快
from multiprocessing.dummy import Pool（多线程的版本？）
hyperthreading can give up to a 30% perf gain，如果有足够的计算资源
It is worth noting that the negative of threads on CPU-bound problems is reasonably solved in Python 3.2+
使用外部的队列实现：Gearman, 0MQ, Celery（使用RabbitMQ作为消息代理）, PyRes, SQS or HotQueue
manager = multiprocessing.Manager()
value = http://www.mamicode.com/manager.Value(b‘c‘, FLAG_CLEAR)
rds = redis.StrictRedis()
rds[FLAG_NAME] = FLAG_SET
value = http://www.mamicode.com/multiprocessing.RawValue(b‘c‘, FLAG_CLEAR) #无同步机制？
sh_mem = mmap.mmap(-1, 1) # memory map 1 byte as a flag
sh_mem.seek(0)
flag = sh_mem.read_byte()
Using mmap as a Flag Redux（？有点看不明白，略过）
$ ps -A -o pid,size,vsize,cmd | grep np_shared
lock = lockfile.FileLock(filename)
lock.acquire/release()
lock = multiprocessing.Lock()
value = http://www.mamicode.com/multiprocessing.Value(‘i‘, 0)
lock.acquire()
value.value += 1
lock.release()

Clusters and Job Queues

$462 Million Wall Street Loss Through Poor Cluster Upgrade Strategy
1. 版本升级造成不一致？但API应该版本化...
Skype‘s 24-Hour Global Outage
1. some versions of the Windows client didn’t properly handle the delayed responses and crashed.
To reliably start the cluster‘s components when the machine boots, we tend to use either a cron job,Circus or supervisord, or sometimes Upstart (which is being replaced by systemd)
you might want to introduce a random-killer tool like Netflix‘s ChaosMonkey
Make sure it is cheap in time and money to deploy updates to the system
Make sure you use a deployment system like Fabric, Salt, Chef, or Puppet
早期预警：Pingdom andServerDensity
状态监控：Ganglia
3 Clustering Solutions
1. Parallel Python
  1. ppservers = ("*",) # set IP list to be autodiscovered
  2. job_server = pp.Server(ppservers=ppservers, ncpus=NBR_LOCAL_CPUS)
  3. ... job = job_server.submit(calculate_pi, (input_args,), (), ("random",))
2. IPython Parallel
  1. via ipcluster
  2. ？Schedulers hide the synchronous nature of the engines and provide an asynchronous interface
3. NSQ（分布式消息系统，Go编写）
  1. Pub/sub：Topicd -> Channels -> Consumers
  2. writer = nsq.Writer([‘127.0.0.1:4150‘, ])
  3. handler = partial(calculate_prime, writer=writer)
  4. reader = nsq.Reader(message_handler = handler, nsqd_tcp_addresses = [‘127.0.0.1:4150‘, ], topic = ‘numbers‘, channel = ‘worker_group_a‘,)
  5. nsq.run()
其他集群工具

Using Less RAM

IPython #memit
array模块
DAWG/DAFSA
Marisa trie（静态树）
Datrie（需要一个字母表以包含所有的key？）
HAT trie
HTTP微服务（使用Flask）：https://github.com/j4mie/postcodeserver/
Probabilistic Data Structures
1. HyperLogLog++结构？
2. Very Approximate Counting with a 1-byte Morris Counter
  1. 2^exponent，使用概率规则更新：random(0,1)<=2^-exponent
3. K-Minimum Values/KMV（记住k个最小的hash值，假设hash值分布均匀）
4. Bloom Filters
  1. This method gives us no false negatives and a controllable rate of false positives（可能误判为有）
  2. ？用2个独立的hash仿真任意多个hash
  3. very sensitive to initial capacity
  4. scalable Bloom filters：By chaining together multiple bloom filters ...
5. LogLog Counter
  bit_index = trailing_zeros(item_hash)
  if bit_index > self.counter:
  self.counter = bit_index
  1. 变体：SuperLogLog HyperLogLog

Lessons from the Field

Sentry is used to log and diagnose Python stack traces
Aho-Corasick trie？
We use Graphite with collectd and statsd to allow us to draw pretty graphs of what‘s going on
Gunicorn was used as a WSGI and its IO loop was executed by Tornado

High Performance Python 笔记（Python是门不错的语言，全栈程序员就用它好了！）

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > High Performance Python 笔记（Python是门不错的语言，全栈程序员就用它好了！）