首页 > 代码库 > High Performance Python 笔记(Python是门不错的语言,全栈程序员就用它好了!)
High Performance Python 笔记(Python是门不错的语言,全栈程序员就用它好了!)
High Performance Python
目录
- 1Understanding Performant Python
- 2Profiling
- 3Lists and Tuples
- 4Dictionaries and Sets
- 5Iterators and Generators
- 6Matrix and Vector Computation
- 7Compiling to C
- 8Concurrency
- 9multiprocessing
- 10Clusters and Job Queues
- 11Using Less RAM
- 12Lessons from the Field
Understanding Performant Python
Profiling
Lists and Tuples
- 内部实现都是array?
Dictionaries and Sets
- 字典元素:__hash__ + __eq__/__cmp__
- entropy(熵)
- locals() globals() __builtin__
- 列表理解/生成器理解:(一个用[],一个用())
- [<value> for <item> in <sequence> if <condition>] vs (<value> for <item> in <sequence> if <condition>)
- itertools:
- imap, ireduce, ifilter, izip, islice, chain, takewhile, cycle
- p95 Knuth‘s online mean algorithm?
Iterators and Generators
Matrix and Vector Computation
- 老是在举‘循环不变式’的例子,这是编译器没优化好吧?
- $ perf stat -e cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,\
cache-references,cache-misses,branches,branch-misses,task-clock,faults,\
minor-faults,cs,migrations -r 3 python diffusion_python_memory.py - numpy
- np.roll([[1,2,3],[4,5,6]], 1, axis=1)
- ?Cython能够优化数据结构吗?还是说只能处理代码?
- In-place operations, such as +=, *=
- => numexpr
- from numexpr import evaluate
- evaluate("next_grid*D*dt+grid", out=next_grid)
- => numexpr
- ?Creating our own roll function
- scipy
- from scipy.ndimage.filters import laplace
- laplace(grid, out, mode=‘wrap‘)
- page-faults显示scipy分配了大量内存?instructions显示scipy函数太过通用?
Compiling to C
- 编译到C:
- Cython
- zmq也用到了?
- setup.py
- from distutils.core import setup
- from distutils.extension import Extension
- from Cython.Distutils import build_ext
- setup( cmdclass = {‘build_ext‘: build_ext},
ext_modules = [Extension("calculate", ["cythonfn.pyx"])] - )
- $ python setup.py build_ext --inplace
- Cython Annotations:代码行更黄代表“more calls into the Python virtual machine,”
- 添加Type Annotations
- cdef unsigned int i, n
- 禁止边界检查:#cython: boundscheck=False(修饰函数)
- Buffer标记协议?
- def calculate_z(int maxiter, double complex[:] zs, double complex[:] cs): ...
- OpenMP
- prange
- -fopenmp(对GCC?)
- schedule="guided"
- Shed Skin:for non- numpy code
- shedskin --extmod test.py
- 额外的0.05s:用于从Python环境复制数据
- Pythran
- Cython
- 基于LLVM的Numba:specialized for numpy
- 使用Continuum’s Anaconda版本
- from numba import jit
- @jit()
- Experimental GPU support is also available?
- #pythran export evolve(float64[][], float)
- VM & JIT:PyPy
- GC行为:Whereas CPython uses reference counting, PyPy uses a modified mark and sweep(从而可能回收不及时)
- Note that PyPy 2.3 runs as Python 2.7.3.
- STM:尝试移除GIL
- 其他工具:Theano Parakeet PyViennaCL Nuitka Pyston(Dropbox的)
PyCUDA(低级代码无法移植?) - ctypes、cffi(来自PyPy)、f2py、CPython模块
- $ f2py -c -m diffusion --fcompiler=gfortran --opt=‘-O3‘ diffusion.f90
- JIT Versus AOT
Concurrency
- 并发:避免I/O wait的浪费
- In Python, coroutines are implemented as generators.
- For Python 2.7 implementations of future-based concurrency, ... ?
- gevent(适合于mainly CPU-based problems that sometimes involve heavy I/O)
- gevent monkey-patches the standard I/O functions to be asynchronous
- Greenlet
- wait
- The futures are created with gevent.spawn
- 控制同时打开的资源数:from gevent.coros import Semaphore
- requests = [gevent.spawn(download, u, semaphore) for u in urls]
- import grequests?
- 69x的加速?这是否意味着对应的不必要的IO waits?
- event loop可能either underutilizing or overutilizing
- tornado(By Facebook,适合于mostly I/O-bound的异步应用)
- from tornado import ioloop, gen
- from functools import partial
- AsyncHTTPClient.configure("tornado.curl_httpclient.CurlAsyncHTTPClient", max_clients=100)
- @gen.coroutine
- ... responses = yield [http_client.fetch(url) for url in urls] #生成Future对象?
- response_sum = sum(len(r.body) for r in responses)
- raise gen.Return(value=http://www.mamicode.com/response_sum)
- _ioloop = ioloop.IOLoop.instance()
- run_func = partial(run_experiment, base_url, num_iter)
- result = _ioloop.run_sync(run_func)
- 缺点:tracebacks can no longer hold valuable information
- gevent(适合于mainly CPU-based problems that sometimes involve heavy I/O)
- In Python 3.4, new machinery introduced to easily create coroutines and have them still return values
- asyncio
- yield from:不再需要raise异常,以便从coroutine中返回结果
- very low-level => import aiohttp
@asyncio.coroutine def http_get(url): # <span style="white-space:pre"> </span>nonlocal semaphore <span style="white-space:pre"> </span>with (yield from semaphore): <span style="white-space:pre"> </span>response = yield from aiohttp.request('GET', url) <span style="white-space:pre"> </span>body = yield from response.content.read() <span style="white-space:pre"> </span>yield from response.wait_for_close() <span style="white-space:pre"> </span>return body return http_get tasks = [http_client(url) for url in urls] for future in asyncio.as_completed(tasks): <span style="white-space:pre"> </span>data = http://www.mamicode.com/yield from future>
- allows us to unify modules like tornado and gevent by having them run in the same event loop
- asyncio
multiprocessing
- Process Pool Queue Pipe Manager ctypes(用于IPC?)
- In Python 3.2, the concurrent.futures module was introduced (via PEP 3148)
- PyPy完全支持multiprocessing,运行更快
- from multiprocessing.dummy import Pool(多线程的版本?)
- hyperthreading can give up to a 30% perf gain,如果有足够的计算资源
- It is worth noting that the negative of threads on CPU-bound problems is reasonably solved in Python 3.2+
- 使用外部的队列实现:Gearman, 0MQ, Celery(使用RabbitMQ作为消息代理), PyRes, SQS or HotQueue
- manager = multiprocessing.Manager()
value = http://www.mamicode.com/manager.Value(b‘c‘, FLAG_CLEAR) - rds = redis.StrictRedis()
rds[FLAG_NAME] = FLAG_SET - value = http://www.mamicode.com/multiprocessing.RawValue(b‘c‘, FLAG_CLEAR) #无同步机制?
- sh_mem = mmap.mmap(-1, 1) # memory map 1 byte as a flag
sh_mem.seek(0)
flag = sh_mem.read_byte() - Using mmap as a Flag Redux(?有点看不明白,略过)
- $ ps -A -o pid,size,vsize,cmd | grep np_shared
- lock = lockfile.FileLock(filename)
lock.acquire/release() - lock = multiprocessing.Lock()
value = http://www.mamicode.com/multiprocessing.Value(‘i‘, 0)
lock.acquire()
value.value += 1
lock.release()
Clusters and Job Queues
- $462 Million Wall Street Loss Through Poor Cluster Upgrade Strategy
- 版本升级造成不一致?但API应该版本化...
- Skype‘s 24-Hour Global Outage
- some versions of the Windows client didn’t properly handle the delayed responses and crashed.
- To reliably start the cluster‘s components when the machine boots, we tend to use either a cron job,Circus or supervisord, or sometimes Upstart (which is being replaced by systemd)
- you might want to introduce a random-killer tool like Netflix‘s ChaosMonkey
- Make sure it is cheap in time and money to deploy updates to the system
- Make sure you use a deployment system like Fabric, Salt, Chef, or Puppet
- 早期预警:Pingdom andServerDensity
- 状态监控:Ganglia
- 3 Clustering Solutions
- Parallel Python
- ppservers = ("*",) # set IP list to be autodiscovered
- job_server = pp.Server(ppservers=ppservers, ncpus=NBR_LOCAL_CPUS)
- ... job = job_server.submit(calculate_pi, (input_args,), (), ("random",))
- IPython Parallel
- via ipcluster
- ?Schedulers hide the synchronous nature of the engines and provide an asynchronous interface
- NSQ(分布式消息系统,Go编写)
- Pub/sub:Topicd -> Channels -> Consumers
- writer = nsq.Writer([‘127.0.0.1:4150‘, ])
- handler = partial(calculate_prime, writer=writer)
- reader = nsq.Reader(message_handler = handler, nsqd_tcp_addresses = [‘127.0.0.1:4150‘, ], topic = ‘numbers‘, channel = ‘worker_group_a‘,)
- nsq.run()
- Parallel Python
- 其他集群工具
Using Less RAM
- IPython #memit
- array模块
- DAWG/DAFSA
- Marisa trie(静态树)
- Datrie(需要一个字母表以包含所有的key?)
- HAT trie
- HTTP微服务(使用Flask):https://github.com/j4mie/postcodeserver/
- Probabilistic Data Structures
- HyperLogLog++结构?
- Very Approximate Counting with a 1-byte Morris Counter
- 2^exponent,使用概率规则更新:random(0,1)<=2^-exponent
- K-Minimum Values/KMV(记住k个最小的hash值,假设hash值分布均匀)
- Bloom Filters
- This method gives us no false negatives and a controllable rate of false positives(可能误判为有)
- ?用2个独立的hash仿真任意多个hash
- very sensitive to initial capacity
- scalable Bloom filters:By chaining together multiple bloom filters ...
- LogLog Counter
- bit_index = trailing_zeros(item_hash)
- if bit_index > self.counter:
self.counter = bit_index
- 变体:SuperLogLog HyperLogLog
Lessons from the Field
- Sentry is used to log and diagnose Python stack traces
- Aho-Corasick trie?
- We use Graphite with collectd and statsd to allow us to draw pretty graphs of what‘s going on
- Gunicorn was used as a WSGI and its IO loop was executed by Tornado
High Performance Python 笔记(Python是门不错的语言,全栈程序员就用它好了!)
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。