Table of Contents
- Introduction
- Quick recap
- grequests with gevent-openssl code change:
- Stage-3
- Stage-3 - Why is Python27 fast and Python3.7 slow?
- Stage-4
- Stage-4 - What did we do to make Python3.7 fast?
- Summary
Introduction
This is the second part of a two part post describing how the Python modules installed on your system can impact run time of a grequests based program.
Read Part-I here.
Quick recap
- Problem: Doing GET type calls on 100s of urls using requests is slow due to serial processing.
- Potential solution: Using grequests which leverages gevent helps us make concurrent calls.
- Issue: If you have pyopenssl installed on your system, urllib3 re-patches the SSLContext and makes your code run slower. This was demoed by stage-1 and stage-2.
- Work around solution:
- Don’t use pyopenssl OR
- Use gevent-openssl which was created to make pyopenssl gevent compatible.
Using gevent-openssl will certainly make Python2.7 code run faster but it will not give you speed gains for Python3.7. Find out more in the coming sections which explore Stage-3 and Stage-4.
grequests with gevent-openssl code change:
- As per bin/test_grequests_v2.py, if you plan to use gevent_openssl, you have to add the following code before importing grequests:
- We are letting the absence of gevent_openssl exception pass because we want to use the same code throughout our stages.
Stage-3
-
Repo path: here
- Modules installed (same for both):
- Experiment output
Stage-3 - Why is Python27 fast and Python3.7 slow?
- Python 2.7 socket class and Python 3.7 socket class are the same - urllib3.contrib.pyopenssl.WrappedSocket, which means that the gevent issue that we saw in earlier stages is not at play.
- There is one subtle difference in Python 2.7 profiling output and Python 3.7 profiling output.
- Python2.7 profile output shows calls to pyopenssl recv:
- Python3.7 profile output shows calls to pyopenssl recv_into:
- You can also trace Python 2.7 function calls and Python 3.7 function calls to verify the same.
- Checking gevent_openssl/SSL.py:recv, we see that it overrides urllib3/contrib/pyopenssl.py:recv
- There are no functions in gevent_openssl/SSL.py to override urllib3/contrib/pyopenssl.py:recv_into
- Python 2.7 uses recv call and can leverage gevent-openssl's patched recv function, while Python 3.7 uses recv_into which has no corresponding gevent-openssl function and hence it falls back to urllib3/contrib/pyopenssl.py:recv_into, which is slow.
Stage-4
-
Repo path: here
- Modules installed (same for both):
- Experiment output
Stage-4 - What did we do to make Python3.7 fast?
- We patched gevent_openssl/SSL.py local copy to add support to override urllib3/contrib/pyopenssl.py:recv_into
- The patch adds the following function:
- It is an similar to the existing gevent_openssl/SSL.py:recv function.
- We are wrapping self._connection.recv_into by self.__iowait, which as per gevent_openssl/SSL.py:__iowait does the following:
- It calls the passed io_func e.g recv, recv_into, etc. and as soon as it gets OpenSSL.SSL.WantReadError, which is raised by OpenSSL/SSL.py:_raise_ssl_error it returns control back to gevent/_hub_primitives.pxd
- A pxd file is like a C-header file, which means when we call wait_read we wade into C-extensions created by gevent, which makes our code faster.
- You can toggle the patching and see a corresponding impact on the execution time.
I have opened a Pull request to add this functionality to gevent_openssl.
Summary
- Stage-0 is unpredictable because we are not using virtualenv.
- Stage-1 verifies that the absence of pyopenssl makes both Python 2.7 and Python 3.7 fast.
- Stage-2 verifies that the presence of pyopenssl makes both Python 2.7 and Python 3.7 slow.
- Stage-3 verifies that the presence of gevent_openssl with pyopenssl makes Python 2.7 fast but Python 3.7 slow.
- Stage-4 patches the existing gevent_openssl to make Python 3.7 fast.