tl;dr; I have a lot of code that does response = requests.get(...) in various Python projects. This is nice and simple but the problem is that networks are unreliable. So it's a good idea to wrap these network calls with retries. Here's one such implementation.

The First Hack


import time
import requests

# DON'T ACTUALLY DO THIS. 
# THERE ARE BETTER WAYS. HANG ON!

def get(url):
    try:
        return requests.get(url)
    except Exception:
        # sleep for a bit in case that helps
        time.sleep(1)
        # try again
        return get(url)

This, above, is a terrible solution. It might fail for sooo many reasons. For example SSL errors due to missing Python libraries. Or the URL might have a typo in it, like get('http:/www.example.com').

Also, perhaps it did work but the response is a 500 error from the server and you know that if you just tried again, the problem would go away.



# ALSO A TERRIBLE SOLUTION

while True:
    response = get('http://www.example.com')
    if response.status_code != 500:
        break
    else:
        # Hope it won't 500 a little later
        time.sleep(1)

What we need is a solution that does this right. Both for 500 errors and for various network errors.

The Solution

Here's what I propose:


import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry


def requests_retry_session(
    retries=3,
    backoff_factor=0.3,
    status_forcelist=(500, 502, 504),
    session=None,
):
    session = session or requests.Session()
    retry = Retry(
        total=retries,
        read=retries,
        connect=retries,
        backoff_factor=backoff_factor,
        status_forcelist=status_forcelist,
    )
    adapter = HTTPAdapter(max_retries=retry)
    session.mount('http://', adapter)
    session.mount('https://', adapter)
    return session

Usage example...


response = requests_retry_session().get('https://www.peterbe.com/')
print(response.status_code)

s = requests.Session()
s.auth = ('user', 'pass')
s.headers.update({'x-test': 'true'})

response = requests_retry_session(session=s).get(
    'https://www.peterbe.com'
)

It's an opinionated solution but by its existence it demonstrates how it works so you can copy and modify it.

Testing The Solution

Suppose you try to connect to a URL that will definitely never work, like this:


t0 = time.time()
try:
    response = requests_retry_session().get(
        'http://localhost:9999',
    )
except Exception as x:
    print('It failed :(', x.__class__.__name__)
else:
    print('It eventually worked', response.status_code)
finally:
    t1 = time.time()
    print('Took', t1 - t0, 'seconds')

There is no server running in :9999 here on localhost. So the outcome of this is...

It failed :( ConnectionError
Took 1.8215010166168213 seconds

Where...

1.8 = 0 + 0.6 + 1.2

The algorithm for that backoff is documented here and it says:

A backoff factor to apply between attempts after the second try (most errors are resolved immediately by a second try without a delay). urllib3 will sleep for: {backoff factor} * (2 ^ ({number of total retries} - 1)) seconds. If the backoff_factor is 0.1, then sleep() will sleep for [0.0s, 0.2s, 0.4s, ...] between retries. It will never be longer than Retry.BACKOFF_MAX. By default, backoff is disabled (set to 0).

It does 3 retry attempts, after the first failure, with a backoff sleep escalation of: 0.6s, 1.2s.
So if the server never responds at all, after a total of ~1.8 seconds it will raise an error:

In this example, the simulation is matching the expectations (1.82 seconds) because my laptop's DNS lookup is near instant for localhost. If it had to do a DNS lookup, it'd potentially be slightly more on the first failure.

Works In Conjunction With timeout

Timeout configuration is not something you set up in the session. It's done on a per-request basis. httpbin makes this easy to test. With a sleep delay of 10 seconds it will never work (with a timeout of 5 seconds) but it does use the timeout this time. Same code as above but with a 5 second timeout:


t0 = time.time()
try:
    response = requests_retry_session().get(
        'http://httpbin.org/delay/10',
        timeout=5
    )
except Exception as x:
    print('It failed :(', x.__class__.__name__)
else:
    print('It eventually worked', response.status_code)
finally:
    t1 = time.time()
    print('Took', t1 - t0, 'seconds')

And the output of this is:

It failed :( ConnectionError
Took 21.829053163528442 seconds

That makes sense. Same backoff algorithm as before but now with 5 seconds for each attempt:

21.8 = 5 + 0 + 5 + 0.6 + 5 + 1.2 + 5

Works For 500ish Errors Too

This time, let's run into a 500 error:


t0 = time.time()
try:
    response = requests_retry_session().get(
        'http://httpbin.org/status/500',
    )
except Exception as x:
    print('It failed :(', x.__class__.__name__)
else:
    print('It eventually worked', response.status_code)
finally:
    t1 = time.time()
    print('Took', t1 - t0, 'seconds')

The output becomes:

It failed :( RetryError
Took 2.353440046310425 seconds

Here, the reason the total time is 2.35 seconds and not the expected 1.8 is because there's a delay between my laptop and httpbin.org. I tested with a local Flask server to do the same thing and then it took a total of 1.8 seconds.

Discussion

Yes, this suggested implementation is very opinionated. But when you've understood how it works, understood your choices and have the documentation at hand you can easily implement your own solution.

Personally, I'm trying to replace all my requests.get(...) with requests_retry_session().get(...) and when I'm making this change I make sure I set a timeout on the .get() too.

The choice to consider a 500, 502 and 504 errors "retry'able" is actually very arbitrary. It totally depends on what kind of service you're reaching for. Some services only return 500'ish errors if something really is broken and is likely to stay like that for a long time. But this day and age, with load balancers protecting a cluster of web heads, a lot of 500 errors are just temporary. Obivously, if you're trying to do something very specific like requests_retry_session().post(...) with very specific parameters you probably don't want to retry on 5xx errors.

Comments

Post your own comment
Anonymous

Your first ( ostensibly "horrible") solution works the best for me, the rest is too verbose.

Anonymous

"robust code is too verbose"

yikes

Anonymous

"Obivously, if you're trying to do something very specific like requests_retry_session().post(...) with very specific parameters you probably don't want to retry on 5xx errors."

Actually, this wouldn't work with the current solution, retry is not applied for POST by default - it needs to be specifically white listed if it's wanted (bite my ass ;) )

Otherwise, thanks for the great article!

Alexander

I had to make more than 8 000 requests. My script had been stumbling after several hundreds requests. Your solution—requests_retry_session()—saved my day. Thanks!

Franck

really cool thx !

Tai

nice, thank you!

serena

cool!

Alexander

This is awesome! Thank you!

Felipe Dornelas

Networks are unreliable, but TCP is fault-tolerant. The problem is that application servers are unreliable.

Anonymous

How do I also handle 404 errors with this?

Anonymous

Exactly, this hangs when hitting 404 errors...

Anonymous

The author has said - figure out what errors are retry-able and retry those. is 404 retry-able ?!

Kiprono

Why would you want to retry on 404?

Anonymous

There is no need to set connect retries, or read retries, total retries takes precedent over the rest of the retries, so set it once there and it works for read, redirect, connect, status retries

Rob

Awesome! This resolved my issue!

Max

Pretty cool! Thank you!!!

Alex

You set status_forcelist, but status kwarg is set to None as default (according to urllib3.util.retry.Retry docs), so retries on bad-statuses-reason will never be made.
Should we specify connect=retries or I have misunderstanding?
P.S. sorry for my english

suineg

came across this review because I'm getting this problem, how to solve it?

suineg

I decided so but I'm not sure if it's right because get an error:

Add:
status=3,
method_whitelist=frozenset(['POST'])

err: requests.exceptions.RetryError: HTTPConnectionPool(host='httpbin.org', port=80): Max retries exceeded with url: /status/504 (Caused by ResponseError('too many 504 error responses',))

Jack

Did you ever sort this?

Pasystem

Good job!!! Thank you!!!

Miro Hrončok

Googled this and it worked like a charm! Thank You.

Alex

how to use proxy?

Chris

I use this:

# sample proxy, does not work, set your own proxy
proxies = {"https": "https://381.177.84.291:9040"}

# create session
session = self.requests_retry_session()

 # get request
response = session.get(url, proxies=proxies)

Alex

Thank you, this option also works:
resp = requests_retry_session().post(
    'http://httpbin.org/post',
    proxies= {"http": "http://381.177.84.291:9040"}
)

Ron

Love it!

However, is there a way to print/log all reponses?
E.g. When it retries 3 times, print the status code of all three requests?

Peter Bengtsson

I doubt it but requests uses logging. You just need to configure your logging to turn it up so you can see these kinds of things happening.

Jeenge

Thanks! made my code much more reliable. Thanks for posting this for everyone to use.

evandrix

there is a typo - "sesssion"

Anonymous

trying the time out i get
NameError: global name 'time' is not defined

Peter Bengtsson

You need to inject ‘import time’ first.

OwN

How do you propose dealing with this situation?

https://stackoverflow.com/questions/56482980/python-requests-not-throwing-an-exception-when-using-session-with-httpadapter

I can't seem to get anyone to respond, and my script is totally broken at the moment.

Anthony Camilo

Did you ever get an answer, i'm on the same boat.

Brikend Rama

Thank you for your code snippet. Works great

Jeff Walters

Excellent solution. Thank you for posting this article/solution! I had no idea that the HTTPAdapters existed. You just saved me a few hours of my life.

DEnilson Grupp Fernandes

Excellent. Thanks for that

David

Does not work if requests fails to read a chunked response :(

David

The following will setup an HTTP server to repro (set the sleep to be greater than your read timeout):
import ssl
from time import sleep
from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer

PORT = 8001

class CustomHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        print "SLEEP"
        self.send_response(200)
        self.send_header('Transfer-Encoding', 'chunked')
        self.send_header('Content-type', 'text/plain')
        self.end_headers()
        self.wfile.write('Hello')
        sleep(5)
        self.wfile.write(', world!')
        print "WAKE"
        pass

httpd = HTTPServer(("", PORT), CustomHandler)
httpd.socket = ssl.wrap_socket(httpd.socket, keyfile='/home/local/ANT/schdavid/tmp/key.pem', certfile='/home/local/ANT/schdavid/tmp/cert.pem')

try:
    httpd.serve_forever()
except KeyboardInterrupt:
    print
    print 'Goodbye'
    httpd.socket.close()

VJ

Thanks... it worked well for me. Good Article...

Anonymous

It was simple but elegant. It covered almost everything. Keep up the good work!

Guillermo Chussir

Very good idea. I'll try this on my scripts. Thanks!

Anonymous

How to do mock unit testing on request_retry_session?

Peter Bengtsson

Do you have to? Also, doesn't that depend greatly on how you mock `requests`?

Shill Shocked

How could you apply this to the Spotipy library?

Peter Bengtsson

Use https://pypi.org/project/redo/ instead and watch for certain HTTPErrors

Shill Shocked

Thanks man. I will look into, so far I'm having luck with catching http_status from exceptions in preliminary testing. I'll see if redo is easier to implement.

Anonymous

I think you just saved my bachelor thesis

Xavier Bustamante Talavera

Thank you for this! Networks are unreliable systems, so it is strange this is just not even by default.
I took this + session timeouts to make a mini package: https://pypi.org/project/retry-requests/

Jamie

Thank you for the blog post, it is very helpful. What is the license of the code in the blog post?

Peter Bengtsson

No license. Help yourself.

Dirk Bangel

This mean we can use it for commercial purpose without any restriction?

Peter Bengtsson

Yes.

Stas

I hope you would consider giving back too, if you dont do it already :-)

Shad Sterling

5xx error responses might include a retry-after header, which you should honor. See https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Retry-After

Peter Bengtsson

Good point! But it belongs to requests.packages.urllib3.util.retry.Retry
Do you know if it supports it already?

Shad Sterling

It shouldn't. retry-after could give you a date next week, that lib shouldn't hang your script until then.

Daniela

Great solution! How can it be adapted so that if the request fails for a SSL certificate issue it retries but this time with verify=false. I've also asked on Stackoverflow but receive no reply: https://stackoverflow.com/questions/62258005/requests-retry-with-verify-false-if-sslerror
 Thanks for your help.

Peter Bengtsson

Something like this?

```
from requests.exceptions import SSLError
session = requests_retry_session()
try:
   r = session.get(url)
except SSLError:
   r = session.get(url, verify=False)
```

Daniela

Thank you so much Peter. I'll give it a go!

Fernando

Thank you! I am implementing something like this, but how can I have a 2 minute interval between retries? Should I use timeout = 120?

Swee Tat Lim

Hi,

I tried your code with the following:

```
def get_session():
    result = requests.Session()
    retries = Retry(
        total=3, connect=3, read=3,
        redirect=3, status=3, backoff_factor=1,
        status_forcelist=[429, 500, 502, 503, 504],
        method_whitelist=[
            "HEAD", "GET", "PUT", "DELETE", "OPTIONS", "TRACE"],
    )
    adapter = HTTPAdapter(max_retries=retries)
    result.mount("http://", adapter)
    result.mount("https://", adapter)
    return result
```

In my unit test, I did the following:
```
    @responses.activate
    def test_get_session_500_retry(self):
        responses.add(responses.POST,
                      self.url,
                      status=500,
                      json={'something': 'nothing'}
                      )

        session = get_session()
        session.hooks["response"] = [logging_hook]
        print(f"url({self.url})")
        wait_time = datetime.now() + timedelta(seconds=10)
        r = session.post(self.url, timeout=10)
        waited_time = wait_time - datetime.now()
        self.assertGreaterEqual(waited_time, timedelta(seconds=0))
        self.assertEqual(r.status_code, 500)
        assert responses.assert_call_count(
            self.url, 1) is True
```

The strange part to me is that the assert_call_count is 1 instead of 3 which I set in the config

Peter Bengtsson

I think all bets are off when you use one of those request/response mocking libs.

Swee Tat Lim

How do you test reliably that the retries in the call works as expected?

Anonymous

use something like mockoon (https://mockoon.com) and set up HTTP routes for 200 OK and a couple statuses in your status_forcelist. turn on random responses and you'll see it working in the logs.

Stas

Very nice piece of code :-)
And thank you for sharing.

Anonymous

POST not working

Anonymous

I like your first solution.

Anonymous

great article! works great inside a custom api wrapper.

DFP

If the server response is a 503 (Service Unavailable) mainly because an update or maintenance, I would check for the header "Retry-After" and if there I will retry after the those seconds. Hope this helps anyone :)

Your email will never ever be published.

Related posts

Go to top of the page