{ background }

  • iOS Background Transfer

    The certificate fiasco, or how we solved the mystery of the resume rate limiter

    Posted on by Agnes Vasarhelyi

    Tags:

    I first heard about background transfer when I was giving a ⚡️ talk at Realm back in 2015. Gwendolyn Weston gave an excellent presentation on background transfer services later that day. Her talk was the first resource that I looked up a few weeks ago when we decided to implement background transfer support into our most precious Topology Eyewear app. 📱 👉 👓

    It’s been a journey full of surprises and learnings, and felt like something worth sharing.

    The idea

    Let me give you a little context on why we wanted background processing. The main user journey in our app is

    1. Our user takes a video selfie.
    2. She waits with the app open while the video is uploaded.
    3. The server processes the video.
    4. Once it’s finished processing, she can virtually try on, design, and purchase her perfectly fit Topology glasses right in the app.
    5. We warm up the robots to start the manufacturing.

    So the app has a video to upload to a server and then download the result, when that’s ready with processing the video. The result is the 3D model of the user’s face that we render the glasses on, in the AR view you can see below.


    Bad network

    Since it’s critical to be able to upload and download files in this flow, we used to ask users to “stay on this screen” during step 2. That works but it’s not the most premium experience. And when someone’s on a bad network, it gets more and more annoying. We are talking about a few megabytes up and then a few megabytes down, but that translates to minutes on a slower network connection.

    Since it “takes forever”, users background the app no matter what you ask from them. They have other stuff to do on their phones, but backgrounding the app when you don’t support background transfer simply causes the transfer to fail.

    Accurate notifications

    Once the upload succeeds, the computer vision algorithms work on the video. It takes approximately 1 minute per video, but this is not always the case. It’s not that the processing work is so unpredictable; there might be all kinds of other delays adding up to the final processing time, delays which the iOS app is unable to anticipate. But since the server keeps working, it can deliver background content updates that let the app fire local, user notifications once everything’s done.

    So the plan was to support a flow like this:

    1. User takes a selfie video, immediately presses home (backgrounds the app).
    2. App uploads selfie in the background.
    3. When processing is done, the server sends a silent push notification to the app.
    4. App downloads the 3D face model.
    5. User sees a notification, opens the app, and everything’s ready.

    Good news, this is how the app works today as of v1.1.4! 🎉

    Bad news, it took us weeks to figure out because certain pieces of this puzzle were under-documented, almost impossible to debug, and nerve-wracking to test. At the end, we needed to implement some fallbacks because the flow above turned out to be the very, very optimistic scenario.

    The best news is, we are here to share the story, so you can spare some time and headache in case you were about to jump into implementing background transfer yourself.

    “Background transfer to the rescue!” – famous last words

    Background transfer is the ability to transfer data over the network when the app is in the background.

    As a reminder, iOS applications have several app states:

    1. Active: receiving events in the foreground, executing code.
    2. Inactive: not receiving events, but can execute code for short periods of time. A brief state between active and background.
    3. Background: background, executing code if background transfer is on and implemented.
    4. Suspended: background, not executing code.
    5. Not running: application is not running at all.

    When your app is not in the foreground (active, inactive), it normally goes to sleep (suspended) quite fast. When memory runs low, the system might decide to terminate your sleeping app, especially if it consumes large amounts of memory in the background.

    Memory consumption is not the only factor when iOS is judging your app. What also counts is the time your app spends executing in the background and the number of times it wants to be woken up. Keep these in mind when implementing background transfer, because you will want to be a good citizen. iOS ranking your app down means delays in your background execution window, and as mentioned, sometimes might lead to termination.

    Switching to background sessions will just work, right?

    A background session is what lets you manage the work to be done in the background through the URLSession API.

    I started implementing this feature after reading the Apple docs. I updated all our URLSessionConfigurations, replacing “default” to “background,” and waited for everything to work. It was compiling fine, with no sign of the weeks of work ahead. Clear skies, sunshine – I was full of hope. 🏖

    And then nothing worked, for reasons. Not just one reason, but many, many reasons. In the end building this feature required diving a lot deeper than “just looking at the API docs”. It ends up providing an interesting tour of parts of iOS that are not very well known, since they work so hard to disappear… into the background. Let’s take them one by one.

    Fancy using completion handlers? Don’t!

    “With background sessions, the actual transfer is performed by a separate process.”

    A background task is the unit of work represented by an URLSessionTask object, created on and managed by a background session. The task doesn’t know about the parent session being a background session; it’s the same kind of task you’d use either way.

    But since your app’s process is not running this transfer, if your tasks are using the callback based URLSession APIs, it’s time for a refactor! Background sessions don’t support the callback based APIs. It’ll produce a runtime error if you try them. They cannot call back to a closure that no longer exists, which is the case if your process was terminated.

    You can make it work with URLSession delegates. The system decides about whom to notify and when, after all tasks for a given session are complete. If the tasks finish while your app is running in the foreground or the background, the session notifies its delegate you have previously set, immediately. In case your app was terminated in the background, the system relaunches it at some point, and notifies the AppDelegate that all tasks are complete, to let you process the results there.

    I started suspecting this was going to require a bit more than just changing one configuration. And then more problems started to surface.


    “Why do my background tasks complete 30 mins late?”

    Our next question was, why sometimes background processing took so long.

    “For time-insensitive tasks, you can set the isDiscretionary property. The system takes this property as a hint to wait for optimal conditions to perform the transfer. For example, the system might wait until the device is plugged in or connected to Wi-Fi.”

    This was something we realized way too late – the importance of the isDiscretionary property on URLSessionConfiguration. It defaults to false. But if it’s set to, or treated as being true, then it tells iOS, hey, no rush, take your time with delivering the information that our app is waiting for in the background.

    “In cases where the transfer is initiated while the app is in the background, the configuration object’s isDiscretionary property is treated as being true.”

    If you scroll up to our very, very optimistic processing plan, you can see that the foundation of our background flow is based on immediate stage progression. Meaning, once the background upload is done, we initiate a task in the background to launch the compute job on the server. Once that’s done, we initiate the download task in the background. Now you see what the problem is. It all gets delayed based on the iPhone’s mood and now we’re back to square one with our premium user experience. 🤦

    There is no way around this.

    😬 😱 😳 🤔

    Let’s just take a deep breath and think about the reasons.

    iOS provides a premium experience to its users. Part of it is keeping our greedy hands off the resources. Imagine when your battery is dying, you barely have network, and you are waiting for your Lyft to arrive. Do you want iOS to upload videos in the background, or do you want it to suspend everything else and let you do your thing? You’re paying full attention to the app that is in the foreground, after all. This is how iOS helps you as a user to achieve that. That’s the reason why people buy iPhones – because it works.

    As a third party app developer, you need to understand the limitations this implies.

    So how do I know if my tasks eventually went through?

    To find out when your tasks did complete when your app was terminated while doing background transfer, you can implement that one function on the AppDelegate for handling the results of those sessions. The tricky part here, is that it provides a completion handler that’s very important for you to call it as soon as possible. Remember, iOS judges you if you spend too much time executing code in the background. This is how it measures it.

    If you’re interested to see how much time you have left in the background, there is a public var on UIApplication.shared, called backgroundTimeRemaining. It’s good to know, but you better aim for never reaching that limit.

    Silent pushes – they arrive whenever they feel like it!

    How does iOS punish you? One way is by not waking you up in the background to receive silent push notifications.

    Silent pushes are the only way to wake up your app from your server, indicating there’s work to do, such as download data. This was exactly what we needed, to tell the app to download the server’s computer vision results.

    We already know the system can terminate our app when it’s in the background, or suspended, and the user can also terminate your app using the multitasking UI. In both cases, the app’s process is killed, and then it never receives silent notifications again.

    But even if the system does not terminate your app, it can still simply never wake it up, or wait hours to wake it up, for delivering silent notifications.

    Before describing some of the normal practices that can surprisingly provoke this punishment, let’s review the details of silent push.

    No need to ask for permissions

    First of all: it’s unrelated to user permission. You can registerForRemoteNotifications and receive a device token allowing you to send remote silent notifications to a device, without asking the user for permissions.

    To do this, you just need to make sure to format your APN payload correctly on your server. Setting content-available to 1, and not adding any alert methods, is how you let iOS know what kind of push message it is. See Configuring a Silent Notification in the docs for more details.

    You will be notified when your push arrives to your app, through a function on AppDelegate. Same game as with the function for background session completion, you will be passed a completion handler that you must call, the sooner the better. Remember, iOS is watching you. ⏱👀

    Background fetch, update, refresh?

    To support silent notifications, you’ll need to turn the Background Modes capability on, and select “Remote notifications”. Despite what others might tell online, you won’t need to turn “Background fetch” on. It’s a slightly different thing, essentially handled the same. Background fetch is also a way to update content in the background, but scheduled by the app (poll), not initiated by a server (push).

    After enabling the silent push capability, you can see that your app appears on the list of apps using background app refresh in the Settings app. If the user turns that switch off there, you won’t receive the silent notifications.

    Device tokens do not identify the device

    For supporting any kind of push notifications, silent or not, you need to install certificates on your server communicating to APNs servers. You will need to point your app built for development to a server that has the dev certificate installed and communicates to the sandbox APNs. You will need to point your production app to a server that has the distribution push certificate installed and communicates to production APNs. This could be the same server, but then the server needs to know if it’s a development or production app talking to it.

    This is because anything built from Xcode, no matter the configuration (yes, Release too!) will be considered as development and will result in a Bad device token error on your server if you try to send push messages to the production APNs environment. This is because iOS gives you a device token valid for development when asking from a dev build, and a device token only valid for production APNs when asking from a production build. So a device token alone doesn’t identify the device. For that, you need the token plus the knowledge of whether it came from a development or production build.

    Don’t forget to call registerForRemoteNotifications on every launch, the token can change across launches.

    Client authentication does not work 🔓

    The biggest surprise of this journey was when authentication between our app and our server broke the minute we switched to background sessions. It stopped working in both the foreground and the background.

    According to Apple, client authentication doesn’t work with background sessions. Don’t even try. I filed a new bug report (35126178) to make sure we increase the counter representing the number of people wanting this to work.

    Custom server-trust does not work either 🔓

    Since we believe in the highest level of security possible, we don’t trust anyone 🙃. We had implemented certificate pinning to ensure that the app only talked to our server, and client certificate authentication to ensure that our server only talked to our app. This meant we did not even need to trust the certificate authorities.

    But it failed silently inside the completion handler of the URLSession callback for authentication challenges, and produced a log message on the console like CredStore - copyIdentPrefs - Error copying Identity cred. There was no way to understand what the problem was. But we had a wild guess: it turned out using a CA-signed certificate instead of pinning to a self-signed one works.

    But according to Apple, server trust validation should work in the background. Bug filed (35126815), because it only seems to be working with CA-signed certificates. Pinning to a CA-signed certificate is possible too, even if you have a rolling certificate (like Let’s Encrypt) in place, as long as you’re pinning against the public key, rather than the certificate itself. You just need to make sure the public key stays the same across all new certificates. Here’s a great article explaining why certificate pinning is a good way to defend yourself from man-in-the-middle attacks.

    Peter needed to change our servers to use a new certificate and new authentication methods. That’s quite some work on the backend for supporting the app’s background processing capabilities. 😬

    Even standard server-trust authentication will punish you

    But even after re-engineering for only what works, we learned that iOS will still punish you for some approaches that are fully supported. Any server-trust authentication is unwise.

    Please make sure to read Quinn the eskimo’s responses to Alexis near the bottom of the dev forum thread carefully, if you want to understand why Apple doesn’t encourage this kind of authentication method when implementing background sessions.

    “For background sessions specifically, the resumes necessary to handle this authentication challenge counter against the resume rate limiter.
    Note This applies to all authentication challenges, not just server trust challenges.”

    Just one last thought before jumping to our next section about the mysterious resume rate limiter… 👻

    As Quinn suggests,

    “I generally recommend that folks avoid authentication challenges and instead implement a custom authentication scheme.”

    For example, to send a custom header field with a transfer token as authentication, that was previously provided by the server in a foreground session. There are more suggestions in that thread to provide better security without risking smooth background processing by hitting the resume rate limit.

    The little-known Resume Rate Limiter is the thing that will punish you

    Remember metric number three from the list of what to look out for with background transfer – the number of times your app wants to be woken up in the background? That is what the resume rate limiter is there to watch and to regulate.

    It all started to make sense… 💡

    You give iOS a piece of data to upload to, or download from a server. Your app goes to the background. iOS has a great deal of resource management implemented, so it is able to find a way to transfer that data with low priority, when it’s convenient to do so. All it needs to do is wake your app up and notify the proper delegate.

    “N downloads, 1 resume.”

    Except, if you’re implementing things like the authentication challenge, for those URL requests.

    “N downloads you need N+1 resumes.”

    These things add up, and iOS will just rank your app down, delaying your execution window. Let me show you how bad we, at Topology, are doing with resumes:

    1. Upload task completed
    2. Data task for job submission authentication challenge for server trust validation
    3. Data task for job submission completed
    4. Silent remote notification arrived
    5. Download task completed

    This is 5 background resumes if the user has only submitted one selfie before going to the background.

    The punishment is (1) hidden by Xcode, (2) varies by phone, and (3) is severe and random

    These three facts above makes you very frustrated when testing, because it starts getting worse as you test more. We unfortunately realized this after a few days of struggling with debugging the uncatchable issues.

    A funny sign was that after building the first internal testing version, others saw the whole flow happening, but not me. You can imagine what iOS on my phone was thinking about the Topology app after working on this feature constantly for weeks. My resume rate limiter was probably like “Lol, Topology app. No way! Too much!”

    But your app will always resume in the background at the expected times if you’re plugged into Xcode. 🚨

    That was a shocking find too, because we had no idea about the rate limiter until that point, when we realized background features just don’t work reliably in TestFlight builds. It’s because you have a free pass with the rate limiter when running in the development environment. In practice, it meant that all of the planned background processing worked in production as expected, except that we could measure the time between stages, as stretching to hours sometimes.

    “dasd” is the best friend you didn’t know you needed

    My teammate Christopher showed us this great debugging move, where you can find the Duet Activity Scheduler Daemon’s logs on the device’s console that’s responsible for delivering notifications in iOS. From that you can extract information about whether your messages went through or not, and why. There is a score, e.g. 0.76 for every notification, and a decision, like Can Proceed. That verified all our theories we’d been putting together from the guesses we had about how the system works.

    1
    2
    3
    4
    5
    6
    7
    Nov 6 14:19:32 vasarhelyias-iPhone-7-Plus dasd(DuetActivitySchedulerDaemon)[123] ...:[
    {name: ApplicationPolicy, policyWeight: 5.000, response: {Decision: Can Proceed, Score: 0.58}}
    {name: DeviceActivityPolicy, policyWeight: 10.000, response: {Decision: Can Proceed, Score: 0.25}}
    {name: ChargerPluggedInPolicy, policyWeight: 10.000, response: {Decision: Can Proceed, Score: 0.50, Rationale: }}
    ] sumScores:32.260200,
    denominator:46.910000,
    FinalDecision: Can Proceed FinalScore: 0.687704 }

    A new hope

    Since we can’t rely on users walking around with fully charged iPhones connected to excellent WiFi with all their apps closed and not supporting background updates, we needed to add some fallbacks to support the non-optimistic but common scenarios. Still, this is already a huge improvement to how our app used to work, especially for the worst case scenario. 🎊

    The UX workaround we have now is to schedule local notifications with estimated end dates for the computing phase, so we don’t rely only on the silent push to trigger the stage progression of the selfie. That allows users to go back and open the app when most of the work is done and let them watch the app downloading the 3D face model.

    The most obvious improvement for the future is to decrease the number of resume attempts necessary. That involves changes on our backend to provide a different API and functionality to support fewer stages of job processing on the client. Also, we could make changes to how authentication works, like implementing the suggestions by Quinn about the transfer token. Good thing we always have more coding to do! 😬

    I hope this was helpful in understanding our process and what we learned about background transfers. If you have any questions or suggestions, please do let me know! 🙏 You can find me as @vasarhelyia on Twitter.

    Special thanks to Alexis and Greg for helping me making this article awesome! 🤗

Topology makes custom eyeglasses and sunglasses, perfectly sculpted to fit one person at a time. Our app combines video capture, 3D rendering, and Core Motion (among other things) to create a premium experience for our users. Sound interesting? We’re hiring, please get in touch!

© 2017–2018 - Topology Eyewear