Skip to content

API for setting limits for use by Python thread pools #215

Description

@itamarst

See #208 for discussion of why the current API is insufficient.

Here is a demonstration of the problem as far as the current API goes, will eventually become a unit test for new API:

from sys import thread_info
from threading import Thread, current_thread

import sklearn
from threadpoolctl import threadpool_info, threadpool_limits

def assert_limit_is(limit):
    print("Checking", current_thread())
    for i in threadpool_info():
        assert i["num_threads"] == limit
    print("OK")


threadpool_limits(limits=2)
assert_limit_is(2)

t = Thread(target=lambda: assert_limit_is(2))
t.start()
t.join()

When run:

Checking <_MainThread(MainThread, started 128633514360960)>
OK
Checking <Thread(Thread-1, started 128632355026624)>
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.9/threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "/home/itamarst/devel/threadpoolctl/theproblem.py", line 17, in <lambda>
    t = Thread(target=lambda: assert_limit_is(2))
  File "/home/itamarst/devel/threadpoolctl/theproblem.py", line 10, in assert_limit_is
    assert i["num_threads"] == limit
AssertionError

Proposed solutions

Noted that for all options, nested Python thread pools are gonna be half-broken given some limiting APIs are process-wide. But will kinda-sorta work if all threads take the same amount of time to run... I have an idea for a completely different approach to deal with that though.

Option 1: Single function custom API

Sketch of proposed API, updated version of #208:

from multiprocessing.pool import ThreadPool
from os import cpu_count

from threadpoolctl import per_thread_limit


def run(n_threads: int):
    cores_per_thread = max(1, cpu_count()) // n_threads

    with per_thread_limit(cores_per_thread):
        with ThreadPool(n_threads, initializer=lambda: per_thread_limit(cores_per_thread)) as pool:
            # ... do business logic here ...

Unfortunately per_thread_limit may sometimes need to set limits on the whole process, so it may actually be identical implementation-wise to threadpool_info, initially...

Option 2: Two-phase custom API

This might allow for some tricks I want to try (varied priority levels per thread), will expand after I do some experiments:

from multiprocessing.pool import ThreadPool
from os import cpu_count

from threadpoolctl import per_thread_limit


def run(n_threads: int):
    with per_thread_limit(max(1, cpu_count()) // n_threads) as limiter:
        with ThreadPool(n_threads, initializer=limiter.limit_in_current_thread) as pool:
            # ... do business logic here ...

Option 3: Reuse current API

from multiprocessing.pool import ThreadPool
from os import cpu_count

from threadpoolctl import threadpool_limit_limit


def run(n_threads: int):
    cores_per_thread = max(1, cpu_count()) // n_threads

    # This will set process-wide limits, for libraries that have that sort of API, or maybe just
    # the main's thread limit, depending on setup:
    with threadpool_limit(limits=cores_per_thread):
        # Tell the ThreadPool to run per-thread limiting, for libraries that have that sort of API:
        with ThreadPool(n_threads, initializer=lambda: threadpool_limits(limits=cores_per_thread) as pool:
            # ... do business logic here ...

The positive: no new APIs, it's just documentation.

The negative:

  • If we get better affordances in the future, this is bad because it's ending up using process wide APIs when it shouldn't. E.g. mkl has both process-wide and current-thread get/set-num-threads APIs. threadpoolctl could choose to expose both, and then use both as needed, but this API option would preclude taking advantage of that.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions