See #208 for discussion of why the current API is insufficient.
Here is a demonstration of the problem as far as the current API goes, will eventually become a unit test for new API:
from sys import thread_info
from threading import Thread, current_thread
import sklearn
from threadpoolctl import threadpool_info, threadpool_limits
def assert_limit_is(limit):
print("Checking", current_thread())
for i in threadpool_info():
assert i["num_threads"] == limit
print("OK")
threadpool_limits(limits=2)
assert_limit_is(2)
t = Thread(target=lambda: assert_limit_is(2))
t.start()
t.join()
When run:
Checking <_MainThread(MainThread, started 128633514360960)>
OK
Checking <Thread(Thread-1, started 128632355026624)>
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/usr/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/itamarst/devel/threadpoolctl/theproblem.py", line 17, in <lambda>
t = Thread(target=lambda: assert_limit_is(2))
File "/home/itamarst/devel/threadpoolctl/theproblem.py", line 10, in assert_limit_is
assert i["num_threads"] == limit
AssertionError
Proposed solutions
Noted that for all options, nested Python thread pools are gonna be half-broken given some limiting APIs are process-wide. But will kinda-sorta work if all threads take the same amount of time to run... I have an idea for a completely different approach to deal with that though.
Option 1: Single function custom API
Sketch of proposed API, updated version of #208:
from multiprocessing.pool import ThreadPool
from os import cpu_count
from threadpoolctl import per_thread_limit
def run(n_threads: int):
cores_per_thread = max(1, cpu_count()) // n_threads
with per_thread_limit(cores_per_thread):
with ThreadPool(n_threads, initializer=lambda: per_thread_limit(cores_per_thread)) as pool:
# ... do business logic here ...
Unfortunately per_thread_limit may sometimes need to set limits on the whole process, so it may actually be identical implementation-wise to threadpool_info, initially...
Option 2: Two-phase custom API
This might allow for some tricks I want to try (varied priority levels per thread), will expand after I do some experiments:
from multiprocessing.pool import ThreadPool
from os import cpu_count
from threadpoolctl import per_thread_limit
def run(n_threads: int):
with per_thread_limit(max(1, cpu_count()) // n_threads) as limiter:
with ThreadPool(n_threads, initializer=limiter.limit_in_current_thread) as pool:
# ... do business logic here ...
Option 3: Reuse current API
from multiprocessing.pool import ThreadPool
from os import cpu_count
from threadpoolctl import threadpool_limit_limit
def run(n_threads: int):
cores_per_thread = max(1, cpu_count()) // n_threads
# This will set process-wide limits, for libraries that have that sort of API, or maybe just
# the main's thread limit, depending on setup:
with threadpool_limit(limits=cores_per_thread):
# Tell the ThreadPool to run per-thread limiting, for libraries that have that sort of API:
with ThreadPool(n_threads, initializer=lambda: threadpool_limits(limits=cores_per_thread) as pool:
# ... do business logic here ...
The positive: no new APIs, it's just documentation.
The negative:
- If we get better affordances in the future, this is bad because it's ending up using process wide APIs when it shouldn't. E.g. mkl has both process-wide and current-thread get/set-num-threads APIs.
threadpoolctl could choose to expose both, and then use both as needed, but this API option would preclude taking advantage of that.
See #208 for discussion of why the current API is insufficient.
Here is a demonstration of the problem as far as the current API goes, will eventually become a unit test for new API:
When run:
Proposed solutions
Noted that for all options, nested Python thread pools are gonna be half-broken given some limiting APIs are process-wide. But will kinda-sorta work if all threads take the same amount of time to run... I have an idea for a completely different approach to deal with that though.
Option 1: Single function custom API
Sketch of proposed API, updated version of #208:
Unfortunately
per_thread_limitmay sometimes need to set limits on the whole process, so it may actually be identical implementation-wise tothreadpool_info, initially...Option 2: Two-phase custom API
This might allow for some tricks I want to try (varied priority levels per thread), will expand after I do some experiments:
Option 3: Reuse current API
The positive: no new APIs, it's just documentation.
The negative:
threadpoolctlcould choose to expose both, and then use both as needed, but this API option would preclude taking advantage of that.