API for setting limits for use by Python thread pools

See #208 for discussion of why the current API is insufficient.

Here is a demonstration of the problem as far as the current API goes, will eventually become a unit test for new API:

```python
from sys import thread_info
from threading import Thread, current_thread

import sklearn
from threadpoolctl import threadpool_info, threadpool_limits

def assert_limit_is(limit):
    print("Checking", current_thread())
    for i in threadpool_info():
        assert i["num_threads"] == limit
    print("OK")


threadpool_limits(limits=2)
assert_limit_is(2)

t = Thread(target=lambda: assert_limit_is(2))
t.start()
t.join()
```

When run:

```
Checking <_MainThread(MainThread, started 128633514360960)>
OK
Checking <Thread(Thread-1, started 128632355026624)>
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.9/threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "/home/itamarst/devel/threadpoolctl/theproblem.py", line 17, in <lambda>
    t = Thread(target=lambda: assert_limit_is(2))
  File "/home/itamarst/devel/threadpoolctl/theproblem.py", line 10, in assert_limit_is
    assert i["num_threads"] == limit
AssertionError
```

## Proposed solutions

Noted that for all options, nested Python thread pools are gonna be half-broken given some limiting APIs are process-wide. But will kinda-sorta work if all threads take the same amount of time to run... I have an idea for a completely different approach to deal with that though.

### Option 1: Single function custom API

Sketch of proposed API, updated version of #208:

```python
from multiprocessing.pool import ThreadPool
from os import cpu_count

from threadpoolctl import per_thread_limit


def run(n_threads: int):
    cores_per_thread = max(1, cpu_count()) // n_threads

    with per_thread_limit(cores_per_thread):
        with ThreadPool(n_threads, initializer=lambda: per_thread_limit(cores_per_thread)) as pool:
            # ... do business logic here ...
```

Unfortunately `per_thread_limit` may sometimes need to set limits on the whole process, so it may actually be identical implementation-wise to `threadpool_info`, initially...

### Option 2: Two-phase custom API

This might allow for some tricks I want to try (varied priority levels per thread), will expand after I do some experiments:

```python
from multiprocessing.pool import ThreadPool
from os import cpu_count

from threadpoolctl import per_thread_limit


def run(n_threads: int):
    with per_thread_limit(max(1, cpu_count()) // n_threads) as limiter:
        with ThreadPool(n_threads, initializer=limiter.limit_in_current_thread) as pool:
            # ... do business logic here ...
```

### Option 3: Reuse current API

```python
from multiprocessing.pool import ThreadPool
from os import cpu_count

from threadpoolctl import threadpool_limit_limit


def run(n_threads: int):
    cores_per_thread = max(1, cpu_count()) // n_threads

    # This will set process-wide limits, for libraries that have that sort of API, or maybe just
    # the main's thread limit, depending on setup:
    with threadpool_limit(limits=cores_per_thread):
        # Tell the ThreadPool to run per-thread limiting, for libraries that have that sort of API:
        with ThreadPool(n_threads, initializer=lambda: threadpool_limits(limits=cores_per_thread) as pool:
            # ... do business logic here ...
```

The positive: no new APIs, it's just documentation.

The negative:

* If we get better affordances in the future, this is bad because it's ending up using process wide APIs when it shouldn't. E.g. mkl has both process-wide and current-thread get/set-num-threads APIs. `threadpoolctl` could choose to expose both, and then use both as needed, but this API option would preclude taking advantage of that.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

API for setting limits for use by Python thread pools #215

Proposed solutions

Option 1: Single function custom API

Option 2: Two-phase custom API

Option 3: Reuse current API

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

API for setting limits for use by Python thread pools #215

Description

Proposed solutions

Option 1: Single function custom API

Option 2: Two-phase custom API

Option 3: Reuse current API

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions