Implement SLF4Jm JUL, and Log4J2 providers for Java SDK#68725
Implement SLF4Jm JUL, and Log4J2 providers for Java SDK#68725uranusjr wants to merge 13 commits into
Conversation
d73ec53 to
7c0f104
Compare
A new airflow-sdk-slf4j artifact is added to allow SLF4J to be seamlessly forwarded to Airflow's task log infrastructure. Use 'org.apache.airflow:airflow-sdk-slf4j' to enable this.
This allows log4j users to write logs into Airflow directly. Note that this uses Java since the appender is registered with an annotation, and the annotation processor can only handle Java. (There's a Kotlin bridge, but the class is small enough it doesn't make much sense to pull it in.)
This is too fancy for the spell checker. Use Dumb English instead.
f35217b to
3ac5294
Compare
Systen.log annoyingly pass a null as an array to a vararg method implemented in Kotlin. This breaks Kotlin since it expects an empty array instead. Fortunately, there seems to be a way to work around this, according to Claude? Let's hope this works.
3ac5294 to
d48fe00
Compare
| s.split(Regex("""[\s,]+""")).forEach { | ||
| val parts = it.split(Regex("""\s*=\s*"""), 2) | ||
| val level = parse(parts[1]) | ||
| if (level != null) put(parts[0], level) |
There was a problem hiding this comment.
| s.split(Regex("""[\s,]+""")).forEach { | |
| val parts = it.split(Regex("""\s*=\s*"""), 2) | |
| val level = parse(parts[1]) | |
| if (level != null) put(parts[0], level) | |
| s.split(Regex("""[\s,]+""")).forEach { | |
| if (it.isEmpty()) return@forEach | |
| val parts = it.split(Regex("""\s*=\s*"""), 2) | |
| if (parts.size != 2) return@forEach // <- guards parts[1] | |
| val level = parse(parts[1]) | |
| if (level != null) put(parts[0], level) | |
| } |
There was a problem hiding this comment.
The happy path works fine, however,
Say a deployment sets:
AIRFLOW__LOGGING__NAMESPACE_LEVELS="botocore=debug,"
A trailing comma — extremely common when people build up a comma list. Now trace it:
Step 1 — split "botocore=debug," on [\s,]+:
["botocore=debug", ""]
The trailing comma produces an empty-string token at the end.
Step 2 — the loop reaches the "" token and splits it on =:
"".split(Regex("""\s*=\s*"""), 2) -> [""] // size 1, no "=" to split on
Step 3 — parse(parts[1]) reads index 1 of a 1-element list:
java.lang.IndexOutOfBoundsException: Index 1 out of bounds for length 1
The same thing happens for "botocore" (a bare name, no =), " botocore=debug" (leading space), ",botocore=debug" (leading comma), or the env var set to "".
The example value in config.yml for this exact option is:
example: "sqlalchemy=INFO sqlalchemy.engine=DEBUG, botocor"
That last token, botocor, has no =level — so a user who copies the documented example verbatim hits the crash
The Python parser (structlog.py) has the same parsing gap, but it fails loudly with a ValueError at config time naming the bad value
There was a problem hiding this comment.
That example is broken and got truncated in my PR that I added. It should fail with an error, but a nice one.
There was a problem hiding this comment.
I’ll add some nicer errors.
The Python implementation currently also just use regex without error handling. Should we improve it too?
airflow/shared/logging/src/airflow_shared/logging/structlog.py
Lines 560 to 564 in d74fbff
There was a problem hiding this comment.
yes we should, but may be in a separate PR

Additional mechanism is added Logger in sdk.execution to support sending arbitrary log messages to LogSender. Once that's in place, we can implement various adapters to support common Java logging providers with a bit boilerplate code for each.
Those providers are implemented in separate artifacts since nobody needs all of them at once. The user can choose what they want. Documentation is added to describe how to do it.
I changed examples to prefer the Java Platform Logging interface instead of SLF4J too. This is built-in since Java 9. All the logging tools are more or less the same, so users should adapt pretty easily.