Description
When running a Spark job on a Databricks spark-managed cluster, the component manager can't be instantiated due to the above error.
At the point of creating the enriched java proxy for the JsonGeneratorFactory, the thread context classloader is an instance of org.apache.spark.repl.ExecutorClassLoader
The JsonGeneratorFactory is loaded by a org.apache.spark.util.MutableURLClassLoader with the container manager classpath.
Since they don't correspond, the logic here ends up using the system classloader:
(which does not have the required JsonGeneratorFactory class at its disposition, causing the above error).
I'm not sure whether this is a consequence of running in Spark standalone, or because Databricks jobs are launched through a notebook-like API (noting the `repl` in the executor thread context classloader), but it breaks the tacokit.
The following snippet is a bad workaround, but works on Databricks:
private ClassLoader selectLoader(final Class[] api, final ClassLoader loader) { if ("org.apache.spark.repl.ExecutorClassLoader".equals(loader.getClass().getCanonicalName())) { return JavaProxyEnricherFactory.class.getClassLoader(); } return Stream.of(api).anyMatch(t -> t.getClassLoader() == loader) || loader.getParent() == null || loader == getSystemClassLoader() ? loader : loader.getParent(); }