Skip to content

[core] Introduce file resource management#8179

Open
gavin9402 wants to merge 5 commits into
apache:masterfrom
gavin9402:resource-management
Open

[core] Introduce file resource management#8179
gavin9402 wants to merge 5 commits into
apache:masterfrom
gavin9402:resource-management

Conversation

@gavin9402

@gavin9402 gavin9402 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Purpose

Introduce resource management capabilities to the REST Catalog, providing a unified way to manage file resources (FILE, JAR, PY, ARCHIVE) associated with databases. This lays the foundation for upcoming ML model and function features, where users will need to reference and manage external file resources such as model artifacts, UDF JARs, and Python scripts.

Changes

Resource Model (paimon-api)

  • Resource interface and AbstractResource base class — define the resource abstraction with properties like name, type, description, URI, and custom properties
  • FileResource, JarResource, PyResource, ArchiveResource — concrete resource types for FILE/JAR/PY/ARCHIVE
  • ResourceType enum — four supported resource types
  • ResourceChange — change operations for altering resources (setProperty, removeProperty, setDescription, setUri)
  • ResourceDeserializer — Jackson deserializer for polymorphic resource deserialization

REST API (paimon-api)

  • ResourcePaths — URL path builders for resource endpoints (/resources, /resource-details, /resources/{name})
  • RESTApi — 8 new resource management API methods: listResources, listResourcesPaged, listResourceDetailsPaged, getResource, createResource, dropResource, alterResource, listResourcesPagedGlobally
  • Request/Response classes: CreateResourceRequest, AlterResourceRequest, GetResourceResponse, ListResourcesResponse, ListResourceDetailsResponse, ListResourcesGloballyResponse

Catalog Interface (paimon-core)

  • Catalog — 8 new interface methods for resource CRUD + ResourceAlreadyExistException and ResourceNotExistException inner exception classes
  • AbstractCatalog — default UnsupportedOperationException implementations
  • DelegateCatalog — delegation implementations
  • RESTCatalog — full REST-backed implementations

Tests (paimon-core)

  • RESTApiJsonTest — JSON serialization/deserialization tests for resource request/response classes
  • RESTCatalogTest — integration tests for resource CRUD operations
  • RESTCatalogServer — mock REST server with resource management route handlers
  • MockRESTMessage — test helper methods for constructing resource test data

API Summary

Operation Method Endpoint
List resources GET /v1/{prefix}/databases/{db}/resources
List resource details GET /v1/{prefix}/databases/{db}/resource-details
Get resource GET /v1/{prefix}/databases/{db}/resources/{name}
Create resource POST /v1/{prefix}/databases/{db}/resources
Drop resource DELETE /v1/{prefix}/databases/{db}/resources/{name}
Alter resource POST /v1/{prefix}/databases/{db}/resources/{name}
List resources globally GET /v1/{prefix}/resources

Tests

mvn -pl paimon-core -am -Pfast-build -DfailIfNoTests=false -Dtest="RESTApiJsonTest,RESTCatalogTest" test

@TheR1sing3un

Copy link
Copy Markdown
Member

It's a very surprising pr. Is there any relevant pip to provide more background information?

@gavin9402

Copy link
Copy Markdown
Contributor Author

It's a very surprising pr. Is there any relevant pip to provide more background information?

Thank you for your suggestion. I will submit the PIP as soon as possible.

@JingsongLi

Copy link
Copy Markdown
Contributor

First PR, I think you can focus on Resource introducing.

@gavin9402 gavin9402 force-pushed the resource-management branch from 57a260d to 81f5235 Compare June 10, 2026 02:03
@gavin9402

Copy link
Copy Markdown
Contributor Author

First PR, I think you can focus on Resource introducing.

Sure, let me revise it.

@gavin9402 gavin9402 force-pushed the resource-management branch from 81f5235 to 25c4603 Compare June 10, 2026 02:50
@gavin9402 gavin9402 changed the title [core] Introduce resource && ML model management [core] Introduce file resource management Jun 10, 2026
@gavin9402 gavin9402 requested a review from JingsongLi June 10, 2026 09:44

private final Identifier identifier;
@Nullable private final String comment;
private final String uri;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to use this URI? How to get rest token for this file?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current design delegates URI handling to the file systems integrated with the engine. The metastore service is only responsible for permission management of the resource entity.

For example, in Daft, we assign the URI to the resources field of the PyFileResourceFunction instance.

    def _get_function(self, ident: Identifier) -> Function:
        ...
        paimon_daft_func: FunctionDefinition = self._inner.get_function(str(ident)).definitions()["daft"]
        ...

        # file_resources may be a list attribute or a callable method
        raw_resources = paimon_daft_func.file_resources
        resources = raw_resources() if callable(raw_resources) else raw_resources

        return PyFileResourceFunction(
            identifier=ident,
            module_name=paimon_daft_func.class_name,
            binding_name=paimon_daft_func.function_name,
            resources=[item.uri for item in resources],
        )

During execution, the engine resolves it by fetching the resources through the corresponding file system, and file permissions are also handled by the file system.

async def run_plan(
        self,
        plan: LocalPhysicalPlan,
        exec_cfg: PyDaftExecutionConfig,
        context: dict[str, str] | None,
        added_resources: dict[str, int] | None = None,
        **inputs: (
            Input | list[ray.ObjectRef]
        ),  # PyMicroPartitions are separated from Inputs because they are Ray ObjectRefs, which will be resolved by Ray.
    ) -> AsyncGenerator[MicroPartition | FlightPartitions | SwordfishTaskMetadata, None]:
        """Run a plan on swordfish and yield partitions."""
        if added_resources:
            file_resource_manager.resolve(added_resources)

More straightforwardly, we could also use Paimon FileIO for handling this. In fact, the engine’s behavior is similar to this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For our REST Catalog, the file system should be managed by Catalog for permissions, and here Resource feels that FileIO also needs to be exposed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion is absolutely right. I moved Resource to the paimon-core module and implemented the toBytes and newInputStream methods for it.

However, this introduces a small side effect: if Resource needs to be used in Function in the future, then Function would also need to be refactored into the paimon-core module.

@gavin9402 gavin9402 requested a review from JingsongLi June 13, 2026 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants