Point-In-Context: Understanding Point Cloud via In-Context Learning

Mengyuan Liu1, Zhongbin Fang2, Xia Li3, Joachim M. Buhmann3, Xiangtai Li4, Chen Change Loy4
1National Key Laboratory of General Artificial Intelligence, Shenzhen Graduate School, Peking University
2Sun Yat-sen University  3Department of Computer Science, ETH Zurich  4S-Lab, Nanyang Technological University 
Teaser Image

Our work is the first to explore in-context learning for 3D point cloud understanding.


This work is the extended version of our conference paper: Explore In-Context Learning for 3D Point Cloud Understanding, in NeurIPS (Spotlight), 2023. We make more significant contributions in this extension to make a full exploration of 3D point cloud in-context learning:

  • Within the 3D in-context learning framework, we further propose Point-In-Context-Segmenter (PIC-S). In particular, we propose In-Context Labeling and In-Context Enhancing to improve both performance and generalization capability in 3D point cloud part segmentation tasks. Furthermore, our PIC-S can seamlessly integrate additional segmented datasets without redundant label points.
  • We establish the Human & Object Segmentation benchmark, which comprises four available point cloud datasets on human and object segmentation, including ShapeNetPart, Human3D, BEHAVE, and AKB-48. Our goal is to fully evaluate the performance of models trained jointly on multiple segmentation datasets, as well as their generalization on unseen datasets.
  • We conduct extensive experiments on the Human & Object Segmentation benchmark to validate our PIC-S model. Compared to other models, our PIC-S achieves the SOTA performance. Furthermore, we show that PIC-S can produce excellent results on out-of-domain part segmentation datasets, which makes our work more applicable in real-world scenarios.
  • Abstract

    With the emergence of large-scale models trained on diverse datasets, in-context learning has emerged as a promising paradigm for multitasking, notably in natural language processing and image processing. However, its application in 3D point cloud tasks remains largely unexplored. In this work, we introduce Point-In-Context (PIC), a novel framework for 3D point cloud understanding via in-context learning. We address the technical challenge of extending masked point modeling to 3D point clouds effectively by introducing a Joint Sampling module and propose a vanilla version of PIC called Point-In-Context-Generalist (PIC-G). PIC-G is designed as a generalist model for various 3D point cloud tasks, with both inputs and outputs modeled as coordinates. In this paradigm, the challenging segmentation task is achieved by assigning coordinates for each category; thus, the closest to the predictions is chosen as the final prediction.

    To break the limitation by the fixed label-coordinate assignment, which has poor generalization upon novel classes, we propose two novel training strategies, In-Context Labeling and In-Context Enhancing, forming an extended version of PIC named Point-In-Context-Segmenter (PIC-S), targeting improving dynamic context labeling and model training. By utilizing dynamic in-context labels and extra in-context pairs, PIC-S achieves enhanced performance and generalization capability in and across part segmentation datasets. It is worth noting that PIC is a general framework so that other tasks or datasets can be seamlessly introduced into our PIC through a unified data format. We conduct extensive experiments to validate the versatility and adaptability of our proposed methods in handling a wide range of tasks and segmenting multi-datasets. Our PIC-S is especially capable of generalizing unseen datasets and performing novel part segmentation by customizing prompts.

    Point-In-Context-Generalist (PIC-G)

    Teaser Image
    Overall scheme of our Point-In-Context-Generalist. Top: Training pipeline of the Masked Point Modeling (MPM) framework. During training, each sample comprises two pairs of input and target point clouds that tackle the same task. These pairs are fed into the transformer model to perform the masked point reconstruction task, which follows a random masking process. Bottom: In-context inference on multitask. Our Point-In-Context could infer results on various downstream point cloud tasks, including reconstruction, denoising, registration, and part segmentation.


    In-context learning for 3D understanding
    • The first work to explore the application of in-context learning in the 3D domain.
    • A new framework for tackling multiple tasks (four tasks), which are unified into the same input-output space.
    • Can improve the performance of our Point-In-Context (Sep & Cat) by selecting higher-quality prompts.
    New benchmark for 3D point cloud multi-tasking
    • A new multi-task benchmark for evaluating the capability of processing multiple tasks, including reconstruction, denoising, registration, and part segmentation.
    Impressive performance and strong generalization capability
    • Surpasses classical models (PointNet, DGCNN, PCT, PointMAE), which are equipped with multi-task heads.
    • Surpasses even task-specific models (siMLPe, EqMotion, STCFormer, GLA-GCN, MotionBERT) on some tasks.
    • Surpasses even task-specific models (PointNet, DGCNN, PCT) on registration when given higher-quality prompts.


    Visualization of PIC-G
    Visualization of predictions from PIC-G-Sep on ShapeNet In-Context Datasets in different tasks, such as reconstruction, denoising, registration, and part segmentation. For part segmentation, we visualize the generated target together with the mapping back, both adding category-specific colors for a better look.
    Teaser Image
    Visualization of comparison results between PIC-G and multitask models on reconstruction (lines 1-2), denoising (lines 3-4), and registration (lines 5-6), where our models can generate more accurate predictions than other multitask models.
    Teaser Image

    Point-In-Context-Segmenter (PIC-S)


    New benchmark for 3D point cloud multi-dataset part segmentation
    • A new multi-dataset joint training benchmark comprising four available point cloud datasets on human and object segmentation, including ShapeNetPart, Human3D, BEHAVE, and AKB-48.
    More superior segmentation performance and stronger generalization capability
    • PIC-S achieves SOTA results on multi-dataset segmentation benchmark. Compared to PIC-G, PIC-S is more adept at integrating multiple datasets in segmentation tasks.
    • PIC-S achieves SOTA results on one-shot testing on out-of-domain dataset~(AKB-48), which is not included in the training set.
    • PIC-S can accurately generate unique part segmentation results via customized prompts.


    Visualization of PIC-S
    Visualization of predictions obtained by PIC-S-Sep and their corresponding ground truth on Human & Object Segmentation In-Context Datasets
    Teaser Image

    Comparison with PIC-G
    Visualization of comparison results between two versions of PIC: PIC-S (extended version) and PIC-G (vanilla version).
    Teaser Image

    Generalization of PIC-S
    We use customized prompts to guide the model to perform specified part segmentation. The red boxes indicate the output of the PIC-S.
    Teaser Image


    If you find our work useful in your research, please consider citing:
            title={Point-In-Context: Understanding Point Cloud via In-Context Learning},
            author={Liu, Mengyuan and Fang, Zhongbin and Li, Xia and Buhmann, Joachim M and Li, Xiangtai and Loy, Chen Change},
            journal={arXiv preprint arXiv:2401.08210},
            title={Explore in-context learning for 3d point cloud understanding},
            author={Fang, Zhongbin and Li, Xiangtai and Li, Xia and Buhmann, Joachim M and Loy, Chen Change and Liu, Mengyuan},
            journal={Advances in Neural Information Processing Systems},