Mask R-CNN

We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without tricks, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. Code will be made available. Predicting Deeper into the Future of Semantic Segmentation

We develop an autoregressive convolutional neural network that learns to iteratively generate multiple frames.

We explored five different models for this task relying on RGB and/or segmentations from previous frames. For prediction beyond a single future frame, we considered batch models that predict all future frames at once, and autoregressive models that sequentially predict the future frames. We found that autoregressive training produces the best results for our problem, and that models predicting in the segmentation space work better than those relying on the RGB frames. Pixel-Level Matching for Video Object Segmentation using Convolutional Neural Networks

he proposed network represents a target object using features from different depth layers in order to take advantage of both the spatial details and the category-level semantic information. Furthermore, we propose a feature compression technique that drastically reduces the memory requirements while maintaining the capability of feature representation. Deeply supervised salient object detection with short connections

In this paper, we propose a new method for saliency detection by introducing short connections to the skip-layer structures within the HED architecture. Our framework provides rich multi-scale feature maps at each layer, a property that is critically needed to perform segment detection. Learning Hierarchical Semantic Image Manipulation through Structured Representations

In this paper, we presented a hierarchical framework for semantic image manipulation. We first learn to generate the pixel-wise semantic label maps given the initial object bounding boxes. Then we learn to generate the manipulated image from the predicted label maps. Such framework allows the user to manipulate images at object-level by adding, removing, and moving an object bounding box at a time. Experimental evaluations demonstrate the advantages of the hierarchical manipulation framework over existing image generation and context hole-filing models, both qualitatively and quantitatively. We further demonstrate its practical benefits in semantic object manipulation, interactive image editing and data-driven image editing. Future research directions include preserving the object identity and providing affordance as additional user input during image manipulation.

Data Augmentation Using GAN, Semantic segmentation And inpainting

Semantic Soft segmentation - Technique + semantic segmentation.