mmflow.apis¶
mmflow.core¶
evaluation¶
- class mmflow.core.evaluation.DistEvalHook(dataloader: torch.utils.data.dataloader.DataLoader, interval: int = 1, tmpdir: Optional[str] = None, gpu_collect: bool = False, by_epoch: bool = False, dataset_name: Optional[Union[str, Sequence[str]]] = None, **eval_kwargs: Any)[source]¶
Distributed evaluation hook.
- Parameters
dataloader (DataLoader) – A PyTorch dataloader.
interval (int) – Evaluation interval (by epochs). Default: 1.
tmpdir (str | None) – Temporary directory to save the results of all processes. Default: None.
gpu_collect (bool) – Whether to use gpu or cpu to collect results. Default: False.
by_epoch (bool) – Determine perform evaluation by epoch or by iteration. If set to True, it will perform by epoch. Otherwise, by iteration. Default: False.
dataset_name (str, list, optional) – The name of the dataset this evaluation hook will doing in.
eval_kwargs (any) – Evaluation arguments fed into the evaluate function of the dataset.
- class mmflow.core.evaluation.EvalHook(dataloader: torch.utils.data.dataloader.DataLoader, interval: int = 1, by_epoch: bool = False, dataset_name: Optional[Union[str, Sequence[str]]] = None, **eval_kwargs: Any)[source]¶
Evaluation hook.
- Parameters
dataloader (DataLoader) – A PyTorch dataloader.
interval (int) – Evaluation interval (by epochs). Default: 1.
by_epoch (bool) – Determine perform evaluation by epoch or by iteration. If set to True, it will perform by epoch. Otherwise, by iteration. Default: False.
dataset_name (str, list, optional) – The name of the dataset this evaluation hook will doing in.
eval_kwargs (any) – Evaluation arguments fed into the evaluate function of the dataset.
- after_train_epoch(runner: mmcv.runner.iter_based_runner.IterBasedRunner) → None[source]¶
After train epoch.
- mmflow.core.evaluation.end_point_error(flow_pred: Sequence[numpy.ndarray], flow_gt: Sequence[numpy.ndarray], valid_gt: Sequence[numpy.ndarray]) → float[source]¶
Calculate end point errors between prediction and ground truth.
- Parameters
flow_pred (list) – output list of flow map from flow_estimator shape(H, W, 2).
flow_gt (list) – ground truth list of flow map shape(H, W, 2).
valid_gt (list) – the list of valid mask for ground truth with the shape (H, W).
- Returns
end point error for output.
- Return type
float
- mmflow.core.evaluation.end_point_error_map(flow_pred: numpy.ndarray, flow_gt: numpy.ndarray) → numpy.ndarray[source]¶
Calculate end point error map.
- Parameters
flow_pred (ndarray) – The predicted optical flow with the shape (H, W, 2).
flow_gt (ndarray) – The ground truth of optical flow with the shape (H, W, 2).
- Returns
End point error map with the shape (H , W).
- Return type
ndarray
- mmflow.core.evaluation.eval_metrics(results: Sequence[numpy.ndarray], flow_gt: Sequence[numpy.ndarray], valid_gt: Sequence[numpy.ndarray], metrics: Union[Sequence[str], str] = ['EPE']) → Dict[str, numpy.ndarray][source]¶
Calculate evaluation metrics.
- Parameters
results (list) – list of predictedflow maps.
flow_gt (list) – list of ground truth flow maps
metrics (list, str) – metrics to be evaluated. Defaults to [‘EPE’], end-point error.
- Returns
metrics and their values.
- Return type
dict
- mmflow.core.evaluation.multi_gpu_online_evaluation(model: torch.nn.modules.module.Module, data_loader: torch.utils.data.dataloader.DataLoader, metric: Union[str, Sequence[str]] = 'EPE', tmpdir: Optional[str] = None, gpu_collect: bool = False) → Dict[str, numpy.ndarray][source]¶
Evaluate model with multiple gpus online.
This function will not save the flow. Namely, there do not exist any IO operations in this function. Thus, in general, online mode will achieve a faster evaluation. However, using this function, the img_metas must include the ground truth e.g. flow_gt or flow_fw_gt and flow_bw_gt.
- Parameters
model (nn.Module) – The optical flow estimator model.
data_loader (DataLoader) – The test dataloader.
metric (str, list) – Metrics to be evaluated. Default: ‘EPE’.
tmpdir (str) – Path of directory to save the temporary results from different gpus under cpu mode.
gpu_collect (bool) – Option to use either gpu or cpu to collect results.
- Returns
The evaluation result.
- Return type
dict
- mmflow.core.evaluation.online_evaluation(model: torch.nn.modules.module.Module, data_loader: torch.utils.data.dataloader.DataLoader, metric: Union[str, Sequence[str]] = 'EPE', **kwargs: Any) → Dict[str, numpy.ndarray][source]¶
Evaluate model online.
- Parameters
model (nn.Module) – The optical flow estimator model.
data_loader (DataLoader) – The test dataloader.
metric (str, list) – Metrics to be evaluated. Default: ‘EPE’.
kwargs (any) – Evaluation arguments fed into the evaluate function of the dataset.
- Returns
The evaluation result.
- Return type
dict
- mmflow.core.evaluation.optical_flow_outliers(flow_pred: Sequence[numpy.ndarray], flow_gt: Sequence[numpy.ndarray], valid_gt: Sequence[numpy.ndarray]) → float[source]¶
Calculate percentage of optical flow outliers for KITTI dataset.
- Parameters
flow_pred (list) – output list of flow map from flow_estimator shape(H, W, 2).
flow_gt (list) – ground truth list of flow map shape(H, W, 2).
valid_gt (list) – the list of valid mask for ground truth with the shape (H, W).
- Returns
optical flow outliers for output.
- Return type
float
- mmflow.core.evaluation.single_gpu_online_evaluation(model: torch.nn.modules.module.Module, data_loader: torch.utils.data.dataloader.DataLoader, metric: Union[str, Sequence[str]] = 'EPE') → Dict[str, numpy.ndarray][source]¶
Evaluate model with single gpu online.
This function will not save the flow. Namely, there do not exist any IO operations in this function. Thus, in general, online mode will achieve a faster evaluation. However, using this function, the img_metas must include the ground truth e.g. flow_gt or flow_fw_gt and flow_bw_gt.
- Parameters
model (nn.Module) – The optical flow estimator model.
data_loader (DataLoader) – The test dataloader.
metric (str, list) – Metrics to be evaluated. Default: ‘EPE’.
- Returns
The evaluation result.
- Return type
dict
hooks¶
- class mmflow.core.hooks.LiteFlowNetStageLoadHook(src_level: str, dst_level: str)[source]¶
Stage loading hook for LiteFlowNet.
This hook works for loading weights at the previous stage to the additional stage in this training.
- Parameters
src_level (str) – The source level to be loaded.
dst_level (str) – The level that will load the weights.
- class mmflow.core.hooks.MultiStageLrUpdaterHook(milestone_lrs: Sequence[float], milestone_iters: Sequence[int], steps: Sequence[Sequence[int]], gammas: Sequence[float], **kwargs: Any)[source]¶
Multi-Stage Learning Rate Hook.
- Parameters
milestone_lrs (Sequence[float]) – The base LR for multi-stages.
milestone_iters (Sequence[int]) – The first iterations in different stages.
steps (Sequence[Sequence[int]]) – The steps to decay the LR in stages.
gammas (Sequence[float]) – The list of decay LR ratios.
kwargs (any) – The arguments of LrUpdaterHook.
mmflow.datasets¶
datasets¶
- class mmflow.datasets.Collect(keys: collections.abc.Sequence, meta_keys: collections.abc.Sequence = ('filename1', 'filename2', 'ori_filename1', 'ori_filename2', 'filename_flow', 'ori_filename_flow', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'flip_direction', 'img_norm_cfg'))[source]¶
Collect data from the loader relevant to the specific task.
This is usually the last stage of the data loader pipeline. Typically keys is set to some subset of “img”, “flow_gt”.
The “img_meta” item is always populated. The contents of the “img_meta” dictionary depends on “meta_keys”. By default this includes:
- “img_shape”: shape of the image input to the network as a tuple
(h, w, c). Note that images may be zero padded on the bottom/right if the batch tensor is larger than this shape.
“scale_factor”: a float indicating the preprocessing scale
“flip”: a boolean indicating if image flip transform was used
“filename1”: path to the image1 file
“filename2”: path to the image2 file
“ori_filename1”: image1 file name
“ori_filename2”: image2 file name
“ori_shape”: original shape of the image as a tuple (h, w, c)
“pad_shape”: image shape after padding
- “img_norm_cfg”: a dict of normalization information:
mean - per channel mean subtraction
std - per channel std divisor
to_rgb - bool indicating if bgr was converted to rgb
- Parameters
keys (Sequence[str]) – Keys of results to be collected in
data
.meta_keys (Sequence[str], optional) – Meta keys to be converted to
mmcv.DataContainer
and collected indata[img_metas]
. Default:('filename1', 'filename2', 'ori_filename1', 'ori_filename2', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'flip_direction', 'img_norm_cfg')
- class mmflow.datasets.ColorJitter(asymmetric_prob=0.0, brightness=0.0, contrast=0.0, saturation=0.0, hue=0.0)[source]¶
Randomly change the brightness, contrast, saturation and hue of an image. :param asymmetric_prob: the probability to do color jitter for two
images asymmetrically.
- Parameters
brightness (float, tuple) – How much to jitter brightness. brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness] or the given [min, max]. Should be non negative numbers.
contrast (float, tuple) – How much to jitter contrast. contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast] or the given [min, max]. Should be non negative numbers.
saturation (float, tuple) – How much to jitter saturation. saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation] or the given [min, max]. Should be non negative numbers.
hue (float, tuple) – How much to jitter hue. hue_factor is chosen uniformly from [-hue, hue] or the given [min, max]. Should have 0<= hue <= 0.5 or -0.5 <= min <= max <= 0.5.
- class mmflow.datasets.Compose(transforms: Sequence)[source]¶
Compose multiple transforms sequentially.
- Parameters
transforms (Sequence[dict | callable]) – Sequence of transform object or config dict to be composed.
- class mmflow.datasets.ConcatDataset(datasets: Sequence[torch.utils.data.dataset.Dataset], separate_eval: bool = True)[source]¶
A wrapper of concatenated dataset.
Same as
torch.utils.data.dataset.ConcatDataset
, but concat the group flag for image aspect ratio.- Parameters
datasets (list[
Dataset
]) – A list of datasets.separate_eval (bool) – Whether to evaluate the results separately if it is used as validation dataset. Defaults to True.
- evaluate(results: dict, logger: Optional[Union[str, logging.Logger]] = None, **kwargs: Any)[source]¶
Evaluate the results.
- Parameters
results (list[list | tuple]) – Testing results of the dataset.
logger (logging.Logger | str | None) – Logger used for printing related information during evaluation. Default: None.
- Returns
float]: AP results of the total dataset or each separate dataset if self.separate_eval=True.
- Return type
dict[str
- class mmflow.datasets.DefaultFormatBundle[source]¶
Default formatting bundle.
It simplifies the pipeline of formatting common fields, including “img” and “flow_gt”. These fields are formatted as follows.
img1: (1)transpose, (2)to tensor, (3)to DataContainer (stack=True)
img2: (1)transpose, (2)to tensor, (3)to DataContainer (stack=True)
flow_gt: (1)transpose, (2)to tensor, (3)to DataContainer (stack=True)
- class mmflow.datasets.DistributedSampler(dataset: torch.utils.data.dataset.Dataset, num_replicas: Optional[int] = None, rank: Optional[int] = None, shuffle: bool = True, seed=0)[source]¶
DistributedSampler inheriting from torch.utils.data.DistributedSampler.
This distributed sampler is compatible Pytorch==1.5, as there is no seed argument in Pytorch==1.5.
- Parameters
datasets (Dataset) – the dataset will be loaded.
num_replicas (int, optional) – Number of processes participating in distributed training. By default, world_size is retrieved from the current distributed group.
rank (int, optional) – Rank of the current process within num_replicas. By default, rank is retrieved from the current distributed group.
shuffle (bool) – If True (default), sampler will shuffle the indices.
seed (int) – random seed used to shuffle the sampler if
shuffle=True
. This number should be identical across all processes in the distributed group. Default:0
.
- class mmflow.datasets.Erase(prob: float, bounds: Sequence = [50, 100], max_num: int = 3)[source]¶
Erase transform from RAFT is randomly erasing rectangular regions in img2 to simulate occlusions.
- Parameters
prob (float) – the probability for erase transform.
bounds (list, tuple) – the bounds for erase regions (bound_x, bound_y).
max_num (int) – the max number of erase regions.
- Returns
revised results, ‘img2’ and ‘erase_num’ are added into results.
- Return type
dict
- class mmflow.datasets.FlyingChairs(*args, split_file: str, **kwargs)[source]¶
FlyingChairs dataset.
- Parameters
split_file (str) – File name of train-validation split file for FlyingChairs.
- load_ann_info(filename: Sequence[str], filename_key: str) → None[source]¶
Load information of optical flow.
This function splits the dataset into two subsets, training subset and testing subset.
- Parameters
filename (list) – ordered list of abstract file path of annotation.
filename_key (str) – the annotation e.g. ‘flow’.
- class mmflow.datasets.FlyingChairsOcc(*args, **kwargs)[source]¶
FlyingChairsOcc dataset.
- load_ann_info(filename, filename_key)[source]¶
Load information of optical flow.
This function splits the dataset into two subsets, training subset and testing subset.
- Parameters
filename (list) – ordered list of abstract file path of annotation.
filename_key (str) – the annotation key for FlyingChairsOcc dataset ‘flow_fw’, ‘flow_bw’, ‘occ_fw’, and ‘occ_bw’.
- class mmflow.datasets.FlyingThings3D(*args, direction: Union[str, Sequence[str]] = ['forward', 'backward'], scene: Union[str, Sequence[str]] = 'left', pass_style: str = 'clean', **kwargs)[source]¶
FlyingThings3D subset dataset.
- Parameters
direction (str) – Direction of flow, has 4 options ‘forward’, ‘backward’, ‘bidirection’ and [‘forward’, ‘backward’]. Default: [‘forward’, ‘backward’].
scene (list, str) – Scene in Flyingthings3D dataset, default: ‘left’. This default value is for RAFT, as FlyingThings3D is so large and not often used, and only RAFT use the ‘left’ data in it.
pass_style (str) – Pass style for FlyingThing3D dataset, and it has 2 options [‘clean’, ‘final’]. Default: ‘clean’.
- class mmflow.datasets.FlyingThings3DSubset(*args, direction: Union[str, Sequence[str]] = ['forward', 'backward'], scene: Optional[Union[str, Sequence[str]]] = None, **kwargs)[source]¶
FlyingThings3D subset dataset.
- Parameters
direction (str) – Direction of flow, has 4 options ‘forward’, ‘backward’, ‘bidirection’, and [‘forward’, ‘backward’]. Default: [‘forward’, ‘backward’].
scene (list, str, optional) – Scene in Flyingthings3D dataset, if scene is None, it means collecting data in all of scene of Flyingthing3D dataset. Default: None.
- class mmflow.datasets.GaussianNoise(sigma_range=(0, 0.04), clamp_range=(- inf, inf))[source]¶
Add Gaussian Noise to images.
Add Gaussian Noise, with mean 0 and std sigma uniformly sampled from sigma_range, to images. And then clamp the images to clamp_range.
- Parameters
sigma_range (list(float) | tuple(float)) – Uniformly sample sigma of gaussian noise in sigma_range. Default: (0, 0.04)
clamp_range (list(float) | tuple(float)) – The min and max value to clamp the images after adding gaussian noise. Default: (float(‘-inf’), float(‘inf’)).
- class mmflow.datasets.ImageToTensor(keys: collections.abc.Sequence)[source]¶
Convert image to
torch.Tensor
by given keys.The dimension order of input image is (H, W, C). The pipeline will convert it to (C, H, W). If only 2 dimension (H, W) is given, the output would be (1, H, W).
- Parameters
keys (Sequence[str]) – Key of images to be converted to Tensor.
- class mmflow.datasets.InputPad(exponent, mode='edge', position='center', **kwargs)[source]¶
Pad images such that dimensions are divisible by 2^n used in test.
- Parameters
exponent (int) – the exponent n of 2^n
mode (str) – mode for numpy.pad(). Defaults to ‘edge’.
position (str) – ‘center’, ‘left’, ‘right’, ‘top’ and ‘down’. Defaults to ‘center’
- class mmflow.datasets.InputResize(exponent)[source]¶
Resize images such that dimensions are divisible by 2^n :param exponent: the exponent n of 2^n :type exponent: int
- Returns
- Resized results, ‘img_shape’, ‘scale_factor’ keys are added
into result dict.
- Return type
dict
- class mmflow.datasets.LoadImageFromFile(to_float32: bool = False, color_type: str = 'color', file_client_args: dict = {'backend': 'disk'}, imdecode_backend: str = 'cv2')[source]¶
Load image1 and image2 from file.
Required keys are “img1_info” (dict that must contain the key “filename” and “filename2”). Added or updated keys are “img1”, “img2”, “img_shape”, “ori_shape” (same as img_shape), “pad_shape” (same as img_shape), “scale_factor” (1.0, 1.0) and “img_norm_cfg” (means=0 and stds=1).
- Parameters
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
color_type (str) – The flag argument for
mmcv.imfrombytes()
. Defaults to ‘color’.file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Defaults todict(backend='disk')
.imdecode_backend (str) – Backend for
mmcv.imdecode()
. Default: ‘cv2’
- class mmflow.datasets.MixedBatchDistributedSampler(datasets: Sequence[torch.utils.data.dataset.Dataset], sample_ratio: Sequence[float], num_replicas: Optional[int] = None, rank: Optional[int] = None, shuffle: bool = True, seed: int = 0)[source]¶
Distributed Sampler for mixed data batch.
- Parameters
datasets (list) – List of datasets will be loaded.
sample_ratio (list) – List of the ratio of each dataset in a batch, e.g. datasets=[DatasetA, DatasetB], sample_ratio=[0.25, 0.75], sample_per_gpu=1, gpus=8, it means 2 gpus load DatasetA, and 6 gpus load DatasetB. The length of datasets must be equal to length of sample_ratio.
num_replicas (int, optional) – Number of processes participating in distributed training. By default, world_size is retrieved from the current distributed group.
rank (int, optional) – Rank of the current process within num_replicas. By default, rank is retrieved from the current distributed group.
shuffle (bool) – If True (default), sampler will shuffle the indices.
seed (int) – random seed used to shuffle the sampler if
shuffle=True
. This number should be identical across all processes in the distributed group. Default:0
.
- class mmflow.datasets.Normalize(mean, std, to_rgb=True)[source]¶
Normalize the image.
Added key is “img_norm_cfg”. :param mean: Mean values of 3 channels. :type mean: sequence :param std: Std values of 3 channels. :type std: sequence :param to_rgb: Whether to convert the image from BGR to RGB,
default is true.
- class mmflow.datasets.PhotoMetricDistortion(brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18)[source]¶
Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5.
The position of random contrast is in second or second to last. 1. random brightness 2. random contrast (mode 0) 3. convert color from BGR to HSV 4. random saturation 5. random hue 6. convert color from HSV to BGR 7. random contrast (mode 1) 8. randomly swap channels :param brightness_delta: delta of brightness. :type brightness_delta: int :param contrast_range: range of contrast. :type contrast_range: tuple :param saturation_range: range of saturation. :type saturation_range: tuple :param hue_delta: delta of hue. :type hue_delta: int
- class mmflow.datasets.RandomAffine(global_transform: Optional[dict] = None, relative_transform: Optional[dict] = None, preserve_valid: bool = True, check_bound: bool = False)[source]¶
Random affine transformation of images, flow map and occlusion map (if available).
Keys of global_transform and relative_transform should be the subset of (‘translates’, ‘zoom’, ‘shear’, ‘rotate’). And also, each key and its corresponding values has to satisfy the following rules:
- translates: the translation ratios along x axis and y axis. Defaults
to(0., 0.).
zoom: the min and max zoom ratios. Defaults to (1.0, 1.0).
shear: the min and max shear ratios. Defaults to (1.0, 1.0).
rotate: the min and max rotate degree. Defaults to (0., 0.).
- Parameters
global_transform (dict) – A dict which contains keys: transform, zoom, shear, rotate. global_transform will transform both img1 and img2.
relative_transform (dict) – A dict which contains keys: transform, zoom, shear, rotate. relative_transform will only transform img2 after global_transform to both images.
preserve_valid (bool) – Whether continue transforming until both images are valid. A valid affine transform is an affine transform which guarantees the transformed image covers the whole original picture frame. Defaults to True.
check_bound (bool) – Whether to check out of bound for transformed occlusion maps. If True, all pixels in borders of img1 but not in borders of img2 will be marked occluded. Defaults to False.
- class mmflow.datasets.RandomCrop(crop_size)[source]¶
Random crop the image & flow.
- Parameters
crop_size (tuple) – Expected size after cropping, (h, w).
- class mmflow.datasets.RandomFlip(prob, direction='horizontal')[source]¶
Flip the image and flow map.
- Parameters
prob (float) – The flipping probability.
direction (str) – The flipping direction. Options are ‘horizontal’ and ‘vertical’. Default: ‘horizontal’.
- class mmflow.datasets.RandomRotation(prob, angle, auto_bound=False)[source]¶
Random rotation of the image from -angle to angle (in degrees).
optical flow data.
- Parameters
prob (float) – The rotation probability.
angle (float) – max angle of the rotation in the range from -180 to 180.
auto_bound (bool) – Whether to adjust the image size to cover the whole rotated image. Default: False
- class mmflow.datasets.RandomTranslate(prob=0.0, x_offset=0.0, y_offset=0.0)[source]¶
Random translation of the images and flow map.
optical flow data.
- Parameters
prob (float) – the probability to do translation.
x_offset (float | tuple) – translate ratio on x axis, randomly choice [-x_offset, x_offset] or the given [min, max]. Default: 0.
y_offset (float | tuple) – translate ratio on y axis, randomly choice [-x_offset, x_offset] or the given [min, max]. Default: 0.
- class mmflow.datasets.RepeatDataset(dataset, times)[source]¶
A wrapper of repeated dataset.
The length of repeated dataset will be times larger than the original dataset. This is useful when the data loading time is long but the dataset is small. Using RepeatDataset can reduce the data loading time between epochs.
- Parameters
dataset (
Dataset
) – The dataset to be repeated.times (int) – Repeat times.
- class mmflow.datasets.Rerange(min_value=0, max_value=255)[source]¶
Rerange the image pixel value.
- Parameters
min_value (float or int) – Minimum value of the reranged image. Default: 0.
max_value (float or int) – Maximum value of the reranged image. Default: 255.
- class mmflow.datasets.Sintel(*args, pass_style: str = 'clean', scene: Optional[Union[str, Sequence[str]]] = None, **kwargs)[source]¶
Sintel optical flow dataset.
- Parameters
pass_style (str) – Pass style for Sintel dataset, and it has 2 options [‘clean’, ‘final’]. Default: ‘clean’.
scene (str, list, optional) – Scene in Sintel dataset, if scene is None, it means collecting data in all of scene of Sintel dataset. Default: None.
- class mmflow.datasets.SpacialTransform(spacial_prob: float, stretch_prob: float, crop_size: Sequence, min_scale: float = - 0.2, max_scale: float = 0.5, max_stretch: float = 0.2)[source]¶
Spacial Transform API for RAFT :param spacial_prob: probability to do spacial transform. :type spacial_prob: float :param stretch_prob: probability to do stretch. :type stretch_prob: float :param crop_size: the base size for resize. :type crop_size: tuple, list :param min_scale: the exponent for min scale. Defaults to -0.2. :type min_scale: float :param max_scale: the exponent for max scale. Defaults to 0.5. :type max_scale: float
- Returns
Resized results, ‘img_shape’,
- Return type
dict
- resize_sparse_flow_map(flow: numpy.ndarray, valid: numpy.ndarray, fx: float = 1.0, fy: float = 1.0, x0: int = 0, y0: int = 0) → Sequence[numpy.ndarray][source]¶
Resize sparse optical flow function.
- Parameters
flow (ndarray) – optical flow data will be resized.
valid (ndarray) – valid mask for sparse optical flow.
fx (float, optional) – horizontal scale factor. Defaults to 1.0.
fy (float, optional) – vertical scale factor. Defaults to 1.0.
x0 (int, optional) – abscissa of left-top point where the flow map will be crop from. Defaults to 0.
y0 (int, optional) – ordinate of left-top point where the flow map will be crop from. Defaults to 0.
- Returns
the transformed flow map and valid mask.
- Return type
Sequence[ndarray]
- spacial_transform(imgs: numpy.ndarray) → Tuple[numpy.ndarray, float, float, int, int][source]¶
Spacial transform function.
- Parameters
imgs (ndarray) – the images that will be transformed.
- Returns
- the transformed images,
horizontal scale factor, vertical scale factor, coordinate of left-top point where the image maps will be crop from.
- Return type
Tuple[ndarray, float, float, int, int]
- class mmflow.datasets.ToDataContainer(fields: collections.abc.Sequence = ({'key': 'img1', 'stack': True}, {'key': 'img2', 'stack': True}, {'key': 'flow_gt'}))[source]¶
Convert results to
mmcv.DataContainer
by given fields.- Parameters
fields (Sequence[dict]) – Each field is a dict like
dict(key='xxx', **kwargs)
. Thekey
in result will be converted tommcv.DataContainer
with**kwargs
. Default:(dict(key='img1', stack=True), dict(key='img2', stack=True), dict(key='flow_gt'))
.
- class mmflow.datasets.ToTensor(keys: collections.abc.Sequence)[source]¶
Convert some results to
torch.Tensor
by given keys.- Parameters
keys (Sequence[str]) – Keys that need to be converted to Tensor.
- class mmflow.datasets.Transpose(keys: collections.abc.Sequence, order: collections.abc.Sequence)[source]¶
Transpose some results by given keys.
- Parameters
keys (Sequence[str]) – Keys of results to be transposed.
order (Sequence[int]) – Order of transpose.
- class mmflow.datasets.Validation(max_flow: Union[float, int])[source]¶
This Validation transform from RAFT is for return a mask for the flow is less than max_flow.
- Parameters
max_flow (float, int) – the max flow for validated flow.
- Returns
- Resized results, ‘valid’ and ‘max_flow’ keys are added into
result dict.
- Return type
dict
- mmflow.datasets.build_dataloader(dataset: torch.utils.data.dataset.Dataset, samples_per_gpu: int, workers_per_gpu: int, sample_ratio: Optional[Sequence] = None, num_gpus: int = 1, dist: bool = True, shuffle: bool = True, seed: Optional[int] = None, persistent_workers: bool = False, **kwargs)[source]¶
Build PyTorch DataLoader.
In distributed training, each GPU/process has a dataloader. In non-distributed training, there is only one dataloader for all GPUs.
- Parameters
dataset (Dataset) – A PyTorch dataset.
samples_per_gpu (int) – Number of training samples on each GPU, i.e., batch size of each GPU.
workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU.
sample_ratio (list, optional) – The ratio for samples in mixed branch, sum of sample_ratio must be equal to 1. and the length must be equal to the length of datasets, e.g branch=8, sample_ratio=(0.5,0.25,0.25) means in one branch 4 samples from dataset1, 2 samples from dataset2 and 2 samples from dataset3.
num_gpus (int) – Number of GPUs. Only used in non-distributed training.
dist (bool) – Distributed training/test or not. Default: True.
shuffle (bool) – Whether to shuffle the data at every epoch. Default: True.
seed (int, optional) – the seed for generating random numbers for data workers. Default to None.
persistent_workers (bool) – If True, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the workers Dataset instances alive. The argument also has effect in PyTorch>=1.7.0. Default: False.
kwargs – any keyword argument to be used to initialize DataLoader
- Returns
A PyTorch dataloader.
- Return type
DataLoader
- mmflow.datasets.build_dataset(cfg: Union[mmcv.utils.config.Config, Sequence[mmcv.utils.config.Config]], default_args: Optional[dict] = None) → torch.utils.data.dataset.Dataset[source]¶
Build Pytorch dataset.
- Parameters
cfg (mmcv.Config) – Config dict of dataset or list of config dict. It should at least contain the key “type”.
default_args (dict, optional) – Default initialization arguments.
Note
If the input config is a list, this function will concatenate them automatically.
- Returns
The built dataset based on the input config.
- Return type
dataset
- mmflow.datasets.read_flow(name: str) → numpy.ndarray[source]¶
Read flow file with the suffix ‘.flo’.
This function is modified from https://lmb.informatik.uni-freiburg.de/resources/datasets/IO.py Copyright (c) 2011, LMB, University of Freiburg.
- Parameters
name (str) – Optical flow file path.
- Returns
Optical flow
- Return type
ndarray
- mmflow.datasets.read_flow_kitti(name: str) → Tuple[numpy.ndarray, numpy.ndarray][source]¶
Read sparse flow file from KITTI dataset.
This function is modified from https://github.com/princeton-vl/RAFT/blob/master/core/utils/frame_utils.py. Copyright (c) 2020, princeton-vl Licensed under the BSD 3-Clause License
- Parameters
name (str) – The flow file
- Returns
flow and valid map
- Return type
Tuple[ndarray, ndarray]
- mmflow.datasets.render_color_wheel(save_file: str = 'color_wheel.png') → numpy.ndarray[source]¶
Render color wheel.
- Parameters
save_file (str) – The saved file name . Defaults to ‘color_wheel.png’.
- Returns
color wheel image.
- Return type
ndarray
- mmflow.datasets.visualize_flow(flow: numpy.ndarray, save_file: Optional[str] = None) → numpy.ndarray[source]¶
Flow visualization function.
- Parameters
flow (ndarray) – The flow will be render
save_dir ([type], optional) – save dir. Defaults to None.
- Returns
flow map image with RGB order.
- Return type
ndarray
- mmflow.datasets.write_flow(flow: numpy.ndarray, flow_file: str) → None[source]¶
Write the flow in disk.
This function is modified from https://lmb.informatik.uni-freiburg.de/resources/datasets/IO.py Copyright (c) 2011, LMB, University of Freiburg.
- Parameters
flow (ndarray) – The optical flow that will be saved.
flow_file (str) – The file for saving optical flow.
- mmflow.datasets.write_flow_kitti(uv: numpy.ndarray, filename: str)[source]¶
Write the flow in disk.
This function is modified from https://github.com/princeton-vl/RAFT/blob/master/core/utils/frame_utils.py. Copyright (c) 2020, princeton-vl Licensed under the BSD 3-Clause License
- Parameters
uv (ndarray) – The optical flow that will be saved.
filename ([type]) – The file for saving optical flow.
pipelines¶
- class mmflow.datasets.pipelines.Collect(keys: collections.abc.Sequence, meta_keys: collections.abc.Sequence = ('filename1', 'filename2', 'ori_filename1', 'ori_filename2', 'filename_flow', 'ori_filename_flow', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'flip_direction', 'img_norm_cfg'))[source]¶
Collect data from the loader relevant to the specific task.
This is usually the last stage of the data loader pipeline. Typically keys is set to some subset of “img”, “flow_gt”.
The “img_meta” item is always populated. The contents of the “img_meta” dictionary depends on “meta_keys”. By default this includes:
- “img_shape”: shape of the image input to the network as a tuple
(h, w, c). Note that images may be zero padded on the bottom/right if the batch tensor is larger than this shape.
“scale_factor”: a float indicating the preprocessing scale
“flip”: a boolean indicating if image flip transform was used
“filename1”: path to the image1 file
“filename2”: path to the image2 file
“ori_filename1”: image1 file name
“ori_filename2”: image2 file name
“ori_shape”: original shape of the image as a tuple (h, w, c)
“pad_shape”: image shape after padding
- “img_norm_cfg”: a dict of normalization information:
mean - per channel mean subtraction
std - per channel std divisor
to_rgb - bool indicating if bgr was converted to rgb
- Parameters
keys (Sequence[str]) – Keys of results to be collected in
data
.meta_keys (Sequence[str], optional) – Meta keys to be converted to
mmcv.DataContainer
and collected indata[img_metas]
. Default:('filename1', 'filename2', 'ori_filename1', 'ori_filename2', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'flip_direction', 'img_norm_cfg')
- class mmflow.datasets.pipelines.ColorJitter(asymmetric_prob=0.0, brightness=0.0, contrast=0.0, saturation=0.0, hue=0.0)[source]¶
Randomly change the brightness, contrast, saturation and hue of an image. :param asymmetric_prob: the probability to do color jitter for two
images asymmetrically.
- Parameters
brightness (float, tuple) – How much to jitter brightness. brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness] or the given [min, max]. Should be non negative numbers.
contrast (float, tuple) – How much to jitter contrast. contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast] or the given [min, max]. Should be non negative numbers.
saturation (float, tuple) – How much to jitter saturation. saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation] or the given [min, max]. Should be non negative numbers.
hue (float, tuple) – How much to jitter hue. hue_factor is chosen uniformly from [-hue, hue] or the given [min, max]. Should have 0<= hue <= 0.5 or -0.5 <= min <= max <= 0.5.
- class mmflow.datasets.pipelines.Compose(transforms: Sequence)[source]¶
Compose multiple transforms sequentially.
- Parameters
transforms (Sequence[dict | callable]) – Sequence of transform object or config dict to be composed.
- class mmflow.datasets.pipelines.DefaultFormatBundle[source]¶
Default formatting bundle.
It simplifies the pipeline of formatting common fields, including “img” and “flow_gt”. These fields are formatted as follows.
img1: (1)transpose, (2)to tensor, (3)to DataContainer (stack=True)
img2: (1)transpose, (2)to tensor, (3)to DataContainer (stack=True)
flow_gt: (1)transpose, (2)to tensor, (3)to DataContainer (stack=True)
- class mmflow.datasets.pipelines.Erase(prob: float, bounds: Sequence = [50, 100], max_num: int = 3)[source]¶
Erase transform from RAFT is randomly erasing rectangular regions in img2 to simulate occlusions.
- Parameters
prob (float) – the probability for erase transform.
bounds (list, tuple) – the bounds for erase regions (bound_x, bound_y).
max_num (int) – the max number of erase regions.
- Returns
revised results, ‘img2’ and ‘erase_num’ are added into results.
- Return type
dict
- class mmflow.datasets.pipelines.GaussianNoise(sigma_range=(0, 0.04), clamp_range=(- inf, inf))[source]¶
Add Gaussian Noise to images.
Add Gaussian Noise, with mean 0 and std sigma uniformly sampled from sigma_range, to images. And then clamp the images to clamp_range.
- Parameters
sigma_range (list(float) | tuple(float)) – Uniformly sample sigma of gaussian noise in sigma_range. Default: (0, 0.04)
clamp_range (list(float) | tuple(float)) – The min and max value to clamp the images after adding gaussian noise. Default: (float(‘-inf’), float(‘inf’)).
- class mmflow.datasets.pipelines.ImageToTensor(keys: collections.abc.Sequence)[source]¶
Convert image to
torch.Tensor
by given keys.The dimension order of input image is (H, W, C). The pipeline will convert it to (C, H, W). If only 2 dimension (H, W) is given, the output would be (1, H, W).
- Parameters
keys (Sequence[str]) – Key of images to be converted to Tensor.
- class mmflow.datasets.pipelines.InputPad(exponent, mode='edge', position='center', **kwargs)[source]¶
Pad images such that dimensions are divisible by 2^n used in test.
- Parameters
exponent (int) – the exponent n of 2^n
mode (str) – mode for numpy.pad(). Defaults to ‘edge’.
position (str) – ‘center’, ‘left’, ‘right’, ‘top’ and ‘down’. Defaults to ‘center’
- class mmflow.datasets.pipelines.InputResize(exponent)[source]¶
Resize images such that dimensions are divisible by 2^n :param exponent: the exponent n of 2^n :type exponent: int
- Returns
- Resized results, ‘img_shape’, ‘scale_factor’ keys are added
into result dict.
- Return type
dict
- class mmflow.datasets.pipelines.LoadAnnotations(with_occ: bool = False, sparse: bool = False, file_client_args: dict = {'backend': 'disk'})[source]¶
Load optical flow from file.
- Parameters
with_occ (bool) – whether to parse and load occlusion mask. Default to False.
sparse (bool) – whether the flow is sparse. Default to False.
file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Defaults todict(backend='disk')
.
- class mmflow.datasets.pipelines.LoadImageFromFile(to_float32: bool = False, color_type: str = 'color', file_client_args: dict = {'backend': 'disk'}, imdecode_backend: str = 'cv2')[source]¶
Load image1 and image2 from file.
Required keys are “img1_info” (dict that must contain the key “filename” and “filename2”). Added or updated keys are “img1”, “img2”, “img_shape”, “ori_shape” (same as img_shape), “pad_shape” (same as img_shape), “scale_factor” (1.0, 1.0) and “img_norm_cfg” (means=0 and stds=1).
- Parameters
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
color_type (str) – The flag argument for
mmcv.imfrombytes()
. Defaults to ‘color’.file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Defaults todict(backend='disk')
.imdecode_backend (str) – Backend for
mmcv.imdecode()
. Default: ‘cv2’
- class mmflow.datasets.pipelines.Normalize(mean, std, to_rgb=True)[source]¶
Normalize the image.
Added key is “img_norm_cfg”. :param mean: Mean values of 3 channels. :type mean: sequence :param std: Std values of 3 channels. :type std: sequence :param to_rgb: Whether to convert the image from BGR to RGB,
default is true.
- class mmflow.datasets.pipelines.PhotoMetricDistortion(brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18)[source]¶
Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5.
The position of random contrast is in second or second to last. 1. random brightness 2. random contrast (mode 0) 3. convert color from BGR to HSV 4. random saturation 5. random hue 6. convert color from HSV to BGR 7. random contrast (mode 1) 8. randomly swap channels :param brightness_delta: delta of brightness. :type brightness_delta: int :param contrast_range: range of contrast. :type contrast_range: tuple :param saturation_range: range of saturation. :type saturation_range: tuple :param hue_delta: delta of hue. :type hue_delta: int
- class mmflow.datasets.pipelines.RandomAffine(global_transform: Optional[dict] = None, relative_transform: Optional[dict] = None, preserve_valid: bool = True, check_bound: bool = False)[source]¶
Random affine transformation of images, flow map and occlusion map (if available).
Keys of global_transform and relative_transform should be the subset of (‘translates’, ‘zoom’, ‘shear’, ‘rotate’). And also, each key and its corresponding values has to satisfy the following rules:
- translates: the translation ratios along x axis and y axis. Defaults
to(0., 0.).
zoom: the min and max zoom ratios. Defaults to (1.0, 1.0).
shear: the min and max shear ratios. Defaults to (1.0, 1.0).
rotate: the min and max rotate degree. Defaults to (0., 0.).
- Parameters
global_transform (dict) – A dict which contains keys: transform, zoom, shear, rotate. global_transform will transform both img1 and img2.
relative_transform (dict) – A dict which contains keys: transform, zoom, shear, rotate. relative_transform will only transform img2 after global_transform to both images.
preserve_valid (bool) – Whether continue transforming until both images are valid. A valid affine transform is an affine transform which guarantees the transformed image covers the whole original picture frame. Defaults to True.
check_bound (bool) – Whether to check out of bound for transformed occlusion maps. If True, all pixels in borders of img1 but not in borders of img2 will be marked occluded. Defaults to False.
- class mmflow.datasets.pipelines.RandomCrop(crop_size)[source]¶
Random crop the image & flow.
- Parameters
crop_size (tuple) – Expected size after cropping, (h, w).
- class mmflow.datasets.pipelines.RandomFlip(prob, direction='horizontal')[source]¶
Flip the image and flow map.
- Parameters
prob (float) – The flipping probability.
direction (str) – The flipping direction. Options are ‘horizontal’ and ‘vertical’. Default: ‘horizontal’.
- class mmflow.datasets.pipelines.RandomRotation(prob, angle, auto_bound=False)[source]¶
Random rotation of the image from -angle to angle (in degrees).
optical flow data.
- Parameters
prob (float) – The rotation probability.
angle (float) – max angle of the rotation in the range from -180 to 180.
auto_bound (bool) – Whether to adjust the image size to cover the whole rotated image. Default: False
- class mmflow.datasets.pipelines.RandomTranslate(prob=0.0, x_offset=0.0, y_offset=0.0)[source]¶
Random translation of the images and flow map.
optical flow data.
- Parameters
prob (float) – the probability to do translation.
x_offset (float | tuple) – translate ratio on x axis, randomly choice [-x_offset, x_offset] or the given [min, max]. Default: 0.
y_offset (float | tuple) – translate ratio on y axis, randomly choice [-x_offset, x_offset] or the given [min, max]. Default: 0.
- class mmflow.datasets.pipelines.Rerange(min_value=0, max_value=255)[source]¶
Rerange the image pixel value.
- Parameters
min_value (float or int) – Minimum value of the reranged image. Default: 0.
max_value (float or int) – Maximum value of the reranged image. Default: 255.
- class mmflow.datasets.pipelines.SpacialTransform(spacial_prob: float, stretch_prob: float, crop_size: Sequence, min_scale: float = - 0.2, max_scale: float = 0.5, max_stretch: float = 0.2)[source]¶
Spacial Transform API for RAFT :param spacial_prob: probability to do spacial transform. :type spacial_prob: float :param stretch_prob: probability to do stretch. :type stretch_prob: float :param crop_size: the base size for resize. :type crop_size: tuple, list :param min_scale: the exponent for min scale. Defaults to -0.2. :type min_scale: float :param max_scale: the exponent for max scale. Defaults to 0.5. :type max_scale: float
- Returns
Resized results, ‘img_shape’,
- Return type
dict
- resize_sparse_flow_map(flow: numpy.ndarray, valid: numpy.ndarray, fx: float = 1.0, fy: float = 1.0, x0: int = 0, y0: int = 0) → Sequence[numpy.ndarray][source]¶
Resize sparse optical flow function.
- Parameters
flow (ndarray) – optical flow data will be resized.
valid (ndarray) – valid mask for sparse optical flow.
fx (float, optional) – horizontal scale factor. Defaults to 1.0.
fy (float, optional) – vertical scale factor. Defaults to 1.0.
x0 (int, optional) – abscissa of left-top point where the flow map will be crop from. Defaults to 0.
y0 (int, optional) – ordinate of left-top point where the flow map will be crop from. Defaults to 0.
- Returns
the transformed flow map and valid mask.
- Return type
Sequence[ndarray]
- spacial_transform(imgs: numpy.ndarray) → Tuple[numpy.ndarray, float, float, int, int][source]¶
Spacial transform function.
- Parameters
imgs (ndarray) – the images that will be transformed.
- Returns
- the transformed images,
horizontal scale factor, vertical scale factor, coordinate of left-top point where the image maps will be crop from.
- Return type
Tuple[ndarray, float, float, int, int]
- class mmflow.datasets.pipelines.TestFormatBundle[source]¶
Default formatting bundle.
It simplifies the pipeline of formatting common fields, including “img1” and “img2”. These fields are formatted as follows.
img1: (1)transpose, (2)to tensor, (3)to DataContainer (stack=True)
img2: (1)transpose, (2)to tensor, (3)to DataContainer (stack=True)
- class mmflow.datasets.pipelines.ToDataContainer(fields: collections.abc.Sequence = ({'key': 'img1', 'stack': True}, {'key': 'img2', 'stack': True}, {'key': 'flow_gt'}))[source]¶
Convert results to
mmcv.DataContainer
by given fields.- Parameters
fields (Sequence[dict]) – Each field is a dict like
dict(key='xxx', **kwargs)
. Thekey
in result will be converted tommcv.DataContainer
with**kwargs
. Default:(dict(key='img1', stack=True), dict(key='img2', stack=True), dict(key='flow_gt'))
.
- class mmflow.datasets.pipelines.ToTensor(keys: collections.abc.Sequence)[source]¶
Convert some results to
torch.Tensor
by given keys.- Parameters
keys (Sequence[str]) – Keys that need to be converted to Tensor.
- class mmflow.datasets.pipelines.Transpose(keys: collections.abc.Sequence, order: collections.abc.Sequence)[source]¶
Transpose some results by given keys.
- Parameters
keys (Sequence[str]) – Keys of results to be transposed.
order (Sequence[int]) – Order of transpose.
- class mmflow.datasets.pipelines.Validation(max_flow: Union[float, int])[source]¶
This Validation transform from RAFT is for return a mask for the flow is less than max_flow.
- Parameters
max_flow (float, int) – the max flow for validated flow.
- Returns
- Resized results, ‘valid’ and ‘max_flow’ keys are added into
result dict.
- Return type
dict