Container

Container Auditor

class swift.container.auditor.ContainerAuditor(conf, logger=None)

Bases: swift.common.db_auditor.DatabaseAuditor

Audit containers.

broker_class

alias of swift.container.backend.ContainerBroker

server_type = 'container'

Container Backend

Pluggable Back-ends for Container Server

class swift.container.backend.ContainerBroker(db_file, timeout=25, logger=None, account=None, container=None, pending_timeout=None, stale_reads_ok=False, skip_commits=False, force_db_file=False)

Bases: swift.common.db.DatabaseBroker

Encapsulates working with a container database.

Note that this may involve multiple on-disk DB files if the container becomes sharded:

  • _db_file is the path to the legacy container DB name, i.e. <hash>.db. This file should exist for an initialised broker that has never been sharded, but will not exist once a container has been sharded.

  • db_files is a list of existing db files for the broker. This list should have at least one entry for an initialised broker, and should have two entries while a broker is in SHARDING state.

  • db_file is the path to whichever db is currently authoritative for the container. Depending on the container’s state, this may not be the same as the db_file argument given to __init__(), unless force_db_file is True in which case db_file is always equal to the db_file argument given to __init__().

  • pending_file is always equal to _db_file extended with .pending, i.e. <hash>.db.pending.

classmethod create_broker(device_path, part, account, container, logger=None, epoch=None, put_timestamp=None, storage_policy_index=None)

Create a ContainerBroker instance. If the db doesn’t exist, initialize the db file.

Parameters
  • device_path – device path

  • part – partition number

  • account – account name string

  • container – container name string

  • logger – a logger instance

  • epoch – a timestamp to include in the db filename

  • put_timestamp – initial timestamp if broker needs to be initialized

  • storage_policy_index – the storage policy index

Returns

a tuple of (broker, initialized) where broker is an instance of swift.container.backend.ContainerBroker and initialized is True if the db file was initialized, False otherwise.

create_container_info_table(conn, put_timestamp, storage_policy_index)

Create the container_info table which is specific to the container DB. Not a part of Pluggable Back-ends, internal to the baseline code. Also creates the container_stat view.

Parameters
  • conn – DB connection object

  • put_timestamp – put timestamp

  • storage_policy_index – storage policy index

create_object_table(conn)

Create the object table which is specific to the container DB. Not a part of Pluggable Back-ends, internal to the baseline code.

Parameters

conn – DB connection object

create_policy_stat_table(conn, storage_policy_index=0)

Create policy_stat table.

Parameters
  • conn – DB connection object

  • storage_policy_index – the policy_index the container is being created with

create_shard_range_table(conn)

Create the shard_range table which is specific to the container DB.

Parameters

conn – DB connection object

db_contains_type = 'object'
property db_epoch
property db_file

Get the path to the primary db file for this broker. This is typically the db file for the most recent sharding epoch. However, if no db files exist on disk, or if force_db_file was True when the broker was constructed, then the primary db file is the file passed to the broker constructor.

Returns

A path to a db file; the file does not necessarily exist.

property db_files

Gets the cached list of valid db files that exist on disk for this broker.

The cached list may be refreshed by calling

reload_db_files().

Returns

A list of paths to db files ordered by ascending epoch; the list may be empty.

db_reclaim_timestamp = 'created_at'
db_type = 'container'
delete_meta_whitelist = ['x-container-sysmeta-shard-quoted-root', 'x-container-sysmeta-shard-root', 'x-container-sysmeta-sharding']
delete_object(name, timestamp, storage_policy_index=0)

Mark an object deleted.

Parameters
  • name – object name to be deleted

  • timestamp – timestamp when the object was marked as deleted

  • storage_policy_index – the storage policy index for the object

empty()

Check if container DB is empty.

This method uses more stringent checks on object count than is_deleted(): this method checks that there are no objects in any policy; if the container is in the process of sharding then both fresh and retiring databases are checked to be empty; if a root container has shard ranges then they are checked to be empty.

Returns

True if the database has no active objects, False otherwise

enable_sharding(epoch)

Updates this broker’s own shard range with the given epoch, sets its state to SHARDING and persists it in the DB.

Parameters

epoch – a Timestamp

Returns

the broker’s updated own shard range.

find_shard_ranges(shard_size, limit=- 1, existing_ranges=None, minimum_shard_size=1)

Scans the container db for shard ranges. Scanning will start at the upper bound of the any existing_ranges that are given, otherwise at ShardRange.MIN. Scanning will stop when limit shard ranges have been found or when no more shard ranges can be found. In the latter case, the upper bound of the final shard range will be equal to the upper bound of the container namespace.

This method does not modify the state of the db; callers are responsible for persisting any shard range data in the db.

Parameters
  • shard_size – the size of each shard range

  • limit – the maximum number of shard points to be found; a negative value (default) implies no limit.

  • existing_ranges – an optional list of existing ShardRanges; if given, this list should be sorted in order of upper bounds; the scan for new shard ranges will start at the upper bound of the last existing ShardRange.

  • minimum_shard_size – Minimum size of the final shard range. If this is greater than one then the final shard range may be extended to more than shard_size in order to avoid a further shard range with less minimum_shard_size rows.

Returns

a tuple; the first value in the tuple is a list of dicts each having keys {‘index’, ‘lower’, ‘upper’, ‘object_count’} in order of ascending ‘upper’; the second value in the tuple is a boolean which is True if the last shard range has been found, False otherwise.

get_all_shard_range_data()

Returns a list of all shard range data, including own shard range and deleted shard ranges.

Returns

A list of dict representations of a ShardRange.

get_brokers()

Return a list of brokers for component dbs. The list has two entries while the db state is sharding: the first entry is a broker for the retiring db with skip_commits set to True; the second entry is a broker for the fresh db with skip_commits set to False. For any other db state the list has one entry.

Returns

a list of ContainerBroker

get_db_state()

Returns the current state of on disk db files.

get_db_version(conn)
get_info()

Get global data for the container.

Returns

dict with keys: account, container, created_at, put_timestamp, delete_timestamp, status, status_changed_at, object_count, bytes_used, reported_put_timestamp, reported_delete_timestamp, reported_object_count, reported_bytes_used, hash, id, x_container_sync_point1, x_container_sync_point2, and storage_policy_index, db_state.

get_info_is_deleted()

Get the is_deleted status and info for the container.

Returns

a tuple, in the form (info, is_deleted) info is a dict as returned by get_info and is_deleted is a boolean.

get_misplaced_since(start, count)

Get a list of objects which are in a storage policy different from the container’s storage policy.

Parameters
  • start – last reconciler sync point

  • count – maximum number of entries to get

Returns

list of dicts with keys: name, created_at, size, content_type, etag, storage_policy_index

get_objects(limit=None, marker='', end_marker='', include_deleted=None, since_row=None)

Returns a list of objects, including deleted objects, in all policies. Each object in the list is described by a dict with keys {‘name’, ‘created_at’, ‘size’, ‘content_type’, ‘etag’, ‘deleted’, ‘storage_policy_index’}.

Parameters
  • limit – maximum number of entries to get

  • marker – if set, objects with names less than or equal to this value will not be included in the list.

  • end_marker – if set, objects with names greater than or equal to this value will not be included in the list.

  • include_deleted – if True, include only deleted objects; if False, include only undeleted objects; otherwise (default), include both deleted and undeleted objects.

  • since_row – include only items whose ROWID is greater than the given row id; by default all rows are included.

Returns

a list of dicts, each describing an object.

get_own_shard_range(no_default=False)

Returns a shard range representing this broker’s own shard range. If no such range has been persisted in the broker’s shard ranges table then a default shard range representing the entire namespace will be returned.

The object_count and bytes_used of the returned shard range are not guaranteed to be up-to-date with the current object stats for this broker. Callers that require up-to-date stats should use the get_info method.

Parameters

no_default – if True and the broker’s own shard range is not found in the shard ranges table then None is returned, otherwise a default shard range is returned.

Returns

an instance of ShardRange

get_policy_stats()
get_reconciler_sync()
get_replication_info()

Get information about the DB required for replication.

Returns

dict containing keys from get_info plus max_row and metadata

Note:: get_info’s <db_contains_type>_count is translated to just

“count” and metadata is the raw string.

get_shard_ranges(marker=None, end_marker=None, includes=None, reverse=False, include_deleted=False, states=None, include_own=False, exclude_others=False, fill_gaps=False)

Returns a list of persisted shard ranges.

Parameters
  • marker – restricts the returned list to shard ranges whose namespace includes or is greater than the marker value.

  • end_marker – restricts the returned list to shard ranges whose namespace includes or is less than the end_marker value.

  • includes – restricts the returned list to the shard range that includes the given value; if includes is specified then marker and end_marker are ignored.

  • reverse – reverse the result order.

  • include_deleted – include items that have the delete marker set

  • states – if specified, restricts the returned list to shard ranges that have the given state(s); can be a list of ints or a single int.

  • include_own – boolean that governs whether the row whose name matches the broker’s path is included in the returned list. If True, that row is included, otherwise it is not included. Default is False.

  • exclude_others – boolean that governs whether the rows whose names do not match the broker’s path are included in the returned list. If True, those rows are not included, otherwise they are included. Default is False.

  • fill_gaps – if True, insert a modified copy of own shard range to fill any gap between the end of any found shard ranges and the upper bound of own shard range. Gaps enclosed within the found shard ranges are not filled.

Returns

a list of instances of swift.common.utils.ShardRange

get_shard_usage()

Get the aggregate object stats for all shard ranges in states ACTIVE, SHARDING or SHRINKING.

Returns

a dict with keys {bytes_used, object_count}

get_sharding_sysmeta(key=None)

Returns sharding specific info from the broker’s metadata.

Parameters

key – if given the value stored under key in the sharding info will be returned.

Returns

either a dict of sharding info or the value stored under key in that dict.

get_sharding_sysmeta_with_timestamps()

Returns sharding specific info from the broker’s metadata with timestamps.

Parameters

key – if given the value stored under key in the sharding info will be returned.

Returns

a dict of sharding info with their timestamps.

has_multiple_policies()
is_empty_enough_to_reclaim()
is_old_enough_to_reclaim(now, reclaim_age)
is_own_shard_range(shard_range)
is_reclaimable(now, reclaim_age)

Check if the broker abstraction is empty, and has been marked deleted for at least a reclaim age.

is_root_container()

Returns True if this container is a root container, False otherwise.

A root container is a container that is not a shard of another container.

is_sharded()
list_objects_iter(limit, marker, end_marker, prefix, delimiter, path=None, storage_policy_index=0, reverse=False, include_deleted=False, since_row=None, transform_func=None, all_policies=False, allow_reserved=False)

Get a list of objects sorted by name starting at marker onward, up to limit entries. Entries will begin with the prefix and will not have the delimiter after the prefix.

Parameters
  • limit – maximum number of entries to get

  • marker – marker query

  • end_marker – end marker query

  • prefix – prefix query

  • delimiter – delimiter for query

  • path – if defined, will set the prefix and delimiter based on the path

  • storage_policy_index – storage policy index for query

  • reverse – reverse the result order.

  • include_deleted – if True, include only deleted objects; if False (default), include only undeleted objects; otherwise, include both deleted and undeleted objects.

  • since_row – include only items whose ROWID is greater than the given row id; by default all rows are included.

  • transform_func – an optional function that if given will be called for each object to get a transformed version of the object to include in the listing; should have same signature as _transform_record(); defaults to _transform_record().

  • all_policies – if True, include objects for all storage policies ignoring any value given for storage_policy_index

  • allow_reserved – exclude names with reserved-byte by default

Returns

list of tuples of (name, created_at, size, content_type, etag, deleted)

make_tuple_for_pickle(record)

Turn this db record dict into the format this service uses for pending pickles.

merge_items(item_list, source=None)

Merge items into the object table.

Parameters
  • item_list – list of dictionaries of {‘name’, ‘created_at’, ‘size’, ‘content_type’, ‘etag’, ‘deleted’, ‘storage_policy_index’, ‘ctype_timestamp’, ‘meta_timestamp’}

  • source – if defined, update incoming_sync with the source

merge_shard_ranges(shard_ranges)

Merge shard ranges into the shard range table.

Parameters

shard_ranges – a shard range or a list of shard ranges; each shard range should be an instance of ShardRange or a dict representation of a shard range having SHARD_RANGE_KEYS.

property path
put_object(name, timestamp, size, content_type, etag, deleted=0, storage_policy_index=0, ctype_timestamp=None, meta_timestamp=None)

Creates an object in the DB with its metadata.

Parameters
  • name – object name to be created

  • timestamp – timestamp of when the object was created

  • size – object size

  • content_type – object content-type

  • etag – object etag

  • deleted – if True, marks the object as deleted and sets the deleted_at timestamp to timestamp

  • storage_policy_index – the storage policy index for the object

  • ctype_timestamp – timestamp of when content_type was last updated

  • meta_timestamp – timestamp of when metadata was last updated

reload_db_files()

Reloads the cached list of valid on disk db files for this broker.

remove_objects(lower, upper, max_row=None)

Removes object records in the given namespace range from the object table.

Note that objects are removed regardless of their storage_policy_index.

Parameters
  • lower – defines the lower bound of object names that will be removed; names greater than this value will be removed; names less than or equal to this value will not be removed.

  • upper – defines the upper bound of object names that will be removed; names less than or equal to this value will be removed; names greater than this value will not be removed. The empty string is interpreted as there being no upper bound.

  • max_row – if specified only rows less than or equal to max_row will be removed

reported(put_timestamp, delete_timestamp, object_count, bytes_used)

Update reported stats, available with container’s get_info.

Parameters
  • put_timestamp – put_timestamp to update

  • delete_timestamp – delete_timestamp to update

  • object_count – object_count to update

  • bytes_used – bytes_used to update

classmethod resolve_shard_range_states(states)

Given a list of values each of which may be the name of a state, the number of a state, or an alias, return the set of state numbers described by the list.

The following alias values are supported: ‘listing’ maps to all states that are considered valid when listing objects; ‘updating’ maps to all states that are considered valid for redirecting an object update; ‘auditing’ maps to all states that are considered valid for a shard container that is updating its own shard range table from a root (this currently maps to all states except FOUND).

Parameters

states – a list of values each of which may be the name of a state, the number of a state, or an alias

Returns

a set of integer state numbers, or None if no states are given

Raises

ValueError – if any value in the given list is neither a valid state nor a valid alias

property root_account
property root_container
property root_path
set_sharded_state()

Unlink’s the broker’s retiring DB file.

Returns

True if the retiring DB was successfully unlinked, False otherwise.

set_sharding_state()

Creates and initializes a fresh DB file in preparation for sharding a retiring DB. The broker’s own shard range must have an epoch timestamp for this method to succeed.

Returns

True if the fresh DB was successfully created, False otherwise.

set_sharding_sysmeta(key, value)

Updates the broker’s metadata stored under the given key prefixed with a sharding specific namespace.

Parameters
  • key – metadata key in the sharding metadata namespace.

  • value – metadata value

set_storage_policy_index(policy_index, timestamp=None)

Update the container_stat policy_index and status_changed_at.

set_x_container_sync_points(sync_point1, sync_point2)
sharding_initiated()

Returns True if a broker has shard range state that would be necessary for sharding to have been initiated, False otherwise.

sharding_required()

Returns True if a broker has shard range state that would be necessary for sharding to have been initiated but has not yet completed sharding, False otherwise.

property storage_policy_index
update_reconciler_sync(point)
swift.container.backend.merge_shards(shard_data, existing)

Compares shard_data with existing and updates shard_data with any items of existing that take precedence over the corresponding item in shard_data.

Parameters
  • shard_data – a dict representation of shard range that may be modified by this method.

  • existing – a dict representation of shard range.

Returns

True if shard data has any item(s) that are considered to take precedence over the corresponding item in existing

swift.container.backend.sift_shard_ranges(new_shard_ranges, existing_shard_ranges)

Compares new and existing shard ranges, updating the new shard ranges with any more recent state from the existing, and returns shard ranges sorted into those that need adding because they contain new or updated state and those that need deleting because their state has been superseded.

Parameters
  • new_shard_ranges – a list of dicts, each of which represents a shard range.

  • existing_shard_ranges – a dict mapping shard range names to dicts representing a shard range.

Returns

a tuple (to_add, to_delete); to_add is a list of dicts, each of which represents a shard range that is to be added to the existing shard ranges; to_delete is a set of shard range names that are to be deleted.

swift.container.backend.update_new_item_from_existing(new_item, existing)

Compare the data and meta related timestamps of a new object item with the timestamps of an existing object record, and update the new item with data and/or meta related attributes from the existing record if their timestamps are newer.

The multiple timestamps are encoded into a single string for storing in the ‘created_at’ column of the objects db table.

Parameters
  • new_item – A dict of object update attributes

  • existing – A dict of existing object attributes

Returns

True if any attributes of the new item dict were found to be newer than the existing and therefore not updated, otherwise False implying that the updated item is equal to the existing.

Container Replicator

class swift.container.replicator.ContainerReplicator(conf, logger=None)

Bases: swift.common.db_replicator.Replicator

brokerclass

alias of swift.container.backend.ContainerBroker

cleanup_post_replicate(broker, orig_info, responses)

Cleanup non primary database from disk if needed.

Parameters
  • broker – the broker for the database we’re replicating

  • orig_info – snapshot of the broker replication info dict taken before replication

  • responses – a list of boolean success values for each replication request to other nodes

Return success

returns False if deletion of the database was attempted but unsuccessful, otherwise returns True.

datadir = 'containers'
default_port = 6201
delete_db(broker)

Ensure that reconciler databases are only cleaned up at the end of the replication run.

dump_to_reconciler(broker, point)

Look for object rows for objects updates in the wrong storage policy in broker with a ROWID greater than the rowid given as point.

Parameters
  • broker – the container broker with misplaced objects

  • point – the last verified reconciler_sync_point

Returns

the last successful enqueued rowid

feed_reconciler(container, item_list)

Add queue entries for rows in item_list to the local reconciler container database.

Parameters
  • container – the name of the reconciler container

  • item_list – the list of rows to enqueue

Returns

True if successfully enqueued

find_local_handoff_for_part(part)

Find a device in the ring that is on this node on which to place a partition. Preference is given to a device that is a primary location for the partition. If no such device is found then a local device with weight is chosen, and failing that any local device.

Parameters

part – a partition

Returns

a node entry from the ring

get_reconciler_broker(timestamp)

Get a local instance of the reconciler container broker that is appropriate to enqueue the given timestamp.

Parameters

timestamp – the timestamp of the row to be enqueued

Returns

a local reconciler broker

replicate_reconcilers()

Ensure any items merged to reconciler containers during replication are pushed out to correct nodes and any reconciler containers that do not belong on this node are removed.

report_up_to_date(full_info)
run_once(*args, **kwargs)

Run a replication pass once.

server_type = 'container'
class swift.container.replicator.ContainerReplicatorRpc(root, datadir, broker_class, mount_check=True, logger=None)

Bases: swift.common.db_replicator.ReplicatorRpc

get_shard_ranges(broker, args)
merge_shard_ranges(broker, args)

Container Server

class swift.container.server.ContainerController(conf, logger=None)

Bases: swift.common.base_storage_server.BaseStorageServer

WSGI Controller for the container server.

DELETE(req)

Handle HTTP DELETE request.

GET(req)

Handle HTTP GET request.

The body of the response to a successful GET request contains a listing of either objects or shard ranges. The exact content of the listing is determined by a combination of request headers and query string parameters, as follows:

  • The type of the listing is determined by the X-Backend-Record-Type header. If this header has value shard then the response body will be a list of shard ranges; if this header has value auto, and the container state is sharding or sharded, then the listing will be a list of shard ranges; otherwise the response body will be a list of objects.

  • Both shard range and object listings may be filtered according to the constraints described below. However, the X-Backend-Ignore-Shard-Name-Filter header may be used to override the application of the marker, end_marker, includes and reverse parameters to shard range listings. These parameters will be ignored if the header has the value ‘sharded’ and the current db sharding state is also ‘sharded’. Note that this header does not override the states constraint on shard range listings.

  • The order of both shard range and object listings may be reversed by using a reverse query string parameter with a value in swift.common.utils.TRUE_VALUES.

  • Both shard range and object listings may be constrained to a name range by the marker and end_marker query string parameters. Object listings will only contain objects whose names are greater than any marker value and less than any end_marker value. Shard range listings will only contain shard ranges whose namespace is greater than or includes any marker value and is less than or includes any end_marker value.

  • Shard range listings may also be constrained by an includes query string parameter. If this parameter is present the listing will only contain shard ranges whose namespace includes the value of the parameter; any marker or end_marker parameters are ignored

  • The length of an object listing may be constrained by the limit parameter. Object listings may also be constrained by prefix, delimiter and path query string parameters.

  • Shard range listings will include deleted shard ranges if and only if the X-Backend-Include-Deleted header value is one of swift.common.utils.TRUE_VALUES. Object listings never include deleted objects.

  • Shard range listings may be constrained to include only shard ranges whose state is specified by a query string states parameter. If present, the states parameter should be a comma separated list of either the string or integer representation of STATES.

    Two alias values may be used in a states parameter value: listing will cause the listing to include all shard ranges in a state suitable for contributing to an object listing; updating will cause the listing to include all shard ranges in a state suitable to accept an object update.

    If either of these aliases is used then the shard range listing will if necessary be extended with a synthesised ‘filler’ range in order to satisfy the requested name range when insufficient actual shard ranges are found. Any ‘filler’ shard range will cover the otherwise uncovered tail of the requested name range and will point back to the same container.

  • Listings are not normally returned from a deleted container. However, the X-Backend-Override-Deleted header may be used with a value in swift.common.utils.TRUE_VALUES to force a shard range listing to be returned from a deleted container whose DB file still exists.

Parameters

req – an instance of swift.common.swob.Request

Returns

an instance of swift.common.swob.Response

HEAD(req)

Handle HTTP HEAD request.

POST(req)

Handle HTTP POST request.

PUT(req)

Handle HTTP PUT request.

REPLICATE(req)

Handle HTTP REPLICATE request (json-encoded RPC calls for replication.)

UPDATE(req)

Handle HTTP UPDATE request (merge_items RPCs coming from the proxy.)

account_update(req, account, container, broker)

Update the account server(s) with latest container info.

Parameters
  • req – swob.Request object

  • account – account name

  • container – container name

  • broker – container DB broker object

Returns

if all the account requests return a 404 error code, HTTPNotFound response object, if the account cannot be updated due to a malformed header, an HTTPBadRequest response object, otherwise None.

allowed_sync_hosts

The list of hosts we’re allowed to send syncs to. This can be overridden by data in self.realms_conf

check_free_space(drive)
create_listing(req, out_content_type, info, resp_headers, metadata, container_list, container)
get_and_validate_policy_index(req)

Validate that the index supplied maps to a policy.

Returns

policy index from request, or None if not present

Raises

HTTPBadRequest – if the supplied index is bogus

realms_conf

ContainerSyncCluster instance for validating sync-to values.

save_headers = ['x-container-read', 'x-container-write', 'x-container-sync-key', 'x-container-sync-to']
server_type = 'container-server'
update_data_record(record)

Perform any mutations to container listing records that are common to all serialization formats, and returns it as a dict.

Converts created time to iso timestamp. Replaces size with ‘swift_bytes’ content type parameter.

Params record

object entry record

Returns

modified record

swift.container.server.app_factory(global_conf, **local_conf)

paste.deploy app factory for creating WSGI container server apps

swift.container.server.gen_resp_headers(info, is_deleted=False)

Convert container info dict to headers.

swift.container.server.get_container_name_and_placement(req)

Split and validate path for a container.

Parameters

req – a swob request

Returns

a tuple of path parts as strings

swift.container.server.get_obj_name_and_placement(req)

Split and validate path for an object.

Parameters

req – a swob request

Returns

a tuple of path parts as strings

Container Reconciler

class swift.container.reconciler.ContainerReconciler(conf, logger=None, swift=None)

Bases: swift.common.daemon.Daemon

Move objects that are in the wrong storage policy.

can_reconcile_policy(policy_index)
ensure_object_in_right_location(q_policy_index, account, container, obj, q_ts, path, container_policy_index, source_ts, source_obj_status, source_obj_info, source_obj_iter, **kwargs)

Validate source object will satisfy the misplaced object queue entry and move to destination.

Parameters
  • q_policy_index – the policy_index for the source object

  • account – the account name of the misplaced object

  • container – the container name of the misplaced object

  • obj – the name of the misplaced object

  • q_ts – the timestamp of the misplaced object

  • path – the full path of the misplaced object for logging

  • container_policy_index – the policy_index of the destination

  • source_ts – the timestamp of the source object

  • source_obj_status – the HTTP status source object request

  • source_obj_info – the HTTP headers of the source object request

  • source_obj_iter – the body iter of the source object request

ensure_tombstone_in_right_location(q_policy_index, account, container, obj, q_ts, path, container_policy_index, source_ts, **kwargs)

Issue a DELETE request against the destination to match the misplaced DELETE against the source.

log_route = 'container-reconciler'
log_stats(force=False)

Dump stats to logger, noop when stats have been already been logged in the last minute.

pop_queue(container, obj, q_ts, q_record)

Issue a delete object request to the container for the misplaced object queue entry.

Parameters
  • container – the misplaced objects container

  • obj – the name of the misplaced object

  • q_ts – the timestamp of the misplaced object

  • q_record – the timestamp of the queue entry

N.B. q_ts will normally be the same time as q_record except when an object was manually re-enqued.

process_queue_item(q_container, q_entry, queue_item)

Process an entry and remove from queue on success.

Parameters
  • q_container – the queue container

  • q_entry – the raw_obj name from the q_container

  • queue_item – a parsed entry from the queue

reconcile()

Main entry point for concurrent processing of misplaced objects.

Iterate over all queue entries and delegate processing to spawned workers in the pool.

reconcile_object(info)

Process a possibly misplaced object write request. Determine correct destination storage policy by checking with primary containers. Check source and destination, copying or deleting into destination and cleaning up the source as needed.

This method wraps _reconcile_object for exception handling.

Parameters

info – a queue entry dict

Returns

True to indicate the request is fully processed successfully, otherwise False.

run_forever(*args, **kwargs)

Override this to run forever

run_once(*args, **kwargs)

Process every entry in the queue.

should_process(queue_item)

Check if a given entry should be handled by this process.

Parameters
  • container – the queue container

  • queue_item – an entry from the queue

stats_log(metric, msg, *args, **kwargs)

Update stats tracking for metric and emit log message.

throw_tombstones(account, container, obj, timestamp, policy_index, path)

Issue a delete object request to the given storage_policy.

Parameters
  • account – the account name

  • container – the container name

  • obj – the object name

  • timestamp – the timestamp of the object to delete

  • policy_index – the policy index to direct the request

  • path – the path to be used for logging

swift.container.reconciler.add_to_reconciler_queue(container_ring, account, container, obj, obj_policy_index, obj_timestamp, op, force=False, conn_timeout=5, response_timeout=15)

Add an object to the container reconciler’s queue. This will cause the container reconciler to move it from its current storage policy index to the correct storage policy index.

Parameters
  • container_ring – container ring

  • account – the misplaced object’s account

  • container – the misplaced object’s container

  • obj – the misplaced object

  • obj_policy_index – the policy index where the misplaced object currently is

  • obj_timestamp – the misplaced object’s X-Timestamp. We need this to ensure that the reconciler doesn’t overwrite a newer object with an older one.

  • op – the method of the operation (DELETE or PUT)

  • force – over-write queue entries newer than obj_timestamp

  • conn_timeout – max time to wait for connection to container server

  • response_timeout – max time to wait for response from container server

Returns

.misplaced_object container name, False on failure. “Success” means a majority of containers got the update.

swift.container.reconciler.best_policy_index(headers)
swift.container.reconciler.cmp_policy_info(info, remote_info)

You have to squint to see it, but the general strategy is just:

if either has been recreated:

return the newest (of the recreated)

else

return the oldest

I tried cleaning it up for awhile, but settled on just writing a bunch of tests instead. Once you get an intuitive sense for the nuance here you can try and see there’s a better way to spell the boolean logic but it all ends up looking sorta hairy.

Returns

-1 if info is correct, 1 if remote_info is better

swift.container.reconciler.direct_delete_container_entry(container_ring, account_name, container_name, object_name, headers=None)

Talk directly to the primary container servers to delete a particular object listing. Does not talk to object servers; use this only when a container entry does not actually have a corresponding object.

swift.container.reconciler.get_reconciler_container_name(obj_timestamp)

Get the name of a container into which a misplaced object should be enqueued. The name is the object’s last modified time rounded down to the nearest hour.

Parameters

obj_timestamp – a string representation of the object’s ‘created_at’ time from it’s container db row.

Returns

a container name

swift.container.reconciler.get_reconciler_content_type(op)
swift.container.reconciler.get_reconciler_obj_name(policy_index, account, container, obj)
swift.container.reconciler.get_row_to_q_entry_translator(broker)
swift.container.reconciler.incorrect_policy_index(info, remote_info)

Compare remote_info to info and decide if the remote storage policy index should be used instead of ours.

swift.container.reconciler.parse_raw_obj(obj_info)

Translate a reconciler container listing entry to a dictionary containing the parts of the misplaced object queue entry.

Parameters

obj_info – an entry in an a container listing with the required keys: name, content_type, and hash

Returns

a queue entry dict with the keys: q_policy_index, account, container, obj, q_op, q_ts, q_record, and path

swift.container.reconciler.slightly_later_timestamp(ts, offset=1)
swift.container.reconciler.translate_container_headers_to_info(headers)

Container Sharder

class swift.container.sharder.CleavingContext(ref, cursor='', max_row=None, cleave_to_row=None, last_cleave_to_row=None, cleaving_done=False, misplaced_done=False, ranges_done=0, ranges_todo=0)

Bases: object

Encapsulates metadata associated with the process of cleaving a retiring DB. This metadata includes:

  • ref: The unique part of the key that is used when persisting a serialized CleavingContext as sysmeta in the DB. The unique part of the key is based off the DB id. This ensures that each context is associated with a specific DB file. The unique part of the key is included in the CleavingContext but should not be modified by any caller.

  • cursor: the upper bound of the last shard range to have been cleaved from the retiring DB.

  • max_row: the retiring DB’s max row; this is updated to the value of the retiring DB’s max_row every time a CleavingContext is loaded for that DB, and may change during the process of cleaving the DB.

  • cleave_to_row: the value of max_row at the moment when cleaving starts for the DB. When cleaving completes (i.e. the cleave cursor has reached the upper bound of the cleaving namespace), cleave_to_row is compared to the current max_row: if the two values are not equal then rows have been added to the DB which may not have been cleaved, in which case the CleavingContext is reset and cleaving is re-started.

  • last_cleave_to_row: the minimum DB row from which cleaving should select objects to cleave; this is initially set to None i.e. all rows should be cleaved. If the CleavingContext is reset then the last_cleave_to_row is set to the current value of cleave_to_row, which in turn is set to the current value of max_row by a subsequent call to start. The repeated cleaving therefore only selects objects in rows greater than the last_cleave_to_row, rather than cleaving the whole DB again.

  • ranges_done: the number of shard ranges that have been cleaved from the retiring DB.

  • ranges_todo: the number of shard ranges that are yet to be cleaved from the retiring DB.

property cursor
delete(broker)
done()
classmethod load(broker)

Returns a CleavingContext tracking the cleaving progress of the given broker’s DB.

Parameters

broker – an instances of ContainerBroker

Returns

An instance of CleavingContext.

classmethod load_all(broker)

Returns all cleaving contexts stored in the broker’s DB.

Parameters

broker – an instance of ContainerBroker

Returns

list of tuples of (CleavingContext, timestamp)

property marker
range_done(new_cursor)
reset()
start()
store(broker)

Persists the serialized CleavingContext as sysmeta in the given broker’s DB.

Parameters

broker – an instances of ContainerBroker

class swift.container.sharder.ContainerSharder(conf, logger=None)

Bases: swift.container.sharder.ContainerSharderConf, swift.container.replicator.ContainerReplicator

Shards containers.

debug(broker, msg, *args, **kwargs)
error(broker, msg, *args, **kwargs)
exception(broker, msg, *args, **kwargs)
info(broker, msg, *args, **kwargs)
log_route = 'container-sharder'
run_forever(*args, **kwargs)

Run the container sharder until stopped.

run_once(*args, **kwargs)

Run the container sharder once.

warning(broker, msg, *args, **kwargs)
yield_objects(broker, src_shard_range, since_row=None, batch_size=None)

Iterates through all object rows in src_shard_range in name order yielding them in lists of up to batch_size in length. All batches of rows that are not marked deleted are yielded before all batches of rows that are marked deleted.

Parameters
  • broker – A ContainerBroker.

  • src_shard_range – A ShardRange describing the source range.

  • since_row – include only object rows whose ROWID is greater than the given row id; by default all object rows are included.

  • batch_size – The maximum number of object rows to include in each yielded batch; defaults to cleave_row_batch_size.

Returns

a generator of tuples of (list of rows, broker info dict)

yield_objects_to_shard_range(broker, src_shard_range, dest_shard_ranges)

Iterates through all object rows in src_shard_range to place them in destination shard ranges provided by the dest_shard_ranges function. Yields tuples of (batch of object rows, destination shard range in which those object rows belong, broker info).

If no destination shard range exists for a batch of object rows then tuples are yielded of (batch of object rows, None, broker info). This indicates to the caller that there are a non-zero number of object rows for which no destination shard range was found.

Note that the same destination shard range may be referenced in more than one yielded tuple.

Parameters
  • broker – A ContainerBroker.

  • src_shard_range – A ShardRange describing the source range.

  • dest_shard_ranges – A function which should return a list of destination shard ranges sorted in the order defined by sort_key().

Returns

a generator of tuples of (object row list, shard range, broker info dict) where shard_range may be None.

class swift.container.sharder.ContainerSharderConf(conf=None)

Bases: object

percent_of_threshold(val)
classmethod validate_conf(namespace)
swift.container.sharder.combine_shard_ranges(new_shard_ranges, existing_shard_ranges)

Combines new and existing shard ranges based on most recent state.

Parameters
  • new_shard_ranges – a list of ShardRange instances.

  • existing_shard_ranges – a list of ShardRange instances.

Returns

a list of ShardRange instances.

swift.container.sharder.finalize_shrinking(broker, acceptor_ranges, donor_ranges, timestamp)

Update donor shard ranges to shrinking state and merge donors and acceptors to broker.

Parameters
  • broker – A ContainerBroker.

  • acceptor_ranges – A list of ShardRange that are to be acceptors.

  • donor_ranges – A list of ShardRange that are to be donors; these will have their state and timestamp updated.

  • timestamp – timestamp to use when updating donor state

swift.container.sharder.find_compactible_shard_sequences(broker, shrink_threshold, expansion_limit, max_shrinking, max_expanding, include_shrinking=False)

Find sequences of shard ranges that could be compacted into a single acceptor shard range.

This function does not modify shard ranges.

Parameters
  • broker – A ContainerBroker.

  • shrink_threshold – the number of rows below which a shard may be considered for shrinking into another shard

  • expansion_limit – the maximum number of rows that an acceptor shard range should have after other shard ranges have been compacted into it

  • max_shrinking – the maximum number of shard ranges that should be compacted into each acceptor; -1 implies unlimited.

  • max_expanding – the maximum number of acceptors to be found (i.e. the maximum number of sequences to be returned); -1 implies unlimited.

  • include_shrinking – if True then existing compactible sequences are included in the results; default is False.

Returns

A list of ShardRangeList each containing a sequence of neighbouring shard ranges that may be compacted; the final shard range in the list is the acceptor

swift.container.sharder.find_overlapping_ranges(shard_ranges, exclude_parent_child=False, time_period=0)

Find all pairs of overlapping ranges in the given list.

Parameters
  • shard_ranges – A list of ShardRange

  • exclude_parent_child – If True then overlapping pairs that have a parent-child relationship within the past time period time_period are excluded from the returned set. Default is False.

  • time_period – the specified past time period in seconds. Value of 0 means all time in the past.

Returns

a set of tuples, each tuple containing ranges that overlap with each other.

swift.container.sharder.find_paths(shard_ranges)

Returns a list of all continuous paths through the shard ranges. An individual path may not necessarily span the entire namespace, but it will span a continuous namespace without gaps.

Parameters

shard_ranges – A list of ShardRange.

Returns

A list of ShardRangeList.

swift.container.sharder.find_paths_with_gaps(shard_ranges, within_range=None)

Find gaps in the shard ranges and pairs of shard range paths that lead to and from those gaps. For each gap a single pair of adjacent paths is selected. The concatenation of all selected paths and gaps will span the entire namespace with no overlaps.

Parameters
  • shard_ranges – a list of instances of ShardRange.

  • within_range – an optional ShardRange that constrains the search space; the method will only return gaps within this range. The default is the entire namespace.

Returns

A list of tuples of (start_path, gap_range, end_path) where start_path is a list of ShardRanges leading to the gap, gap_range is a ShardRange synthesized to describe the namespace gap, and end_path is a list of ShardRanges leading from the gap. When gaps start or end at the namespace minimum or maximum bounds, start_path and end_path may be ‘null’ paths that contain a single ShardRange covering either the minimum or maximum of the namespace.

swift.container.sharder.find_sharding_candidates(broker, threshold, shard_ranges=None)
swift.container.sharder.find_shrinking_candidates(broker, shrink_threshold, expansion_limit)
swift.container.sharder.is_sharding_candidate(shard_range, threshold)
swift.container.sharder.is_shrinking_candidate(shard_range, shrink_threshold, expansion_limit, states=None)
swift.container.sharder.make_shard_ranges(broker, shard_data, shards_account_prefix)
swift.container.sharder.process_compactible_shard_sequences(broker, sequences)

Transform the given sequences of shard ranges into a list of acceptors and a list of shrinking donors. For each given sequence the final ShardRange in the sequence (the acceptor) is expanded to accommodate the other ShardRanges in the sequence (the donors). The donors and acceptors are then merged into the broker.

Parameters
swift.container.sharder.random() x in the interval [0, 1).
swift.container.sharder.rank_paths(paths, shard_range_to_span)

Sorts the given list of paths such that the most preferred path is the first item in the list.

Parameters
  • paths – A list of ShardRangeList.

  • shard_range_to_span – An instance of ShardRange that describes the namespace that would ideally be spanned by a path. Paths that include this namespace will be preferred over those that do not.

Returns

A sorted list of ShardRangeList.

swift.container.sharder.sharding_enabled(broker)
swift.container.sharder.update_own_shard_range_stats(broker, own_shard_range)

Update the own_shard_range with the up-to-date object stats from the broker.

Note: this method does not persist the updated own_shard_range; callers should use broker.merge_shard_ranges if the updated stats need to be persisted.

Parameters
  • broker – an instance of ContainerBroker.

  • own_shard_range – and instance of ShardRange.

Returns

own_shard_range with up-to-date object_count and bytes_used.

Container Sync

class swift.container.sync.ContainerSync(conf, container_ring=None, logger=None)

Bases: swift.common.daemon.Daemon

Daemon to sync syncable containers.

This is done by scanning the local devices for container databases and checking for x-container-sync-to and x-container-sync-key metadata values. If they exist, newer rows since the last sync will trigger PUTs or DELETEs to the other container.

The actual syncing is slightly more complicated to make use of the three (or number-of-replicas) main nodes for a container without each trying to do the exact same work but also without missing work if one node happens to be down.

Two sync points are kept per container database. All rows between the two sync points trigger updates. Any rows newer than both sync points cause updates depending on the node’s position for the container (primary nodes do one third, etc. depending on the replica count of course). After a sync run, the first sync point is set to the newest ROWID known and the second sync point is set to newest ROWID for which all updates have been sent.

An example may help. Assume replica count is 3 and perfectly matching ROWIDs starting at 1.

First sync run, database has 6 rows:

  • SyncPoint1 starts as -1.

  • SyncPoint2 starts as -1.

  • No rows between points, so no “all updates” rows.

  • Six rows newer than SyncPoint1, so a third of the rows are sent by node 1, another third by node 2, remaining third by node 3.

  • SyncPoint1 is set as 6 (the newest ROWID known).

  • SyncPoint2 is left as -1 since no “all updates” rows were synced.

Next sync run, database has 12 rows:

  • SyncPoint1 starts as 6.

  • SyncPoint2 starts as -1.

  • The rows between -1 and 6 all trigger updates (most of which should short-circuit on the remote end as having already been done).

  • Six more rows newer than SyncPoint1, so a third of the rows are sent by node 1, another third by node 2, remaining third by node 3.

  • SyncPoint1 is set as 12 (the newest ROWID known).

  • SyncPoint2 is set as 6 (the newest “all updates” ROWID).

In this way, under normal circumstances each node sends its share of updates each run and just sends a batch of older updates to ensure nothing was missed.

Parameters
  • conf – The dict of configuration values from the [container-sync] section of the container-server.conf

  • container_ring – If None, the <swift_dir>/container.ring.gz will be loaded. This is overridden by unit tests.

allowed_sync_hosts

The list of hosts we’re allowed to send syncs to. This can be overridden by data in self.realms_conf

conf

The dict of configuration values from the [container-sync] section of the container-server.conf.

container_deletes

Number of successful DELETEs triggered.

container_failures

Number of containers that had a failure of some type.

container_puts

Number of successful PUTs triggered.

container_report(start, end, sync_point1, sync_point2, info, max_row)
container_ring

swift.common.ring.Ring for locating containers.

container_skips

Number of containers whose sync has been turned off, but are not yet cleared from the sync store.

container_stats

Per container stats. These are collected per container. puts - the number of puts that were done for the container deletes - the number of deletes that were fot the container bytes - the total number of bytes transferred per the container

container_sync(path)

Checks the given path for a container database, determines if syncing is turned on for that database and, if so, sends any updates to the other container.

Parameters

path – the path to a container db

container_sync_row(row, sync_to, user_key, broker, info, realm, realm_key)

Sends the update the row indicates to the sync_to container. Update can be either delete or put.

Parameters
  • row – The updated row in the local database triggering the sync update.

  • sync_to – The URL to the remote container.

  • user_key – The X-Container-Sync-Key to use when sending requests to the other container.

  • broker – The local container database broker.

  • info – The get_info result from the local container database broker.

  • realm – The realm from self.realms_conf, if there is one. If None, fallback to using the older allowed_sync_hosts way of syncing.

  • realm_key – The realm key from self.realms_conf, if there is one. If None, fallback to using the older allowed_sync_hosts way of syncing.

Returns

True on success

container_syncs

Number of containers with sync turned on that were successfully synced.

container_time

Maximum amount of time to spend syncing a container before moving on to the next one. If a container sync hasn’t finished in this time, it’ll just be resumed next scan.

devices

Path to the local device mount points.

interval

Minimum time between full scans. This is to keep the daemon from running wild on near empty systems.

log_route = 'container-sync'
logger

Logger to use for container-sync log lines.

mount_check

Indicates whether mount points should be verified as actual mount points (normally true, false for tests and SAIO).

realms_conf

ContainerSyncCluster instance for validating sync-to values.

report()

Writes a report of the stats to the logger and resets the stats for the next report.

reported

Time of last stats report.

run_forever(*args, **kwargs)

Runs container sync scans until stopped.

run_once(*args, **kwargs)

Runs a single container sync scan.

select_http_proxy()
sync_store

ContainerSyncStore instance for iterating over synced containers

swift.container.sync.random() x in the interval [0, 1).

Container Updater

class swift.container.updater.ContainerUpdater(conf, logger=None)

Bases: swift.common.daemon.Daemon

Update container information in account listings.

container_report(node, part, container, put_timestamp, delete_timestamp, count, bytes, storage_policy_index)

Report container info to an account server.

Parameters
  • node – node dictionary from the account ring

  • part – partition the account is on

  • container – container name

  • put_timestamp – put timestamp

  • delete_timestamp – delete timestamp

  • count – object count in the container

  • bytes – bytes used in the container

  • storage_policy_index – the policy index for the container

container_sweep(path)

Walk the path looking for container DBs and process them.

Parameters

path – path to walk

get_account_ring()

Get the account ring. Load it if it hasn’t been yet.

get_paths()

Get paths to all of the partitions on each drive to be processed.

Returns

a list of paths

process_container(dbfile)

Process a container, and update the information in the account.

Parameters

dbfile – container DB to process

run_forever(*args, **kwargs)

Run the updater continuously.

run_once(*args, **kwargs)

Run the updater once.

swift.container.updater.random() x in the interval [0, 1).