Large Object Support¶
Overview¶
Swift has a limit on the size of a single uploaded object; by default this is 5GB. However, the download size of a single object is virtually unlimited with the concept of segmentation. Segments of the larger object are uploaded and a special manifest file is created that, when downloaded, sends all the segments concatenated as a single object. This also offers much greater upload speed with the possibility of parallel uploads of the segments.
Dynamic Large Objects¶
Middleware that will provide Dynamic Large Object (DLO) support.
Using swift
¶
The quickest way to try out this feature is use the swift
Swift Tool
included with the python-swiftclient library. You can use the -S
option to specify the segment size to use when splitting a large file. For
example:
swift upload test_container -S 1073741824 large_file
This would split the large_file into 1G segments and begin uploading those
segments in parallel. Once all the segments have been uploaded, swift
will
then create the manifest file so the segments can be downloaded as one.
So now, the following swift
command would download the entire large
object:
swift download test_container large_file
swift
command uses a strict convention for its segmented object
support. In the above example it will upload all the segments into a
second container named test_container_segments. These segments will
have names like large_file/1290206778.25/21474836480/00000000,
large_file/1290206778.25/21474836480/00000001, etc.
The main benefit for using a separate container is that the main container listings will not be polluted with all the segment names. The reason for using the segment name format of <name>/<timestamp>/<size>/<segment> is so that an upload of a new file with the same name won’t overwrite the contents of the first until the last moment when the manifest file is updated.
swift
will manage these segment files for you, deleting old segments on
deletes and overwrites, etc. You can override this behavior with the
--leave-segments
option if desired; this is useful if you want to have
multiple versions of the same large object available.
Direct API¶
You can also work with the segments and manifests directly with HTTP
requests instead of having swift
do that for you. You can just
upload the segments like you would any other object and the manifest
is just a zero-byte (not enforced) file with an extra
X-Object-Manifest
header.
All the object segments need to be in the same container, have a common object
name prefix, and sort in the order in which they should be concatenated.
Object names are sorted lexicographically as UTF-8 byte strings.
They don’t have to be in the same container as the manifest file will be, which
is useful to keep container listings clean as explained above with swift
.
The manifest file is simply a zero-byte (not enforced) file with the extra
X-Object-Manifest: <container>/<prefix>
header, where <container>
is
the container the object segments are in and <prefix>
is the common prefix
for all the segments.
It is best to upload all the segments first and then create or update the manifest. In this way, the full object won’t be available for downloading until the upload is complete. Also, you can upload a new set of segments to a second location and then update the manifest to point to this new location. During the upload of the new segments, the original manifest will still be available to download the first set of segments.
Note
When updating a manifest object using a POST request, a
X-Object-Manifest
header must be included for the object to
continue to behave as a manifest object.
The manifest file should have no content. However, this is not enforced.
If the manifest path itself conforms to container/prefix specified in
X-Object-Manifest
, and if manifest has some content/data in it, it
would also be considered as segment and manifest’s content will be part of
the concatenated GET response. The order of concatenation follows the usual
DLO logic which is - the order of concatenation adheres to order returned
when segment names are sorted.
Here’s an example using curl
with tiny 1-byte segments:
# First, upload the segments
curl -X PUT -H 'X-Auth-Token: <token>' http://<storage_url>/container/myobject/00000001 --data-binary '1'
curl -X PUT -H 'X-Auth-Token: <token>' http://<storage_url>/container/myobject/00000002 --data-binary '2'
curl -X PUT -H 'X-Auth-Token: <token>' http://<storage_url>/container/myobject/00000003 --data-binary '3'
# Next, create the manifest file
curl -X PUT -H 'X-Auth-Token: <token>' -H 'X-Object-Manifest: container/myobject/' http://<storage_url>/container/myobject --data-binary ''
# And now we can download the segments as a single object
curl -H 'X-Auth-Token: <token>' http://<storage_url>/container/myobject
-
class
swift.common.middleware.dlo.
GetContext
(dlo, logger)¶ Bases:
swift.common.wsgi.WSGIContext
-
get_or_head_response
(req, x_object_manifest)¶ - Parameters
req – user’s request
x_object_manifest – as unquoted, native string
-
handle_request
(req, start_response)¶ Take a GET or HEAD request, and if it is for a dynamic large object manifest, return an appropriate response.
Otherwise, simply pass it through.
-
Static Large Objects¶
Middleware that will provide Static Large Object (SLO) support.
This feature is very similar to Dynamic Large Object (DLO) support in that it allows the user to upload many objects concurrently and afterwards download them as a single object. It is different in that it does not rely on eventually consistent container listings to do so. Instead, a user defined manifest of the object segments is used.
Uploading the Manifest¶
After the user has uploaded the objects to be concatenated, a manifest is
uploaded. The request must be a PUT
with the query parameter:
?multipart-manifest=put
The body of this request will be an ordered list of segment descriptions in JSON format. The data to be supplied for each segment is either:
Key |
Description |
---|---|
path |
the path to the segment object (not including account) /container/object_name |
etag |
(optional) the ETag given back when the segment object was PUT |
size_bytes |
(optional) the size of the complete segment object in bytes |
range |
(optional) the (inclusive) range within the object to use as a segment. If omitted, the entire object is used |
Or:
Key |
Description |
---|---|
data |
base64-encoded data to be returned |
Note
At least one object-backed segment must be included. If you’d like to create a manifest consisting purely of data segments, consider uploading a normal object instead.
The format of the list will be:
[{"path": "/cont/object",
"etag": "etagoftheobjectsegment",
"size_bytes": 10485760,
"range": "1048576-2097151"},
{"data": base64.b64encode("interstitial data")},
{"path": "/cont/another-object", ...},
...]
The number of object-backed segments is limited to max_manifest_segments
(configurable in proxy-server.conf, default 1000). Each segment must be at
least 1 byte. On upload, the middleware will head every object-backed segment
passed in to verify:
the segment exists (i.e. the
HEAD
was successful);the segment meets minimum size requirements;
if the user provided a non-null
etag
, the etag matches;if the user provided a non-null
size_bytes
, the size_bytes matches; andif the user provided a
range
, it is a singular, syntactically correct range that is satisfiable given the size of the object referenced.
For inlined data segments, the middleware verifies each is valid, non-empty
base64-encoded binary data. Note that data segments do not count against
max_manifest_segments
.
Note that the etag
and size_bytes
keys are optional; if omitted, the
verification is not performed. If any of the objects fail to verify (not
found, size/etag mismatch, below minimum size, invalid range) then the user
will receive a 4xx error response. If everything does match, the user will
receive a 2xx response and the SLO object is ready for downloading.
Note that large manifests may take a long time to verify; historically,
clients would need to use a long read timeout for the connection to give
Swift enough time to send a final 201 Created
or 400 Bad Request
response. Now, clients should use the query parameters:
?multipart-manifest=put&heartbeat=on
to request that Swift send an immediate 202 Accepted
response and periodic
whitespace to keep the connection alive. A final response code will appear in
the body. The format of the response body defaults to text/plain but can be
either json or xml depending on the Accept
header. An example body is as
follows:
Response Status: 201 Created
Response Body:
Etag: "8f481cede6d2ddc07cb36aa084d9a64d"
Last Modified: Wed, 25 Oct 2017 17:08:55 GMT
Errors:
Or, as a json response:
{"Response Status": "201 Created",
"Response Body": "",
"Etag": "\"8f481cede6d2ddc07cb36aa084d9a64d\"",
"Last Modified": "Wed, 25 Oct 2017 17:08:55 GMT",
"Errors": []}
Behind the scenes, on success, a JSON manifest generated from the user input is
sent to object servers with an extra X-Static-Large-Object: True
header
and a modified Content-Type
. The items in this manifest will include the
etag
and size_bytes
for each segment, regardless of whether the client
specified them for verification. The parameter swift_bytes=$total_size
will
be appended to the existing Content-Type
, where $total_size
is the sum
of all the included segments’ size_bytes
. This extra parameter will be
hidden from the user.
Manifest files can reference objects in separate containers, which will improve
concurrent upload speed. Objects can be referenced by multiple manifests. The
segments of a SLO manifest can even be other SLO manifests. Treat them as any
other object i.e., use the Etag
and Content-Length
given on the PUT
of the sub-SLO in the manifest to the parent SLO.
While uploading a manifest, a user can send Etag
for verification. It needs
to be md5 of the segments’ etags, if there is no range specified. For example,
if the manifest to be uploaded looks like this:
[{"path": "/cont/object1",
"etag": "etagoftheobjectsegment1",
"size_bytes": 10485760},
{"path": "/cont/object2",
"etag": "etagoftheobjectsegment2",
"size_bytes": 10485760}]
The Etag of the above manifest would be md5 of etagoftheobjectsegment1
and
etagoftheobjectsegment2
. This could be computed in the following way:
echo -n 'etagoftheobjectsegment1etagoftheobjectsegment2' | md5sum
If a manifest to be uploaded with a segment range looks like this:
[{"path": "/cont/object1",
"etag": "etagoftheobjectsegmentone",
"size_bytes": 10485760,
"range": "1-2"},
{"path": "/cont/object2",
"etag": "etagoftheobjectsegmenttwo",
"size_bytes": 10485760,
"range": "3-4"}]
While computing the Etag of the above manifest, internally each segment’s etag
will be taken in the form of etagvalue:rangevalue;
. Hence the Etag of the
above manifest would be:
echo -n 'etagoftheobjectsegmentone:1-2;etagoftheobjectsegmenttwo:3-4;' \
| md5sum
For the purposes of Etag computations, inlined data segments are considered to have an etag of the md5 of the raw data (i.e., not base64-encoded).
Range Specification¶
Users now have the ability to specify ranges for SLO segments.
Users can include an optional range
field in segment descriptions
to specify which bytes from the underlying object should be used for the
segment data. Only one range may be specified per segment.
Note
The etag
and size_bytes
fields still describe the backing object
as a whole.
If a user uploads this manifest:
[{"path": "/con/obj_seg_1", "size_bytes": 2097152, "range": "0-1048576"},
{"path": "/con/obj_seg_2", "size_bytes": 2097152,
"range": "512-1550000"},
{"path": "/con/obj_seg_1", "size_bytes": 2097152, "range": "-2048"}]
The segment will consist of the first 1048576 bytes of /con/obj_seg_1, followed by bytes 513 through 1550000 (inclusive) of /con/obj_seg_2, and finally bytes 2095104 through 2097152 (i.e., the last 2048 bytes) of /con/obj_seg_1.
Note
The minimum sized range is 1 byte. This is the same as the minimum segment size.
Inline Data Specification¶
When uploading a manifest, users can include ‘data’ segments that should be included along with objects. The data in these segments must be base64-encoded binary data and will be included in the etag of the resulting large object exactly as if that data had been uploaded and referenced as separate objects.
Note
This feature is primarily aimed at reducing the need for storing
many tiny objects, and as such any supplied data must fit within
the maximum manifest size (default is 8MiB). This maximum size
can be configured via max_manifest_size
in proxy-server.conf.
Retrieving a Large Object¶
A GET
request to the manifest object will return the concatenation of the
objects from the manifest much like DLO. If any of the segments from the
manifest are not found or their Etag
/Content-Length
have changed since
upload, the connection will drop. In this case a 409 Conflict
will be
logged in the proxy logs and the user will receive incomplete results. Note
that this will be enforced regardless of whether the user performed per-segment
validation during upload.
The headers from this GET
or HEAD
request will return the metadata
attached to the manifest object itself with some exceptions:
Header |
Value |
---|---|
Content-Length |
the total size of the SLO (the sum of the sizes of the segments in the manifest) |
X-Static-Large-Object |
the string “True” |
Etag |
the etag of the SLO (generated the same way as DLO) |
A GET
request with the query parameter:
?multipart-manifest=get
will return a transformed version of the original manifest, containing additional fields and different key names. For example, the first manifest in the example above would look like this:
[{"name": "/cont/object",
"hash": "etagoftheobjectsegment",
"bytes": 10485760,
"range": "1048576-2097151"}, ...]
As you can see, some of the fields are renamed compared to the put request: path is name, etag is hash, size_bytes is bytes. The range field remains the same (if present).
A GET request with the query parameters:
?multipart-manifest=get&format=raw
will return the contents of the original manifest as it was sent by the client. The main purpose for both calls is solely debugging.
When the manifest object is uploaded you are more or less guaranteed that every segment in the manifest exists and matched the specifications. However, there is nothing that prevents the user from breaking the SLO download by deleting/replacing a segment referenced in the manifest. It is left to the user to use caution in handling the segments.
Deleting a Large Object¶
A DELETE
request will just delete the manifest object itself. The segment
data referenced by the manifest will remain unchanged.
A DELETE
with a query parameter:
?multipart-manifest=delete
will delete all the segments referenced in the manifest and then the manifest itself. The failure response will be similar to the bulk delete middleware.
A DELETE
with the query parameters:
?multipart-manifest=delete&async=yes
will schedule all the segments referenced in the manifest to be deleted asynchronously and then delete the manifest itself. Note that segments will continue to appear in listings and be counted for quotas until they are cleaned up by the object-expirer. This option is only available when all segments are in the same container and none of them are nested SLOs.
Modifying a Large Object¶
PUT
and POST
requests will work as expected; PUT
s will just
overwrite the manifest object for example.
Container Listings¶
In a container listing the size listed for SLO manifest objects will be the
total_size
of the concatenated segments in the manifest. The overall
X-Container-Bytes-Used
for the container (and subsequently for the account)
will not reflect total_size
of the manifest but the actual size of the JSON
data stored. The reason for this somewhat confusing discrepancy is we want the
container listing to reflect the size of the manifest object when it is
downloaded. We do not, however, want to count the bytes-used twice (for both
the manifest and the segments it’s referring to) in the container and account
metadata which can be used for stats and billing purposes.
-
class
swift.common.middleware.slo.
SloGetContext
(slo)¶ Bases:
swift.common.wsgi.WSGIContext
-
convert_segment_listing
(resp_headers, resp_iter)¶ Converts the manifest data to match with the format that was put in through ?multipart-manifest=put
- Parameters
resp_headers – response headers
resp_iter – a response iterable
-
handle_slo_get_or_head
(req, start_response)¶ Takes a request and a start_response callable and does the normal WSGI thing with them. Returns an iterator suitable for sending up the WSGI chain.
- Parameters
req –
Request
object; is aGET
orHEAD
request aimed at what may (or may not) be a static large object manifest.start_response – WSGI start_response callable
-
-
class
swift.common.middleware.slo.
StaticLargeObject
(app, conf, max_manifest_segments=1000, max_manifest_size=8388608, yield_frequency=10, allow_async_delete=False)¶ Bases:
object
StaticLargeObject Middleware
See above for a full description.
The proxy logs created for any subrequests made will have swift.source set to “SLO”.
- Parameters
app – The next WSGI filter or app in the paste.deploy chain.
conf – The configuration dict for the middleware.
max_manifest_segments – The maximum number of segments allowed in newly-created static large objects.
max_manifest_size – The maximum size (in bytes) of newly-created static-large-object manifests.
yield_frequency – If the client included
heartbeat=on
in the query parameters when creating a new static large object, the period of time to wait between sending whitespace to keep the connection alive.
-
get_segments_to_delete_iter
(req)¶ A generator function to be used to delete all the segments and sub-segments referenced in a manifest.
- Parameters
req – a
Request
with an SLO manifest in path- Raises
HTTPPreconditionFailed – on invalid UTF8 in request path
HTTPBadRequest – on too many buffered sub segments and on invalid SLO manifest path
-
get_slo_segments
(obj_name, req)¶ Performs a
Request
and returns the SLO manifest’s segments.- Parameters
obj_name – the name of the object being deleted, as
/container/object
req – the base
Request
- Raises
HTTPServerError – on unable to load obj_name or on unable to load the SLO manifest data.
HTTPBadRequest – on not an SLO manifest
HTTPNotFound – on SLO manifest not found
- Returns
SLO manifest’s segments
-
handle_multipart_delete
(req)¶ Will delete all the segments in the SLO manifest and then, if successful, will delete the manifest file.
- Parameters
req – a
Request
with an obj in path- Returns
swob.Response whose app_iter set to Bulk.handle_delete_iter
-
handle_multipart_get_or_head
(req, start_response)¶ Handles the GET or HEAD of a SLO manifest.
The response body (only on GET, of course) will consist of the concatenation of the segments.
- Parameters
req – a
Request
with a path referencing an objectstart_response – WSGI start_response callable
- Raises
HttpException – on errors
-
handle_multipart_put
(req, start_response)¶ Will handle the PUT of a SLO manifest. Heads every object in manifest to check if is valid and if so will save a manifest generated from the user input. Uses WSGIContext to call self and start_response and returns a WSGI iterator.
- Parameters
req – a
Request
with an obj in pathstart_response – WSGI start_response callable
- Raises
HttpException – on errors
-
swift.common.middleware.slo.
parse_and_validate_input
(req_body, req_path)¶ Given a request body, parses it and returns a list of dictionaries.
The output structure is nearly the same as the input structure, but it is not an exact copy. Given a valid object-backed input dictionary
d_in
, its corresponding output dictionaryd_out
will be as follows:d_out[‘etag’] == d_in[‘etag’]
d_out[‘path’] == d_in[‘path’]
d_in[‘size_bytes’] can be a string (“12”) or an integer (12), but d_out[‘size_bytes’] is an integer.
(optional) d_in[‘range’] is a string of the form “M-N”, “M-“, or “-N”, where M and N are non-negative integers. d_out[‘range’] is the corresponding swob.Range object. If d_in does not have a key ‘range’, neither will d_out.
Inlined data dictionaries will have any extraneous padding stripped.
- Raises
HTTPException on parse errors or semantic errors (e.g. bogus JSON structure, syntactically invalid ranges)
- Returns
a list of dictionaries on success
Direct API¶
SLO support centers around the user generated manifest file. After the user has uploaded the segments into their account a manifest file needs to be built and uploaded. All object segments, must be at least 1 byte in size. Please see the SLO docs for Static Large Objects further details.
Additional Notes¶
With a
GET
orHEAD
of a manifest file, theX-Object-Manifest: <container>/<prefix>
header will be returned with the concatenated object so you can tell where it’s getting its segments from.When updating a manifest object using a POST request, a
X-Object-Manifest
header must be included for the object to continue to behave as a manifest object.The response’s
Content-Length
for aGET
orHEAD
on the manifest file will be the sum of all the segments in the<container>/<prefix>
listing, dynamically. So, uploading additional segments after the manifest is created will cause the concatenated object to be that much larger; there’s no need to recreate the manifest file.The response’s
Content-Type
for aGET
orHEAD
on the manifest will be the same as theContent-Type
set during thePUT
request that created the manifest. You can easily change theContent-Type
by reissuing thePUT
.The response’s
ETag
for aGET
orHEAD
on the manifest file will be the MD5 sum of the concatenated string of ETags for each of the segments in the manifest (for DLO, from the listing<container>/<prefix>
). Usually in Swift the ETag is the MD5 sum of the contents of the object, and that holds true for each segment independently. But it’s not meaningful to generate such an ETag for the manifest itself so this method was chosen to at least offer change detection.
Note
If you are using the container sync feature you will need to ensure both your manifest file and your segment files are synced if they happen to be in different containers.
History¶
Dynamic large object support has gone through various iterations before settling on this implementation.
The primary factor driving the limitation of object size in Swift is maintaining balance among the partitions of the ring. To maintain an even dispersion of disk usage throughout the cluster the obvious storage pattern was to simply split larger objects into smaller segments, which could then be glued together during a read.
Before the introduction of large object support some applications were already splitting their uploads into segments and re-assembling them on the client side after retrieving the individual pieces. This design allowed the client to support backup and archiving of large data sets, but was also frequently employed to improve performance or reduce errors due to network interruption. The major disadvantage of this method is that knowledge of the original partitioning scheme is required to properly reassemble the object, which is not practical for some use cases, such as CDN origination.
In order to eliminate any barrier to entry for clients wanting to store objects larger than 5GB, initially we also prototyped fully transparent support for large object uploads. A fully transparent implementation would support a larger max size by automatically splitting objects into segments during upload within the proxy without any changes to the client API. All segments were completely hidden from the client API.
This solution introduced a number of challenging failure conditions into the cluster, wouldn’t provide the client with any option to do parallel uploads, and had no basis for a resume feature. The transparent implementation was deemed just too complex for the benefit.
The current “user manifest” design was chosen in order to provide a transparent download of large objects to the client and still provide the uploading client a clean API to support segmented uploads.
To meet an many use cases as possible Swift supports two types of large object manifests. Dynamic and static large object manifests both support the same idea of allowing the user to upload many segments to be later downloaded as a single file.
Dynamic large objects rely on a container listing to provide the manifest. This has the advantage of allowing the user to add/removes segments from the manifest at any time. It has the disadvantage of relying on eventually consistent container listings. All three copies of the container dbs must be updated for a complete list to be guaranteed. Also, all segments must be in a single container, which can limit concurrent upload speed.
Static large objects rely on a user provided manifest file. A user can upload objects into multiple containers and then reference those objects (segments) in a self generated manifest file. Future GETs to that file will download the concatenation of the specified segments. This has the advantage of being able to immediately download the complete object once the manifest has been successfully PUT. Being able to upload segments into separate containers also improves concurrent upload speed. It has the disadvantage that the manifest is finalized once PUT. Any changes to it means it has to be replaced.
Between these two methods the user has great flexibility in how (s)he chooses to upload and retrieve large objects to Swift. Swift does not, however, stop the user from harming themselves. In both cases the segments are deletable by the user at any time. If a segment was deleted by mistake, a dynamic large object, having no way of knowing it was ever there, would happily ignore the deleted file and the user will get an incomplete file. A static large object would, when failing to retrieve the object specified in the manifest, drop the connection and the user would receive partial results.