Wednesday, November 18, 2015

Recursively Find all the Files and Sizes of a Bucket on S3

Say I want to recurse through a S3 bucket, find all the file sizes and sum them up? Easy:

s3cmd ls s3://your-s3-bucket/ --recursive | awk -F' ' '{s +=$3} END {print s}' 

The output of s3cmd ls looks like:

2015-11-15 12:22   4482528   s3://bucket/-4878692415071619643--6245724311294558574_479343588_data.0
2015-11-15 12:34  34398163   s3://bucket/-6827273792407145391--2667978502585357890_1957252193_data.0
2015-11-15 12:46   4558355   s3://bucket/2184012989583635362-3242759126742622102_1630577622_data.0
2015-11-15 12:59  13297607   s3://bucket/6147240539106964522-4824521201578762651_240049741_data.0

So you want to split on the spaces, and take the size argument (3rd argument) and recursively sum them. That's what awk -F' ' '{s +=$3} does (the -F ' ' splits on whitespace). The END {print s} prints out the sum at the end.

No comments:

Post a Comment