aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorArun Isaac2025-01-03 22:13:03 +0000
committerArun Isaac2025-01-03 22:13:03 +0000
commit997bd4952195e80a080e4480be7006ddda6ac23e (patch)
treeeec50e82879a2f4575644ac7e00e0661b036bebc
downloadglobus-weblinks-997bd4952195e80a080e4480be7006ddda6ac23e.tar.gz
globus-weblinks-997bd4952195e80a080e4480be7006ddda6ac23e.tar.lz
globus-weblinks-997bd4952195e80a080e4480be7006ddda6ac23e.zip
Initial commit
-rw-r--r--README.md28
-rw-r--r--UNLICENSE24
-rwxr-xr-xglobus-weblinks42
-rw-r--r--manifest.scm2
4 files changed, 96 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..fd98fab
--- /dev/null
+++ b/README.md
@@ -0,0 +1,28 @@
+This python script is a quick hack to download data from Globus via HTTPS and without having to set up any Globus-specific tools. This is convenient if you simply want to download data from a Globus collection, and don't wish to set up their complex proprietary tooling.
+
+# Dependencies
+
+[globus-sdk](https://pypi.org/project/globus-sdk/) is the only dependency. The easiest way is to use GNU Guix. You will need the [guix-bioinformatics channel](https://git.genenetwork.org/guix-bioinformatics/about/).
+```
+guix shell -m manifest.scm
+```
+
+# Find the endpoint ID of your collection
+
+Log in to the Globus web app, go to the `Collections` page, and find the collection you are interested in. When you click on it, you will be taken to an `Overview` page which will show the `UUID` of the collection. That is the endpoint ID.
+
+# Authorize app and get HTTPS links for all files in your collection
+
+Run the globus-weblinks script passing in your endpoint ID. The script will prompt you for authorization. Once authorized, it will print out HTTPS links to all your files. Write the links to a file.
+```
+./globus-weblinks <YOUR-ENDPOINT-ID> > weblinks
+```
+
+# Download your files using wget
+
+You can now download your files using `wget`. But first, you will need cookies to authenticate the download. We need to extract these cookies from a browser session. This is a somewhat cumbersome process. Here's one way to do it. In the Globus web app, download any file from your collection whilst inspecting network traffic. Copy the HTTPS request for the file by right clicking it and selecting "Copy as cURL". One of the parameters in the copied curl command should be the cookie header we need. Use it with wget like so.
+```
+wget --header 'Cookie: mod_globus_OIDC=aloooooooooooongrandomcookiestring' -i weblinks
+```
+
+Enjoy!
diff --git a/UNLICENSE b/UNLICENSE
new file mode 100644
index 0000000..efb9808
--- /dev/null
+++ b/UNLICENSE
@@ -0,0 +1,24 @@
+This is free and unencumbered software released into the public domain.
+
+Anyone is free to copy, modify, publish, use, compile, sell, or
+distribute this software, either in source code form or as a compiled
+binary, for any purpose, commercial or non-commercial, and by any
+means.
+
+In jurisdictions that recognize copyright laws, the author or authors
+of this software dedicate any and all copyright interest in the
+software to the public domain. We make this dedication for the benefit
+of the public at large and to the detriment of our heirs and
+successors. We intend this dedication to be an overt act of
+relinquishment in perpetuity of all present and future rights to this
+software under copyright law.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+OTHER DEALINGS IN THE SOFTWARE.
+
+For more information, please refer to <https://unlicense.org/>
diff --git a/globus-weblinks b/globus-weblinks
new file mode 100755
index 0000000..74fd6f0
--- /dev/null
+++ b/globus-weblinks
@@ -0,0 +1,42 @@
+#! /usr/bin/env python3
+
+import argparse
+from pathlib import PurePath
+import sys
+import globus_sdk
+
+# This is the tutorial client ID from
+# https://globus-sdk-python.readthedocs.io/en/stable/tutorial.html.
+# Let's not bother to create our own.
+CLIENT_ID = "61338d24-54d5-408f-a10d-66c06b59f6d2"
+
+def get_transfer_token():
+ client = globus_sdk.NativeAppAuthClient(CLIENT_ID)
+ client.oauth2_start_flow()
+
+ authorize_url = client.oauth2_get_authorize_url()
+ print(f"Please go to this URL and login:\n\n{authorize_url}\n",
+ file=sys.stderr)
+
+ print("Please enter the code you get after login here: ",
+ end="", file=sys.stderr)
+ auth_code = input().strip()
+ return (client.oauth2_exchange_code_for_tokens(auth_code)
+ .by_resource_server["transfer.api.globus.org"]["access_token"])
+
+def find_files(transfer_client, endpoint_id, path=PurePath("/")):
+ for file in transfer_client.operation_ls(endpoint_id, path=str(path))["DATA"]:
+ if file["type"] == "dir":
+ yield from find_files(transfer_client, endpoint_id, path / file["name"])
+ else:
+ yield path / file["name"]
+
+parser = argparse.ArgumentParser(description="Get web links for Globus collection")
+parser.add_argument("endpoint_id", metavar="endpoint-id", help="Endpoint ID of collection")
+args = parser.parse_args()
+
+transfer_client = globus_sdk.TransferClient(
+ authorizer=globus_sdk.AccessTokenAuthorizer(get_transfer_token()))
+endpoint = transfer_client.get_endpoint(args.endpoint_id)
+for path in find_files(transfer_client, args.endpoint_id):
+ print(endpoint["https_server"] + str(path))
diff --git a/manifest.scm b/manifest.scm
new file mode 100644
index 0000000..d07898a
--- /dev/null
+++ b/manifest.scm
@@ -0,0 +1,2 @@
+(specifications->manifest
+ (list "python" "python-globus-sdk"))