Repository-based Job Launches Ivan Furic University of Florida - - PowerPoint PPT Presentation

repository based job launches
SMART_READER_LITE
LIVE PREVIEW

Repository-based Job Launches Ivan Furic University of Florida - - PowerPoint PPT Presentation

Repository-based Job Launches Ivan Furic University of Florida Current (up to incl DC1.5) approach Edit scripts, fcl drivers etc in a subdirectory of dunepro@dune-offline Launch GUI logs into dune-offline Executes shell commands


slide-1
SLIDE 1

Repository-based Job Launches

Ivan Furic University of Florida

slide-2
SLIDE 2

Current (up to incl DC1.5) approach

  • Edit scripts, fcl drivers etc in a subdirectory of dunepro@dune-offline
  • Launch GUI logs into dune-offline
  • Executes shell commands specified in launch template
  • cd to specific subdir
  • execute specific script
  • Issues:
  • Unsafe – files are available for anyone with dunepro k5login access to edit at

any time, with no record of what happened

  • Not very re-usable: to launch from different subdir, or execute different script

need to create new launch template

slide-3
SLIDE 3

Repository-based technique

  • Script specified in launch template:
  • creates new temporary directory in /tmp
  • checks out (clones) a repository
  • runs a pre-defined script: “scripts/submissionScript.sh”
  • launch behavior driven by maximally generic launch script + number of

command line parameters specified by POMS GUI

  • Previously, launch script was a wrapper for jobsub_submit which

changed behavior based on few command line parameters

  • gensim / reco / mergeana etc
slide-4
SLIDE 4

Proof-of-principle

  • Git repository is dunepro@dune-offline.fnal.gov:git/prod-repo.git
  • Need dunepro k5login ability to access (read / write)
  • Test launch POMS template in POMS GUI: ikf_test_repo_prod
  • Test launch POMS job type in POMS GUI: ikf_test_prod_repo
  • Test launch POMS campaign in POMS GUI: ikf_test_prod_repo
  • Working ”Hello World here’s my env dump” launch:

https://pomsgpvm01.fnal.gov/poms/list_launch_file?campaign_id=1177&fname=20180224_200131_ikfuric

slide-5
SLIDE 5

New launch script

#!/bin/bash . `ups setup -z /grid/fermiapp/products/common/db poms_jobsub_wrapper` # args: NOTE args order matter echo -e "\nRunning\n `basename $0` $@” cd /dune/app/home/dunepro/protodune-sp/DC1.5 if [ x"$1" = "x--recovery" ] then dataset="$2" else dataset=$(./new_files_in.sh -e dune -d dc1.5_input) fi # IKF: uncomment for debugging dataset=dc1.5_input # numfiles=$(samweb -e dune count-definition-files ${dataset}) numfiles=10 echo "dataset=${dataset}" echo "numfiles=${numfiles}" # IKF: This was in the jobsub_submit, figure out later how it interacts with role & subgroup, for starters remove # -l "priority=5" \ jobsub_submit \

  • G dune \
  • e SAM_EXPERIMENT=dune \
  • -role=Production \
  • -subgroup=prod \
  • -resource-provides=usage_model=OPPORTUNISTIC,DEDICATED \
  • -expected-lifetime=24h \
  • -memory 8000MB \
  • -dataset_definition=${dataset} \
  • N ${numfiles} \

file:///dune/app/home/dunepro/protodune-sp/DC1.5/mini_reco_lar.sh EXITCODE=$? echo "jobsub_submit terminated with exit code ${EXITCODE}"

slide-6
SLIDE 6

Launch example output

Convention: environment variables starting with DUNEPRO_ modify launch behavior NB Launch script not yet modified to use this

slide-7
SLIDE 7

Launch Example Output (2)

Temp directory only exists during launch Only permanent record is in git repository

slide-8
SLIDE 8

Modifying the behavior of a launch

  • POMS GUI
  • Compose Campaign Stages
  • Click on Edit Button
  • Click on Parameter Overrides Edit Button
  • Parameter Convention:
  • Key of format --name=
  • Value format ”many words”
  • scripts/cmd_ln_to_env.sh converts to

DUNEPRO_NAME=“many words”

  • todo: pass DUNEPRO_* env vars to

worker nodes through jobsub_submit

slide-9
SLIDE 9

Goal

One repository for job launch

  • Likely one single branch, re-usable for all launches

One repository for worker node scripts

  • Multiple branches, likely min 4:
  • Generator Type (no input SAM dataset)
  • Reco / Processing Type (1 input file -> 1 output file)
  • Merge Type (multiple input files -> 1 output file)
  • Analysis Type (1 input file -> 1 “histogram” output file)
  • Attempt to keep everything as general as possible, modify behavior

via POMS parameter overrides (previous slide)

slide-10
SLIDE 10

Next steps:

  • Launch from dune-offline.fnal.gov DNS round-robin alias

(done, thanks to Ken Herner’s intervention on pomsgpvm01)

  • Use submissionScript.sh cmd line parameters to modify jobsub_submit

behavior

  • Launch “DC1.5-type” campaign
  • Retrieve log files, outputs for 10 worker node jobs
  • Split worker node repository from launch repository
  • Use git archive option to generate worker node tarball