Commandline Options

Commandline options are grouped by General, Actions, Programs, Configs and Styles.

Some Actions are single actions, exiting after executing, ignoring other Actions. Others are sequential actions. See Actions.

Many options are related to only a specific Action. Unrelated options are just ignored.

Configs and Styles options are mostly the same as Config Options. You have to format commandline strings, following the rules specified there (according to Value Functions).

Note

For [LINE] options, it may not be easy to actually supply multiple values.

For example, an option in a configuration file becomes like this in bash:

opt=    aaa
        bbb
--opt $'aaa\\nbbb'

General

-i INPUT, --input INPUT

input rsrc (URL or file path). it can be specified multiple times

-f FILE, --file FILE

file to read inputs. only one file

default=rsrcs.txt

-h, --help

show this help message and exit

-v, --verbose

print out more detailed log messages

-q, --quiet

supress non-critical log messages

-V, --version

print version and exit

--userdir USERDIR

specify user configuration directory

--nouserdir

disable user configuration (intended for testing)

Actions

-1, --download

download by downloader

-2, --extract

extract by extractor

-3, --convert

convert by converter

-4, --view

open a pdf viewer if configured

-a, --appcheck

print application settings after command line evaluation, and exit

-b, --browser

open first extracted html (efile) in browser, and exit

-c, --check

print matched rsrc settings, and exit (so you have to supply rsrc some way)

--toc

create toc htmls and a toc rsrc list file. conflicts with ‘–input’.

--inspect

parse downloaded htmls (dfiles), and do arbitrary things user specified

--printout {0,1,2,3,all}

print filenames the program’s actions would create (0=rsrc, 1=dfiles, 2=efiles 3=pdfname, all=0<tab>1<tab>2)

choices=0, 1, 2, 3, all

Programs

--urllib

set downloader to urllib (default)

--headless

set downloader to one of headless browser engines (see –browser-engine)

--lxml

set extractor to lxml (default, and currently the only option)

--prince

set converter to princexml

--weasyprint

set converter to weasyprint

--cnvpath CNVPATH

specify converter executable path. also need to set converter itself

--css2 CSS2

specify css files, for converter commandline css option

--cnvopts CNVOPTS

specify additional converter commandline options

Configs

--user-agent USER_AGENT

set http request user-agent (only for urllib)

--timeout TIMEOUT

set http request timeout (only for urllib)

--interval INTERVAL

interval for each download

--browser-engine {selenium-chrome,selenium-firefox}

specify the browser engine when ‘headless’ (default: selenium-firefox)

choices=selenium-chrome, selenium-firefox

--selenium-chrome-path SELENIUM_CHROME_PATH

specify the path of chromedriver for selenium

--selenium-firefox-path SELENIUM_FIREFOX_PATH

specify the path of geckodriver for selenium

--encoding ENCODING

specify encoding candidates for file opening when extract (f: comma)

--encoding-errors { (choices...) }

specify encoding error handler (default: strict)

choices=strict, ignore, replace, xmlcharrefreplace, backslashreplace, namereplace, surrogateescape, surrogatepass

--parts-download

download components (images etc.) before PDF conversion (default: True)

--no-parts-download

not download components before PDF conversion

--force-download

force ‘–download’ and ‘–parts-download’ even if the file already exists

--guess GUESS

if there is no matched option, use this XPath for content selection (f: line)

--full-image FULL_IMAGE

pixel size to add special class attributes to images (default: 200)

--add-binary-extensions ADD_BINARY_EXTENSIONS

add or subtract to-skip-binaries-extension list (f: plus_binaries)

--add-clean-tags ADD_CLEAN_TAGS

add or subtract to-delete-tag list (f: plus)

--add-clean-attrs ADD_CLEAN_ATTRS

add or subtract to-delete-attribute list (f: plus)

--elements-to-keep-attrs ELEMENTS_TO_KEEP_ATTRS

specify elements (XPath) in which you want to keep attributes (default: <math>, <svg> and some mathjax tags) (f: line)

--ftype {html,prose,nonprose,python}

specify file type

choices=html, prose, nonprose, python

--textwidth TEXTWIDTH

width (character numbers) for rendering non-prose text

--textindent TEXTINDENT

line continuation marker for rendering non-prose text

--trimdirs TRIMDIRS

if no sign, remove leading directories from local text name in PDF TOC. if minus sign, remove leading directories to reduce path segments to that abs number. (default: 3)

--raw

use input paths as is (no filename transformation)

--pdfname PDFNAME

specify pdf file name

--precmd1 PRECMD1

run arbitrary commands before download action

--postcmd1 POSTCMD1

run arbitrary commands after download action

--precmd2 PRECMD2

run arbitrary commands before extract action

--postcmd2 POSTCMD2

run arbitrary commands after extract action

--precmd3 PRECMD3

run arbitrary commands before convert action

--postcmd3 POSTCMD3

run arbitrary commands after convert action

--viewcmd VIEWCMD

commandline string to open the pdf viewer (f: cmds)

--pre-each-cmd1 PRE_EACH_CMD1

run arbitrary commands before each download

--post-each-cmd1 POST_EACH_CMD1

run arbitrary commands after each download

--pre-each-cmd2 PRE_EACH_CMD2

run arbitrary commands before each extract

--post-each-cmd2 POST_EACH_CMD2

run arbitrary commands after each extract

--download-dir DOWNLOAD_DIR

specify root directory for download and extract (default: ‘_htmls’)

--keep-html

do not extract, keep html as is, just component download to make complete local version html

--overwrite-html

do not create new ‘efile’ (overwrite ‘dfile’)

Styles

--orientation {portrait,landscape}

portrait (default) or landscape, determine which size data to use

choices=portrait, landscape

--portrait-size PORTRAIT_SIZE

portrait size for css, e.g. ‘90mm 118mm’

--landscape-size LANDSCAPE_SIZE

landscape size for css, e.g. ‘118mm 90mm’

--toc-depth TOC_DEPTH

specify depth of table of contents

--font-family FONT_FAMILY

main font for css, e.g. ‘“DejaVu Sans”, sans-serif’

--font-mono FONT_MONO

monospace font for css

--font-serif FONT_SERIF

serif font for css (not used by sample)

--font-sans FONT_SANS

sans font for css (not used by sample)

--font-size FONT_SIZE

main font size for css, e.g. ‘9px’

--font-size-mono FONT_SIZE_MONO

monospace font size for css

--font-scale FONT_SCALE

number like 1.5 to scale base font sizes (default: 1.0)

--line-height LINE_HEIGHT

adjust line height (default: 1.3)