VisiData Architecture for Developers¶
VisiData is like a powerful spreadsheet from an alternate textpunk reality, in which data can be easily manipulated from the keyboard and terminal. Unlike a spreadsheet, however, the data is well-structured, so that the data model is closer to Pandas or an RDBMS.
- The main unit of functionality is the sheet. A
vdinstance contains a stack of sheets, the left-most (!) of which (vd.sheets[0]) is the one displayed. - Sheets have rows and columns.
- Each sheet has a homogeneous list of rows, which can be any kind of Python object.
- Individual cells do not contain arbitrary values, but are extracted by the column from the particular Python object for that row. Unlike in a traditional spreadsheet, this is the primary function of a VisiData column.
Constraining the data to fit within this architecture simplifies the implementation and allows for some radical optimizations to data workflow.
One project, two licenses¶
vd.py is a stand-alone library. It is meant for use in other projects. It is distributed under the MIT free software license.
The rest of the matter in this project is distributed under the more restrictive GPLv3 free software license.
Columns¶
Note that each Column object is detached from any sheets in which it
appears. Think of is as a lens through which each individual row of a sheet
is viewed. Every Column must have a name and a getter method.
name and other properties¶
Columns have a few properties, all optional except for name:
- name: should be a valid Python identifier and unique among the column names on the sheet. Some features may not work if these conditions are not met.
- type: defaults to
str; other values areint,float,date,currency. There is also a dummyanytypeto produce a stringified version for anything not in these categories. - width: specifies the default width for the column;
0means hidden. - fmtstr: format string for use with
typewhentypeis a date. - aggregators: a dictionary providing a few simple statistical
functions (
sum,mean,max, etc.). - expr: Python expression that generates values if the column is a “computed column”.
Getter and setter¶
Each Column object has getter and setter methods; both are lambdas.
These lambdas are the “lenses” mentioned above — they are used on the fly to
display the cells of each row that (apparently) intersects with the column.
Getter¶
This lambda function is required. It takes a row as input and returns the value
for that column. This is the essential functionality of a Column.
A getter has wrapper methods getValue and getDisplayValue to
represent a value as its declared type or to format a value properly for
display.
Setter¶
The setter lambda function allows a row to be modified by the user using
the Sheet.editCell method. It takes a row object and new value, and sets
the value for that column.
When a new Column object is initialized, setter defaults to None,
making the column read-only (Column.setValues).
Normally when a
Column object is instantiated in code rather than being read from a source,
the setter is defined as an argument to Column. For example:
def ColumnAttr(attrname, type=anytype, **kwargs):
'Return Column object with `attrname` from current row Python object.'
return Column(attrname,
type=type,
getter=lambda r,b=attrname: getattr(r,b),
setter=lambda r,v,b=attrname: setattr(r,b,v),
**kwargs)
Built-in methods for column-creation¶
ColumnAttr above is one of several built-in methods for constructing a
Column object:
ColumnAttrgets an attribute from the row object usingvisidata.getattr(and allows it to be set withvisidata.setattr). This is useful when the rows are Python objects.- Another is
ColumnItem(colname, itemkey). It usesvisidata.getitem, which is useful when the rows are mapping objects. - Two others are
combineColumns,SubrowColumn, andvisidata.ColumnItem.
Because cell-values are computed on the fly by lambdas, they are hard to
observe in a REPL or using a conventional debugger. It may be useful to call
Ctrl-o followed by sheet or vd.sheets, to inspect sheets’
attributes visually.
Commands¶
Keyboard commands are the primary interface for the user to control VisiData.
Add new commands using the global command() function within a .py file.
Syntax¶
command() takes three arguments:
- command sequence: the sequence of keys pressed to trigger the action. (Note
that if the control-key is involved, control is represented by
^and the following key must be upper-case. This is a stricture of Curses.) - exec string: a string containing valid python code that will be passed to
exec. This string is limited to a single line of Python; longer code must be placed in a separate “add-on” module (see Extending VisiData). - help string: help text provided to users on the help sheet.
Example¶
For example, VisiData has a builtin command Shift-P to take a random sample
of rows from the current sheet:
command('P',
'vd.push(sheet.copy("_sample")).rows=
random.sample(rows, int(input("random population size: ")))',
'push duplicate sheet with a random sample of <N> rows')
Here the command sequence is regular ASCII P, but it could include one or
more prefixes or consist of a Curses key constant (e.g.
KEY_HOME).
The exec string in this example illustrates the basic interface for
commands. Below we dissect various elements in the example.
- The global
VisiDatasingleton object is available asvdin the exec string (andvd()in other contexts). - The
VisiData.pushmethod pushes aSheetobject onto thesheetsstack, making it the currently visible sheet. It returns that same sheet, so that a member (in this case,rows) may be conveniently set without using a temporary variable. - The current sheet is available as
sheet. - The current sheet is also passed as the locals dict to
exec, so all Sheet members and methods can be read and called without referencingsheetexplicitly. Note: due to the implementation ofSheet.exec_command, setting sheet members requiressheetto be passed explicitly. That is, when a sheet member variable is on the LHS of an assignment, it must be referred to assheet.memberor the assignment will not stick. - The
Sheet.copymember function takes a string, which is appended to the original sheet name to make the new sheet’s name. random.sampleis a builtin Python function. Therandompackage is imported by VisiData (and thus available to all extensions automatically); other packages may be imported at the toplevel of the .py extension.inputis a global function that displays a prompt and gets a string of input from the user (on the bottom line).
What can be done with commands¶
Anything is possible! However, the exec string limits functionality to
Python one-liners. More complicated commands require a custom sheet (“add-on”)
to implement longer Python functions.
There will eventually be a VisiData API reference. In the meantime, please see the source code for examples of how to accomplish most tasks.
Extending VisiData¶
Extend VisiData by defining custom sheets, in an “add-on”. An add-on is a
non-core Python module, available to VisiData if placed in visidata/addons
and given a top-level key-binding that is available on all sheets. The add-on
returns specialized Sheet objects which are pushed onto the
VisiData.sheets stack, initiated by a top-level command available on all
sheets.
Outline of syntax¶
The skeleton of an add-on, apart from its actual functionality, is as follows:
- Subclass
Sheet. In__init__:- Add a command (using
command()) that instantiates the class and pushes it onto avdinstance. You may also like to add options, using theoptioncommand - Call
superto define the name of the new sheet. - The constructor passes the name of the sheet and any source sheets
(available later as
Sheet.source). - Populate columns
self.columnswith a list of all possible columns. Each entry should be aColumnobject (or subclass) and should have a name. - Define any sheet-specific commands, using
self.command()within the constructor. The arguments are identical to those of the globalcommand()function (see Commands).
- Add a command (using
- Define
reloadto as to recompute the values of the rows. See `reload()`_ below. - Consider whether the sheet may be so large or slow to recompute that you
don’t want to user to be blocked waiting for reloading to finish. Some
sheets, such as the help sheet, cannot become that large and so there is
no need for asynchronous handling. But if it may become large, then:
- Use
genProgressto display a progress bar showing the percentage of rows recomputed. - Decorate
reloadwith `@async`_.
- Use
Example¶
Here is a simple sheet which makes a t command to “take” the current
cell from any sheet and append it to a predefined “journal” sheet. This
sheet can be viewed with Shift-T and then dumped to a .tsv file with
^W (Ctrl-w).
from visidata import *
command('t',
'vd.journal.rows.append([sheet, cursorCol, cursorRow])',
'take this cell and append it to the journal')
command('T', 'vd.push(vd.journal)', 'push the journal')
option('fn_journal', 'journal.tsv', 'default journal output file')
class JournalSheet(Sheet):
def __init__(self):
super().__init__('journal')
self.columns = [
Column('sheet', getter=lambda r: r[0].name),
Column('column', getter=lambda r: r[1].name),
Column('value', getter=lambda r: r[1].getValue(r[2])),
]
self.command('^W',
'appendToJournalFile(); sheet.rows = []',
'append to existing journal and clear sheet')
def appendToJournalFile(self):
p = Path(options.fn_journal)
writeHdr = not p.exists()
with p.open_text('a') as fp:
if writeHdr:
fp.write('\t'.join('sheet', 'column', 'value'))
status('created journal at %s' % str(p))
for r in self.rows:
fp.write('\t'.join(col.getDisplayValue(r)
for col in self.columns) + '\n')
status('saved %d rows' % len(self.rows))
vd().journal = JournalSheet()
Note that the t command includes cursorRow in the list instead of
cursorValue, and when the journal is saved the value in the column of
the referenced row is retrieved using Column.getValue. This is the
desired pattern for appending rows based on existing sheets, so that
changes to the source row are automatically reflected in the subsheets.
Custom VisiData applications¶
Import the visidata package into a Python script to create a custom
VisiData application.
Other functionality¶
Status bar¶
The VisiData singleton has a list statuses that stores status-messages
successively. Add a status message using VisiData.status; there is also
module-level wrapper status, available to lambdas and eval.
The on-screen status bar is composed in two parts, with VisiData.leftStatus
and VisiData.rightStatus; the two parts are drawn separately, with
VisiData.drawLeftStatus and VisiData.drawRightStatus.
Special to the Sheet object is method statusLine, which returns the
number of rows and the numbers of selected rows and columns.
Errors and debugging¶
The VisiData singleton maintains a list lastErrors, containing the most
recent ten tracebacks. A traceback is added by VisiData.exceptionCaught,
which is normally called in the except clause of a try except block.
There is a module-level error function for use with lambdas and eval.
The developer will find it useful to toggle debug-mode on with Ctrl-d, to
display error messages (without traceback) on the left side of the status bar.
Hooks¶
Hooks for special functionality are stored in VisiData.hooks and supported with VisiData.addHook and VisiData.callHook. At the moment, hooks are used mainly in editText, the optional editlog addon, and before redrawing the screen.
Adding a new data source¶
In the JournalSheet example above, the rows are added incrementally
during a user’s workflow, so the reload() method is extremely simple.
(We may question whether it should even be there at all, but no matter.)
New data sources can also be integrated into VisiData, and the primary
difference is the reload() method. There are several existing
examples in the visidata/addons directory, and the general structure
looks like this:
Example¶
from visidata import *
class open_xlsx(Sheet):
def __init__(self, path):
super().__init__(path.name, path)
self.workbook = None
self.command(ENTER,
'vd.push(sheet.getSheet(cursorRow))',
'push this sheet')
@async
def reload(self):
import openpyxl
self.columns = [Column('name')]
self.workbook = openpyxl.load_workbook(str(self.source),
data_only=True,
read_only=True)
self.rows = list(self.workbook.sheetnames)
def getSheet(self, sheetname):
worksheet = self.workbook.get_sheet_by_name(sheetname)
return xlsxSheet(join_sheetnames(self.source, sheetname),
worksheet)
class xlsxSheet(Sheet):
@async
def reload(self):
worksheet = self.source
self.columns = ArrayColumns(worksheet.max_column)
self.progressTotal = worksheet.max_row
self.rows = []
for row in worksheet.iter_rows():
self.progressMade += 1
self.rows.append([cell.value for cell in row])
New data sources are generally implemented with one or more subclasses of Sheet.
To have a data source apply to files with extension .foo, create a
class (or function) called open_foo. This should return a new sheet
constructed from the given source, which will be a Path object
instead of a parent sheet.
This .xlsx example is fairly typical of real world data sources,
which often contain multiple datasets. In such a case, an index sheet is
pushed first, with an ENTER command to push one of the contained
sheets. The getSheet in this example is just a sheet-specific method
on the index sheet that constructs the chosen sheet.
Custom options¶
The option() global function allows a user-modifiable option to be
specified instead of using a hard-coded value.
- The arguments are the option name, a default value, and a help string.
- Options are available as attributes on the
optionsobject. - Options should always have a usable default.
- Options should not be cached as the user can change them while the program is running.
The reload() method¶
The reload() method (invoked with ^R, Ctrl-r) should in general
reset the sheet to its starting rowset, without changing the column
layout.
In the above example, reload() clears Sheet.rows before
reloading, to prevent the sheet from growing in size with every ^R.
reload() is not called until the sheet is first viewed.
Note that import of non-standard Python packages should occur just
before their first use. In the case of data sources, that means in the
reload() method itself. This is so that vd does not require external
packages to be installed unless they are actually needed for parsing a specific
data source.
The @async decorator¶
Functions which can take a long time to execute may be decorated with
@async, which spawns a managed Task in a new thread to run the
function. This is especially useful for data sources which may require
loading large amounts of data.
Async functions should initialize Sheet.progressTotal to some
reasonable measure of total work, and they should also be structured to
frequently update Sheet.progressMade with the amount of work already
done. This is used for the progress meter on the right status line.
Curses line-editing: editText¶
The module-level function editText is a hack to replace curses.textpad
for line-editing functionality. It supplies a subset of standard GNU
Readline key-bindings: ^a for
start of line, ^e for end of line,
and so on. One innovation is ^r to reload the initial value of a cell.
Module-level editText is wrapped by VisiData.editText and
Sheet.editCell.
Regular expressions (RegEx)¶
Developers may enjoy using regular expressions (RegEx) to select rows.
VisiData.searchRegex is available for that purpose. The flavor of RegEx is
that of Python, similar to that
of Perl rather than that of vi.
Drawing¶
(Not yet documented. Topics include colLayout and visibleCols.)
Colorizing¶
Control of the colors of foreground and background text is in need of work and is not yet documented.
Theme colors and characters¶
(Not yet documented.)
Making VisiData apps¶
(Not yet documented. Topics include set_global and the helper sheets
TextSheet and DirSheet.)
Making VisiData sources¶
(Not yet documented. Topics include Path objects, openSource, and
open_*.)
Common variables¶
Following are some variable names used frequently in the codebase, together with their usual associations:
c: columnexpr: Python expressionD,d: dictf: functionfn: filenamei: target variable of iterator or generatoridx: indexL: listp: pathpv: present valuer: rowret: return valuerng: ranges: stringscr: “screen” object in Cursesv: name of variablevd:visidata.Visidata, normally constructed as a singleton (one-time-only instance) asVisiData()vs: sheet, constructed asvisidata.Sheet(name, path)or returned from some function asopenURL(path),open_tsv(path),DirSheet(name, path), etc.w: widthx: horizontal position on the screeny: vertical position on the screen
Unresolved hacks¶
Your insight as to how to improve these is most welcome.
chooseOne¶
chooseOne should be a proper chooser.
Adding properties to vd in extensions¶
- Adding a property to the VisiData singleton in an extension is done as in
visidata/status_history.py:vd().statusHistory = []
Globals¶
Accessing all commands in an extension requires the use of globals. The extension requires a statement like
setGlobal('g_client', g_client)which calls a setter for a global dict in
vd.py:g_globals = None def setGlobal(k, v): 'Manually set global key-value pair in `g_globals`.' g_globals[k = vThat yogic maneuver allows instances of
command()in the extension to pass the string'g_client'toexecstatements.
Deviations from PEP8¶
- One-line docstrings are surrounded by a single (
'...'). - Multi-line docstrings are surrounded by three single quotes (
'''...'''). - Names of functions and variables are mostly in camel case, with some exceptions.