Translating Source Code, part 2

Morgan Wahl

2023-09-08 00:00

See the previous post to have any clue what I'm going on about here.

Japanese

Our first attempt at "translating" Python from English was to German. While there were a few issues to consider, it was relatively easy since the two langauges are somewhat similar. They have almost identical writing systems, and their word order and syntactic structures are mostly the same. Neither of those things are true about Japanese.

Firstly, let's start with writing. The Japanese writing system makes use of three or four different scripts, but that's not a huge issue for us. What's more important is that written Japanese rarely uses spaces between words. The different scripts somewhat help delineate the different parts of a sentence. And the head-last syntax (more on that later) works somewhat like a reverse-Polish calculator to help parsing.

This means one of the fundamental syntactic elements of a programming language, spaces separating tokens, is arguably out of place. To fit with this, we could just not use spaces, and when parsing look for keywords at the beginning or ends of identifiers. This would place odd restrictions on identifiers though, and maybe would make writing a parser more difficult.

Instead, when we need to separate token within a line, we'll use a middle-dot ・. This is used in Japanese for those situations where separate words need to be distinguished, typically separate words within a phrase that are all written with the same script.

The upside of no spaces is that identifiers can just be whole phrases, no need for using underscores or camel casing or any other transformations.

The other big different with Japanese is that its word order is almost entirely the opposite of English. Verbs come last in a sentence.[2] Also, the equivalent of English's prepositions occur after their noun phrases instead of before them.[1]

This word order presents an opportunity to simplify the language. In English Python, conditionals have the syntax:

if conditional_expression:
    ...

In Japanese, the "if" naturally comes after the condition, which eliminates the need for the : character.

conditional_expression・と
    ...

The ・ is used to delimit an expressing ending in an identifier from the と conditional keyword.

Japanese also has its own repertoire of punctuation, which I've swapped in for the ASCII punctuation.

Here's some example syntax to demonstrate a different word-order and different punctuation:

Python	ニシキヘビ	nishikihebi
object.attribute	属性。物体	zokusei.buttai
"string"	『列』	"retsu"
'string'	「列」	'retsu'
function_call(arg1, arg2)	【話1、話2】呼び出された関数	(hanashi1, hanashi2)yobidasareta kansuu
(item1, item2)	（要素1、要素2）	(youso1, youso2)
[item1, item2]	［要素1、要素2］	[youso1, youso2]
{thing1, thing2}	｛物1、物2｝	{mono1, mono2}
{thing: "stuff"}	｛物：『やつ』｝	{mono: "yatsu"}

The third column has transliterations of the Japanese column.

Function calls have their arguments before the name of the function. This matches Japanese syntax where verbs come at the end of their clause. Japanese has more kinds of brackets, so I took the opportunity to use the lenticular brackets to distinguish function calls from other uses of parentheses.

I've taken the probably controversial approach of swapping the arguments to the . operator. With the 。 operator, the attribute name comes first, then the object. I think this matches with the general pattern of the syntax, but I could be wrong.

Also, 話 is probably a terrible translation of "function argument".

To see our full example, well use this "translation table":

Python	ニシキヘビ	nishikihebi
x y (placeholders for identifiers)	X Y
class x:	X・種類	X shuroi
def x(y1, y2):	【Y1、Y2】X・関数	(Y1, Y2)X kansuu
if x:	X・と	X to
x is y	X・Y・だ	X Y da
return x	Xを返せ	X okaese
None	無	mu

For the identifiers below, I've put transliterations in square brackets.

# program-translation
# from: en
# to: ja
# identifiers:
#   absolute_value: 絶対値 [zettaichi]
#   is_negative: 負だ [fu da]
#   is_odd: 奇数だ [kisuu da]
#   negated_number: ネゲートした数 [negaato shita kazu]
#   number: この数 [kono kazu]
#   Number: 数 [kazu]
#   self: 己 [ono]

数・種類
    【己】負だ・関数
        …

    【己】奇数だ・関数
        …

【この数:数】絶対値・関数
    この数・無・だ・と
        無を返せ
    【】負だ.この数・と
        ネゲートした数 = -この数
        ネゲートした数を返せ
    この数を返せ

The result is delightfully compact. Even with the negated_number variable becoming a phrase that could be translated "number that was negated", and the number argument having to become "this number" to avoid collisions with the Number class.

A second approach to Japanese could use the "halfwidth" katakana that were used on computers back when they required 1-byte-per-character text encodings. To my eyes, this gives the language a bit of a all-caps FORTRAN feel to it, which isn't Pythonic, but could work for other languages. I think ASCII punctuation is more appropriate here, but I'm not sure.

Python	ﾆｼｷﾍﾋﾞ	nishikihebi
x y (placeholders for identifiers)	X Y	X Y
class x:	X･ｼｭﾛｲ	X shuroi
def x(y1, y2):	(Y1､Y2)X･ｶﾝｽｰ	(Y1, Y2)X kansuu
if x:	X･ﾄ	X to
x is y	X･Y･ﾀﾞ	X Y da
return x	Xｦｶｴｾ	X okaese
None	ﾑ	mu

# program-translation
# from: en
# to: ja-Kana-x-halfwidth
# identifiers:
#   absolute_value: ｾﾞｯﾀｲﾁ [zettaichi]
#   is_negative: ﾊﾟﾀﾞ [fu da]
#   is_odd: ｷｽｰﾀﾞ [kisuu da]
#   negated_number: ﾈｹﾞｰﾄｼﾀｶｽﾞ [negaato shita kazu]
#   number: ｺﾉｶｽﾞ [kono kazu]
#   Number: ｶｽﾞ [kazu]
#   self: ｵﾉ [ono]

ｶｽﾞ･ｼｭﾛｲ
    (ｵﾉ)ﾊﾟﾀﾞ･ｶﾝｽｰ
        ...

    (ｵﾉ)ｷｽｰﾀﾞ･ｶﾝｽｰ
        ...

(ｺﾉｶｽﾞ:ｶｽﾞ)ｾﾞｯﾀｲﾁ･ｶﾝｽｰ
    ｺﾉｶｽﾞ･ﾑ･ﾀﾞ･ﾄ
        ﾑｦｶｴｾ
    ()ﾊﾟﾀﾞ.ｺﾉｶｽﾞ･ﾄ
        ﾈｹﾞｰﾄｼﾀｶｽﾞ = -ｺﾉｶｽﾞ
        ﾈｹﾞｰﾄｼﾀｶｽﾞｦｶｴｾ
    ｺﾉｶｽﾞｦｶｴｾ

Despite having more characters, this is even more compact. I have no idea how readable it is however.

So, lessons from Japanese: explicit token separators aren't so bad, and having keywords at the end requires you to use a lot less punctuation. Also, more kinds of brackets means easier to read code.

Hindi

Syntactically, Hindi is very similar to Japanese, so there's a few places we can use the "keyword-final" syntax again. Hindi doesn't have the complicated four-script writing system, and uses spaces for punctuation, so that's a little more similar to English.

Hindi does have grammatical gender, with only masculine and feminine options. It doesn't come up as often as in German (or Spanish, for example), but could occationally present a headache for choosing an identifier.

Python	अजगर	ajgar
x y (placeholders for identifiers)	X Y
class x:	TBD
def x(y1, y2):	TBD
if x:	यदि X:	yadi X:
x is y	X Y है	X Y hai
return x	X दे	X de
None	TBD

I'm examining Hindi mainly as a nice stop on the road to a language that uses a left-to-right script (see below).

TODO: full Hindi example.

Urdu

Hindi is the relatively modern language that draws on the much older tradition of Urdu. While Hindi is written largely phonemically in Devanagri script, Urdu uses its own variant of the Perso-Arabic script. Since Hindi and Urdu share most of their vocabulary, we can use mostly the same keywords. The main thing to contend with is the left-to-right ordering of the script.

Also, I'm not sure what "monospaced" looks like in Arabic, especially compared to the Nastaliq style used for Urdu.

TODO: flesh out Urdu example.

Translating Source Code

Morgan Wahl

2023-09-01 00:00

I'd like to jot down a few ideas about how to enable computer programming without English.

Why would you want to do that? I can think of a few reasons:

To lower the barrier of entry for people who are fluent in a written language that isn't English.
To explore what programming languages can be like when removed from the constraints of English (and are placed in the constraints of another natural language).
To simply imagine other timelines where English is not the lingua-franca of computers.

There's two parts to enabling programming in a natural language: the identifiers devised by the programmer, and the keywords and syntax of the language the program is written itself.

Let's start with the simpler case: identifiers.

Identifiers

In many modern programming languages, you can use just about any strings for identifiers. They typically can't contain any punctuation besides "_" (I'm including spaces as punctuation), but otherwise you more or less stick whatever in there.

So nothing to do here, right?

Well, the fact is English is the lingua-franca of programming, so even if you have some, say, Japanese words in mind for your identifiers, you know you have to "translate" those to English if you want to actually work with anyone else.

What if you could document in a machine-readable way the identifiers you would use in a particular natural language, alongside the English ones that are your baseline for collaboration? This could be helpful to readers who are also familiar with that language. And maybe text editing software could make the substitutions when displaying the code.

We just need a way to annotate each scope to specify the non-English version of each of its identifiers. The data involved is not particularly complicated. We can use BCP-47 to specify a natural written language, and the rest is basically just one mapping for each scope.

One way to store that data might be in special comments:

# program-translation
# from: en-US
# to: en-x-piglatin

# pt: Umbernay
class Number:
    # pt: isway_egativenay(elfsay)
    def is_negative(self):
        ...

    # pt: isway_oddway(elfsay)
    def is_odd(self):
        ...

# pt: absoluteway_aluevay(umbernay):
def absolute_value(number: Number):
    if number is None:
        return None
    if number.is_negative():
        # pt: egatedway_umbernay
        negated_number = -number
        return negated_number
    return number

This code is quite silly, but serves to demonstrate some of the challenges. Each "pt" comment gives translations for the identifier that is defined on the next line. Python's semantic whitespace means we can be sure there's at most one assignment per logical line, so writing a parser to match up the comments to the identifiers shouldn't be too difficult. Other languages with more free-form line-breaking might be trickier.

One obvious problem with the comment-based approach is the sheer amount noise this adds to the code. With a little more verbosity and some YAML, we could move the mappings to the comment the top of the file where we already specified languages:

# program-translation
# from: en-US
# to: en-x-piglatin
# identifiers:
#   Number:
#     _: Umbernay
#     is_negative:
#       _: isway_egativenay
#       self: elfsay
#     is_odd:
#       _: isway_oddway
#       self: elfsay
#   absolute_value:
#     _: absoluteway_aluevay
#     number: umbernay
#     negated_number: egatedway_umbernay

class Number:
    def is_negative(self):
        ...

    def is_odd(self):
        ...

def absolute_value(number: Number):
    if number is None:
        return None
    if number.is_negative():
        negated_number = -number
        return negated_number
    return number

Well, that's kind of ugly, but at least it's in one place.

This approach has the advantage of being easily extendible to multiple languages: just use multiple comments.

There's still a problem though. In the implementation of absolute_value, how do we know what translation to use for the is_negative attribute on the number variable? In a statically typed language this wouldn't be an issue. In Python's ducktyping, we could look at type hints, but that means pulling in an entire library like mypy just to do some string replacements. And still wouldn't work if the types aren't hinted.

Let's take Python's duck-typing at face value, and just say "if it's a particular identifier in English, then there's only one translation".

# program-translation
# from: en-US
# to: en-x-piglatin
# identifiers:
#   absolute_value: absoluteway_aluevay
#   is_negative: isway_egativenay
#   is_odd: isway_oddway
#   negated_number: egatedway_umbernay
#   number: umbernay
#   Number: Umbernay
#   self: elfsay

class Number:
    def is_negative(self):
        ...

    def is_odd(self):
        ...

def absolute_value(number: Number):
    if number is None:
        return None
    if number.is_negative():
        negated_number = -number
        return negated_number
    return number

Hey, that's a bit nicer. Now we're just doing dumb string-substitution on the identifiers. It also de-duplicated the translation of self.

Now, we may actually want different translations in different scopes. We could enable that with inline comments again, but then we have a noise problem again. Instead let's see if we can specify a scope and then some overrides.

# program-translation
# from: en-US
# to: en-x-piglatin
# identifiers:
#   absolute_value: absoluteway_aluevay
#   is_negative: isway_egativenay
#   is_odd: isway_oddway
#   negated_number: egatedway_umbernay
#   number: umbernay
#   Number: Umbernay
#   self: elfsay
# scopes:
#   absolute_value:
#     number: umberney

class Number:
    def is_negative(self):
        ...

    def is_odd(self):
        ...

def absolute_value(number: Number):
    if number is None:
        return None
    if number.is_negative():
        negated_number = -number
        return negated_number
    return number

Here, we're saying specifically in the scope of the absolute_value function, translate the identifier number differently.

What about imports? Well, the file importing could provide translations of all their identifiers too, but I think it would be better to use separate files to give those translations in one place. For example, assuming our code above is in a module called number, the translations could live in a file called number.en-x-piglatin.translations.yaml. This file could be either next to the Python module, or in the same place in a hierarchy rooted at a different directory. That would allow you to provide your own translations for other people's code if they aren't packaged with them.

When displaying this source code, whatever software is doing that could give a "en-x-piglatin" option to show translated identifiers. I'm not sure what a good UX would be for editing, but I think you could imagine it.

For some commonly used identifiers, such as self, there might be some translations configured at the package level, or even globally.

Syntax

Now for the really interesting part.

So, you got some editor or other piece of software that can show you the source code "translated" by replacing identifiers. Let's see what that would look like, using Shavian as a "translation" of our usual Latin-script English:

# program-translation
# from: en-US
# to: en-US-Shaw
# identifiers:
#   absolute_value: 𐑨𐑚𐑕𐑩𐑤𐑵𐑑_𐑝𐑨𐑤𐑘𐑵
#   is_negative: 𐑦𐑟_𐑯𐑧𐑜𐑩𐑛𐑦𐑝
#   is_odd: 𐑦𐑟_𐑷𐑛
#   negated_number: 𐑯𐑩𐑜𐑱𐑛𐑩𐑑_𐑯𐑳𐑥𐑚𐑻
#   number: 𐑳_𐑯𐑳𐑥𐑚𐑻
#   Number: 𐑯𐑳𐑥𐑚𐑻
#   self: 𐑕𐑧𐑤𐑓:

class 𐑯𐑳𐑥𐑚𐑻:
    def 𐑦𐑟_𐑯𐑧𐑜𐑩𐑛𐑦𐑝(𐑕𐑧𐑤𐑓):
        ...

    def 𐑦𐑟_𐑷𐑛(𐑕𐑧𐑤𐑓):
        ...

def 𐑨𐑚𐑕𐑩𐑤𐑵𐑑_𐑝𐑨𐑤𐑘𐑵(𐑳_𐑯𐑳𐑥𐑚𐑻: 𐑯𐑳𐑥𐑚𐑻):
    if 𐑳_𐑯𐑳𐑥𐑚𐑻 is None:
        return None
    if 𐑳_𐑯𐑳𐑥𐑚𐑻.𐑦𐑟_𐑯𐑧𐑜𐑩𐑛𐑦𐑝():
        𐑯𐑩𐑜𐑱𐑛𐑩𐑑_𐑯𐑳𐑥𐑚𐑻 = -𐑳_𐑯𐑳𐑥𐑚𐑻
        return 𐑯𐑩𐑜𐑱𐑛𐑩𐑑_𐑯𐑳𐑥𐑚𐑻
    return 𐑳_𐑯𐑳𐑥𐑚𐑻

I've used Shavian so it's extra clear what "untranslated" English remains after dealing with identifiers.

This is progress, but there's still a ways to go! Even in this trivial example there are 5 keywords. Also there's the "embedded" English syntax; for example: "if" comes before a condition, "return" comes before the thing being returned, "is" is between to two things being copula'd.

This also maybe a good time to point out that "def" isn't exactly English.[1] It's short for "define", but function definition is such a common and fundamental thing that someone (van Rossum?) decided to abbreviate it. So, while Python certainly has an English influence, it's not entirley beholden to it. especially with the conveinence of the language is at stake. This principle will come up later when thinking about what Python would look like under the influence of other languages.

If you're super observant, you'll also notice our first translation wrinkle. The class Number was glossed as 𐑯𐑳𐑥𐑚𐑻, yet the variable number was changed to 𐑳_𐑯𐑳𐑥𐑚𐑻, which is my Shavian spelling of "a number". This is because Shavian has no casing distinctions, hence there's no way to have a capitalized convention for class names.[2] Even when dealing with just English in a different alphabet, we're encountering a way that the ergonomics of the programming language will have to change.

If we wanted to complete our Shavian version of Python, we would need to pick spellings for the 38 or so keywords in the language. Keywords aren't typically added often, so, unlike identifiers, this information could be standardized and provided from a central place.

To "translate" code into a Shavian Python, we'll use these rules for keywords and syntax:

Python (en-US)	·𐑐𐑲𐑔𐑷𐑯 (en-US-Shaw Python)
x y (placeholders for identifiers)	𐑙 𐑣
class x:	𐑒𐑤𐑨𐑕 𐑙:
def x(y1, y2):	𐑛𐑧𐑓 𐑙(𐑣1, 𐑣1):
if x:	𐑦𐑓 𐑙:
x is y	𐑙 𐑦𐑟 𐑣
return x	𐑮𐑩𐑑𐑻𐑯 𐑙
None	𐑯𐑳𐑯

# program-translation
# from: en-US
# to: en-US-Shaw
# identifiers:
#   absolute_value: 𐑨𐑚𐑕𐑩𐑤𐑵𐑑_𐑝𐑨𐑤𐑘𐑵
#   is_negative: 𐑦𐑟_𐑯𐑧𐑜𐑩𐑛𐑦𐑝
#   is_odd: 𐑦𐑟_𐑷𐑛
#   negated_number: 𐑯𐑩𐑜𐑱𐑛𐑩𐑑_𐑯𐑳𐑥𐑚𐑻
#   number: 𐑳_𐑯𐑳𐑥𐑚𐑻
#   Number: 𐑯𐑳𐑥𐑚𐑻
#   self: 𐑕𐑧𐑤𐑓:

𐑒𐑤𐑨𐑕 𐑯𐑳𐑥𐑚𐑻:
    𐑛𐑧𐑓 𐑦𐑟_𐑯𐑧𐑜𐑩𐑛𐑦𐑝(𐑕𐑧𐑤𐑓):
        ...

    𐑛𐑧𐑓 𐑦𐑟_𐑷𐑛(𐑕𐑧𐑤𐑓):
        ...

𐑛𐑧𐑓 𐑨𐑚𐑕𐑩𐑤𐑵𐑑_𐑝𐑨𐑤𐑘𐑵(𐑳_𐑯𐑳𐑥𐑚𐑻: 𐑯𐑳𐑥𐑚𐑻):
    𐑦𐑓 𐑳_𐑯𐑳𐑥𐑚𐑻 𐑦𐑟 𐑯𐑳𐑯:
        𐑮𐑩𐑑𐑻𐑯 𐑯𐑳𐑯
    𐑦𐑓 𐑳_𐑯𐑳𐑥𐑚𐑻.𐑦𐑟_𐑯𐑧𐑜𐑩𐑛𐑦𐑝():
        𐑯𐑩𐑜𐑱𐑛𐑩𐑑_𐑯𐑳𐑥𐑚𐑻 = -𐑳_𐑯𐑳𐑥𐑚𐑻
        𐑮𐑩𐑑𐑻𐑯 𐑯𐑩𐑜𐑱𐑛𐑩𐑑_𐑯𐑳𐑥𐑚𐑻
    𐑮𐑩𐑑𐑻𐑯 𐑳_𐑯𐑳𐑥𐑚𐑻

Well, it's lost the syntax highlighting of course, but there we can finally see a Python program fully "translated".

While Shavian programming is a neat trick, we didn't really learn much new about programming languages. Let's try a few more natural languages that have some more substantial differences to English.

German

German presents an interesting wrinkle: in standard written German, all nouns are capitalized. Let's think about what that would mean in a progamming language.

Programming languages can be thought of in (at least) two different ways.

One way is the perspective of functional programming: a function tells a computer what to do. You can define functions and invoke functions. That's it. All "data" is just defined but not-yet-invoked functions. In this perspective, we can relate function invocation to verbs, specifically imperative verbs. When the source code invokes the function, it's saying "do(this)". When the source code defines a function (or other data), we can think of that as a noun, with the copula assigning the noun to its meaning. Thus, a given identifier might have a "noun" meaning in one context, and a "verb" one in another.

In reality, even in very function-oriented languages, there are types of data besides functions, such as numbers or strings. Since those aren't executable, their identifiers always have the "noun" sense.

Another way to think of a program is as a state machine with limitless room to store its state. The state is the nouns, and the different transitions between those are verbs.

So let's define a "verb" as a function (or other "callable"), and a "noun" as everything else. Conceptually, the callables are nouns until you invoke them, but most of the time the only thing you do with them is invoke them.

If you wanted to make a Very German™ progamming language, you might have identifiers always be nouns when being defined (and thus capitalized), but if their value is a function, they are lowercased on function invocation:

Tun = (x): ...
Sachen = 4
tu Sachen

This defines a function called Tun ("do") that takes one argument. It then defines a variable Sachen ("stuff") that is assigned the value 4. Then Tun is invoked with Sachen as its argument. No parenthesis are necessary in the function invocation since the lowercased name indicates it's being invoked. There's also a -n suffix in the noun form. While using casing to distinguish definition from invocation is a cool idea, this is probably an example of embedding too much of a natural language in a programming language.

Let's "translating" Python into German. Here's our keyword and syntax table:

Python (en-US)	Python[3] (de Python)
x y (placeholders for identifiers)	x y
class x:	Klasse x:
def x(y1, y2):	def x(y1, y1):
if x:	wenn x:
x is y	x ist y
return x	gib y
None	Nichts

This is pretty straightforward. The keyword for class definition, Klasse, is capitalized since it's a noun. The keyword for callable definition is still "def", but now it's short for definier.

Please note that, in general, I'm not fluent in these other languages I'm going to use in examples. What you see here is based on what I know from reading dictionaries and grammars. If you have suggestions for better translations, feel free to share.

Let's look at our little example program:

# program-translation
# from: en
# to: de
# identifiers:
#   absolute_value: Absolutwert
#   is_negative: ist_negative
#   is_odd: ist_ungerade
#   negated_number: negiert_Zahl
#   number: ein_Zahl
#   Number: Zahl
#   self: Selbst

Klass Zahl:
    def ist_negative(Selbst):
        ...

    def ist_ungerade(Selbst):
        ...

def Absolutwert(ein_Zahl: Zahl):
    wenn ein_Zahl ist Nichts:
        gib Nichts
    wenn ein_Zahl.ist_negative():
        negiert_Zahl = -ein_Zahl
        gib negiert_Zahl
    gib ein_Zahl

Note the function with a noun name Absolutwert. In code typically a callable with a noun name means "make one of this noun out of this other noun I'm giving you". If we wanted to be pedantic, we could call it mach_Absolutwert, "make absolute value".

The verbs should be in the imperative tense, since the program is telling the computer to take some action. They can also use the "familiar" form instead of the "formal" one. The computer is du instead of Sie.[4]

Now let's consider grammatical gender. Verbs in German must change form depending on the "gender" of the subject of the clause. Thankfully, since all the verbs are orders to the computer, we can just use whatever gender "computer" is assigned in German. And really, we don't even need to worry about that, because imperative verbs don't do gender-agreement.

However, adjectives also must have gender-agreement. The word Zahl has a "feminine" grammatical gender, the adjective negative in the method name ist_negative is the feminine form, and not ist_negativer (masculine) or ist_negatives (neuter).

However let's image we write (in English) an "is negative" function that just returns whether something is less than zero.

def is_negative(thing):
    return thing < 0

How do we translate this? We don't know what gender "thing" will have. We could assume feminine since that's what Zahl is, but that's kind of against the spirit of ducktyping. "thing" can by anything for which the < operator is defined. For example you could define a Vector type which works with this function. Its name would be translated as Vektor, which is masculine. Should you make three aliases for the function, ist_negatives, ist_negativer, and ist_negative? Should the language support some kind of regex-identifier like def ist_negative[rs]?(Ding): ..., that would concisely define those three aliases?

We should probably just pick a convention and stick with it. Python doesn't always match English grammar conventions, and that's OK. I'm not a native speaker, but to me either the shortest form negative or the neuter one negatives seems like a good choice.

To Be Continued

Next time I'll look at what happens when we try to "translate" into Japanese, Hindi, and Urdu.

Landlord Registry

Morgan Wahl

2022-09-03 00:00

Policy proposal: Implement a registry of landlords to track rents, lease terms, and landlord income.

Land ownership is considered a matter of public record, and details about property size, building, value, tax, mortgages, and ownership history are carefully maintained by the town. Information on the rent paid and lease terms is not. Nor is there any record of who _is_ a landlord and there responsible to provide good housing to their tenants.

An registry of landlords would list all properties rented, amount paid in rent, and the lease terms. The identity of the individual renters would not be included if they lived there, but the renters of non-residental properties or corportate renters would be indentified.

Landlords would be required to keep this information up-to-date.

Such a registry would provide a very valuable source of data on the current state of rental housing.

The registry could also provide digital services to tenants and landlords, such as easy rent payments or lease signatures and recording.

Residential Deductions

Morgan Wahl

2022-09-03 00:00

Policy proposal: Implement residential property-tax deductions as simple payments to all residents.

In Massachusetts, towns are largely funded by property taxes. The town picks the tax rate (per assessed value of the property) and an amount to deduct from the tax for each property if the owner lives at that property.

For people who rent, the property tax is effectively included in the rent (otherwise the landlord would be losing money). However, they don't get the resident deduction. (Yes, if the landlord lives in the building, the build does get it. But the amount per housing unit, and per person is much lower). This makes the tax somewhat regressive.

Instead of subsidizing housing by paying property owners, simply make a payment to each resident of the town. (With a 1099 for income tax.) This means renters get the same deduction as owners. It also makes the property tax slightly progressive.

State Routes as Bus Routes

Morgan Wahl

2022-09-03 00:00

Policy proposal: Every state route/state highway (including federally funded ones) is also a bus route.

Since the roads are publicly paid for, and designed for use by motor vehicles, there should be public motor vehicles available to use them. This is most efficiently done using buses.

Making state routes reliably also bus routes allows residents to easily navigate whether driving themselves or taking a bus. It also means development that is accessible by road is more likely to be accessible by people who can't or don't drive.

A state route network is probably not a well-designed bus network. Some route redesigns would make sense. The state might choose to no longer fund the maintenance of some highways if they don't make sense as bus routes. The towns such routes run through might not like this, but ultimately if it doesn't make sense to run a bus there, then it doesn't make sense for the whole state to pay for the road itself.

The state buses would ideally be well-integrated with other public transportation. That other public transportation may adjust its services to take the state ones into account. They might be able to provide more and better service since some of their load is now taken on by state buses.

Interstates in particular lend themselves to local (stops at every exit) and express (stops at cities?) service.

This policy would allow for some efficiencies by doubling-up on signage and route design.

Sustainable Hardware

Morgan Wahl

2022-09-03 00:00

Here's a proposed policy to address several problems:

All computers sold with pre-loaded software must meet at least one of the two following conditions:

Open: The source-code of the software and the development tools required to build and install the software is publicly available, in perpetuity, and is licensed under an open-source license.
Replaceable: Documentation is made publicly available, in perpetuity, that is sufficient to write and install software on the computer that accomplishes all the same tasks as the pre-loaded software, and it is possible to replace the pre-loaded software without special tools.

If the pre-loaded software on the computer connects to any services on other computers, the communications protocol must be documented and the computer's owner must be able to choose what computers it connects to.

This would apply to the software in things like laptops, but also embedded software in cars, appliances, phones, and the myriad other things with computers in them. The manufacturer can choose to either "show their work" (the open approach) so that the public can see exactly what the devices will do, or keep the software secret, but facilitate its replacement.

For connections to other services, only the "replacable" option really makes sense.

This would allow hardware to continue to function even when abandoned by its manufacturer, while still giving manufacturers the option of keeping their software secret if they choose. Either one would be minimal effort, since it just requires the publishing of what the manufacturer would already have a copy of for their own use. Public organizations such as libraries could even handle the archiving and making available "in perpetuity" of the software or documentation.

For some systems, it would greatly complicate the design to make the software easily replacable after manufacturing, or replacing the software would require regulartory re-approval. These would have to go with the "Open" option. This makes sense, because these systems are effectively static tools, and exactly what they will do should be thoroughly knowable by the people using them.

Sometimes manufacturers want to keep their software secret. This tends to be much more complicated software, and thus it's less onerous to provide a mechanism to replace it. Thus, things like phones and laptops would likely go the "Replaceable" route. This is more or less status-quo, the main improvement of this policy is that hardware documentation would be available to anyone writing software for that hardware.

These requirements apply to each computer individually, of which there may be many inside a single manufactured "product". In a laptop for example, the controller in an SSD may be an embeded computer. The manufacturer could either make it easily replacable (perhaps with a usb port and documented debugging interface?), or simply release its source code. The OS and bootloader on the other hand, may remain secret, but the hardware interfaces they use would then need to be documented.

Another example might be a car. Some of the computers would simply have their source code released, particularly critical ones with safety concerns. This allows for more public scrutiny of such systems, which is a good thing. Other computers in the car, such as the one controlling the stereo or telephone, might stay secret, but be replaceable.

The "connections to other services" should apply to connections between computers _within_ a product as well. This allows owners to replace parts to repair the product.

The definition of "computer" is turing-completeness and programmability. Even if it is only programmed once, that still counts. If it can _only_ be programmed once, then only the "Open" option is available.

Robots

Morgan Wahl

2021-03-04 00:00

The trope of "robots" in stories is a way to include slavery in a story without having to deal with any of the real-life terrible things about enslaving humans.

In fact the real-life obsession among some people of making machines as human-like as they can may stem from a desire to have control over real humans. There aren't many engineering problems that call for a machine to take the form of a human, certainly not enough to justify the interest in such-shaped machines.

The Jetsons is a white American fantasy of subservient humans with none of the "downsides" (from a white perspective) of actual black people.

The "robot uprising" trope is perhaps an echo of fear of slave revolts in the U.S.

"Robots" are almost always depicted as just impractically (and impossibly) human. The main thing that separates the idea of a "robot" from a "machine" is some attempt at simulating a human, even if it's just a face.

Experiment: put a face on your dishwasher. Does it seem like a robot now?

There are definitions of "robot" that refer to some idea of a machine that senses its environment and responds, and is somehow more "independent" of humans. But this is not a very precise distinction, and is often just one of degree. A furnace with a thermostat senses its environment and responds, but most people don't think of it as a robot.

The robot trope may be a re-casting of the older trope of animals that act like just differently-shaped humans. It's _possible_ there's some of the same motivation there: if you're used to working with domesticated animals, you might think an animal that was more human would be like a human you could boss around. But I think the reverse is more likely: putting human attributes on animals is a way to describe their _uncontrollableness_. A smaller example of the pattern of putting human thoughts in various things to explain their unpredictability or complexity (e.g. gods controlling weather, yokai, etc.), and make it seem less scary.

Representing Discrete Probabilities in RDF

Morgan Wahl

2019-11-13 23:33

Many RDF datasets are meant to be interpreted as simply "statements in the graph are true". In order to record more nuanced interpretations, we'll make meta-statements: statements about statements and graphs.

We'll use a predicate that assigns a probability to a statement or graph. When its subject is a statement, it means that statement is true with that probability. When its subject is a graph, it means all statements in that graph are true with at least that probability.

Assume we have a dataset with a variety of statements, some of which we think are true, some are false, and some are true only with some probability. We can't just start inferring new statements from this dataset, since we have no idea what they would mean.

First create a graph of "assumed true" statements. Let's call it G₁. Add to it a statement "G₁ probability 1". We can add to our "true" graph probabilities for other statements not present in that graph itself. Within this graph we can safely let loose a inference process to find more true statements.

Next comes the fun part. We want to allow inferences about statements with a known, non-1 probability. To do that, let's say we have a statement outside G₁, S_a, and there's a statement in G₁ "S_a probability .9". We'll create a new supergraph of G₁ consisting of G₁ plus S_a. Let's call that G_a. We can add to G₁ a statement "G_a probability .9". Then, we can safely let an inference process run in that new supergraph.

Let's say we have another statement, called S_~a, known to be the inverse of S_a, either because a human manually added a statement to G₁ creating that relationship between them, or an automated process was able to infer that. We can infer it's probability is .1, and carry out the same supergraph process, this time producing a graph with probability .1.

Let's say we have another statement, S_b, also with probability .9. We could perform the supergraph process with it on G₁ to produce yet another graph with probability .9. We could also perform the supergraph process on G_a, producing a graph that represents everything in G₁ being true, plus S_a, plus S_b. We'll call that graph G_ab. We can then add to G₁ a statement "G_ab probability .81". This assumes the probability of the two statements is independent.

To represent dependant probabilities, instead of making statements about probability in G₁, we can make them in the subgraphs that suppose those statements are true. E.g. if S_a being true implies S_b has a probability of .99, then we add a statement to G_a saying as much. The supergraph construction then proceeds the same, but the resulting graph has a probability of 0.891.

Let's try a more formal approach.

We'll define three predicates:

n hasProbability p: n is a statement or graph, p is a number between 0 and 1 (inclusive).
n hasProbabilityAtLeast p: n is a graph or statement, p is a number between 0 and 1 (inclusive).
s isOppositeOf t: s and t are statements.

You can make several inferences that are obvious. In addition:

if n isOppositeOf m and n hasProbability p then m hasProbability 1 − p

Let's think about a largest possible graph, the graph that contains all possible statements. Let's call it G_🌌. Assigning meaning to G_🌌 is pretty much impossible; it contains every possible contradiction! You could apply a certain kind of inference engine to it that just constructs subgraphs that contain no contradictions, but it would have a infinite about of work to do. So instead, we'll carve out subgraphs that we can assign some meaning to. We can also nicely represent subjectivity while we're at it.

Let's say I want to mark some statement (S_a) as true. I create a new graph of statements I think are true, and add the statement to it. Let's call it G_morgan:1. I can also add to G_morgan:1 the statement "G_morgan:1 hasProbability 1". I can let an inference engine loose in this graph and have it add everything it can derive from statements already in the graph to it.

Now let's say I want to say another statement (S_b) in G_🌌 (but not G_morgan:1) has probability 0.8. I add a statement to G_morgan:1: "S_b hasProbability 0.8". This can trigger the automatic creation of a new graph G_morgan:b, which contains S_b, and is also a supergraph of G_morgan:1. We can then infer in G_morgan:1 "G_morgan:b hasProbabilityAtLeast 0.8".