Wednesday, May 18, 2016

Adding type annotations for fspath

Type annotations for fspath

Python 3.6 will have a new dunder protocol, __fspath__() , which should be supported by classes that represent filesystem paths. Example of such classes are the pathlib.Path family and os.DirEntry  (returned by os.scandir() ).

You can read more about this protocol in the brand new PEP 519. In this blog post I’m going to discuss how we would add type annotations for these additions to the standard library.

I’m making frequent use of AnyStr , a quite magical type variable predefined in the typing module. If you’re not familiar with it, I recommend reading my blog post about AnyStr . You may also want to read up on generics in PEP 484 (or read mypy’s docs on the subject).

Adding os.scandir() to the stubs for os.py

For practice, let’s see if we can add something to the stub file for os.py. As of this writing there’s no typeshed information for os.scandir() , which I think is a shame. I think the following will do nicely. Note how we only define DirEntry  and scandir() for Python versions >= 3.5. (Mypy doesn’t support this yet, but it will soon, and the example here still works — it just doesn’t realize scandir()  is only available in Python 3.5.) This could be added to the end of stdlib/3/os/__init__.pyi:

from typing import Generic, AnyStr, overload, Iterator

if sys.version_info >= (3, 5):

    class DirEntry(Generic[AnyStr]):
        name = ...  # type: AnyStr
        path = ...  # type: AnyStr
        def inode(self) -> int: ...
        def is_dir(self, *, follow_symlinks: bool = ...) -> bool: ...
        def is_file(self, *, follow_symlinks: bool = ...) -> bool: ...
        def is_symlink(self) -> bool: ...
        def stat(self, *, follow_symlinks: bool = ...) -> stat_result: ...

    @overload
    def scandir() -> Iterator[DirEntry[str]]: ...
    @overload
    def scandir(path: AnyStr) -> Iterator[DirEntry[AnyStr]]: ...

Deconstructing this a bit, we see a generic class (that’s what the Generic[AnyStr]  base class means) and an overloaded function.  The scandir() definition uses @overload because it can also be called without arguments. We could also write it as follows; it’ll work either way:

    @overload
    def scandir(path: str = ...) -> Iterator[DirEntry[str]]: ...
    @overload
    def scandir(path: bytes) -> Iterator[DirEntry[bytes]]: ...

Either way there really are three ways to call scandir() , all three returning an iterable of DirEntry objects:

  • scandir() -> Iterator[DirEntry[str]] 
  • scandir(str) -> Iterator[DirEntry[str]] 
  • scandir(bytes) -> Iterator[DirEntry[bytes]] 

Adding os.fspath()

Next I’ll show how to add os.fspath() and how to add support for the __fspath__()  protocol to DirEntry .

PEP 519 defines a simple ABC (abstract base class), PathLike , with one method, __fspath__() . We need to add this to the stub for os.py , as follows:

class PathLike(Generic[AnyStr]):
    @abstractmethod
    def __fspath__(self) -> AnyStr: ...

That’s really all there is to it (except for the sys.version_info  check, which I’ll leave out here since it doesn’t really work yet). Next we define os.fspath() , which wraps this protocol. It’s slightly more complicated than just calling its argument’s __fspath__()  method, because it also handles strings and bytes. So here it is:

@overload
def fspath(path: PathLike[AnyStr]) -> AnyStr: ...
@overload
def fspath(path: AnyStr) -> AnyStr: ...

Easy enough! Next is update the definition of DirEntry . That’s easy too — in fact we only need to make it inherit from PathLike[AnyStr] , the rest is the same as the definition I gave above:

class DirEntry(PathLike[AnyStr], Generic[AnyStr]):
    # Everything else unchanged!

The only slightly complicated bit here is the extra base class Generic[AnyStr] . This seems redundant, and in fact PEP 484 says we can leave it off, but mypy doesn’t support that yet, and it’s quite harmless — this just rubs into mypy’s face that this is a generic class of one type variable (the by-now famous AnyStr ).

Finally we need to make a similar change to the stub for pathlib.py . Again, all we need to do is to make PurePath  inherit from PathLike[str] , like so:

from os import PathLike

class PurePath(PathLike[str]):
    # Everything else unchanged!

However, here we don’t add Generic , because this is not a generic class! It inherits from PathLike[str] , which is quite un-generic, since it’s PathLike specialized for just str .

Note that we don’t actually have to define the __fspath__()  method in these stubs — we’re not supposed to call them directly, and stubs don’t provide implementations, only interfaces.

Putting it all together, we see that it’s quite elegant:

for a in os.scandir('.'):
    b = os.fspath(a)
    # Here, the typechecker will know that the type of b is str!

The derivation that b has type str  is not too complicated: first, os.scandir('.')  has a str  argument, so it returns an iterator of DirEntry  objects parameterized with str , which we write as DirEntry[str] . Passing this DirEntry[str]  to os.fspath()  then takes the first of that function’s two overloads (the one with PathLike[AnyStr] ), since it doesn’t match the second one ( DirEntry  doesn’t inherit from AnyStr , because it’s neither a str  nor bytes ). Further the AnyStr type variable in PathLike[AnyStr] is solved to stand for just str , because DirEntry[str]  inherits from PathLike[str] . This is the specialized version of what the code says: DirEntry[AnyStr]  inherits from PathLike[AnyStr] .

Okay, so maybe that last paragraph was intermediate or advanced. And maybe it could be expanded. Maybe I’ll write another blog about how type inference works, but there’s a lot on that topic, and other authors have probably already written better introductory material about generics (in other languages, though).

Making things accept PathLike

There’s a bit of cleanup work that I’ve left out. PEP 519 says that many stdlib functions that currently take strings for pathnames will be modified to also accept PathLike . For example, here’s how the signatures for os.scandir()  would change:

@overload
def scandir() -> Iterator[DirEntry[str]]: ...
@overload
def scandir(path: AnyStr) -> Iterator[DirEntry[AnyStr]]: ...
@overload
def scandir(path: PathLike[AnyStr]) -> Iterator[DirEntry[AnyStr]]: ...

The first two entries are unchanged; I’ve just added a third overload. (Note that the alternative way of defining scandir() would require more changes — an indication that this way is more natural.)

I also tried doing this with a union:

@overload
def scandir() -> Iterator[DirEntry[str]]: ...
@overload
def scandir(path: Union[AnyStr, PathLike[AnyStr]]) -> Iterator[DirEntry[AnyStr]]: ...

But I couldn’t get this to work, so the extra overload is probably the best we can do. Quite a few functions will require a similar treatment, sometimes introducing overloading where none exists today (but that shouldn’t hurt anything).

A note about pathlib : since it only deals with strings, its methods (the ones that PEP 519 says should be changed anyway) should use PathLike[str]  rather than PathLike[AnyStr] .

Acknowledgments

(Thanks for comments on the draft to Stephen Turnbull, Koos Zevenhoven, Ethan Furman, and Brett Cannon.)

3 comments:

Unknown said...

I think this a premature push. While progressive type checking is a step toward capturing and enforcing developer intent, this standardization is too soon. Consider type annotation mini-languages, e.g., def graph(rotate:"int>0<180", xs:[int+], data:[[[float?]]"), which has terrible run-time characteristics but allows a developer to convey constraints and complex types. It may be a foolish path, but we don't know yet.

Torsten Bronger said...

I know that you are very much excited about type annotation and mypy, and I assume these things are good ideas for many use cases. But I could hardly follow your last two blog entries. For me, all this is complicated. Possibly, everyday use is simpler ... I remember how I hated the idea of decorators when they were first published (and the @ syntax for that matter) and how I enjoy using them today. I truly hope to be positively surprised again. ;-)

Guido van Rossum said...

This is definitely advanced material! I wrote these blog posts because I had written about this in the discussion for PEP 519 and Brett Cannon said he hadn't known about AnyStr. But you should definitely start with a gentler intro -- not PEP 484 itself. Perhaps PEP 483 for the theory, and mypy.readthedocs.io for a tutorial.