Concepts:
We have already seen how to define classes / models using simple types, such as int
, float
, str
, bool
, etc. However in practice data structures are often more complex. They include for examples dictionaries, list of lists, or mutiple allowed types. Of course Pydantic also supports these more complex types, such as lists, dictionaries, enums, and unions. In the following we will see an overview of those types and how to use them:
Lists and dictionaries are very common data structures in Python. Pydantic supports typed lists and dictionaries, which means that we can also define the type of the elements in the list or the type of the values in the dictionary.
Typed lists and dictionaries are defined using the list
and dict
generic types. For example, we can define a model with a list of floats as follows:
from pydantic import BaseModel
class LineV1(BaseModel):
"""A Line object represented by two lists of coordinates"""
x: list[float]
y: list[float]
def length(self):
"""Line length computed by summing over the distance between points"""
length = 0
for idx in range(len(self.x) - 1):
length += (
(self.x[idx] - self.x[idx + 1]) ** 2
+ (self.y[idx] - self.y[idx + 1]) ** 2
) ** 0.5
return length
Instantiation of the model works as expected:
line_v1 = LineV1(x=[0, 1, 3], y=[0, 1, 2])
display(line_v1)
LineV1(x=[0.0, 1.0, 3.0], y=[0.0, 1.0, 2.0])
print(line_v1.length())
3.6502815398728847
The behavior is exactly the same as for simple types. So values are converted to the specified type if possible:
line_v1 = LineV1(x=[0, 1, "3"], y=[0, True, 2])
display(line_v1)
LineV1(x=[0.0, 1.0, 3.0], y=[0.0, 1.0, 2.0])
If the type cannot be converted a ValidationError
is raised:
line_v1 = LineV1(x=[0, 1, "10^-2"], y=[0, 1, 2])
display(line_v1)
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
Cell In[5], line 1
----> 1 line_v1 = LineV1(x=[0, 1, "10^-2"], y=[0, 1, 2])
2 display(line_v1)
File /opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/pydantic/main.py:341, in pydantic.main.BaseModel.__init__()
ValidationError: 1 validation error for LineV1
x -> 2
value is not a valid float (type=type_error.float)
Note that pydantic indicates the index of the invalid value in the error message using x -> 2
.
Typed dictionaries are defined in a similar way using the dict
generic type:
from pydantic import BaseModel
class LineV2(BaseModel):
"""A Line object represented by two lists of coordinates"""
coordinates: dict[str, list[float]]
def length(self):
"""Line length computed by summing over the distance between points"""
length = 0
x = self.coordinates["x"]
y = self.coordinates["y"]
for idx in range(len(x) - 1):
length += ((x[idx] - x[idx + 1]) ** 2 + (y[idx] - y[idx + 1]) ** 2) ** 0.5
return length
Now we define some data an create the model:
coordinates = {
"x": [0, 1, 3],
"y": [0, 1, 2],
}
line_v2 = LineV2(coordinates=coordinates)
For illustration purposes we also create a model with invalid data:
coordinates = {
"x": [0, "one", 3],
"y": [0, 1, "three"],
}
line_v2 = LineV2(coordinates=coordinates)
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
Cell In[8], line 6
1 coordinates = {
2 "x": [0, "one", 3],
3 "y": [0, 1, "three"],
4 }
----> 6 line_v2 = LineV2(coordinates=coordinates)
File /opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/pydantic/main.py:341, in pydantic.main.BaseModel.__init__()
ValidationError: 2 validation errors for LineV2
coordinates -> x -> 1
value is not a valid float (type=type_error.float)
coordinates -> y -> 2
value is not a valid float (type=type_error.float)
Again pydantic raises a meaningful error message and also indicates the keys and indices of the invalid values.
In many cases it is useful to provide users with a selection of valid values, such as a choice of strings. The data structure to handle this is called Enum
. Enums are defined using the Enum
generic type in Python. For example, we can define a selection for the addtional property of a line color:
from pydantic import BaseModel
from enum import Enum
class LineColor(str, Enum):
"""Line color enum"""
red = "red"
green = "green"
blue = "blue"
And add this to the definiton of the line class:
class ColoredLine(BaseModel):
"""A Line object that can be used to represent a line."""
x: list[float]
y: list[float]
color: LineColor = LineColor.red
def length(self):
"""Length of the line"""
length = 0
for idx in range(len(self.x) - 1):
length += (
(self.x[idx] - self.x[idx + 1]) ** 2
+ (self.y[idx] - self.y[idx + 1]) ** 2
) ** 0.5
return length
On initialisation we can now pass a color:
colored_line = ColoredLine(x=[0, 1, 2], y=[0, 1, 2], color="red")
display(colored_line)
ColoredLine(x=[0.0, 1.0, 2.0], y=[0.0, 1.0, 2.0], color=<LineColor.red: 'red'>)
Now let’s try an invalid color:
colored_line = ColoredLine(x=[0, 1, 2], y=[0, 1, 2], color="purple")
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
Cell In[12], line 1
----> 1 colored_line = ColoredLine(x=[0, 1, 2], y=[0, 1, 2], color="purple")
File /opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/pydantic/main.py:341, in pydantic.main.BaseModel.__init__()
ValidationError: 1 validation error for ColoredLine
color
value is not a valid enumeration member; permitted: 'red', 'green', 'blue' (type=type_error.enum; enum_values=[<LineColor.red: 'red'>, <LineColor.green: 'green'>, <LineColor.blue: 'blue'>])
Pydantic now gives a useful error message, saying that the value is not a valid enumeration member; and it also lists the valid choices.
Now we try to modify the color of an exising instance:
colored_line.color = "violet"
By default pydantic only validates / parses the values on initialisation. If we want to validate the values on modification we can use the validate_assignment
configuaration option:
class ColoredLine(BaseModel):
"""A Line object that can be used to represent a line."""
x: list[float]
y: list[float]
color: LineColor = LineColor.red
class Config:
"""Pydantic Config object"""
validate_assignment = True
colored_line = ColoredLine(x=[0, 1, 2], y=[0, 1, 2], color="red")
colored_line.color = "purple"
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
Cell In[15], line 2
1 colored_line = ColoredLine(x=[0, 1, 2], y=[0, 1, 2], color="red")
----> 2 colored_line.color = "purple"
File /opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/pydantic/main.py:384, in pydantic.main.BaseModel.__setattr__()
ValidationError: 1 validation error for ColoredLine
color
value is not a valid enumeration member; permitted: 'red', 'green', 'blue' (type=type_error.enum; enum_values=[<LineColor.red: 'red'>, <LineColor.green: 'green'>, <LineColor.blue: 'blue'>])
Note: that there is https://github.com/pydantic/pydantic-extra-types, which provides support for validation of CSS colors. from pydantic.color import Color
, so there is actually no reason to implement this ourselves.
from typing import Union
from pydantic import BaseModel
from enum import Enum
class LineColor(str, Enum):
"""Line color enum"""
red = "red"
green = "green"
blue = "blue"
class LineV2(BaseModel):
"""A Line object that can be used to represent a line."""
x: list[float]
y: list[float]
color: Union[LineColor, None] = LineColor.red
Another very useful advanced type is the datetime
, which lets users handle dates and times. Pydantic natively supports parsing and validation of datetime types. This relies on the Python standard library (see https://docs.python.org/3/library/datetime.html). Let’s take a look at an example:
from datetime import datetime
class LonLatTimeVector(BaseModel):
"""Represents a position on earth with time."""
lon: float = 0.0
lat: float = 0.0
time: datetime = None
position = LonLatTimeVector(lon=1.0, lat=2.0, time=datetime.now())
display(position)
LonLatTimeVector(lon=1.0, lat=2.0, time=datetime.datetime(2023, 7, 25, 15, 18, 23, 759435))
Remember you can always check the type using type()
:
type(position.time)
datetime.datetime
So datetime.now()
already creates a datetime
object, but pydantic also supports other valid formats, such as:
# Time in ISO format
position = LonLatTimeVector(time="2021-01-01T00:00:00")
display(position)
LonLatTimeVector(lon=0.0, lat=0.0, time=datetime.datetime(2021, 1, 1, 0, 0))
# Ints or floats interpred as unix time, i.e. seconds since 1970-01-01T00:00:00
position = LonLatTimeVector(time=1609459200)
display(position)
LonLatTimeVector(lon=0.0, lat=0.0, time=datetime.datetime(2021, 1, 1, 0, 0, tzinfo=datetime.timezone.utc))
Let’s try to pass an invalid datetime:
position = LonLatTimeVector(time="2021-01-01")
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
Cell In[21], line 1
----> 1 position = LonLatTimeVector(time="2021-01-01")
File /opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/pydantic/main.py:341, in pydantic.main.BaseModel.__init__()
ValidationError: 1 validation error for LonLatTimeVector
time
invalid datetime format (type=value_error.datetime)
There is variety of other date and time related quantities, which might be useful:
from datetime import date, datetime, time, timedelta
from pydantic import BaseModel
class Model(BaseModel):
d: date = None
dt: datetime = None
t: time = None
td: timedelta = None
m = Model(
d=1966280412345.6789, # Unix time in seconds
dt="2032-04-23T10:20:30.400+02:30", # Time in ISO format
t=time(4, 8, 16), # Time object hours, minutes, seconds, [milliseconds]
td="P3DT12H30M5S", # ISO 8601 duration format
)
display(m)
Model(d=datetime.date(2032, 4, 22), dt=datetime.datetime(2032, 4, 23, 10, 20, 30, 400000, tzinfo=datetime.timezone(datetime.timedelta(seconds=9000))), t=datetime.time(4, 8, 16), td=datetime.timedelta(days=3, seconds=45005))
Pydantic has a few useful factory functions for building types that have additional validations built in. For example, pydantic.conint
is a function with parameters that allow you to build a constrained integer. This could be used to create a MovieRating
type that only allows integers from 0 to 100:
from pydantic import BaseModel, conint
MovieRating = conint(ge=0, le=100)
class Movie(BaseModel):
name: str
year: int
rating: MovieRating
On top of the normal integer validations, Movie
will throw a validation error if the value provided is an integer less than 0 or greater than 100.
Other useful type factories exist, such as conlist
, confloat
, constr
, condate
, and more. Pydantic has also exposed specific instantiations of some of these type factories that are commonly used. A few examples include, but are not limited to, NegativeInt
, NonNegativeInt
, PositiveInt
, NonPositiveInt
, and the float
counterparts. You can guess how each of these are created from conint
and confloat
.
There are a variety of other useful types built into pydantic. We have already mentioned several above. Most of the types not mentioned above are mostly relevant for web development but it is nonetheless good to know that they exist. For an overview of those types see https://docs.pydantic.dev/1.10/usage/types/#pydantic-types and also https://github.com/pydantic/pydantic-extra-types (unreleased).
One of the most powerful features of Pydantic is the ability to combine models to create hierarchical structures. This is done simply by using an existing pydantic model as a new type. For example we easily define a triangle using three points:
from pydantic import BaseModel
class Point(BaseModel):
"""Representation of a two-dimensional point coordinate."""
x: float
y: float
def distance_to(self, other: "Point") -> float:
"""Computes the distance to another `PointV3`."""
dx = self.x - other.x
dy = self.y - other.y
return (dx**2 + dy**2) ** 0.5
class Triangle(BaseModel):
"""Representation of a triangle"""
point_a: Point = Point(x=0, y=0)
point_b: Point = Point(x=1, y=0)
point_c: Point = Point(x=0, y=1)
def circumference(self):
"""Circumference of the triangle"""
return (
self.point_a.distance_to(self.point_b)
+ self.point_b.distance_to(self.point_c)
+ self.point_c.distance_to(self.point_a)
)
triangle = Triangle(
point_a=Point(x=0, y=0),
point_b=Point(x=0.5, y=0),
point_c=Point(x=0, y=0.5),
)
display(triangle)
Triangle(point_a=Point(x=0.0, y=0.0), point_b=Point(x=0.5, y=0.0), point_c=Point(x=0.0, y=0.5))
print(triangle.circumference())
1.7071067811865475
Alternatively in this case we can also pass a dictionary to the model without creating the Point
instances first:
triangle = Triangle(
point_a={"x": 0, "y": 0.5},
point_b={"x": 0.5, "y": 0},
point_c={"x": 0, "y": 0},
)
print(triangle.circumference())
1.7071067811865475
Pydantic will automatically convert the dictionaries to Point
instances by passing the arguments to Point(**kwargs)
. This is already a preview of the serialization and deserialization of models into dictionaries and hierarchical languages such as JSON, YAML and TOML. Let’s quickly verify the type of the point:
type(triangle.point_a)
__main__.Point
Of course this also works for list of Point
objects. For example we can rewrite the definition of the Line
class we introduced above as follows:
from pydantic import BaseModel
class Point(BaseModel):
"""Representation of a two-dimensional point coordinate."""
x: float
y: float
def distance_to(self, other: "Point") -> float:
"""Computes the distance to another `PointV3`."""
dx = self.x - other.x
dy = self.y - other.y
return (dx**2 + dy**2) ** 0.5
class LineV3(BaseModel):
"""Line object represented by a list of points"""
points: list[Point]
def length(self):
"""Line length computed by summing over the distance between points"""
length = 0
for point, next_point in zip(self.points[:-1], self.points[1:]):
length += point.distance_to(next_point)
return length
Which can now be used as follows:
line_v3 = LineV3(points=[Point(x=0, y=0), Point(x=1, y=1)])
display(line_v3)
LineV3(points=[Point(x=0.0, y=0.0), Point(x=1.0, y=1.0)])
print(line_v3.length())
1.4142135623730951
If you compare to our first implementation at the beginning, LineV3
is more compact, readable and elegant.
Because of the way model attributes are defined in Pydantic, we cannot simply define a normal class attribute or “normal” Python attribute for a class. However in some cases we might want to introduce e.g. an internal attribute that is not part of the Pydantic model. The usual Python convention for non-public attributes is to prefix them with an underscore _
. By convention attributes starting with an underscore are excluded from the Pydantic model.
Let’s try the following first:
from pydantic import BaseModel
class LineV4(BaseModel):
"""Line object represented by a list of points"""
points: list[Point]
_color: LineColor = LineColor.red
def is_red(self):
return self._color == LineColor.red
Let’s instantiate the model:
line_with_hidden_color = LineV4(
points=[Point(x=0, y=0), Point(x=1, y=1)],
)
You can see that the _color
attribute is not part of the model:
line_with_hidden_color.__fields__
{'points': ModelField(name='points', type=List[Point], required=True)}
But you can access it internally as usual, e.g. in the is_red()
method we defined above:
line_with_hidden_color.is_red()
True
However it cannot be modified from the outside:
line_with_hidden_color._color = "blue"
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[34], line 1
----> 1 line_with_hidden_color._color = "blue"
File /opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/pydantic/main.py:357, in pydantic.main.BaseModel.__setattr__()
ValueError: "LineV4" object has no field "_color"
This behaviour is obviously is different from a “normal” Python class attribute.
Howver if you need to vary or manipulate internal attributes on instances of the model, you can declare them using PrivateAttr
instead:
from pydantic import BaseModel, PrivateAttr
class LineV5(BaseModel):
"""Line object represented by a list of points"""
points: list[Point]
_color: LineColor = PrivateAttr(LineColor.red)
# as this now behaves as a "normal" Python attribute, we need to set it on init
def __init__(self, **data):
super().__init__(**data)
self._color = data.get("_color", LineColor.red)
def is_red(self):
return self._color == LineColor.red
Let’s try the behaviour now:
line_with_hidden_color = LineV5(
points=[Point(x=0, y=0), Point(x=1, y=1)], _color="violet"
)
line_with_hidden_color.is_red()
False
line_with_hidden_color._color = "red"
line_with_hidden_color.is_red()
True
Now this behaves like a normal Python class attribute. But note that it is also not being validated, just as a normal Python attribute. That’s why we could set it to violet, even though it is not a valid color.
In case you have many private attributes you can also use the config setting to achieve the equivalent behaviour:
from pydantic import BaseModel
class LineV6(BaseModel):
"""Line object represented by a list of points"""
points: list[Point]
_color: LineColor = LineColor.red
class Config:
underscore_attrs_are_private = True
# as this now behaves as a "normal" Python attribute, we need to set it on init
def __init__(self, **data):
super().__init__(**data)
self._color = data.get("_color", LineColor.red)
def is_red(self):
return self._color == LineColor.red
line_with_hidden_color = LineV5(points=[Point(x=0, y=0), Point(x=1, y=1)], _color="red")
line_with_hidden_color.is_red()
True
Theres is one more rare case to be covered: in some cases you might want to define a class variable, which is shared by all instances of the model. The “normal” syntax is occupied by the Pydantic model declaration. However we can still define class variables using the ClassVar
type:
from typing import ClassVar
from pydantic import BaseModel
class LineV6(BaseModel):
"""Line object represented by a list of points"""
points: list[Point]
color: ClassVar[LineColor] = LineColor.red
def is_red(self):
return self._color == LineColor.red
Again the variable is not part of the pydantic model:
line_global_color = LineV6(
points=[Point(x=0, y=0), Point(x=1, y=1)],
)
line_global_color.__fields__
{'points': ModelField(name='points', type=List[Point], required=True)}
However it can be accessed and modified as a normal Python class variable:
other_line_global_color = LineV6(
points=[Point(x=0, y=0), Point(x=1, y=1)],
)
# We modify the class variable
LineV6.color = LineColor.green
# The change is reflected in both instances
print(line_global_color.color)
print(other_line_global_color.color)
LineColor.green
LineColor.green
We will be using the open-meteo free weather API to make a request to the https://api.open-meteo.com/v1/forecast endpoint. We will fetch temperature data for Austin, TX on an hourly cadence for all of 2023.
To do this we will use the requests third-party Python library. For anyone new to requests
, here is a quick primer. Imagine you want to receive the response from the example URL: https://example.com/api/v2/foo?bar=1&bap=scipy2023&baz=false. Then the response can be obtained with:
import requests
api_endpoint = "https://example.com/api/v2/foo"
params = {"bar": 1, "bap": "scipy2023", "baz": False}
response = requests.get(url=api_endpoint, params=params)
print(response.content)
The requests library has many additional features that you can find in the documentation linked above, but for this exercise, this should be all you need.
Task: Write a script that makes a request to the open-meteo API enpoint with the following constraints:
You should create a Pydantic model that parses and validates the response from the request. Your Pydantic model should be precise (i.e. include as many validations and specific type hints as possible) but also flexible (examples may include (a) allowing for accepting either celsius or fahrenheit or (b) allowing for either ISO8601 or unix time). Your model should also be hierarchical to account for the nested structure of the API response. In order to understand the expected response needed to build your model, you may want to spend a few minutes reading the open-meteo documentation (linked above) and/or playing around with the open-meteo UI (also at the linked documentation).
After receiving the response, compute the daily average temperature and plot it as a function of time. You can use any plotting library you like, but we recommend matplotlib and for the averaging you could use e.g. pandas.