author: Sam Derbyshire, Well-Typed title: GHC's renamer subtitle: GHC Contributors' Workshop date: June 7th, 2023
- First part: survey of the renamer and how information about identifiers flows through the compiler pipeline.
- Second part: getting our hands dirty, fixing a bug in the renamer
(interaction of
COMPLETE
sets of pattern synonyms anddo
notation).
⭲
-
Gives all names unique identifiers.
- Resolve namespacing (type constructor vs variable, module qualification, record fields).
- Handle shadowing.
-
Usage & dependency analysis.
- Redundant/unused imports, unused declarations, ...
- Dependency analysis: determines order of typechecking.
⭲
There are many ways that GHC can refer to an identifier depending on how much it knows about it.
:::{.element: class="fragment"}
OccName
::: :::{.element: class="fragment"}Name
::: :::{.element: class="fragment"}RdrName
::: :::{.element: class="fragment"}GlobalRdrElt
::: :::{.element: class="fragment"}Var
,TyVar
,Id
⭲
::::::{.element: class="fragment"}
data OccName = OccName
{ occNameSpace :: NameSpace
, occNameFS :: FastString }
data NameSpace
= VarName
| FldName { fldParent :: FastString }
| DataName
| TvName
| TcClsName
newtype OccEnv a = MkOccEnv (FastStringEnv (UniqFM NameSpace a))
⭲
::::::{.element: class="fragment"}
data Name = Name
{ n_sort :: NameSort
, n_occ :: OccName
, n_uniq :: Unique
, n_loc :: SrcSpan
}
data NameSort
= External Module
| WiredIn Module TyThing BuiltInSyntax
| Internal
| System
⭲
::::::{.element: class="fragment"}
data RdrName
= Unqual OccName
| Qual ModuleName OccName
| Orig Module OccName
| Exact Name
⭲
::::::{.element: class="fragment"}
data Var -- slightly abridged
= TyVar { varName :: Name
, varType :: Kind }
| TcTyVar { varName :: Name
, varType :: Kind
, tc_tv_details :: TcTyVarDetails }
| Id { varName :: Name
, varType :: Type
, varMult :: Mult
, idScope :: IdScope
, id_details :: IdDetails
, id_info :: IdInfo
}
data IdScope = GlobalId | LocalId ExportFlag
data ExportFlag = NotExported | Exported
data IdDetails = VanillaId | RecSelId {} | PrimOpId {} | CoVarId {} | ...
type TyVar = Var
type Id = Var
:::
:::{.element: class="fragment"}
data TyThing
= AnId Id
| AConLike ConLike
| ATyCon TyCon
| ACoAxiom (CoAxiom Branched)
:::
:::{.element: class="fragment"}
data GREInfo
= Vanilla
| IAmTyCon (TyConFlavour Name)
| IAmConLike ConInfo
| IAmRecField RecFieldInfo
⭲
:::The renamer primarily deals with GlobalRdrElt
, which consists of a Name
,
information about how it's in scope in the renamer, and additional information
that the renamer might need to know.
:::{.element: class="fragment"}
data GlobalRdrElt
= GRE { gre_name :: Name
, gre_par :: Parent
, gre_lcl :: Bool
, gre_imp :: Bag ImportSpec
, gre_info :: GREInfo }
type GlobalRdrEnv = OccEnv [GlobalRdrElt]
⭲
:::Renaming and typechecking happens in a shared monad,
TcM
(also called TcRn
or RnM
).
:::{.element: class="fragment" data-fragment-index="1"}
type TcRnIf a b = IOEnv (Env a b)
type TcRn = TcRnIf TcGblEnv TcLclEnv
type TcM = TcRn
type RnM = TcRn
:::
:::{.element: class="fragment" data-fragment-index="2"}
This is ReaderT
over IO
, with access to:
HscEnv
– per-module options (e.g. flags passed to GHC) and environment
e.g.UnitEnv
, currently loaded modules;TcGblEnv
– generated during typechecking and passed on
e.g.TypeEnv
,InstEnv
,GlobalRdrEnv
;TcLclEnv
– changes as we move inside expressions
e.g.SrcSpan
,TcLevel
,LocalRdrEnv
.⭲
:::
:::{.element: class="fragment"}
Start by looking at rnExpr (RecordUpd {..})
:
:::
:::{.element: class="fragment"}
rnExpr (RecordUpd { rupd_expr = L l expr, rupd_flds = rbinds })
= setSrcSpanA l $
case rbinds of
RegularRecUpdFields { recUpdFields = flds } ->
do { (e, fv_e) <- rnExpr expr
; (parents, flds, fv_flds) <- rnHsRecUpdFields flds
; ... }
-- ...
⭲
:::After handling duplicates, rnHsRecUpdFields
starts by calling lookupRecUpdFields
to look up each field individually.
:::{.element: class="fragment"}
lookupRecUpdFields :: NE.NonEmpty (LHsRecUpdField GhcPs GhcPs)
-> RnM (NE.NonEmpty (HsRecUpdParent GhcRn))
lookupRecUpdFields flds
= do { gre_env <- getGlobalRdrEnv
; fld1_gres NE.:| other_flds_gres <-
mapM (lookupFieldGREs gre_env . getFieldUpdLbl) flds
; -- ...
}
:::
:::{.element: class="fragment"}
We retrieve GlobalRdrElt
s for each record field.
These have GREInfo
s which specify which constructors have that field.
We can use that to disambiguate.
⭲
::::::{.element: class="fragment"}
data S = MkS1 { x, y :: Float } | MkS2 { x, y, z :: Float }
data T = MkT1 { x :: Word } | MkT2 { y :: Int }
foo r = r { x = 3, y = 4 }
:::
:::{.element: class="fragment"}
Lookup: [(x, [MkS1, MkS2, MkT1]), (y, [MkS1, MkS2, MkT2])]
.
:::
:::{.element: class="fragment"}
Intersect all the possible data constructors: [MkS1, MkS2]
.
:::
:::{.element: class="fragment"}
Take parents (removing duplicates): [S]
.
:::
:::{.element: class="fragment"}
There is a single parent: the record update is unambiguous!
⭲
::::::{.element: class="fragment"}
type family IdP p
type LIdP p = XRec p (IdP p)
type family XRec p a
:::
:::{.element: class="fragment"}
type instance IdP (GhcPass p) = IdGhcP p
type family IdGhcP pass where
IdGhcP 'Parsed = RdrName
IdGhcP 'Renamed = Name
IdGhcP 'Typechecked = Id
:::
:::{.element: class="fragment"}
type instance XRec (GhcPass p) a = GenLocated (Anno a) a
data GenLocated l e = L l e
type instance Anno RdrName = SrcSpanAnnN -- SrcSpan, but with
type instance Anno Name = SrcSpanAnnN -- some extra stuff
type instance Anno Id = SrcSpanAnnN -- for exact-print
⭲
:::We start off by lexing. See GHC.Parser.Lexer.x
:
:::{.element: class="fragment"}
@varid = $small $idchar* -- variable identifiers
@conid = $large $idchar* -- constructor identifiers
@varsym = ($symbol # \:) $symbol* -- variable (operator) symbol
@consym = \: $symbol* -- constructor (operator) symbol
$small = [a-z \_]
$large = [A-Z]
$symbol = [\!\#\$\%\&\*\+\.\/\<\=\>\?\@\\\^\|\-\~\:]
$idchar = [$small $large $digit \']
⭲
:::Then parsing, in GHC.Parser.y
.
-
Parse occurrences into `RdrName`.
- Use `mkQual`/`mkUnqual` depending on qualification.
- Resolve `NameSpace` using context (e.g. whether we are parsing a term or a type).
- Fix up after the fact when necessary.
:::{.element: class="fragment"} Example: parsing a type constructor.
tyconsym :: { RdrName } -- (actually LocatedN RdrName)
: CONSYM { mkUnqual tcClsName (getCONSYM $1) }
| VARSYM { mkUnqual tcClsName (getVARSYM $1) }
| ':' { consDataCon_RDR }
| '-' { mkUnqual tcClsName (fsLit "-") }
| '.' { mkUnqual tcClsName (fsLit ".") }
⭲
::::::{.element: class="fragment"}
data D a b c d = Q a => b :+: c :*: d
:::
:::{.element: class="fragment"}
Parse this as a type first. Then, after properly associating using the
fixities, we find that :+:
should be namespaced as a data constructor,
and the others remain type constructors.
:::
:::{.element: class="fragment"}
We also need to resolve ambiguities between expression and patterns.
See Note [Ambiguous syntactic categories] in GHC.Parser.PostProcess
.
⭲
::::::{.element: class="fragment"}
The main entry point is GHC.Tc.Module.tcRnModuleTcRnM
.
:::
- Rename the import list in `tcRnImports`.
-
Rename and typecheck local declarations and the export list in `tcRnSrcDecls`.
- Rename local declarations in `rnTopSrcDecls`.
- Typecheck these declarations in `tcTopSrcDecls`.
- Rename the exports in `rnExportList`. This allows us to assemble a final `TcGblEnv` which contains everything provided by the module.
-
Assemble the final typechecked module, to be passed onto the
next stage of the compiler pipeline. Everything is extracted from
the `TcGblEnv`.
<p class="indicator">⭲</p>
Entry point: GHC.Rename.Names.rnImportDecl
.
- Load the module we're importing; this will load its interface file from disk, or directly load the information if it's available in memory.
-
Figure out what we are importing, in `filterImports`.
- For a blanket import, import everything.
-
For an explicit import list, accumulate everything that is mentioned.
`import M( A(x, ..), B(..) )`
-
`import M hiding ( .. )`
Works the same, except now we filter out instead.
-
Add all the imported identifiers to the `GlobalRdrEnv`. We will look up
in this environment when renaming the body of the module.
<p class="indicator">⭲</p>
:::{.element: class="fragment" data-fragment-index="1"}
The main entry point to renaming local declarations is GHC.Rename.Module.rnSrcDecls
.
Control flow:
:::
-
Generate new `Name`s for all binders.
-
"Non-value" binders `getLocalNonValBinders`.
- Type synonyms, type families, data declarations, data families, class declarations.
- Data family instances and class instances, including associated types and methods.
- Foreign imports.
- Pattern synonyms `extendPatSynEnv`.
-
Generate new `Name`s and rename the LHS of top-level value bindings
(`rnTopBindsLHS`).
⤞
-
"Non-value" binders `getLocalNonValBinders`.
-
Rename declaration bodies.
-
Type declarations `rnTyClDecls`.
- Type synonyms, type families, data declarations, data families, class declarations.
- Standalone kind signatures.
- Type family instances, data family instances, class instances.
- Role annotations.
- Top-level value bindings `rnValBindsRHS`.
-
Everything else: `RULES`, foreign import/exports, default declarations...
⭲
-
Type declarations `rnTyClDecls`.
:::{.element: class="fragment"}
rnTyClDecls
- renames all types/classes defined in the module
- uses this information to compute dependency groups (strongly-connected components). :::
:::{.element: class="fragment"}
rnTyClDecls :: [TyClGroup GhcPs] -> RnM ([TyClGroup GhcRn], FreeVars)
rnTyClDecls tycl_ds =
do { tycls_w_fvs <- mapM rnTyClDecl (tyClGroupTyClDecls tycl_ds)
; kisigs_w_fvs <- rnStandaloneKindSigs (tyClGroupKindSigs tycl_ds)
; instds_w_fvs <- mapM rnSrcInstDecl (tyClGroupInstDecls tycl_ds)
; let tycl_sccs = depAnalTyClDecls kisig_fv_env tycls_w_fvs
; .. }
:::
:::{.element: class="fragment"}
See Note [Dependency analysis of type, class, and instance decls] in GHC.Rename.Module
.
:::
:::{.element: class="fragment"} This approach is known to have shortcomings, as it doesn't properly account for type instances when kind-checking.
⭲
See wiki page: Type-&-Class-Dependency-Analysis.
:::
:::{.element: class="fragment"}
data HsBindLR idL idR
= FunBind {..} -- used for function/variable bindings
| PatBind {..} -- (pattern is never a simple variable)
| VarBind {..} -- (introduced by the typechecker)
| PatSynBind {..}
| XHsBindsLR !(XXHsBindsLR idL idR) -- (introduced by the typechecker)
:::
:::{.element: class="fragment"}
rnValBindsRHS
- renames all value bindings
- performs dependency analysis :::
:::{.element: class="fragment"}
rnValBindsRHS ctxt (ValBinds _ mbinds sigs)
= do { (sigs', sig_fvs) <- renameSigs ctxt sigs
; binds_w_dus <- mapBagM (rnLBind (mkScopedTvFn sigs')) mbinds
; let !(anal_binds, anal_dus) = depAnalBinds binds_w_dus
; ... }
⭲
::::::{.element: class="fragment"}
- Template Haskell ::: :::{.element: class="fragment"}
- Backpack :::
Slides available online: sheaf.github.io/ghc-renamer.